mirror of
https://github.com/Z3Prover/z3
synced 2025-08-26 13:06:05 +00:00
286 lines
No EOL
16 KiB
Markdown
286 lines
No EOL
16 KiB
Markdown
# Parallel project notes
|
|
|
|
|
|
|
|
We track notes for updates to
|
|
[smt/parallel.cpp](https://github.com/Z3Prover/z3/blob/master/src/smt/smt_parallel.cpp)
|
|
and possibly
|
|
[solver/parallel_tactic.cpp](https://github.com/Z3Prover/z3/blob/master/src/solver/parallel_tactical.cpp).
|
|
|
|
|
|
|
|
|
|
|
|
## Variable selection heuristics
|
|
|
|
|
|
|
|
* Lookahead solvers:
|
|
* lookahead in the smt directory performs a simplistic lookahead search using unit propagation.
|
|
* lookahead in the sat directory uses custom lookahead solver based on MARCH. March is described in Handbook of SAT and Knuth volumne 4.
|
|
* They both proxy on a cost model where the most useful variable to branch on is the one that _minimizes_ the set of new clauses maximally
|
|
through unit propagation. In other words, if a literal _p_ is set to true, and _p_ occurs in clause $\neg p \vee q \vee r$, then it results in
|
|
reducing the clause from size 3 to 2 (because $\neg p$ will be false after propagating _p_).
|
|
* Selected references: SAT handbook, Knuth Volumne 4, Marijn's March solver on github, [implementation of march in z3](https://github.com/Z3Prover/z3/blob/master/src/sat/sat_lookahead.cpp)
|
|
* VSIDS:
|
|
* As referenced in Matteo and Antti's solvers.
|
|
* Variable activity is a proxy for how useful it is to case split on a variable during search. Variables with a higher VSIDS are split first.
|
|
* VSIDS is updated dynamically during search. It was introduced in the paper with Moscovitz, Malik, et al in early 2000s. A good overview is in Armin's tutorial slides (also in my overview of SMT).
|
|
* VSIDS does not keep track of variable phases (if the variable was set to true or false).
|
|
* Selected refernces [DAC 2001](https://www.princeton.edu/~chaff/publication/DAC2001v56.pdf) and [Biere Tutorial, slide 64 on Variable Scoring Schemes](https://alexeyignatiev.github.io/ssa-school-2019/slides/ab-satsmtar19-slides.pdf)
|
|
* Proof prefix:
|
|
* Collect the literals that occur in learned clauses. Count their occurrences based on polarity. This gets tracked in a weighted score.
|
|
* The weight function can be formulated to take into account clause sizes.
|
|
* The score assignment may also decay similar to VSIDS.
|
|
* We could also use a doubly linked list for literals used in conflicts and keep reinsert literals into the list when they are used. This would be a "Variable move to front" (VMTF) variant.
|
|
* Selected references: [Battleman et al](https://www.cs.cmu.edu/~mheule/publications/proofix-SAT25.pdf)
|
|
* From local search:
|
|
* Note also that local search solvers can be used to assign variable branch priorities.
|
|
* We are not going to directly run a local search solver in the mix up front, but let us consider this heuristic for completeness.
|
|
* The heuristic is documented in Biere and Cai's journal paper on integrating local search for CDCL.
|
|
* Roughly, it considers clauses that move from the UNSAT set to the SAT set of clauses. It then keeps track of the literals involved.
|
|
* Selected references: [Cai et al](https://www.jair.org/index.php/jair/article/download/13666/26833/)
|
|
* Assignment trails:
|
|
* We could also consider the assignments to variables during search.
|
|
* Variables that are always assigned to the same truth value could be considered to be safe to assign that truth value.
|
|
* The cubes resulting from such variables might be a direction towards finding satisfying solutions.
|
|
* Selected references: [Alex and Vadim](https://link.springer.com/chapter/10.1007/978-3-319-94144-8_7) and most recently [Robin et al](https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SAT.2024.9).
|
|
|
|
|
|
## Algorithms
|
|
|
|
This section considers various possible algorithms.
|
|
In the following, $F$ refers to the original goal, $T$ is the number of CPU cores or CPU threads.
|
|
|
|
### Base algorithm
|
|
|
|
The existing algorithm in <b>smt_parallel</b> is as follows:
|
|
|
|
1. Run a solver on $F$ with a bounded number of conflicts.
|
|
2. If the result is SAT/UNSAT, or UNKNOWN with an interrupt or timeout, return. If the maximal number of conflicts were reached continue.
|
|
3. Spawn $T$ solvers on $F$ with a bounded number of conflicts, wait until a thread returns UNSAT/SAT or all threads have reached a maximal number of conflicts.
|
|
4. Perform a similar check as in 2.
|
|
5. Share unit literals learned by each thread.
|
|
6. Compute unit cubes for each thread $T$.
|
|
7. Spawn $T$ solvers with $F \wedge \ell$, where $\ell$ is a unit literal determined by lookahead function in each thread.
|
|
8. Perform a similar check as in 2. But note that a thread can be UNSAT because the unit cube $\ell$ contradicted $F$. In this case learn the unit literal $\neg \ell$.
|
|
9. Shared unit literals learned by each thread, increase the maximal number of conflicts, go to 3.
|
|
|
|
### Algorithm Variants
|
|
|
|
* Instead of using lookahead solving to find unit cubes use the proof-prefix based scoring function.
|
|
* Instead of using independent unit cubes, perform a systematic (where systematic can mean many things) cube and conquer strategy.
|
|
* Spawn some threads to work in "SAT" mode, tuning to find models instead of short resolution proofs.
|
|
* Change the synchronization barrier discipline.
|
|
* [Future] Include in-processing
|
|
|
|
### Cube and Conquer strategy
|
|
|
|
We could maintain a global decomposition of the search space by maintaing a list of _cubes_.
|
|
Initially, the list of cubes has just one element, the cube with no literals $[ [] ]$.
|
|
By using a list of cubes instead of a _set_ of cubes we can refer to an ordering.
|
|
For example, cubes can be ordered by a suffix traversal of the _cube tree_ (the tree formed by
|
|
case splitting on the first literal, children of the _true_ branch are the cubes where the first
|
|
literal is true, children of the _false_ branch are the cubes where the first literal is false).
|
|
|
|
The main question is going to be how the cube decomposition is created.
|
|
|
|
#### Static cubing
|
|
We can aim for a static cube strategy that uses a few initial (concurrent) probes to find cube literals.
|
|
This strategy would be a parallel implementaiton of proof-prefix approach. The computed cubes are inserted
|
|
into the list of cubes and the list is consumed by a second round.
|
|
|
|
#### Growing cubes on demand
|
|
Based on experiences with cubing so far, there is high variance in how easy cubes are to solve.
|
|
Some cubes will be harder than others to solve. For hard cubes, it is tempting to develop a recursive
|
|
cubing strategy. Ideally, a recursive cubing strategy is symmetric to top-level cubing.
|
|
|
|
* The solver would have to identify hard cubes vs. easy cubes.
|
|
* It would have to know when to stop working on a hard cube and replace it in the list of cubes by
|
|
a new list of sub-cubes.
|
|
|
|
* Ideally, we don't need any static cubing and cubing is grown on demand while all threads are utilized.
|
|
* If we spawn $T$ threads to initially work with empty cubes, we could extract up to $T$ indepenent cubes
|
|
by examining the proof-prefix of their traces. This can form the basis for the first, up to $2^T$ cubes.
|
|
* After a round of solving with each thread churning on some cubes, we may obtain more proof-prefixes from
|
|
_hard_ cubes. It is not obvious that we want to share cubes from different proof prefixes at this point.
|
|
But a starting point is to split a hard cube into two by using the proof-prefix from attempting to solve it.
|
|
* Suppose we take the proof-prefix sampling algorithm at heart: It says to start with some initial cube prefix
|
|
and then sample for other cube literals. If we translate it to the case where multiple cubes are being processed
|
|
in parallel, then an analogy is to share candidates for new cube literals among cubes that are close to each-other.
|
|
For example, if thread $t_1$ processes cube $a, b, c$ and $t_2$ processes $a,b, \neg c$. They are close. They are only
|
|
separated by Hamming distance 1. If $t_1$ finds cube literal $d$ and $t_2$ finds cube literal $e$, we could consider the cubes
|
|
$a, b, c, d, e$, and $a, b, c, d, \neg e$, $\ldots$, $a, b, \neg c, \neg d, \neg e$.
|
|
|
|
#### Representing cubes implicitly
|
|
|
|
We can represent a list of cubes by using intervals and only represent start and end-points of the intervals.
|
|
|
|
#### Batching
|
|
Threads can work on more than one cube in a batch.
|
|
|
|
### Synchronization
|
|
|
|
* The first thread to time out or finish could kill other threads instead of joining on all threads to finish.
|
|
* Instead of synchronization barriers have threads continue concurrently without terminating. They synchronize on signals and new units. This is trickier to implement, but in some guises accomplished in [sat/sat_parallel.cpp](https://github.com/Z3Prover/z3/blob/master/src/sat/sat_parallel.cpp)
|
|
|
|
|
|
## Parameter tuning
|
|
|
|
The idea is to have parallel threads try out different parameter settings and search the parameter space of an optimal parameter setting.
|
|
|
|
Let us assume that there is a set of tunable parameters $P$. The set comprises of a set of named parameters with initial values.
|
|
$P = \{ (p_1, v_1), \ldots, (p_n, v_n) \}$.
|
|
With each parameter associate a set of mutation functions $+=, -=, *=$, such as increment, decrement, scale a parameter by a non-negative multiplier (which can be less than 1).
|
|
We will initialize a search space of parameter settings by parameters, values and mutation functions that have assigned reward values. The reward value is incremented
|
|
if a parameter mutation step results in an improvement, and decremented if a mutation step degrades performance.
|
|
$P = \{ (p_1, v_1, \{ (r_{11}, m_{11}), \ldots, (r_{1k_1}, m_{1k_1}) \}), \ldots, (p_n, v_n, \{ (r_{n1}, m_{n1}), \ldots, (r_{nk_n}, m_{nk_n})\}) \}$.
|
|
The initial values of reward functions is fixed (to 1) and the initial values of parameters are the defaults.
|
|
|
|
* The batch manager maintains a set of candidate parameters $CP = \{ (P_1, r_1), \ldots, (P_n, r_n) \}$.
|
|
* A worker thread picks up a parameter $P_i$ from $CP$ from the batch manager.
|
|
* It picks one or more parameter settings within $P_i$ whose mutation function have non-zero reward functions and applies a mutation.
|
|
* It then runs with a batch of cubes.
|
|
* It measures the reward for the new parameter setting based in number of cubes, cube depth, number of timeouts, and completions with number of conflicts.
|
|
* If the new reward is an improvement over $(P_i, r_i)$ it inserts the new parameter setting $(P_i', r_i')$ into the batch manager.
|
|
* The batch manager discards the worst parameter settings keeping the top $K$ ($K = 5$) parameter settings.
|
|
|
|
When picking among mutation steps with reward functions use a weighted sampling algorithm.
|
|
Weighted sampling works as follows: You are given a set of items with weights $(i_1, w_1), \ldots, (i_k, w_k)$.
|
|
Add $w = \sum_j w_j$. Pick a random number $w_0$ in the range $0\ldots w$.
|
|
Then you pick item $i_n$ such that $n$ is the smallest index with $\sum_{j = 1}^n w_j \geq w_0$.
|
|
|
|
SMT parameters that could be tuned:
|
|
|
|
<pre>
|
|
|
|
arith.bprop_on_pivoted_rows (bool) (default: true)
|
|
arith.branch_cut_ratio (unsigned int) (default: 2)
|
|
arith.eager_eq_axioms (bool) (default: true)
|
|
arith.enable_hnf (bool) (default: true)
|
|
arith.greatest_error_pivot (bool) (default: false)
|
|
arith.int_eq_branch (bool) (default: false)
|
|
arith.min (bool) (default: false)
|
|
arith.nl.branching (bool) (default: true)
|
|
arith.nl.cross_nested (bool) (default: true)
|
|
arith.nl.delay (unsigned int) (default: 10)
|
|
arith.nl.expensive_patching (bool) (default: false)
|
|
arith.nl.expp (bool) (default: false)
|
|
arith.nl.gr_q (unsigned int) (default: 10)
|
|
arith.nl.grobner (bool) (default: true)
|
|
arith.nl.grobner_cnfl_to_report (unsigned int) (default: 1)
|
|
arith.nl.grobner_eqs_growth (unsigned int) (default: 10)
|
|
arith.nl.grobner_expr_degree_growth (unsigned int) (default: 2)
|
|
arith.nl.grobner_expr_size_growth (unsigned int) (default: 2)
|
|
arith.nl.grobner_frequency (unsigned int) (default: 4)
|
|
arith.nl.grobner_max_simplified (unsigned int) (default: 10000)
|
|
arith.nl.grobner_row_length_limit (unsigned int) (default: 10)
|
|
arith.nl.grobner_subs_fixed (unsigned int) (default: 1)
|
|
arith.nl.horner (bool) (default: true)
|
|
arith.nl.horner_frequency (unsigned int) (default: 4)
|
|
arith.nl.horner_row_length_limit (unsigned int) (default: 10)
|
|
arith.nl.horner_subs_fixed (unsigned int) (default: 2)
|
|
arith.nl.nra (bool) (default: true)
|
|
arith.nl.optimize_bounds (bool) (default: true)
|
|
arith.nl.order (bool) (default: true)
|
|
arith.nl.propagate_linear_monomials (bool) (default: true)
|
|
arith.nl.rounds (unsigned int) (default: 1024)
|
|
arith.nl.tangents (bool) (default: true)
|
|
arith.propagate_eqs (bool) (default: true)
|
|
arith.propagation_mode (unsigned int) (default: 1)
|
|
arith.random_initial_value (bool) (default: false)
|
|
arith.rep_freq (unsigned int) (default: 0)
|
|
arith.simplex_strategy (unsigned int) (default: 0)
|
|
dack (unsigned int) (default: 1)
|
|
dack.eq (bool) (default: false)
|
|
dack.factor (double) (default: 0.1)
|
|
dack.gc (unsigned int) (default: 2000)
|
|
dack.gc_inv_decay (double) (default: 0.8)
|
|
dack.threshold (unsigned int) (default: 10)
|
|
delay_units (bool) (default: false)
|
|
delay_units_threshold (unsigned int) (default: 32)
|
|
dt_lazy_splits (unsigned int) (default: 1)
|
|
lemma_gc_strategy (unsigned int) (default: 0)
|
|
phase_caching_off (unsigned int) (default: 100)
|
|
phase_caching_on (unsigned int) (default: 400)
|
|
phase_selection (unsigned int) (default: 3)
|
|
qi.eager_threshold (double) (default: 10.0)
|
|
qi.lazy_threshold (double) (default: 20.0)
|
|
qi.quick_checker (unsigned int) (default: 0)
|
|
relevancy (unsigned int) (default: 2)
|
|
restart_factor (double) (default: 1.1)
|
|
restart_strategy (unsigned int) (default: 1)
|
|
seq.max_unfolding (unsigned int) (default: 1000000000)
|
|
seq.min_unfolding (unsigned int) (default: 1)
|
|
seq.split_w_len (bool) (default: true)
|
|
</pre>
|
|
|
|
# Benchmark setup
|
|
|
|
## Prepare benchmark data
|
|
|
|
Data
|
|
<pre>
|
|
QF_LIA - heavily skewed for selected benchmarks
|
|
QF_NIA_small
|
|
|
|
Others we should try:
|
|
|
|
Certora
|
|
QF_ABV
|
|
QF_UFBV
|
|
QF_AUFBV (or whatever it is called)
|
|
QF_UFLIA (might be too small)
|
|
|
|
push more instances from smtlib.org benchmarks into z3-poly-testing/inputs or upload tgz files directly in setup.
|
|
|
|
</pre>
|
|
|
|
## Naming conventions and basic parameters
|
|
|
|
The first rounds created a base-line where all features were turned off, then it created variants where one feature was turned off at a time.
|
|
The data-point where all but one feature is turned off could tell us something if a feature has a chance of being useful in isolation.
|
|
The following is a sample naming scheme for associated settings.
|
|
|
|
<pre>
|
|
threads-1
|
|
-T:30 smt.threads=1 tactic.default_tactic=smt
|
|
|
|
threads-4-none-unbounded
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=true smt_parallel.share_conflicts=false smt_parallel.share_units=false smt.threads.max_conflicts=4294967295
|
|
|
|
threads-4-none
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=true smt_parallel.share_conflicts=false smt_parallel.share_units=false
|
|
|
|
threads-4-shareunits
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=true smt_parallel.share_conflicts=false smt_parallel.share_units=true
|
|
|
|
threads-4-cube-frugal
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=false smt_parallel.frugal_cube_only=true smt_parallel.share_conflicts=false smt_parallel.share_units=false
|
|
|
|
threads-4-cube-frugal-shareconflicts
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=false smt_parallel.frugal_cube_only=true smt_parallel.share_conflicts=true smt_parallel.share_units=false
|
|
|
|
threads-4-shareunits-nonrelevant
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=true smt_parallel.share_conflicts=true smt_parallel.share_units=false smt_parallel.relevant_units_only=false
|
|
|
|
threads-4-cube
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=false smt_parallel.frugal_cube_only=false smt_parallel.share_conflicts=false smt_parallel.share_units=false
|
|
|
|
threads-4-cube-shareconflicts
|
|
-T:30 smt.threads=4 tactic.default_tactic=smt smt_parallel.never_cube=false smt_parallel.frugal_cube_only=false smt_parallel.share_conflicts=true smt_parallel.share_units=false
|
|
|
|
</pre>
|
|
|
|
Ideas for other knobs that can be added:
|
|
|
|
<il>
|
|
<li> Only cube on literals that exist in initial formula. Don't cube on literals created during search (such as by theory lemmas).
|
|
<li> Only share units for literals that exist in the initial formula.
|
|
</il>
|
|
|
|
This requires collecting sub-expressions from the solver state at initialization time. It can be programmed by adding an expr_mark
|
|
structure and marking all expressions used in bool_vars after initialization as initial.
|
|
Then only initial bool_vars are shared as units.
|
|
A simpler approach is just to remember the initial number of bool_vars.
|
|
Fresh atoms are allocated with higher ordinals anyway. |