z3/src at 772ee91cbf0a27062502739c745163fc6bf33d05 - mirrors/z3

mirrors/z3

mirror of https://github.com/Z3Prover/z3 synced 2026-07-05 06:46:11 +00:00

Ilana Shapiro 44966e1733 Terminate on Demand and some algorithmic bugfixes in the search tree (#9336 ) * first attempt with codex. Codex notes: What changed: - Each tree node now tracks: - active worker count - lease epoch - cancel epoch - get_cube() now hands each worker an explicit lease: (node, epoch, cancel_epoch). - try_split() and backtrack() now operate on that lease, and the batch manager releases the worker’s lease under the tree lock before mutating the node. - If another worker closes the leased node or subtree, the batch manager cancels only the workers whose current leased nodes are now closed. - Workers detect canceled leases after check(), reset their local cancel flag, abandon the stale lease, and continue instead of turning that into a global exception. - The “reopen immediately into the open queue” policy is preserved. I did not add a barrier waiting for all workers on a node to finish. - Active-worker accounting is now separate from the open/active/closed scheduling status, so reopening a node no longer erases the fact that other workers are still on it. I also updated search_tree bookkeeping so: - closure bumps node cancel/lease epochs - active-node counting uses actual active-worker presence, not just status == active * fix(parallel-smt): gate split/backtrack by lease epoch What it changes: - util/search_tree.h - bumps node epoch on split - threads epoch through should_split(...) and try_split(...) - always records effort, but only split/reopen if the lease epoch still matches - smt/smt_parallel.cpp - requires is_lease_valid(..., lease.epoch) before backtrack(...) - passes lease.epoch into m_search_tree.try_split(...) * clean up code and add some comments * fix bug about backtracking condition being too strict: The epoch guard should not block backtrack(...) the same way it blocks try_split(...). A stale worker that proves UNSAT for n should still be able to close n, and that closure should then cancel the other workers on n and its subtree. I changed smt/smt_parallel.cpp accordingly: - try_split(...) still uses epoch to reject stale structural splits - backtrack(...) no longer requires is_lease_valid(..., epoch); it only requires that the lease is not already canceled So the intended asymmetry is now restored: - stale split: reject - stale unsat/backtrack: allow closure, then cancel affected workers * ablate to no backtracking on stale leases * revert codex change about exception handling * remove old code * ablate backtracking gate * attempt to fix linux crashes * try to fix bug about active worker counts/lease accounting. current policy should hold: - stale leases: release/decrement - canceled leases: do not release/decrement (just ignore since we have an invariant that canceled leases mean closed nodes that are never revisited * delay premature root activation * fix major semantic bug about threads continually choosing the root if their lease is reset * fix cancellation to unknown status * fix very bad bug about all threads needing to start at the root * ablate active ranking: now nodes are only reopened if they are truly inactive (active worker count is 0) * fix some bugs about leases * ablate adding static effort only * fix some bugs about leases * don't explode effort for portfolio nodes * fix: still accumulate per-node effort, but don't over-accumulate on portfolio solves * restore dynamically scaled effort * lease cancellation doens't touch rlimit now, it just sets max conflicts to 0. also fix a VERY BAD BUG about effort never being updated until all leases are done on a node, which meant we never left the root * cross-thread modification of max conflicts is unsafe, so create an atomic lease canceled variable that's ch ecked in ctx where max conflicts is also checked * move atomic lease check in the context to the more global get_cancel_flag function * Fix new SIGSEV. issue: The root cause: get_cancel_flag() is called from within propagation loops (mid-BCP, mid-equality-propagation, mid-atom-propagation). When it returns true there, the solver exits early and leaves the context in an intermediate state — propagation queues partially processed, theory state potentially inconsistent with boolean state. For the global cancel (m.limit().cancel()), this is harmless: the worker exits entirely and the context is destroyed. Intermediate state doesn't matter. For a lease cancel, the context is reused — the worker gets a new cube and calls ctx->check() again on the same context object. Re-entering check() on a context interrupted mid-propagation causes it to access that corrupted intermediate state → SIGSEGV. The m_max_conflicts check is the only checkpoint that's safe for re-entry: it only fires post-conflict-resolution, pre-decision, when propagation queues are empty and theory state is consistent. Fix: Remove m_lease_canceled from get_cancel_flag(). Keep it only at safe, between-phase checkpoints where the context is in a known-consistent state. The result is two safe checkpoints for m_lease_canceled: after each conflict (post-resolution, queues empty) and before each theory final check (not yet entered the theory). Neither interrupts the solver mid-mutation. The SIGSEGV should be gone, and NIA performance should improve because long theory final checks (where NIA burns most time) are now preemptable before they start. * fix new inconsistent theory bug: The problem is returning FC_GIVEUP from inside final_check() after some theories have already run final_check_eh() and pushed propagations into the queue. Those pending propagations reference context state that gets invalidated on the next check() call → SIGSEGV. The fix: check m_lease_canceled before entering final_check() in bounded_search(), never from inside it. That way the context is always in a clean pre-final-check state when we bail out. This is safe: decide() returned false (all variables assigned, no pending propagations), theories haven't been touched yet, context is in a fully consistent state. For NIA, this is still a meaningful win — we avoid entering expensive arithmetic final checks entirely when the lease is already canceled. * remove second lease cancel check in smt_context, not sure it's safe. only check where we do the max conflicts check * check epoch match in release_lease_unlocked * restore exception handling logic to master branch * restore reslimit cancels since the bug appears to be latent --------- Co-authored-by: Ilana Shapiro <ilanashapiro@Ilanas-MacBook-Pro.local> Co-authored-by: Ilana Shapiro <ilanashapiro@Mac.lan1>		2026-04-19 16:21:41 +02:00
..
ackermannization	Fix off-by-one vulnerabilities: use range-based for on goals; cache loop bound	2026-02-19 22:37:22 +00:00
api	Go/OCaml API gaps: substitution, AST introspection, Spacer, Goal completion (#9277 )	2026-04-12 14:00:03 -07:00
ast	add flag to control non-linear substitutions: smt.solve_eqs.linear is by default false, setting it to true restricts solutions to substitutions to only use linear terms. This can have an effect on cross-multiplication of nested substitutions	2026-04-12 09:41:46 -07:00
cmd_context	Remove redundant default constructors when they're the only constructor (#8461 )	2026-02-18 20:58:01 -08:00
math	fix(anum): give anum move semantics to prevent sort-triggered double-free (#9320 )	2026-04-19 15:55:52 +02:00
model	Remove redundant default constructors when they're the only constructor (#8461 )	2026-02-18 20:58:01 -08:00
muz	Refactor sls_euf_plugin.cpp validate_model and add SASSERT in udoc_relation.cpp	2026-03-09 16:57:59 +00:00
nlsat	Fix apply_permutation to take perm by const reference	2026-03-30 04:57:08 -10:00
opt	fix box mode: reset bounds before each objective	2026-03-19 17:07:21 -10:00
params	add flag to control non-linear substitutions: smt.solve_eqs.linear is by default false, setting it to true restricts solutions to substitutions to only use linear terms. This can have an effect on cross-multiplication of nested substitutions	2026-04-12 09:41:46 -07:00
parsers	Revert "Refactor find_psort_decl() to return std::optional<psort_decl*> (#8339 )"	2026-02-18 20:57:56 -08:00
qe	Simplify `extract_var_bound` via operator normalization (#9062 )	2026-03-22 16:01:12 -07:00
sat	Fix assertion violation in q_mbi diagnostic output	2026-02-28 11:15:28 -10:00
shell	Standardize for-loop increments to prefix form (++i) (#8199 )	2026-02-18 20:57:29 -08:00
smt	Terminate on Demand and some algorithmic bugfixes in the search tree (#9336 )	2026-04-19 16:21:41 +02:00
solver	Remove copies (#8583 )	2026-02-18 21:02:22 -08:00
tactic	Merge pull request #8955 from Z3Prover/copilot/convert-injectivity-to-simplifier	2026-03-12 17:07:19 -07:00
test	Fixes for lar_term== operator (#9284 )	2026-04-12 14:31:18 -07:00
util	Terminate on Demand and some algorithmic bugfixes in the search tree (#9336 )	2026-04-19 16:21:41 +02:00
CMakeLists.txt	git bindings v1.0	2026-02-18 21:02:25 -08:00