mirrors/z3 - Libre-Chip.org

mirrors/z3

mirror of https://github.com/Z3Prover/z3 synced 2026-06-26 10:28:48 +00:00

Author	SHA1	Message	Date
Margus Veanes	d3fbefbeb9	perf(seq): right-associate concats in derivative; restore union subsumption; fix mk_regex_concat sort bug The symbolic derivative of a left-associated concat (ab)r2 recurses through the entire left spine, exceeding m_max_depth and emitting stuck symbolic re.derivative terms that accumulate across NFA states and cause an exponential blow-up on contains-pattern intersections. Right- associate the concat (via derive::mk_concat, a single linear pass that does not touch the derivative depth counter) before deriving, so the head stays atomic and recursion is shallow. Also restore the L(a)subset L(b) subsumption in mk_union_core (collapses semantically-equal union states in antimirov intersection derivatives), and fix a latent sort bug in mk_regex_concat where the '..* = .+' rewrite passed the element sort instead of the regex sort to mk_full_char, triggering a 're.+' sort-mismatch exception once exercised by the derivative path. Result on QF_S/20250410-matching wildcard-matching-regex set: 02 15s->0.6s, 04 15s->0.7s, 05 timeout->4.2s, 59/62 timeout->0.1-0.3s (vs master). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-26 03:55:21 +03:00
Margus Veanes	77bce353db	Merge remote-tracking branch 'origin/derive-with-ranges' into derive-with-ranges	2026-06-26 00:36:18 +03:00
Margus Veanes	149549b946	wip(seq): clean_leaf cofactors, mk_union_core simplifications, re_is_empty antimirov emptiness check Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-26 00:36:14 +03:00
Nikolaj Bjorner	618a30f5d8	differentiate A and B derivatives Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-25 14:27:41 -07:00
Margus Veanes	3d42d997c6	perf(seq): collapse adjacent full_seq factors in regex concatenation mk_re_concat only merged Sigma* Sigma* when both stars were direct siblings. Once a literal nests the concatenation, e.g. (Sigma* x Sigma) ++ Sigma, the trailing duplicate Sigma* survived because the left operand is a concat that ends with Sigma* rather than a bare full_seq. Redundant adjacent stars then multiply derivative states during bisimulation. Add two grouping-insensitive rules using the existing starts_with_full_seq/ends_with_full_seq helpers: R ++ Sigma* -> R when R ends with Sigma* Sigma* ++ R -> R when R starts with Sigma* Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-25 22:56:51 +03:00
Margus Veanes	2ecdd15e0f	fix(seq): reject non-character regexes in regex_to_range_predicate Guard regex_to_range_predicate so it only collapses regexes whose element sort is a character sort. Previously a regex over a non-char sequence sort (e.g. (re (Seq Int))) could be silently mis-collapsed into bogus [0, max_char] ranges. Add a negative unit test covering re.empty/re.full_char over (Seq Int). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-25 22:42:26 +03:00
Margus Veanes	ee20c9963b	feat(seq::bisim): hoist all ITEs to top in bisim derivative leaves In the bisimulation equivalence path, derivative leaves are now extracted via the new seq_rewriter::brz_derivative_cofactors (derive::derivative_cofactors), which computes the symbolic derivative and enumerates its reachable leaves in fully ITE-hoisted normal form: every if-then-else over the input character (including ones previously buried under a concat or union) is hoisted to the top via decompose_ite, infeasible minterms are pruned, and unions are kept intact as single states. Each leaf is therefore a ground regex free of (:var 0), so its nullability is always decidable. This replaces collect_leaves (which only split top-level ITEs and left buried (:var 0) ITEs inside leaves), the root cause of bisim returning l_undef and falling through to the slow theory solver. Validation on the regex-equivalence corpus (1523 files, -T:5, 8 workers): - vs master: total solved 1394 vs 1378 (+16), soft_timeouts 129 vs 145, 0 soundness disagreements (was 18 -> 5 -> 0). - vs derive: +242 solved (1394 vs 1152), 25.4% faster on commonly-solved files, fixes 18 soundness disagreements, only 6 regressions. - corpus wall time halved (172s vs 332s/349s). - All 91 unit tests pass, including seq_regex_bisim. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-25 08:40:32 +03:00
Margus Veanes	a2b73b0ee6	misc edits of work in progress	2026-06-25 07:54:00 +03:00
Margus Veanes	8fe488721b	Stage 3: collapse boolean combinations of char-class regexes Introduce src/ast/rewriter/regex_range_collapse.{h,cpp}, a translator between the boolean-combination-of-character-class fragment of regexes and the range_predicate value type added in Stage 2. Recognized fragment (translates to range_predicate): re.empty, re.full_char, re.range, re.union, re.intersection, re.diff of operands recursively in the fragment. Range bounds are accepted in three encodings: string constant ("a"), seq.unit of a const char (seq.unit (Char 97)), and length-1 zstring literal. NOT translated: re.complement -- this is sequence-level complement (Sigma* \ L), not character-class complement. Translating it would incorrectly turn re.comp(re.range "a" "z") into the character class [^a-z], which would drop the empty string and all length>=2 strings. Hook the translator into seq_rewriter at mk_re_union0, mk_re_union, mk_re_inter0, mk_re_inter, and mk_re_diff so that boolean combinations of character classes always reduce to a single canonical range-set form. mk_re_complement is intentionally not hooked. Materialization uses the canonical (seq.unit (Char N)) bound form (matching the rest of seq_rewriter) and right-associates the union with operands sorted by expr_id so the result matches the invariant expected by merge_regex_sets. Unit tests in src/test/regex_range_collapse.cpp cover the recognized fragment, the non-translatable cases, and round-trip identity for multi-range predicates. Corpus validation on bench/inputs/regex-equivalence (1523 .smt2): - 0 soundness regressions vs derive baseline. - Resolves 4 previously-soft-timeout files (now solved correctly). - Resolves 1 pre-existing wrong answer (mut_0404: master/derive say unsat, ground-truth annotation and Stage 3 say sat). - Wall-time: -2.2% vs Stage-3 starting point, -1.5% vs derive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-25 07:51:09 +03:00
Margus Veanes	764a689092	Add seq::range_predicate value type with unit tests Introduce a specialized range-algebra over the unsigned character domain [0, max_char], with canonical sorted-disjoint-non-adjacent representation and linear-time union, intersection, complement, difference, and symmetric difference operations. This is Stage 1 of the derive-with-ranges plan: the value type only, with unit tests covering factories, ordering, hashing, hand-picked instances, and exhaustive de-Morgan / lattice laws over a small domain (verified by enumerating all 64 subsets). Integration with seq::derive's path conditions, the OneStep cache, and the R&psi smart-constructor rewrite are deferred to later stages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-25 07:50:35 +03:00
Nikolaj Bjorner	9a5089397d	better cofactoring	2026-06-24 15:55:30 -07:00
Nikolaj Bjorner	d77fe0b0cd	enable distribution of union over intersection Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-23 12:56:23 -07:00
Nikolaj Bjorner	f2e6d05c56	disable hoisting ite over union Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-17 16:26:38 -06:00
Nikolaj Bjorner	29489c3bd8	make concatentation right associative Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-17 12:00:08 -06:00
Nikolaj Bjorner	e011aead11	perf Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-17 10:06:43 -06:00
Nikolaj Bjorner	f261c7732b	fix perf Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-17 10:03:52 -06:00
Nikolaj Bjorner	05c394aa6c	bug fixes Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-17 09:39:08 -06:00
Nikolaj Bjorner	26feb16714	fix bugs Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-17 09:14:16 -06:00
Margus Veanes	3d2d793f1d	fix(seq::derive): remove unsound depth limit in hoist_ite The single-ITE branch of hoist_ite was gated by 'm_path_stack.size() < 8'. When the depth limit was reached, hoist_ite returned nullptr and callers fell back to non-path-aware structural rewrites (mk_union0 / mk_xor0). These rewrites simplify e.g. mk_union(empty, X) -> X and return X unchanged, preserving inner ITE branches that were built at an earlier (less constrained) path. Subsequent derivative computation never path-prunes those branches, which can leak unreachable epsilon-leaves into the final t-regex and cause the bisimulation algorithm to report inequivalence for equivalent regexes. Concrete trigger: derivatives of unions/xors with >= 9 components produce 9-deep ITE chains; at depth 8 the inner ITE returns unprocessed, leaving an unreachable epsilon-leaf that bisim mis-interprets as a distinguishing word. Removing the guard restores soundness. The corpus run against regex-equivalence (1523 files) fixes 22 triangulated soundness bugs (mut_0013, mut_0241, mut_0257, mut_0301 among others) with zero regressions. 89/89 unit tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-15 03:18:15 -07:00
Margus Veanes	0b8bb98656	fix(seq::derive): symbolize top-level cache key to avoid concrete-ele poisoning `seq::derive::operator()(ele, r)` looks up `m_top_cache` keyed only by the regex `r`, but on a miss it used to set `m_ele = ele` (a concrete char) before calling `derive_rec(r)`. The resulting ITE-tree contained constant-folded `(= ele c)` conditions, so the "symbolic" derivative stored in the cache was actually specialized to that one ele. Subsequent calls with the same `r` but a different ele hit the stale cached answer and the substitution at the bottom was a no-op (no `v0` left to replace). Simplest victim: (str.in_re "aP" (re.++ (re.* "a") "P")) returned `unsat`. The first call D_'a'(aP) computed `aP` and cached it under key `aP`; the next call D_'P'(aP) hit that cache entry and returned `aP` instead of epsilon, so the membership check ended on a non-nullable state. Fix: set `m_ele = v` (the canonical fresh var) so the derivative is genuinely symbolic. Concrete-ele callers go through the existing substitution at the bottom of `operator()`. Adds a regression test in src/test/seq_regex_bisim.cpp checking that D_'a'(aP) is not nullable while D_'P'(a*P) is. Note: this is independent of the mut_0013 bisim-level unsoundness; that case still fails and is being tracked separately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-15 02:28:21 -07:00
Margus Veanes	fb6470a1a1	fix seq::derive::intersect_intervals dropping kept intervals on disjoint case The intersect_intervals routine in seq_derive.cpp maintains a path-tracking disjoint union of character intervals. When intersecting the active suffix with a new constraint [lo, hi], it iterated the suffix and, on encountering the first interval disjoint from [lo, hi], reset the output cursor to the end-marker and broke out of the loop. This both threw away the intervals it had already kept and skipped every remaining interval, so e.g. [(0, 96), (98, max)] intersected with [98, 98] became empty instead of [(98, 98)]. Inside derive that silently killed valid branches of symbolic derivatives. For example D(a\|b) collapsed to ite(c='a', eps, empty) -- the 'b' branch vanished -- which made the bisimulation procedure conclude bogus equalities such as a* == (a\|b). On the regex-equivalence corpus this single bug accounted for ~510 false-unsat results vs master. Fix: drop only the disjoint interval and continue scanning the rest of the suffix. Add a small assertion-based regression test that builds D(a\|b), checks both branches survive, and runs bisim on a vs (a\|b)*. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-15 01:38:55 -07:00
Nikolaj Bjorner	cf0e43ab38	polish Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-11 14:10:25 -07:00
Nikolaj Bjorner	e21d154778	cleanup, missing case for nullable xor Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 19:57:51 -07:00
Nikolaj Bjorner	21a3a317f2	cleanup Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 19:50:26 -07:00
Nikolaj Bjorner	f48ec128eb	cleanup Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 19:43:38 -07:00
Nikolaj Bjorner	898178fbe5	merge with master Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:54:33 -07:00
Nikolaj Bjorner	c9cd5147be	merge Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:31:01 -07:00
Nikolaj Bjorner	0c2ed444ca	fix build Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:27:48 -07:00
copilot-swe-agent[bot]	1e906ba585	Remove is_nullable_rec from seq_rewriter, delegate to derive::nullable	2026-06-10 15:27:42 -07:00
copilot-swe-agent[bot]	70a9dbfae2	Apply follow-up derive validation fixes	2026-06-10 15:26:46 -07:00
copilot-swe-agent[bot]	bf9707a316	Address PR feedback on derive, nullability, and requested reverts	2026-06-10 15:26:40 -07:00
Nikolaj Bjorner	458878b5e1	cleanup Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:25:06 -07:00
Nikolaj Bjorner	0e29a35da5	updates Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:25:05 -07:00
Nikolaj Bjorner	dbd725bdc0	Refactor merge_union and mk_union_core functions	2026-06-10 15:25:05 -07:00
Nikolaj Bjorner	72575ff962	reuse char extraction from seq_util	2026-06-10 15:25:04 -07:00
Nikolaj Bjorner	867dc175c5	tune and fix derive	2026-06-10 15:25:03 -07:00
Nikolaj Bjorner	61093fadf6	updates to derive	2026-06-10 15:25:03 -07:00
Nikolaj Bjorner	ff8a1034d6	Refactor seq_derive: inline path pruning with ACI normalization Replace simplify_ite_rec post-hoc pass with inline path pruning: - push/pop API with lbool return (l_true=implied, l_undef=pushed, l_false=contradicts) - apply_ite hoists ITE through union/inter/complement with path-aware pruning - Path-aware caching for mk_union, mk_inter, mk_complement - Incremental path expression maintenance for cache keys - Complement always pushes through ITE for same-condition merge - ACI normalization (flatten/sort/deduplicate) for union base case - is_subset subsumption prevents unbounded union growth - Prefix factoring (a·x ∪ a·y = a·(x ∪ y)) for loop derivatives - seq_rewriter passed as reference to derive class - Depth-limited single-ITE hoisting (path_stack.size() < 8) - pred_implies with signed atoms avoids mk_not allocations - extract_char_range properly checks m_ele identity Results: 0 timeouts on regression suite (vs 2 on master). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-10 15:24:57 -07:00
Nikolaj Bjorner	13d0de42bc	tuning	2026-06-10 15:24:06 -07:00
Nikolaj Bjorner	243ebe0660	remove local copies of benchmarks Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:24:05 -07:00
Nikolaj Bjorner	6b218b47dd	Delete benchmarks/instance08315.smt2	2026-06-10 15:24:05 -07:00
Nikolaj Bjorner	416c676040	Delete benchmarks/instance08175.smt2	2026-06-10 15:24:04 -07:00
Nikolaj Bjorner	9456297046	tuning simplification processing Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:24:04 -07:00
Nikolaj Bjorner	f42172e65a	conservative expansions	2026-06-10 15:24:03 -07:00
Nikolaj Bjorner	98a7992a65	handle more cass with intervals	2026-06-10 15:24:02 -07:00
Nikolaj Bjorner	18a0db9d48	cr updates Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:24:01 -07:00
Nikolaj Bjorner	6b862ddf19	intervals Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>	2026-06-10 15:24:01 -07:00
Nikolaj Bjorner	6289b91f17	Add interval-based range simplification for ITE conditions Introduce exclusion intervals alongside the existing path-based condition tracking in simplify_ite_rec. The intervals track which character values are still possible at each point in the ITE tree, enabling simplification of nested range conditions that the per-entry path approach cannot handle. Key additions: - intervals_t type and push_intervals() to maintain live character ranges - eval_range_cond() checks AND-of-char_le conditions against intervals - intersect_intervals/exclude_interval utilities from seq_rewriter pattern - Negated AND handling: ¬(lo<=x ∧ x<=hi) excludes [lo,hi] from intervals The interval check runs before the existing eval_path_cond logic, catching cases like: if(0<=x<=10, t, if(1<=x<=8, t2, e2)) → if(0<=x<=10, t, e2) where the inner range [1,8] is fully contained in the excluded outer range. Fixes remaining regression timeouts on 5728 P2 and 5731 P4. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-10 15:24:00 -07:00
Nikolaj Bjorner	d54d62a07a	Fix regression timeouts via range condition simplification - Simplify trivial range bounds in derive_range: when lo=0, omit the lo<=x condition; when hi=max_char, omit the x<=hi condition. Full charset ranges return epsilon directly. - Add char_le(0,x)=true and char_le(x,max)=true to eval_cond for always-valid bounds. - Add range implication logic to simplify_ite_rec: when path has negated/positive char_le constraints, detect implied or contradicted char_le conditions (e.g., ¬(x<=127) implies 128<=x). - Add is_subset(a, .+) check: non-nullable regexes are subsets of .+ - In update_state_graph, skip recursive exploration of nullable targets to avoid state explosion. These fixes resolve timeouts on 5724 (all problems), 5721 P1, and 5693. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-10 15:23:59 -07:00
Nikolaj Bjorner	88a177f6c5	Fix derivative instability and recursion bugs - Add top-level cache (m_top_cache) to ensure stable AST node identity across repeated derivative calls, preventing state graph divergence - Add get_head_tail helper for derive_to_re with str.is_unit/str.is_concat - Add ITE hoisting in mk_union/mk_inter to keep ITEs at top level - Add De Morgan rule in mk_complement: ~(A∪B) → ~A ∩ ~B - Add ~ε → .+ simplification in mk_complement - Add prefix factoring: a·x ∪ a·y = a·(x∪y) and a·x ∩ a·y = a·(x∩y) - Add r* ∩ .+ = r+ special case in mk_inter - Enhance is_subset with union/intersection distributivity and complement - Remove De Morgan from mk_inter to prevent infinite recursion loop Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-06-10 15:23:59 -07:00

1 2 3 4 5 ...

22372 commits