3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-06-26 02:20:58 +00:00
Commit graph

22366 commits

Author SHA1 Message Date
Margus Veanes
ee20c9963b feat(seq::bisim): hoist all ITEs to top in bisim derivative leaves
In the bisimulation equivalence path, derivative leaves are now
extracted via the new seq_rewriter::brz_derivative_cofactors
(derive::derivative_cofactors), which computes the symbolic
derivative and enumerates its reachable leaves in fully ITE-hoisted
normal form: every if-then-else over the input character (including
ones previously buried under a concat or union) is hoisted to the top
via decompose_ite, infeasible minterms are pruned, and unions are kept
intact as single states. Each leaf is therefore a ground regex free of
(:var 0), so its nullability is always decidable.

This replaces collect_leaves (which only split top-level ITEs and left
buried (:var 0) ITEs inside leaves), the root cause of bisim returning
l_undef and falling through to the slow theory solver.

Validation on the regex-equivalence corpus (1523 files, -T:5, 8 workers):
- vs master: total solved 1394 vs 1378 (+16), soft_timeouts 129 vs 145,
  0 soundness disagreements (was 18 -> 5 -> 0).
- vs derive: +242 solved (1394 vs 1152), 25.4% faster on commonly-solved
  files, fixes 18 soundness disagreements, only 6 regressions.
- corpus wall time halved (172s vs 332s/349s).
- All 91 unit tests pass, including seq_regex_bisim.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-25 08:40:32 +03:00
Margus Veanes
a2b73b0ee6 misc edits of work in progress 2026-06-25 07:54:00 +03:00
Margus Veanes
8fe488721b Stage 3: collapse boolean combinations of char-class regexes
Introduce src/ast/rewriter/regex_range_collapse.{h,cpp}, a translator
between the boolean-combination-of-character-class fragment of regexes
and the range_predicate value type added in Stage 2.

Recognized fragment (translates to range_predicate):
  re.empty, re.full_char, re.range, re.union, re.intersection, re.diff
of operands recursively in the fragment.  Range bounds are accepted in
three encodings: string constant ("a"), seq.unit of a const char
(seq.unit (Char 97)), and length-1 zstring literal.

NOT translated:
  re.complement -- this is sequence-level complement (Sigma* \ L), not
  character-class complement.  Translating it would incorrectly turn
  re.comp(re.range "a" "z") into the character class [^a-z], which would
  drop the empty string and all length>=2 strings.

Hook the translator into seq_rewriter at mk_re_union0, mk_re_union,
mk_re_inter0, mk_re_inter, and mk_re_diff so that boolean combinations
of character classes always reduce to a single canonical range-set
form.  mk_re_complement is intentionally not hooked.

Materialization uses the canonical (seq.unit (Char N)) bound form
(matching the rest of seq_rewriter) and right-associates the union
with operands sorted by expr_id so the result matches the invariant
expected by merge_regex_sets.

Unit tests in src/test/regex_range_collapse.cpp cover the recognized
fragment, the non-translatable cases, and round-trip identity for
multi-range predicates.

Corpus validation on bench/inputs/regex-equivalence (1523 .smt2):
- 0 soundness regressions vs derive baseline.
- Resolves 4 previously-soft-timeout files (now solved correctly).
- Resolves 1 pre-existing wrong answer (mut_0404: master/derive say
  unsat, ground-truth annotation and Stage 3 say sat).
- Wall-time: -2.2% vs Stage-3 starting point, -1.5% vs derive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-25 07:51:09 +03:00
Margus Veanes
764a689092 Add seq::range_predicate value type with unit tests
Introduce a specialized range-algebra over the unsigned character

domain [0, max_char], with canonical sorted-disjoint-non-adjacent

representation and linear-time union, intersection, complement,

difference, and symmetric difference operations.

This is Stage 1 of the derive-with-ranges plan: the value type only,

with unit tests covering factories, ordering, hashing, hand-picked

instances, and exhaustive de-Morgan / lattice laws over a small

domain (verified by enumerating all 64 subsets).

Integration with seq::derive's path conditions, the OneStep cache,

and the R&psi smart-constructor rewrite are deferred to later stages.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-25 07:50:35 +03:00
Nikolaj Bjorner
9a5089397d better cofactoring 2026-06-24 15:55:30 -07:00
Nikolaj Bjorner
d77fe0b0cd enable distribution of union over intersection
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-23 12:56:23 -07:00
Nikolaj Bjorner
f2e6d05c56 disable hoisting ite over union
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-17 16:26:38 -06:00
Nikolaj Bjorner
29489c3bd8 make concatentation right associative
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-17 12:00:08 -06:00
Nikolaj Bjorner
e011aead11 perf
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-17 10:06:43 -06:00
Nikolaj Bjorner
f261c7732b fix perf
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-17 10:03:52 -06:00
Nikolaj Bjorner
05c394aa6c bug fixes
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-17 09:39:08 -06:00
Nikolaj Bjorner
26feb16714 fix bugs
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-17 09:14:16 -06:00
Margus Veanes
3d2d793f1d fix(seq::derive): remove unsound depth limit in hoist_ite
The single-ITE branch of hoist_ite was gated by 'm_path_stack.size() < 8'.
When the depth limit was reached, hoist_ite returned nullptr and callers
fell back to non-path-aware structural rewrites (mk_union0 / mk_xor0).
These rewrites simplify e.g. mk_union(empty, X) -> X and return X unchanged,
preserving inner ITE branches that were built at an earlier (less constrained)
path. Subsequent derivative computation never path-prunes those branches,
which can leak unreachable epsilon-leaves into the final t-regex and cause
the bisimulation algorithm to report inequivalence for equivalent regexes.

Concrete trigger: derivatives of unions/xors with >= 9 components produce
9-deep ITE chains; at depth 8 the inner ITE returns unprocessed, leaving an
unreachable epsilon-leaf that bisim mis-interprets as a distinguishing word.

Removing the guard restores soundness. The corpus run against
regex-equivalence (1523 files) fixes 22 triangulated soundness bugs
(mut_0013, mut_0241, mut_0257, mut_0301 among others) with zero regressions.
89/89 unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 03:18:15 -07:00
Margus Veanes
0b8bb98656 fix(seq::derive): symbolize top-level cache key to avoid concrete-ele poisoning
`seq::derive::operator()(ele, r)` looks up `m_top_cache` keyed only by the
regex `r`, but on a miss it used to set `m_ele = ele` (a concrete char)
before calling `derive_rec(r)`. The resulting ITE-tree contained
constant-folded `(= ele c)` conditions, so the "symbolic" derivative
stored in the cache was actually specialized to that one ele. Subsequent
calls with the same `r` but a different ele hit the stale cached answer
and the substitution at the bottom was a no-op (no `v0` left to replace).

Simplest victim:
  (str.in_re "aP" (re.++ (re.* "a") "P"))
returned `unsat`. The first call D_'a'(a*P) computed `a*P` and cached it
under key `a*P`; the next call D_'P'(a*P) hit that cache entry and
returned `a*P` instead of epsilon, so the membership check ended on a
non-nullable state.

Fix: set `m_ele = v` (the canonical fresh var) so the derivative is
genuinely symbolic. Concrete-ele callers go through the existing
substitution at the bottom of `operator()`.

Adds a regression test in src/test/seq_regex_bisim.cpp checking that
D_'a'(a*P) is not nullable while D_'P'(a*P) is.

Note: this is independent of the mut_0013 bisim-level unsoundness;
that case still fails and is being tracked separately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 02:28:21 -07:00
Margus Veanes
fb6470a1a1 fix seq::derive::intersect_intervals dropping kept intervals on disjoint case
The intersect_intervals routine in seq_derive.cpp maintains a path-tracking
disjoint union of character intervals.  When intersecting the active suffix
with a new constraint [lo, hi], it iterated the suffix and, on encountering
the first interval disjoint from [lo, hi], reset the output cursor to the
end-marker and broke out of the loop.  This both threw away the intervals it
had already kept and skipped every remaining interval, so e.g.
[(0, 96), (98, max)] intersected with [98, 98] became empty instead of
[(98, 98)].

Inside derive that silently killed valid branches of symbolic derivatives.
For example D(a|b) collapsed to ite(c='a', eps, empty) -- the 'b' branch
vanished -- which made the bisimulation procedure conclude bogus equalities
such as a* == (a|b)*.  On the regex-equivalence corpus this single bug
accounted for ~510 false-unsat results vs master.

Fix: drop only the disjoint interval and continue scanning the rest of the
suffix.  Add a small assertion-based regression test that builds D(a|b),
checks both branches survive, and runs bisim on a* vs (a|b)*.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 01:38:55 -07:00
Nikolaj Bjorner
cf0e43ab38 polish
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-11 14:10:25 -07:00
Nikolaj Bjorner
e21d154778 cleanup, missing case for nullable xor
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 19:57:51 -07:00
Nikolaj Bjorner
21a3a317f2 cleanup
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 19:50:26 -07:00
Nikolaj Bjorner
f48ec128eb cleanup
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 19:43:38 -07:00
Nikolaj Bjorner
898178fbe5 merge with master
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:54:33 -07:00
Nikolaj Bjorner
c9cd5147be merge
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:31:01 -07:00
Nikolaj Bjorner
0c2ed444ca fix build
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:27:48 -07:00
copilot-swe-agent[bot]
1e906ba585 Remove is_nullable_rec from seq_rewriter, delegate to derive::nullable 2026-06-10 15:27:42 -07:00
copilot-swe-agent[bot]
70a9dbfae2 Apply follow-up derive validation fixes 2026-06-10 15:26:46 -07:00
copilot-swe-agent[bot]
bf9707a316 Address PR feedback on derive, nullability, and requested reverts 2026-06-10 15:26:40 -07:00
Nikolaj Bjorner
458878b5e1 cleanup
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:25:06 -07:00
Nikolaj Bjorner
0e29a35da5 updates
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:25:05 -07:00
Nikolaj Bjorner
dbd725bdc0 Refactor merge_union and mk_union_core functions 2026-06-10 15:25:05 -07:00
Nikolaj Bjorner
72575ff962 reuse char extraction from seq_util 2026-06-10 15:25:04 -07:00
Nikolaj Bjorner
867dc175c5 tune and fix derive 2026-06-10 15:25:03 -07:00
Nikolaj Bjorner
61093fadf6 updates to derive 2026-06-10 15:25:03 -07:00
Nikolaj Bjorner
ff8a1034d6 Refactor seq_derive: inline path pruning with ACI normalization
Replace simplify_ite_rec post-hoc pass with inline path pruning:
- push/pop API with lbool return (l_true=implied, l_undef=pushed, l_false=contradicts)
- apply_ite hoists ITE through union/inter/complement with path-aware pruning
- Path-aware caching for mk_union, mk_inter, mk_complement
- Incremental path expression maintenance for cache keys
- Complement always pushes through ITE for same-condition merge
- ACI normalization (flatten/sort/deduplicate) for union base case
- is_subset subsumption prevents unbounded union growth
- Prefix factoring (a·x ∪ a·y = a·(x ∪ y)) for loop derivatives
- seq_rewriter passed as reference to derive class
- Depth-limited single-ITE hoisting (path_stack.size() < 8)
- pred_implies with signed atoms avoids mk_not allocations
- extract_char_range properly checks m_ele identity

Results: 0 timeouts on regression suite (vs 2 on master).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:24:57 -07:00
Nikolaj Bjorner
13d0de42bc tuning 2026-06-10 15:24:06 -07:00
Nikolaj Bjorner
243ebe0660 remove local copies of benchmarks
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:24:05 -07:00
Nikolaj Bjorner
6b218b47dd Delete benchmarks/instance08315.smt2 2026-06-10 15:24:05 -07:00
Nikolaj Bjorner
416c676040 Delete benchmarks/instance08175.smt2 2026-06-10 15:24:04 -07:00
Nikolaj Bjorner
9456297046 tuning simplification processing
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:24:04 -07:00
Nikolaj Bjorner
f42172e65a conservative expansions 2026-06-10 15:24:03 -07:00
Nikolaj Bjorner
98a7992a65 handle more cass with intervals 2026-06-10 15:24:02 -07:00
Nikolaj Bjorner
18a0db9d48 cr updates
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:24:01 -07:00
Nikolaj Bjorner
6b862ddf19 intervals
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:24:01 -07:00
Nikolaj Bjorner
6289b91f17 Add interval-based range simplification for ITE conditions
Introduce exclusion intervals alongside the existing path-based condition
tracking in simplify_ite_rec. The intervals track which character values
are still possible at each point in the ITE tree, enabling simplification
of nested range conditions that the per-entry path approach cannot handle.

Key additions:
- intervals_t type and push_intervals() to maintain live character ranges
- eval_range_cond() checks AND-of-char_le conditions against intervals
- intersect_intervals/exclude_interval utilities from seq_rewriter pattern
- Negated AND handling: ¬(lo<=x ∧ x<=hi) excludes [lo,hi] from intervals

The interval check runs before the existing eval_path_cond logic, catching
cases like: if(0<=x<=10, t, if(1<=x<=8, t2, e2)) → if(0<=x<=10, t, e2)
where the inner range [1,8] is fully contained in the excluded outer range.

Fixes remaining regression timeouts on 5728 P2 and 5731 P4.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:24:00 -07:00
Nikolaj Bjorner
d54d62a07a Fix regression timeouts via range condition simplification
- Simplify trivial range bounds in derive_range: when lo=0, omit
  the lo<=x condition; when hi=max_char, omit the x<=hi condition.
  Full charset ranges return epsilon directly.

- Add char_le(0,x)=true and char_le(x,max)=true to eval_cond for
  always-valid bounds.

- Add range implication logic to simplify_ite_rec: when path has
  negated/positive char_le constraints, detect implied or contradicted
  char_le conditions (e.g., ¬(x<=127) implies 128<=x).

- Add is_subset(a, .+) check: non-nullable regexes are subsets of .+

- In update_state_graph, skip recursive exploration of nullable targets
  to avoid state explosion.

These fixes resolve timeouts on 5724 (all problems), 5721 P1, and 5693.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:23:59 -07:00
Nikolaj Bjorner
88a177f6c5 Fix derivative instability and recursion bugs
- Add top-level cache (m_top_cache) to ensure stable AST node identity
  across repeated derivative calls, preventing state graph divergence
- Add get_head_tail helper for derive_to_re with str.is_unit/str.is_concat
- Add ITE hoisting in mk_union/mk_inter to keep ITEs at top level
- Add De Morgan rule in mk_complement: ~(A∪B) → ~A ∩ ~B
- Add ~ε → .+ simplification in mk_complement
- Add prefix factoring: a·x ∪ a·y = a·(x∪y) and a·x ∩ a·y = a·(x∩y)
- Add r* ∩ .+ = r+ special case in mk_inter
- Enhance is_subset with union/intersection distributivity and complement
- Remove De Morgan from mk_inter to prevent infinite recursion loop

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:23:59 -07:00
Nikolaj Bjorner
c1637ab806 Address PR review: push_path helper, lbool eval_cond, fix year
- Add push_path(path, c, sign) that decomposes conjuncts/disjuncts
- Add simplify_ite_rec(path, c, t, e) helper for cleaner recursion
- Change eval_cond signature to return lbool (l_undef = undetermined)
- Fix copyright year from 2025 to 2026

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:23:58 -07:00
Nikolaj Bjorner
0f50702c9e Address PR review: subsumption, is_value, simplify_ite fixes
- Add lightweight structural is_subset for union/inter simplification
- Use m.is_value instead of is_const_char for swap checks
- Move eval_cond to beginning of simplify_ite_rec
- Use path.shrink(sz) instead of copying extended_path
- Fix normalize_reverse stuck case to return mk_reverse(r)
- Expose subsumes() in public API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:23:57 -07:00
Nikolaj Bjorner
2e3dd32b90 Address PR review comments: cache, simplify_ite_rec, itos
- Cache now indexes by (ele, r) pair using obj_pair_map
- Remove eval() function; operator()(ele, r) handles all cases
- Rewrite simplify_ite_rec with path vector of signed conditions
- Add range-based simplification: (lo <= x, false) + (x <= hi, false)
  eliminates ite(x = v, t, e) when v is outside [lo, hi]
- Add is_itos case in derive_to_re: guards on n >= 0, digit range,
  and first character match
- Port is_reverse normalization (previous commit)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:23:56 -07:00
Nikolaj Bjorner
3a22994b80 Port reverse normalization into derive class
Instead of treating reverse(r) as stuck (returning symbolic mk_derivative),
normalize it by pushing reverse inward through the regex structure, then
compute the derivative of the normalized result. Mirrors mk_re_reverse logic.

Handles: concat, union, intersection, diff, ite, opt, complement, star,
plus, loop, to_re (string literals, units, concats), and symmetric cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:23:56 -07:00
Nikolaj Bjorner
2b06d6ddb2 Add simplify_ite_rec and eval for two-phase derivative
- Add simplify_ite post-processing in operator() to simplify ITE conditions
- Add simplify_ite_rec(cond, sign, r) for propagating condition truth values
- Handles c == cond, x=ch1 vs x=ch2 with different constants
- Add eval(ele, d) for efficient two-phase: symbolic derivative + concrete eval
- mk_derivative uses two-phase pattern: m_derive(r) then m_derive.eval(ele, d)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:23:55 -07:00
Nikolaj Bjorner
a59a7296fb make reset private 2026-06-10 15:23:54 -07:00