The single-ITE branch of hoist_ite was gated by 'm_path_stack.size() < 8'.
When the depth limit was reached, hoist_ite returned nullptr and callers
fell back to non-path-aware structural rewrites (mk_union0 / mk_xor0).
These rewrites simplify e.g. mk_union(empty, X) -> X and return X unchanged,
preserving inner ITE branches that were built at an earlier (less constrained)
path. Subsequent derivative computation never path-prunes those branches,
which can leak unreachable epsilon-leaves into the final t-regex and cause
the bisimulation algorithm to report inequivalence for equivalent regexes.
Concrete trigger: derivatives of unions/xors with >= 9 components produce
9-deep ITE chains; at depth 8 the inner ITE returns unprocessed, leaving an
unreachable epsilon-leaf that bisim mis-interprets as a distinguishing word.
Removing the guard restores soundness. The corpus run against
regex-equivalence (1523 files) fixes 22 triangulated soundness bugs
(mut_0013, mut_0241, mut_0257, mut_0301 among others) with zero regressions.
89/89 unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
`seq::derive::operator()(ele, r)` looks up `m_top_cache` keyed only by the
regex `r`, but on a miss it used to set `m_ele = ele` (a concrete char)
before calling `derive_rec(r)`. The resulting ITE-tree contained
constant-folded `(= ele c)` conditions, so the "symbolic" derivative
stored in the cache was actually specialized to that one ele. Subsequent
calls with the same `r` but a different ele hit the stale cached answer
and the substitution at the bottom was a no-op (no `v0` left to replace).
Simplest victim:
(str.in_re "aP" (re.++ (re.* "a") "P"))
returned `unsat`. The first call D_'a'(a*P) computed `a*P` and cached it
under key `a*P`; the next call D_'P'(a*P) hit that cache entry and
returned `a*P` instead of epsilon, so the membership check ended on a
non-nullable state.
Fix: set `m_ele = v` (the canonical fresh var) so the derivative is
genuinely symbolic. Concrete-ele callers go through the existing
substitution at the bottom of `operator()`.
Adds a regression test in src/test/seq_regex_bisim.cpp checking that
D_'a'(a*P) is not nullable while D_'P'(a*P) is.
Note: this is independent of the mut_0013 bisim-level unsoundness;
that case still fails and is being tracked separately.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The intersect_intervals routine in seq_derive.cpp maintains a path-tracking
disjoint union of character intervals. When intersecting the active suffix
with a new constraint [lo, hi], it iterated the suffix and, on encountering
the first interval disjoint from [lo, hi], reset the output cursor to the
end-marker and broke out of the loop. This both threw away the intervals it
had already kept and skipped every remaining interval, so e.g.
[(0, 96), (98, max)] intersected with [98, 98] became empty instead of
[(98, 98)].
Inside derive that silently killed valid branches of symbolic derivatives.
For example D(a|b) collapsed to ite(c='a', eps, empty) -- the 'b' branch
vanished -- which made the bisimulation procedure conclude bogus equalities
such as a* == (a|b)*. On the regex-equivalence corpus this single bug
accounted for ~510 false-unsat results vs master.
Fix: drop only the disjoint interval and continue scanning the rest of the
suffix. Add a small assertion-based regression test that builds D(a|b),
checks both branches survive, and runs bisim on a* vs (a|b)*.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce exclusion intervals alongside the existing path-based condition
tracking in simplify_ite_rec. The intervals track which character values
are still possible at each point in the ITE tree, enabling simplification
of nested range conditions that the per-entry path approach cannot handle.
Key additions:
- intervals_t type and push_intervals() to maintain live character ranges
- eval_range_cond() checks AND-of-char_le conditions against intervals
- intersect_intervals/exclude_interval utilities from seq_rewriter pattern
- Negated AND handling: ¬(lo<=x ∧ x<=hi) excludes [lo,hi] from intervals
The interval check runs before the existing eval_path_cond logic, catching
cases like: if(0<=x<=10, t, if(1<=x<=8, t2, e2)) → if(0<=x<=10, t, e2)
where the inner range [1,8] is fully contained in the excluded outer range.
Fixes remaining regression timeouts on 5728 P2 and 5731 P4.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Simplify trivial range bounds in derive_range: when lo=0, omit
the lo<=x condition; when hi=max_char, omit the x<=hi condition.
Full charset ranges return epsilon directly.
- Add char_le(0,x)=true and char_le(x,max)=true to eval_cond for
always-valid bounds.
- Add range implication logic to simplify_ite_rec: when path has
negated/positive char_le constraints, detect implied or contradicted
char_le conditions (e.g., ¬(x<=127) implies 128<=x).
- Add is_subset(a, .+) check: non-nullable regexes are subsets of .+
- In update_state_graph, skip recursive exploration of nullable targets
to avoid state explosion.
These fixes resolve timeouts on 5724 (all problems), 5721 P1, and 5693.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add top-level cache (m_top_cache) to ensure stable AST node identity
across repeated derivative calls, preventing state graph divergence
- Add get_head_tail helper for derive_to_re with str.is_unit/str.is_concat
- Add ITE hoisting in mk_union/mk_inter to keep ITEs at top level
- Add De Morgan rule in mk_complement: ~(A∪B) → ~A ∩ ~B
- Add ~ε → .+ simplification in mk_complement
- Add prefix factoring: a·x ∪ a·y = a·(x∪y) and a·x ∩ a·y = a·(x∩y)
- Add r* ∩ .+ = r+ special case in mk_inter
- Enhance is_subset with union/intersection distributivity and complement
- Remove De Morgan from mk_inter to prevent infinite recursion loop
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add lightweight structural is_subset for union/inter simplification
- Use m.is_value instead of is_const_char for swap checks
- Move eval_cond to beginning of simplify_ite_rec
- Use path.shrink(sz) instead of copying extended_path
- Fix normalize_reverse stuck case to return mk_reverse(r)
- Expose subsumes() in public API
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Cache now indexes by (ele, r) pair using obj_pair_map
- Remove eval() function; operator()(ele, r) handles all cases
- Rewrite simplify_ite_rec with path vector of signed conditions
- Add range-based simplification: (lo <= x, false) + (x <= hi, false)
eliminates ite(x = v, t, e) when v is outside [lo, hi]
- Add is_itos case in derive_to_re: guards on n >= 0, digit range,
and first character match
- Port is_reverse normalization (previous commit)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of treating reverse(r) as stuck (returning symbolic mk_derivative),
normalize it by pushing reverse inward through the regex structure, then
compute the derivative of the normalized result. Mirrors mk_re_reverse logic.
Handles: concat, union, intersection, diff, ite, opt, complement, star,
plus, loop, to_re (string literals, units, concats), and symmetric cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement a new seq::derive class (seq_derive.h/cpp) that computes
symbolic derivatives of regular expressions using ITE-trees, based on
the RE# approach (Varatalu, Veanes, Ernits - POPL 2025).
Key features:
- Two-argument operator()(ele, r): computes derivative of regex r w.r.t.
element ele (concrete character or de Bruijn variable for symbolic mode)
- ACI canonicalization (flatten, stable_sort, dedup) for union/intersection
- ITE-tree combinators for binary/unary operations
- Info-based nullability with recursive fallback
- Complement absorption rules
- Depth-bounded recursion to prevent stack overflow
Integration with seq_rewriter:
- mk_derivative(ele, r) and mk_derivative(r) now delegate to m_derive
- Removed dead mk_derivative_rec function
- Added ITE hoisting in mk_re_star, mk_re_concat, mk_re_union0,
mk_re_inter0, mk_re_complement
- Added depth limiting in Antimirov derivative helpers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The C++ implementation of the fixedpoint engine (in
z3/src/api/api_datalog.cpp) already attempts to read `rlimit` from its
local parameters:
```c++
unsigned rlimit = to_fixedpoint(d)->m_params.get_uint("rlimit", mk_c(c)->get_rlimit());
```
However, because `rlimit` was not registered in the public fp parameter
schema (`fp_params.pyg`), any attempt by clients to set it locally via
`Z3_fixedpoint_set_params` was rejected by the Z3 parameter validator
with an "unknown parameter" error.
Implements the algorithm of Eq(p,q) = Empty(p XOR q)' using a union-find
driven bisimulation closure (per the CAV'26 ERE paper).
### What's added
* **New primitive OP_RE_XOR (re.xor)** wired through seq_decl_plugin:
parser signature, info propagation (nullable, min_length), and
pretty-printer.
* **seq_rewriter**: structural XOR rewrites ( XOR r = empty, XOR empty =
r, ull XOR r = comp(r), comp/comp absorption, complement push, AC
normalisation), nullability (Null(p XOR q) = Null(p) != Null(q)),
derivative (D_a(p XOR q) = D_a(p) XOR D_a(q)), reverse, antimirov
derivative, and `check_deriv_normal_form` coverage.
* **New class seq::regex_bisim** in
`src/ast/rewriter/seq_regex_bisim.{h,cpp}` to keep the bisim logic out
of the already-large `seq_rewriter.cpp`. Uses `basic_union_find` from
`util/union_find.h`, an `obj_map` for the node assignment, and a
50000-step bound (returns `l_undef` on overrun).
* **Integration** in `seq_rewriter::reduce_re_eq` (with a re-entry
guard) and in `seq_regex::propagate_eq` / `propagate_ne` for ground
regexes; on `l_undef` we fall back to the existing axiomatisation.
* **`sls_seq_plugin`**: extend `OP_RE_DIFF` switch arms to also cover
`OP_RE_XOR`.
### Validation
* Full release build with MSVC + Ninja.
* `./test-z3 /a` -- 89/89 tests passing.
* `./test-z3 /seq smt2print_parse` -- PASS.
* Smoke tests with `(a|b)*` vs `(a*b*)*` (equal) and `a*` vs `(a|b)*`
(not equal) return the expected `sat`/`unsat` quickly.
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>