Introduce src/ast/rewriter/regex_range_collapse.{h,cpp}, a translator
between the boolean-combination-of-character-class fragment of regexes
and the range_predicate value type added in Stage 2.
Recognized fragment (translates to range_predicate):
re.empty, re.full_char, re.range, re.union, re.intersection, re.diff
of operands recursively in the fragment. Range bounds are accepted in
three encodings: string constant ("a"), seq.unit of a const char
(seq.unit (Char 97)), and length-1 zstring literal.
NOT translated:
re.complement -- this is sequence-level complement (Sigma* \ L), not
character-class complement. Translating it would incorrectly turn
re.comp(re.range "a" "z") into the character class [^a-z], which would
drop the empty string and all length>=2 strings.
Hook the translator into seq_rewriter at mk_re_union0, mk_re_union,
mk_re_inter0, mk_re_inter, and mk_re_diff so that boolean combinations
of character classes always reduce to a single canonical range-set
form. mk_re_complement is intentionally not hooked.
Materialization uses the canonical (seq.unit (Char N)) bound form
(matching the rest of seq_rewriter) and right-associates the union
with operands sorted by expr_id so the result matches the invariant
expected by merge_regex_sets.
Unit tests in src/test/regex_range_collapse.cpp cover the recognized
fragment, the non-translatable cases, and round-trip identity for
multi-range predicates.
Corpus validation on bench/inputs/regex-equivalence (1523 .smt2):
- 0 soundness regressions vs derive baseline.
- Resolves 4 previously-soft-timeout files (now solved correctly).
- Resolves 1 pre-existing wrong answer (mut_0404: master/derive say
unsat, ground-truth annotation and Stage 3 say sat).
- Wall-time: -2.2% vs Stage-3 starting point, -1.5% vs derive.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement a new seq::derive class (seq_derive.h/cpp) that computes
symbolic derivatives of regular expressions using ITE-trees, based on
the RE# approach (Varatalu, Veanes, Ernits - POPL 2025).
Key features:
- Two-argument operator()(ele, r): computes derivative of regex r w.r.t.
element ele (concrete character or de Bruijn variable for symbolic mode)
- ACI canonicalization (flatten, stable_sort, dedup) for union/intersection
- ITE-tree combinators for binary/unary operations
- Info-based nullability with recursive fallback
- Complement absorption rules
- Depth-bounded recursion to prevent stack overflow
Integration with seq_rewriter:
- mk_derivative(ele, r) and mk_derivative(r) now delegate to m_derive
- Removed dead mk_derivative_rec function
- Added ITE hoisting in mk_re_star, mk_re_concat, mk_re_union0,
mk_re_inter0, mk_re_complement
- Added depth limiting in Antimirov derivative helpers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the algorithm of Eq(p,q) = Empty(p XOR q)' using a union-find
driven bisimulation closure (per the CAV'26 ERE paper).
### What's added
* **New primitive OP_RE_XOR (re.xor)** wired through seq_decl_plugin:
parser signature, info propagation (nullable, min_length), and
pretty-printer.
* **seq_rewriter**: structural XOR rewrites ( XOR r = empty, XOR empty =
r, ull XOR r = comp(r), comp/comp absorption, complement push, AC
normalisation), nullability (Null(p XOR q) = Null(p) != Null(q)),
derivative (D_a(p XOR q) = D_a(p) XOR D_a(q)), reverse, antimirov
derivative, and `check_deriv_normal_form` coverage.
* **New class seq::regex_bisim** in
`src/ast/rewriter/seq_regex_bisim.{h,cpp}` to keep the bisim logic out
of the already-large `seq_rewriter.cpp`. Uses `basic_union_find` from
`util/union_find.h`, an `obj_map` for the node assignment, and a
50000-step bound (returns `l_undef` on overrun).
* **Integration** in `seq_rewriter::reduce_re_eq` (with a re-entry
guard) and in `seq_regex::propagate_eq` / `propagate_ne` for ground
regexes; on `l_undef` we fall back to the existing axiomatisation.
* **`sls_seq_plugin`**: extend `OP_RE_DIFF` switch arms to also cover
`OP_RE_XOR`.
### Validation
* Full release build with MSVC + Ninja.
* `./test-z3 /a` -- 89/89 tests passing.
* `./test-z3 /seq smt2print_parse` -- PASS.
* Smoke tests with `(a|b)*` vs `(a*b*)*` (equal) and `a*` vs `(a|b)*`
(not equal) return the expected `sat`/`unsat` quickly.
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement a new seq::derive class (seq_derive.h/cpp) that computes
symbolic derivatives of regular expressions using ITE-trees, based on
the RE# approach (Varatalu, Veanes, Ernits - POPL 2025).
Key features:
- Two-argument operator()(ele, r): computes derivative of regex r w.r.t.
element ele (concrete character or de Bruijn variable for symbolic mode)
- ACI canonicalization (flatten, stable_sort, dedup) for union/intersection
- ITE-tree combinators for binary/unary operations
- Info-based nullability with recursive fallback
- Complement absorption rules
- Depth-bounded recursion to prevent stack overflow
Integration with seq_rewriter:
- mk_derivative(ele, r) and mk_derivative(r) now delegate to m_derive
- Removed dead mk_derivative_rec function
- Added ITE hoisting in mk_re_star, mk_re_concat, mk_re_union0,
mk_re_inter0, mk_re_complement
- Added depth limiting in Antimirov derivative helpers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* improved generated dgml graph
* fixed simplification of negated ranges and did some code cleanup
* do not make loops with lower=upper=0, this is epsilon
* do not add loops with lower=upper=1
* bug fix in normalization: forgotten eps case
* updates related to issue #5140
* updated/simplified some cases
* fixing feedback comments
* fixed comments and added missing case for get_re_head_tail_reversed
* two bug fixes and some other code improvements