3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-07-01 12:58:54 +00:00

Stage 3: collapse boolean combinations of char-class regexes

Introduce src/ast/rewriter/regex_range_collapse.{h,cpp}, a translator
between the boolean-combination-of-character-class fragment of regexes
and the range_predicate value type added in Stage 2.

Recognized fragment (translates to range_predicate):
  re.empty, re.full_char, re.range, re.union, re.intersection, re.diff
of operands recursively in the fragment.  Range bounds are accepted in
three encodings: string constant ("a"), seq.unit of a const char
(seq.unit (Char 97)), and length-1 zstring literal.

NOT translated:
  re.complement -- this is sequence-level complement (Sigma* \ L), not
  character-class complement.  Translating it would incorrectly turn
  re.comp(re.range "a" "z") into the character class [^a-z], which would
  drop the empty string and all length>=2 strings.

Hook the translator into seq_rewriter at mk_re_union0, mk_re_union,
mk_re_inter0, mk_re_inter, and mk_re_diff so that boolean combinations
of character classes always reduce to a single canonical range-set
form.  mk_re_complement is intentionally not hooked.

Materialization uses the canonical (seq.unit (Char N)) bound form
(matching the rest of seq_rewriter) and right-associates the union
with operands sorted by expr_id so the result matches the invariant
expected by merge_regex_sets.

Unit tests in src/test/regex_range_collapse.cpp cover the recognized
fragment, the non-translatable cases, and round-trip identity for
multi-range predicates.

Corpus validation on bench/inputs/regex-equivalence (1523 .smt2):
- 0 soundness regressions vs derive baseline.
- Resolves 4 previously-soft-timeout files (now solved correctly).
- Resolves 1 pre-existing wrong answer (mut_0404: master/derive say
  unsat, ground-truth annotation and Stage 3 say sat).
- Wall-time: -2.2% vs Stage-3 starting point, -1.5% vs derive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Margus Veanes 2026-06-14 18:49:31 -07:00
parent bcdbc80ab8
commit c0c826cf5f
8 changed files with 500 additions and 1 deletions

View file

@ -250,6 +250,14 @@ class seq_rewriter {
br_status mk_re_union0(expr* a, expr* b, expr_ref& result);
br_status mk_re_inter0(expr* a, expr* b, expr_ref& result);
br_status mk_re_complement(expr* a, expr_ref& result);
// Range-set collapse helpers: if the operands form a boolean
// combination of character-class regexes, materialize the result as a
// canonical regex over a single range_predicate. See
// ast/rewriter/regex_range_collapse.h for the recognized fragment.
// NOTE: re.complement is intentionally not in this set because it
// operates at the sequence level, not the character-class level.
bool try_collapse_re_union(expr* a, expr* b, expr_ref& result);
bool try_collapse_re_inter(expr* a, expr* b, expr_ref& result);
br_status mk_re_star(expr* a, expr_ref& result);
br_status mk_re_diff(expr* a, expr* b, expr_ref& result);
br_status mk_re_xor(expr* a, expr* b, expr_ref& result);