3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-06-19 23:26:30 +00:00
Commit graph

1 commit

Author SHA1 Message Date
Margus Veanes
c0c826cf5f Stage 3: collapse boolean combinations of char-class regexes
Introduce src/ast/rewriter/regex_range_collapse.{h,cpp}, a translator
between the boolean-combination-of-character-class fragment of regexes
and the range_predicate value type added in Stage 2.

Recognized fragment (translates to range_predicate):
  re.empty, re.full_char, re.range, re.union, re.intersection, re.diff
of operands recursively in the fragment.  Range bounds are accepted in
three encodings: string constant ("a"), seq.unit of a const char
(seq.unit (Char 97)), and length-1 zstring literal.

NOT translated:
  re.complement -- this is sequence-level complement (Sigma* \ L), not
  character-class complement.  Translating it would incorrectly turn
  re.comp(re.range "a" "z") into the character class [^a-z], which would
  drop the empty string and all length>=2 strings.

Hook the translator into seq_rewriter at mk_re_union0, mk_re_union,
mk_re_inter0, mk_re_inter, and mk_re_diff so that boolean combinations
of character classes always reduce to a single canonical range-set
form.  mk_re_complement is intentionally not hooked.

Materialization uses the canonical (seq.unit (Char N)) bound form
(matching the rest of seq_rewriter) and right-associates the union
with operands sorted by expr_id so the result matches the invariant
expected by merge_regex_sets.

Unit tests in src/test/regex_range_collapse.cpp cover the recognized
fragment, the non-translatable cases, and round-trip identity for
multi-range predicates.

Corpus validation on bench/inputs/regex-equivalence (1523 .smt2):
- 0 soundness regressions vs derive baseline.
- Resolves 4 previously-soft-timeout files (now solved correctly).
- Resolves 1 pre-existing wrong answer (mut_0404: master/derive say
  unsat, ground-truth annotation and Stage 3 say sat).
- Wall-time: -2.2% vs Stage-3 starting point, -1.5% vs derive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 03:28:01 -07:00