mirror of
https://github.com/Z3Prover/z3
synced 2026-07-01 12:58:54 +00:00
Stage 3: collapse boolean combinations of char-class regexes
Introduce src/ast/rewriter/regex_range_collapse.{h,cpp}, a translator
between the boolean-combination-of-character-class fragment of regexes
and the range_predicate value type added in Stage 2.
Recognized fragment (translates to range_predicate):
re.empty, re.full_char, re.range, re.union, re.intersection, re.diff
of operands recursively in the fragment. Range bounds are accepted in
three encodings: string constant ("a"), seq.unit of a const char
(seq.unit (Char 97)), and length-1 zstring literal.
NOT translated:
re.complement -- this is sequence-level complement (Sigma* \ L), not
character-class complement. Translating it would incorrectly turn
re.comp(re.range "a" "z") into the character class [^a-z], which would
drop the empty string and all length>=2 strings.
Hook the translator into seq_rewriter at mk_re_union0, mk_re_union,
mk_re_inter0, mk_re_inter, and mk_re_diff so that boolean combinations
of character classes always reduce to a single canonical range-set
form. mk_re_complement is intentionally not hooked.
Materialization uses the canonical (seq.unit (Char N)) bound form
(matching the rest of seq_rewriter) and right-associates the union
with operands sorted by expr_id so the result matches the invariant
expected by merge_regex_sets.
Unit tests in src/test/regex_range_collapse.cpp cover the recognized
fragment, the non-translatable cases, and round-trip identity for
multi-range predicates.
Corpus validation on bench/inputs/regex-equivalence (1523 .smt2):
- 0 soundness regressions vs derive baseline.
- Resolves 4 previously-soft-timeout files (now solved correctly).
- Resolves 1 pre-existing wrong answer (mut_0404: master/derive say
unsat, ground-truth annotation and Stage 3 say sat).
- Wall-time: -2.2% vs Stage-3 starting point, -1.5% vs derive.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
parent
bcdbc80ab8
commit
c0c826cf5f
8 changed files with 500 additions and 1 deletions
|
|
@ -250,6 +250,14 @@ class seq_rewriter {
|
|||
br_status mk_re_union0(expr* a, expr* b, expr_ref& result);
|
||||
br_status mk_re_inter0(expr* a, expr* b, expr_ref& result);
|
||||
br_status mk_re_complement(expr* a, expr_ref& result);
|
||||
// Range-set collapse helpers: if the operands form a boolean
|
||||
// combination of character-class regexes, materialize the result as a
|
||||
// canonical regex over a single range_predicate. See
|
||||
// ast/rewriter/regex_range_collapse.h for the recognized fragment.
|
||||
// NOTE: re.complement is intentionally not in this set because it
|
||||
// operates at the sequence level, not the character-class level.
|
||||
bool try_collapse_re_union(expr* a, expr* b, expr_ref& result);
|
||||
bool try_collapse_re_inter(expr* a, expr* b, expr_ref& result);
|
||||
br_status mk_re_star(expr* a, expr_ref& result);
|
||||
br_status mk_re_diff(expr* a, expr* b, expr_ref& result);
|
||||
br_status mk_re_xor(expr* a, expr* b, expr_ref& result);
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue