3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-06-30 12:28:53 +00:00
Commit graph

149 commits

Author SHA1 Message Date
Margus Veanes
c0c826cf5f Stage 3: collapse boolean combinations of char-class regexes
Introduce src/ast/rewriter/regex_range_collapse.{h,cpp}, a translator
between the boolean-combination-of-character-class fragment of regexes
and the range_predicate value type added in Stage 2.

Recognized fragment (translates to range_predicate):
  re.empty, re.full_char, re.range, re.union, re.intersection, re.diff
of operands recursively in the fragment.  Range bounds are accepted in
three encodings: string constant ("a"), seq.unit of a const char
(seq.unit (Char 97)), and length-1 zstring literal.

NOT translated:
  re.complement -- this is sequence-level complement (Sigma* \ L), not
  character-class complement.  Translating it would incorrectly turn
  re.comp(re.range "a" "z") into the character class [^a-z], which would
  drop the empty string and all length>=2 strings.

Hook the translator into seq_rewriter at mk_re_union0, mk_re_union,
mk_re_inter0, mk_re_inter, and mk_re_diff so that boolean combinations
of character classes always reduce to a single canonical range-set
form.  mk_re_complement is intentionally not hooked.

Materialization uses the canonical (seq.unit (Char N)) bound form
(matching the rest of seq_rewriter) and right-associates the union
with operands sorted by expr_id so the result matches the invariant
expected by merge_regex_sets.

Unit tests in src/test/regex_range_collapse.cpp cover the recognized
fragment, the non-translatable cases, and round-trip identity for
multi-range predicates.

Corpus validation on bench/inputs/regex-equivalence (1523 .smt2):
- 0 soundness regressions vs derive baseline.
- Resolves 4 previously-soft-timeout files (now solved correctly).
- Resolves 1 pre-existing wrong answer (mut_0404: master/derive say
  unsat, ground-truth annotation and Stage 3 say sat).
- Wall-time: -2.2% vs Stage-3 starting point, -1.5% vs derive.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-15 03:28:01 -07:00
Nikolaj Bjorner
898178fbe5 merge with master
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:54:33 -07:00
Nikolaj Bjorner
c9cd5147be merge
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:31:01 -07:00
copilot-swe-agent[bot]
1e906ba585 Remove is_nullable_rec from seq_rewriter, delegate to derive::nullable 2026-06-10 15:27:42 -07:00
copilot-swe-agent[bot]
bf9707a316 Address PR feedback on derive, nullability, and requested reverts 2026-06-10 15:26:40 -07:00
Nikolaj Bjorner
0e29a35da5 updates
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-10 15:25:05 -07:00
Nikolaj Bjorner
ff8a1034d6 Refactor seq_derive: inline path pruning with ACI normalization
Replace simplify_ite_rec post-hoc pass with inline path pruning:
- push/pop API with lbool return (l_true=implied, l_undef=pushed, l_false=contradicts)
- apply_ite hoists ITE through union/inter/complement with path-aware pruning
- Path-aware caching for mk_union, mk_inter, mk_complement
- Incremental path expression maintenance for cache keys
- Complement always pushes through ITE for same-condition merge
- ACI normalization (flatten/sort/deduplicate) for union base case
- is_subset subsumption prevents unbounded union growth
- Prefix factoring (a·x ∪ a·y = a·(x ∪ y)) for loop derivatives
- seq_rewriter passed as reference to derive class
- Depth-limited single-ITE hoisting (path_stack.size() < 8)
- pred_implies with signed atoms avoids mk_not allocations
- extract_char_range properly checks m_ele identity

Results: 0 timeouts on regression suite (vs 2 on master).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:24:57 -07:00
Nikolaj Bjorner
e74d2d2151 move seq_derive and fix include paths, remove antimirov code 2026-06-10 15:23:46 -07:00
Nikolaj Bjorner
52c7e89c31 Add seq::derive class for symbolic regex derivatives
Implement a new seq::derive class (seq_derive.h/cpp) that computes
symbolic derivatives of regular expressions using ITE-trees, based on
the RE# approach (Varatalu, Veanes, Ernits - POPL 2025).

Key features:
- Two-argument operator()(ele, r): computes derivative of regex r w.r.t.
  element ele (concrete character or de Bruijn variable for symbolic mode)
- ACI canonicalization (flatten, stable_sort, dedup) for union/intersection
- ITE-tree combinators for binary/unary operations
- Info-based nullability with recursive fallback
- Complement absorption rules
- Depth-bounded recursion to prevent stack overflow

Integration with seq_rewriter:
- mk_derivative(ele, r) and mk_derivative(r) now delegate to m_derive
- Removed dead mk_derivative_rec function
- Added ITE hoisting in mk_re_star, mk_re_concat, mk_re_union0,
  mk_re_inter0, mk_re_complement
- Added depth limiting in Antimirov derivative helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 15:21:48 -07:00
Margus Veanes
513b81253b
Add OP_RE_XOR and union-find bisimulation for ground regex equivalence (#9804)
Implements the algorithm of Eq(p,q) = Empty(p XOR q)' using a union-find
driven bisimulation closure (per the CAV'26 ERE paper).

### What's added

* **New primitive OP_RE_XOR (re.xor)** wired through seq_decl_plugin:
parser signature, info propagation (nullable, min_length), and
pretty-printer.
* **seq_rewriter**: structural XOR rewrites ( XOR r = empty, XOR empty =
r, ull XOR r = comp(r), comp/comp absorption, complement push, AC
normalisation), nullability (Null(p XOR q) = Null(p) != Null(q)),
derivative (D_a(p XOR q) = D_a(p) XOR D_a(q)), reverse, antimirov
derivative, and `check_deriv_normal_form` coverage.
* **New class seq::regex_bisim** in
`src/ast/rewriter/seq_regex_bisim.{h,cpp}` to keep the bisim logic out
of the already-large `seq_rewriter.cpp`. Uses `basic_union_find` from
`util/union_find.h`, an `obj_map` for the node assignment, and a
50000-step bound (returns `l_undef` on overrun).
* **Integration** in `seq_rewriter::reduce_re_eq` (with a re-entry
guard) and in `seq_regex::propagate_eq` / `propagate_ne` for ground
regexes; on `l_undef` we fall back to the existing axiomatisation.
* **`sls_seq_plugin`**: extend `OP_RE_DIFF` switch arms to also cover
`OP_RE_XOR`.

### Validation

* Full release build with MSVC + Ninja.
* `./test-z3 /a` -- 89/89 tests passing.
* `./test-z3 /seq smt2print_parse` -- PASS.
* Smoke tests with `(a|b)*` vs `(a*b*)*` (equal) and `a*` vs `(a|b)*`
(not equal) return the expected `sat`/`unsat` quickly.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-10 14:58:20 -07:00
copilot-swe-agent[bot]
b6a29b800b
Remove is_nullable_rec from seq_rewriter, delegate to derive::nullable 2026-06-10 18:53:55 +00:00
copilot-swe-agent[bot]
00fcd3a36d
Address PR feedback on derive, nullability, and requested reverts 2026-06-10 18:18:46 +00:00
Nikolaj Bjorner
77ac58484f updates
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-09 17:42:11 -07:00
Nikolaj Bjorner
f02391a01f Merge branch 'master' of https://github.com/z3prover/z3 into derive 2026-06-09 15:18:29 -07:00
Copilot
f0956a622f
Refactor regex subset logic into seq_subset with depth-bounded recursion and optimized concat traversal (#9777)
`seq_rewriter::is_subset` was too localized and missed key subset
implications for regex concatenations. This change extracts subset
reasoning into a dedicated component and adds heuristic
closure/monotonicity rules, then tunes the recursion strategy based on
profiling feedback.

- **Architecture: isolate subset reasoning**
  - Introduce `seq_subset` in `src/ast/rewriter` (`seq_subset.h/.cpp`).
- Add `seq_subset` as an attribute on `seq_rewriter` and route
`seq_rewriter::is_subset` through it.
- Keep `seq_rewriter` focused on rewrite orchestration while subset
logic evolves independently.

- **Subset rules: broaden inferable cases**
- Add derive-style subset decomposition across `union`, `intersection`,
`complement`, `concat`, and bounded `loop`.
  - Add E3-style closure rules:
    - `R ⊆ R*`
    - `R1* ⊆ R2*  ⇐  R1 ⊆ R2`
    - `R1+ ⊆ R2+  ⇐  R1 ⊆ R2`
  - Add missing cheap cases:
    - `ε ⊆ R` when `R` is nullable
    - `R ⊆ R+`
    - `R+ ⊆ R*`
    - Range containment: `[c1–c2] ⊆ [c3–c4]` when `c3 ≤ c1 ∧ c2 ≤ c4`
    - `to_re(s) ⊆ range` for single-character string constants
    - Difference monotonicity: `a1 \ a2 ⊆ b` when `a1 ⊆ b`
- Star absorption checks for concat/star combinations (`R·R* ⊆ R*`,
`R*·R ⊆ R*`)
- Preserve nullable-based `. +` handling and top/bottom regular-language
shortcuts.

- **Concatenation reasoning and traversal tuning**
- Remove `flatten_concat` and assume right-associative concatenation
traversal.
- Keep containment shortcuts for both `R ⊆ Σ*·R'` and `R ⊆ R'·Σ*` when
`R ⊆ R'`.
  - Make concat/concat handling tail-recursive on second arguments.

- **Depth-bounded recursion (profiling follow-up)**
- Replace visited-pair hash-table recursion state with an explicit depth
parameter in `is_subset_rec`.
  - Add `m_max_depth = 3` and return `false` when the bound is reached.
- Increment depth on recursive calls, except for the tail-recursive
concat-second-argument step.

- **Build integration**
  - Register `seq_subset.cpp` in `src/ast/rewriter/CMakeLists.txt`.

```cpp
// seq_rewriter.cpp
bool seq_rewriter::is_subset(expr* r1, expr* r2) const {
    return m_subset.is_subset(r1, r2);
}
```

---------

Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-06-09 13:42:28 -07:00
Nikolaj Bjorner
8deac03ca8 Refactor seq_derive: inline path pruning with ACI normalization
Replace simplify_ite_rec post-hoc pass with inline path pruning:
- push/pop API with lbool return (l_true=implied, l_undef=pushed, l_false=contradicts)
- apply_ite hoists ITE through union/inter/complement with path-aware pruning
- Path-aware caching for mk_union, mk_inter, mk_complement
- Incremental path expression maintenance for cache keys
- Complement always pushes through ITE for same-condition merge
- ACI normalization (flatten/sort/deduplicate) for union base case
- is_subset subsumption prevents unbounded union growth
- Prefix factoring (a·x ∪ a·y = a·(x ∪ y)) for loop derivatives
- seq_rewriter passed as reference to derive class
- Depth-limited single-ITE hoisting (path_stack.size() < 8)
- pred_implies with signed atoms avoids mk_not allocations
- extract_char_range properly checks m_ele identity

Results: 0 timeouts on regression suite (vs 2 on master).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-08 20:43:43 -07:00
Nikolaj Bjorner
cb2cf913e3 move seq_derive and fix include paths, remove antimirov code 2026-06-03 11:04:19 -07:00
Nikolaj Bjorner
1f28fd0e6b Add seq::derive class for symbolic regex derivatives
Implement a new seq::derive class (seq_derive.h/cpp) that computes
symbolic derivatives of regular expressions using ITE-trees, based on
the RE# approach (Varatalu, Veanes, Ernits - POPL 2025).

Key features:
- Two-argument operator()(ele, r): computes derivative of regex r w.r.t.
  element ele (concrete character or de Bruijn variable for symbolic mode)
- ACI canonicalization (flatten, stable_sort, dedup) for union/intersection
- ITE-tree combinators for binary/unary operations
- Info-based nullability with recursive fallback
- Complement absorption rules
- Depth-bounded recursion to prevent stack overflow

Integration with seq_rewriter:
- mk_derivative(ele, r) and mk_derivative(r) now delegate to m_derive
- Removed dead mk_derivative_rec function
- Added ITE hoisting in mk_re_star, mk_re_concat, mk_re_union0,
  mk_re_inter0, mk_re_complement
- Added depth limiting in Antimirov derivative helpers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-03 10:36:19 -07:00
Nikolaj Bjorner
fcd3a70c92 remove theory_str and classes that are only used by it 2025-08-07 21:05:12 -07:00
Nikolaj Bjorner
99ec42c0d7 additional simplifications to seq 2025-03-19 08:57:31 -10:00
Nikolaj Bjorner
c6f58c8bf7 updates to some_string_in_re per code review comments
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2025-01-11 17:47:27 -08:00
Clemens Eisenhofer
c572fc2e4f
Regex membership (#7506)
* Make finding a word in the regex iterative

* Fixed gc problem
2025-01-11 17:41:37 -08:00
Nikolaj Bjorner
2dd4faf598 sketch expr_inverter approach for eliminating unconstrained regex containment. 2025-01-07 16:53:57 -08:00
Bruce Mitchener
53f89a81c1
Fix some typos. (#7115) 2024-02-07 23:06:43 -08:00
Nikolaj Bjorner
d1c7ff1a36 add unconstrained elimination for sequences 2023-03-20 17:07:04 +01:00
Nikolaj Bjorner
a409a4a677 enforce flat within QF_BV tactic, cap in-processing var-elim loops 2022-10-27 20:10:55 -07:00
Bruce Mitchener
5014b1a34d Use = default for virtual constructors. 2022-08-05 18:11:46 +03:00
Nikolaj Bjorner
4d8e4b5bd3 fix #6052
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2022-05-25 17:21:01 -04:00
Nikolaj Bjorner
c850259f89 rw 2022-05-22 07:54:27 -04:00
Nikolaj Bjorner
87d2a3b4e5 map/mapi/foldl/foldli
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2022-05-04 01:10:18 -07:00
Nikolaj Bjorner
773e829c58 #5804 2022-01-31 10:16:09 -08:00
Margus Veanes
5afb95b34a
improved subset checking for regexes with counters (#5731) 2021-12-22 17:53:34 -08:00
Margus Veanes
a7b1db611c
State graph dgml update and fixes in condition simplifier (#5721)
* improved generated dgml graph

* fixed simplification of negated ranges and did some code cleanup

* do not make loops with lower=upper=0, this is epsilon

* do not add loops with lower=upper=1

* bug fix in normalization: forgotten eps case
2021-12-19 11:09:55 -08:00
Margus Veanes
a288f9048a
Update regex union and intersection to maintain ANF (#5717)
* added merge for unions and intersections

* added normalization rules to ensure ANF

* fixing PR comments related to merge
2021-12-16 19:19:36 -08:00
Margus Veanes
2be93870c8
Cleanup regex info and some fixes in Derivative code (#5709)
* removed unused regex info fields

* cleanup of info and fixes in antimirov derivatives

* removed extra qualification on operator
2021-12-15 10:59:34 -08:00
Nikolaj Bjorner
51fa40ece5 fix spelling 2021-12-09 10:23:37 -08:00
Margus Veanes
146f4621c5
Updated regex derivative engine (#5567)
* updated derivative engine

* some edit

* further improvements in derivative code

* more deriv code edits and re::to_str update

* optimized mk_deriv_accept

* fixed PR comments

* small syntax fix

* updated some simplifications

* bugfix:forgot to_re before reverse

* fixed PR comments

* more PR comment fixes

* more PR comment fixes

* forgot to delete

* deleting unused definition

* fixes

Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>

* fixes

Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>

Co-authored-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2021-10-08 13:04:49 -07:00
Margus Veanes
225204e2f4
updates related to issue #5140 (#5463)
* updates related to issue #5140

* updated/simplified some cases

* fixing feedback comments

* fixed comments and added missing case for get_re_head_tail_reversed

* two bug fixes and some other code improvements
2021-08-09 10:48:56 -07:00
CEisenhofer
0fa4b63d26
Added sbv2s (#5413)
* Added sbv2s

* Fixed indention

Co-authored-by: Clemens Eisenhofer <Clemens.Eisenhofer@tuwien.ac.at>
2021-07-16 17:58:28 +02:00
Nikolaj Bjorner
1bc10cebc5 add ubv2s step 1 2021-07-12 12:53:00 +02:00
Nikolaj Bjorner
4a6083836a call it data instead of c_ptr for approaching C++11 std::vector convention. 2021-04-13 18:17:35 -07:00
Nikolaj Bjorner
804f065215 fixes for #4688
https://github.com/Z3Prover/z3/issues/4866#issuecomment-778721073
2021-04-11 17:42:12 -07:00
Nuno Lopes
c47ab023e5 remove a few trivial destructors so they get inlined 2021-04-04 17:13:59 +01:00
Nikolaj Bjorner
377d060036 move to separate axiom management
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2021-02-23 18:09:45 -08:00
Nikolaj Bjorner
27584d68db more rewrite rules
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2021-02-18 22:14:53 -08:00
Nikolaj Bjorner
b2eb248bad fixes, fix #5034
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2021-02-18 16:47:44 -08:00
Nikolaj Bjorner
9ae3339c33 fixes
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2021-02-18 12:33:17 -08:00
Nikolaj Bjorner
e63dc7efc2 more rewrite rules 2021-02-17 17:32:00 -08:00
Nikolaj Bjorner
70b4822571 patch seq theory using purification to avoid unsoundness caused by interaction with canonization and rewriting
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2021-02-14 17:41:06 -08:00
Nikolaj Bjorner
3ae4c6e9de refactor get_sort 2021-02-02 04:45:54 -08:00