3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-05-20 08:59:34 +00:00
z3/research/docs/nseq-issues/09-crash-word-equation-regex.md
Copilot f518faac9b
Add nseq issue drafts from Ostrich benchmark analysis (discussion #9071) (#9073)
Agent-Logs-Url: https://github.com/Z3Prover/z3/sessions/04321ea7-2a53-4ed5-9f43-816dc6f7476b

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
2026-03-20 19:07:51 -07:00

117 lines
4.8 KiB
Markdown

# [nseq] Crashes: word equation + regex benchmarks (various operations)
**Labels**: bug, c3, nseq, crash
## Summary
The nseq solver crashes or errors on a variety of benchmarks involving combinations of
word equations with regex constraints, contains, replace_all, and SLIA arithmetic.
These represent missing or incomplete operation support in the nseq solver.
## Affected benchmarks
| File | seq verdict | nseq verdict | Operations involved |
|------|-------------|--------------|---------------------|
| `concat-001.smt2` | sat | **bug/crash** | word equation + regex (old syntax) |
| `contains-1.smt2` | unsat | **bug/crash** | contains in equality context |
| `contains-7.smt2` | sat | **bug/crash** | str.replace_all |
| `nonlinear-2.smt2` | sat | **bug/crash** | `a = b ++ b` + regex |
| `noodles-unsat8.smt2` | unknown | **bug/crash** | complex word equation |
| `noodles-unsat10.smt2` | unsat | **bug/crash** | complex word equation |
| `norn-benchmark-9i.smt2` | sat | **bug/crash** | old-syntax regex + complex |
| `pcp-1.smt2` | sat | **bug/crash** | SLIA + complex string ops |
Data from: https://github.com/Z3Prover/z3/discussions/9071
## Reproducing examples
```smt2
; concat-001.smt2 — EXPECTED: sat, nseq crashes
; Uses old-style str.in.re syntax and unambiguous regex constraints
(declare-const x String)
(declare-const y String)
(declare-const z String)
(assert (str.in.re x (re.* (str.to.re "ab"))))
(assert (str.in.re y (re.* (str.to.re "c"))))
(assert (str.in.re z (str.to.re "cccc")))
(assert (= z (str.++ x y)))
(check-sat)
```
```smt2
; contains-1.smt2 — EXPECTED: unsat, nseq crashes
; str.contains used inside an equality with a boolean literal
(set-logic QF_S)
(declare-const x String)
(assert (not (= (str.contains "A" x) true)))
(check-sat)
```
```smt2
; contains-7.smt2 — EXPECTED: sat, nseq crashes
; str.replace_all used with regex constraints
(set-logic QF_S)
(declare-const x String)
(declare-const y String)
(declare-const z String)
(declare-const res String)
(assert (= res (str.replace_all x y z)))
(assert (str.in_re x (re.* (str.to_re "ab"))))
(assert (not (str.in_re y (re.* (str.to_re "ab")))))
(assert (not (= res "")))
(check-sat)
```
```smt2
; nonlinear-2.smt2 — EXPECTED: sat, nseq crashes
; Non-linear word equation: a = b ++ b (self-concatenation)
(set-logic QF_S)
(declare-fun a () String)
(declare-fun b () String)
(assert (= a (str.++ b b)))
(assert (str.in.re a (re.++ (re.* (str.to.re "(")) (re.+ (str.to.re ")")))))
(check-sat)
```
## Analysis of individual crash causes
### concat-001.smt2
Uses the old SMT-LIB `str.in.re` / `str.to.re` syntax with a word equation
`z = x ++ y` where all three variables have concrete regex constraints. If the nseq
solver's parser or internalization does not correctly handle the old-style `str.to.re`
synonym, the membership constraint may not be registered correctly.
### contains-1.smt2
The formula `(not (= (str.contains "A" x) true))` creates a Boolean atom of the form
`(= <bool-expr> true)` rather than the direct `str.contains` predicate.
In `assign_eh`, the direct check `is_contains(e)` will NOT match `(= (str.contains "A" x) true)`.
The catch-all `push_unhandled_pred()` fires, but the outer negation means the atom
`(= (str.contains "A" x) true)` is assigned to false. This should cause `FC_GIVEUP`,
but may manifest as a crash if the solver mishandles the boolean variable assignment.
**Fix**: Normalize `(= <bool-expr> true/false)` patterns in `assign_eh` or during
internalization, or add specific handling for `str.contains` embedded in boolean equalities.
### contains-7.smt2
Uses `str.replace_all` which is handled by `replace_all_axiom`. The crash may occur
when `replace_all_axiom` interacts with regex constraints on the arguments in ways
the Nielsen graph doesn't support.
### nonlinear-2.smt2
The word equation `a = b ++ b` is a non-linear equation (variable `b` appears twice
on the RHS). The Nielsen graph supports some non-linear equations, but this specific
combination with a regex constraint `a ∈ (re.* "(") ++ (re.+ ")")` — which constrains
characters — may expose a gap in the non-linear handling.
### noodles-unsat8 / noodles-unsat10
These are word equations from the Oliver Markgraf OSTRICH benchmark set. They involve
multiple variables with complex concatenation patterns and regex constraints. The crashes
suggest that the combination of 3+ variable word equations with disjunctive regex
constraints exceeds what the current nseq Nielsen graph implementation can handle.
## Files to investigate
- `src/smt/theory_nseq.cpp``assign_eh` (boolean-equality pattern for contains), `internalize_atom`
- `src/ast/rewriter/seq_axioms.cpp``replace_all_axiom`, contains axioms
- `src/smt/seq/seq_nielsen.cpp` — non-linear equation handling, `generate_extensions`
- `src/ast/seq_decl_plugin.cpp` — old-style synonym mapping (str.in.re / str.to.re)