mirror of
https://github.com/Z3Prover/z3
synced 2026-04-20 02:43:35 +00:00
Agent-Logs-Url: https://github.com/Z3Prover/z3/sessions/04321ea7-2a53-4ed5-9f43-816dc6f7476b Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
67 lines
2.6 KiB
Markdown
67 lines
2.6 KiB
Markdown
# [nseq] Crash: non-greedy regex quantifiers (re.+?, re.*?) cause exception
|
|
|
|
**Labels**: bug, c3, nseq, crash
|
|
|
|
## Summary
|
|
|
|
The nseq solver crashes (throws an exception or returns an error) when the input
|
|
formula contains non-greedy regex quantifiers such as `re.+?` (lazy plus) or `re.*?`
|
|
(lazy star). The seq solver handles these correctly by treating them as equivalent to
|
|
their greedy counterparts for satisfiability purposes.
|
|
|
|
## Affected benchmark
|
|
|
|
| File | seq verdict | nseq verdict |
|
|
|------|-------------|--------------|
|
|
| `non-greedy-quantifiers.smt2` | sat | **bug/crash** |
|
|
|
|
Data from: https://github.com/Z3Prover/z3/discussions/9071
|
|
|
|
## Reproducing example
|
|
|
|
```smt2
|
|
; non-greedy-quantifiers.smt2 — EXPECTED: sat, nseq crashes
|
|
(set-logic QF_S)
|
|
(declare-fun a () String)
|
|
(declare-fun b () String)
|
|
(declare-fun c () String)
|
|
(declare-fun d () String)
|
|
(assert (= a (str.++ b c)))
|
|
(assert (= b (str.++ d c)))
|
|
; non-greedy re.+? has same semantics as re.+ for satisfiability
|
|
(assert (str.in.re a (re.+ (re.union (str.to.re "x") (str.to.re "y")))))
|
|
(assert (str.in.re c (re.+? (str.to.re "x"))))
|
|
(check-sat)
|
|
(get-model)
|
|
```
|
|
|
|
## Analysis
|
|
|
|
Z3's SMT-LIB parser supports non-greedy regex quantifiers (`re.+?`, `re.*?`, `re.??`)
|
|
as part of the ECMA-like regex extension. For satisfiability checking, non-greedy
|
|
quantifiers are semantically equivalent to their greedy counterparts (they only differ
|
|
in which match is returned, not in what strings are accepted).
|
|
|
|
The nseq solver's regex handling infrastructure (`seq_regex.cpp`) does not appear to
|
|
recognize or normalize non-greedy quantifiers. When the symbol `re.+?` is encountered
|
|
in a membership constraint, the solver either:
|
|
1. Does not recognize the operator and triggers `push_unhandled_pred()`, causing `FC_GIVEUP`
|
|
(which may manifest as an error/"bug" in the benchmark runner), or
|
|
2. Crashes due to an unhandled case in the Brzozowski derivative computation or
|
|
Parikh constraint generation.
|
|
|
|
### Fix proposal
|
|
|
|
Non-greedy quantifiers should be rewritten to their greedy equivalents before processing:
|
|
- `re.+? e` → `re.+ e` (lazy plus → greedy plus)
|
|
- `re.*? e` → `re.* e` (lazy star → greedy star)
|
|
- `re.?? e` → `re.opt e` (lazy optional → greedy optional)
|
|
|
|
This rewrite is sound for satisfiability/unsatisfiability since both interpretations
|
|
accept exactly the same language.
|
|
|
|
## Files to investigate
|
|
|
|
- `src/smt/seq/seq_regex.cpp` — regex preprocessing and derivative computation
|
|
- `src/ast/rewriter/seq_rewriter.cpp` — regex rewriting rules (add non-greedy normalization)
|
|
- `src/smt/theory_nseq.cpp` — `internalize_atom` / `assign_eh` for regex membership
|