3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-05-21 01:19:34 +00:00
z3/research/docs/nseq-issues/05-soundness-conflicting-regex.md
Copilot f518faac9b
Add nseq issue drafts from Ostrich benchmark analysis (discussion #9071) (#9073)
Agent-Logs-Url: https://github.com/Z3Prover/z3/sessions/04321ea7-2a53-4ed5-9f43-816dc6f7476b

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
2026-03-20 19:07:51 -07:00

65 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# [nseq] Soundness bug: conflicting regex memberships not detected (norn-benchmark-9f)
**Labels**: bug, c3, nseq, soundness
## Summary
The nseq solver returns `sat` for a formula asserting that a non-empty string belongs
simultaneously to two disjoint regular languages. The correct answer is `unsat`.
## Affected benchmark
| File | seq verdict | nseq verdict |
|------|-------------|--------------|
| `norn-benchmark-9f.smt2` | unsat | **sat** (WRONG) |
Data from: https://github.com/Z3Prover/z3/discussions/9071
## Reproducing example
```smt2
; norn-benchmark-9f.smt2 — EXPECTED: unsat, nseq returns: sat
(set-logic QF_S)
(declare-fun var_0 () String)
(assert (str.in.re var_0 (re.* (re.range "a" "u"))))
(assert (str.in.re var_0 (re.* (str.to.re "v"))))
(assert (not (= var_0 "")))
(check-sat)
```
The formula asserts:
1. `var_0 ∈ (re.range "a" "u")* ` — all characters in "a""u"
2. `var_0 ∈ "v"*` — only the character "v"
3. `var_0 ≠ ""`
Since "v" is not in the range "a""u" (the range includes up to 'u', not 'v'),
the intersection of the two languages is `{""}`. Combined with `var_0 ≠ ""`, this is
unsatisfiable.
## Note on syntax
This benchmark uses the old SMT-LIB 2.5 syntax `str.in.re` (with dots) rather than
the SMT-LIB 2.6 syntax `str.in_re` (with underscore). Both are supported by Z3's parser.
The bug may be triggered specifically by the old-style syntax interaction with nseq's
regex handling, or may be a pure logic issue.
## Analysis
The nseq solver handles `str.in_re` constraints by computing the intersection of
multiple regex membership constraints for the same string variable (via the sgraph and
Nielsen graph). The Parikh image pre-check (`seq_parikh.cpp`) should detect that
`(re.range "a" "u")* ∩ "v"*` restricted to non-empty strings is empty, since:
- `"v"*` allows only 'v' characters
- `(re.range "a" "u")*` allows only characters in 'a'..'u'
- 'v' ∉ 'a'..'u' → intersection = `{""}` only
The root cause is likely that the nseq solver does not derive the character-level
disjointness between these two regex languages. The seq solver may use a derivative-based
or automaton intersection approach that detects this.
## Files to investigate
- `src/smt/seq/seq_regex.h` / `seq_regex.cpp` — regex membership processing, pre-check
- `src/smt/seq/seq_parikh.h` / `seq_parikh.cpp` — Parikh image constraint generation
- `src/smt/seq/seq_nielsen.cpp` — membership processing in Nielsen nodes
- `src/smt/theory_nseq.cpp``assign_eh` for regex membership