Agent-Logs-Url: https://github.com/Z3Prover/z3/sessions/04321ea7-2a53-4ed5-9f43-816dc6f7476b Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
2.5 KiB
[nseq] Soundness bug: conflicting regex memberships not detected (norn-benchmark-9f)
Labels: bug, c3, nseq, soundness
Summary
The nseq solver returns sat for a formula asserting that a non-empty string belongs
simultaneously to two disjoint regular languages. The correct answer is unsat.
Affected benchmark
| File | seq verdict | nseq verdict |
|---|---|---|
norn-benchmark-9f.smt2 |
unsat | sat (WRONG) |
Data from: https://github.com/Z3Prover/z3/discussions/9071
Reproducing example
; norn-benchmark-9f.smt2 — EXPECTED: unsat, nseq returns: sat
(set-logic QF_S)
(declare-fun var_0 () String)
(assert (str.in.re var_0 (re.* (re.range "a" "u"))))
(assert (str.in.re var_0 (re.* (str.to.re "v"))))
(assert (not (= var_0 "")))
(check-sat)
The formula asserts:
var_0 ∈ (re.range "a" "u")*— all characters in "a"–"u"var_0 ∈ "v"*— only the character "v"var_0 ≠ ""
Since "v" is not in the range "a"–"u" (the range includes up to 'u', not 'v'),
the intersection of the two languages is {""}. Combined with var_0 ≠ "", this is
unsatisfiable.
Note on syntax
This benchmark uses the old SMT-LIB 2.5 syntax str.in.re (with dots) rather than
the SMT-LIB 2.6 syntax str.in_re (with underscore). Both are supported by Z3's parser.
The bug may be triggered specifically by the old-style syntax interaction with nseq's
regex handling, or may be a pure logic issue.
Analysis
The nseq solver handles str.in_re constraints by computing the intersection of
multiple regex membership constraints for the same string variable (via the sgraph and
Nielsen graph). The Parikh image pre-check (seq_parikh.cpp) should detect that
(re.range "a" "u")* ∩ "v"* restricted to non-empty strings is empty, since:
"v"*allows only 'v' characters(re.range "a" "u")*allows only characters in 'a'..'u'- 'v' ∉ 'a'..'u' → intersection =
{""}only
The root cause is likely that the nseq solver does not derive the character-level disjointness between these two regex languages. The seq solver may use a derivative-based or automaton intersection approach that detects this.
Files to investigate
src/smt/seq/seq_regex.h/seq_regex.cpp— regex membership processing, pre-checksrc/smt/seq/seq_parikh.h/seq_parikh.cpp— Parikh image constraint generationsrc/smt/seq/seq_nielsen.cpp— membership processing in Nielsen nodessrc/smt/theory_nseq.cpp—assign_ehfor regex membership