3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-04-29 15:23:37 +00:00
z3/research/docs/nseq-issues/03-soundness-indexof-regex.md
Copilot f518faac9b
Add nseq issue drafts from Ostrich benchmark analysis (discussion #9071) (#9073)
Agent-Logs-Url: https://github.com/Z3Prover/z3/sessions/04321ea7-2a53-4ed5-9f43-816dc6f7476b

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
2026-03-20 19:07:51 -07:00

2.5 KiB

[nseq] Soundness bug: str.indexof unsound when combined with regex membership

Labels: bug, c3, nseq, soundness

Summary

The nseq solver returns sat for benchmarks that constrain str.indexof to values impossible given the regex membership of the input string. The seq solver correctly returns unsat for these cases.

Affected benchmarks

File seq verdict nseq verdict
indexof_const_index_unsat.smt2 unsat sat (WRONG)
indexof_var_unsat.smt2 unsat sat (WRONG)

Data from: https://github.com/Z3Prover/z3/discussions/9071

Reproducing examples

; indexof_const_index_unsat.smt2 — EXPECTED: unsat, nseq returns: sat
(set-info :status unsat)
(declare-fun a () String)
(declare-fun i () Int)
(declare-fun j () Int)
(assert (str.in_re a (re.union (str.to_re "hhhbbb") (str.to_re "bhhh"))))
(assert (= (str.indexof a "hhh" j) i))
(assert (= i 2))
(assert (> j 0))
(check-sat)
; indexof_var_unsat.smt2 — EXPECTED: unsat, nseq returns: sat
(set-info :status unsat)
(declare-fun a () String)
(declare-fun i () Int)
(declare-fun j () Int)
(assert (str.in_re a (re.union (str.to_re "hhhbbb") (str.to_re "bhhh"))))
(assert (= (str.indexof a "hhh" j) i))
(assert (> i 1))
(check-sat)

Analysis

For indexof_const_index_unsat.smt2:

  • a ∈ {hhhbbb, bhhh} (two possibilities)
  • str.indexof a "hhh" j = 2 with j > 0
  • In "hhhbbb", "hhh" appears at index 0 only (but j > 0 means the search starts after index 0)
  • In "bhhh", "hhh" appears at index 1, but with j > 0 the only valid return would be 1, not 2
  • So i = 2 is impossible → unsat

The indexof_axiom in seq_axioms.cpp generates arithmetic constraints for indexof, but these constraints may not be sufficiently tight when combined with concrete regex membership constraints. Specifically, the nseq solver does not appear to combine the regex membership information with the indexof position constraints to derive the contradiction.

The root cause is likely that nseq's indexof_axiom generates axioms about str.indexof without leveraging the concrete alphabet constraints imposed by regex membership. The seq solver may do additional propagation (e.g., via character-level analysis of the regex language) that nseq does not perform.

Files to investigate

  • src/ast/rewriter/seq_axioms.cppindexof_axiom
  • src/smt/seq/seq_regex.h / seq_regex.cpp — regex membership propagation
  • src/smt/theory_nseq.cpp — interaction between regex constraints and arithmetic axioms