## Summary
Fixes a **soundness regression** in the sequence/regex rewriter: a
symbolic character range such as `(re.range x x)` was unsoundly
collapsed to `re.empty`, causing a satisfiable membership constraint to
be reported `unsat`.
This was surfaced by the `snapshot-regression` corpus in
`Z3Prover/bench`.
- **Originating discussion:**
https://github.com/Z3Prover/bench/discussions/2761
- **Benchmark:** `iss-5873/bug-2.smt2` (in `Z3Prover/bench`, under
`inputs/issues/iss-5873/`)
- **z3 under test at capture:** `z3-4.17.0-x64-glibc-2.39` (Nightly)
## Divergence
The recorded oracle expects `sat`; current z3 returns `unsat`:
```diff
--- bug-2.expected.out (expected)
+++ produced (current z3)
@@ -1,3 +1,4 @@
-sat
-((tmp_str0 "\u{0}"))
+unsat
+(error "line 12 column 10: check annotation that says sat")
+(error "line 14 column 22: model is not available")
(:reason-unknown "")
```
The benchmark asserts (simplified):
```smt2
(assert (= (str.in_re (str.replace tmp_str0 tmp_str0 tmp_str0)
(re.range tmp_str0 tmp_str0))
(str.contains tmp_str0 tmp_str0)))
```
`str.contains x x` is always true and `str.replace x x x = x`, so this
requires `str.in_re x (re.range x x)` to hold, which is satisfiable
exactly when `x` is a single character (`len(x) = 1`).
## Root cause
`seq_rewriter::mk_re_range` treated any bound that is not a concrete
single-character literal as making the whole range **empty**:
```cpp
if (str().is_string(lo, slo) && slo.length() == 1) clo = slo[0];
else if (str().is_unit(lo, lo1) && m_util.is_const_char(lo1, clo)) ;
else is_empty = true; // unsound for a symbolic bound
```
For a symbolic bound this is unsound: `(re.range x x)` denotes `{x}`
whenever `x` is a single character, not `∅`. Collapsing it to `re.empty`
makes `str.in_re x (re.range x x)` false, contradicting the (true)
`str.contains x x`, so the solver derives an unsound `unsat`.
`git blame` attributes this unsound collapse to z3 commit `15f33f458d`
("Derive with ranges (#9965)"), which post-dates the oracle capture.
## Fix
Two surgical changes in `src/ast/rewriter/seq_rewriter.cpp`:
1. **`mk_re_range`** no longer assumes emptiness for symbolic bounds. It
concludes `re.empty` only when it can *prove* emptiness — a bound whose
length can never be 1, or two concrete bounds with `lo > hi`. When a
bound is symbolic it returns `BR_FAILED` and keeps the range. Concrete
single-character ranges keep their existing handling (`lo == hi →
str.to_re`, inverted → `re.empty`).
2. **`mk_str_in_regexp`** reduces membership in a range that has a
symbolic bound to the equivalent length/order constraints, which are
sound and complete under SMT-LIB `re.range` semantics:
`str.in_re e (re.range lo hi)` ⟶ `len(lo)=1 ∧ len(hi)=1 ∧ len(e)=1 ∧ lo
≤ e ∧ e ≤ hi`
(using `str.<=`). The derivative engine only unfolds ranges whose bounds
are concrete characters, so without this reduction a symbolic-bound
range would otherwise be left unsolved.
## Validation
Rebuilt z3 from this branch on the workflow runner (`./configure && make
-C build -j$(nproc)`) and re-ran the failing benchmark with the same
option the snapshot capture uses (`-T:20`):
```
$ z3 -T:20 inputs/issues/iss-5873/bug-2.smt2
sat
((tmp_str0 "A"))
(:reason-unknown "")
```
The verdict is now **`sat`** (was `unsat`) — the soundness regression is
resolved. A correctness battery over concrete and symbolic ranges all
returns the expected results, e.g.:
- `(str.in_re "b" (re.range "a" "c"))` → `sat`, `(str.in_re "d"
(re.range "a" "c"))` → `unsat`
- `(str.in_re x (re.range x x))` → `sat`; with `(= (str.len x) 2)` →
`unsat`
- `(str.in_re "b" (re.range x y))` → `sat`; with `(str.< y x)` → `unsat`
- `(str.in_re "" (re.range x y))` → `unsat`; `(str.in_re "ab" (re.range
"a" "c"))` → `unsat`
The pre-existing concrete-range derivative fast path is unchanged.
### Note on the model value (benign, unrelated to this fix)
The model value differs from the recorded oracle: current z3 prints
`((tmp_str0 "A"))` whereas the oracle recorded `((tmp_str0 "\u{0}"))`.
Both are valid single-character models (the formula has many). This
difference is **pre-existing and unrelated to this fix**: even a bare
`(assert (= (str.len x) 1))` yields `"A"` on current z3. It stems from
the seq/char theory's default character assignment for
otherwise-unconstrained characters (`theory_char.cpp` assigns fresh
characters starting from `'A'`), not from range handling. I deliberately
did **not** force the character to `\u{0}` — adding `x = "\u{0}"` would
be unsound over-constraining, and changing the global default character
is out of scope for this soundness fix and would perturb unrelated
models. The output is therefore semantically equivalent to the oracle
(same `sat` verdict and reason-unknown) but not byte-identical.
---
*Draft for human review. Diagnosed and fixed by the
`snapshot-regression-fixer` maintenance workflow.*
> Generated by [Fix a Z3 snapshot-regression
divergence](https://github.com/Z3Prover/bench/actions/runs/28502614658)
· 890.7 AIC · ⌖ 46.8 AIC · ⊞ 9K ·
[◷](https://github.com/search?q=repo%3AZ3Prover%2Fz3+%22gh-aw-workflow-id%3A+snapshot-regression-fixer%22&type=pullrequests)
<!-- gh-aw-agentic-workflow: Fix a Z3 snapshot-regression divergence,
engine: copilot, version: 1.0.63, model: claude-opus-4.8, id:
28502614658, workflow_id: snapshot-regression-fixer, run:
https://github.com/Z3Prover/bench/actions/runs/28502614658 -->
<!-- gh-aw-workflow-id: snapshot-regression-fixer -->
<!-- gh-aw-workflow-call-id: Z3Prover/bench/snapshot-regression-fixer
-->
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>