34 KiB
theory_nseq — ZIPT-Based String Theory Solver for Z3
| Document Metadata | Details |
|---|---|
| Author(s) | Nikolaj Bjorner |
| Status | In Review (RFC) |
| Team / Owner | Z3 / SMT Strings |
| Created / Last Updated | 2026-03-03 |
1. Executive Summary
This spec defines the completion of the ZIPT algorithm port into Z3's c3 branch as theory_nseq — a new string/sequence theory solver based on Nielsen-style word equation solving with lazy regex membership and Parikh image reasoning. The foundational data structures (sgraph, snode, seq_plugin, nielsen_graph) are already implemented and tested (38 unit tests). What remains is: (1) the theory solver skeleton and SMT integration, (2) the Nielsen transformation algorithms (modifier chain, graph expansion, iterative deepening, conflict detection), (3) Parikh image / length reasoning, (4) regex membership processing, (5) propagation engine, and (6) model generation. The new solver will coexist with theory_seq and be selectable via smt.string_solver=nseq.
2. Context and Motivation
2.1 Current State
Z3's string solver theory_seq uses an axiom-driven approach with a 20-phase final_check_eh cascade. While production-grade, it struggles with certain classes of word equations and regex membership constraints that the ZIPT algorithm handles more effectively.
The ZIPT algorithm (reference C# implementation at github.com/CEisenhofer/ZIPT) uses a fundamentally different approach: Nielsen transformations on a search graph with iterative deepening, lazy regex evaluation, and Parikh image constraints.
Already implemented on c3 branch (per research doc):
- ✅
sgraph+snode— String representation, metadata, hashing, substitution, derivatives, minterms (src/ast/euf/euf_sgraph.h/.cpp,euf_snode.h) - ✅
seq_plugin— Egraph plugin for associativity, identity, absorption, star merge (src/ast/euf/euf_seq_plugin.h/.cpp) - ✅ Nielsen graph data structures —
str_eq,str_mem,nielsen_subst,nielsen_edge,nielsen_node,nielsen_graph,dep_tracker(src/smt/seq/seq_nielsen.h/.cpp) - ✅ 38 unit tests (18 sgraph + 7 seq_plugin + 13 nielsen)
- ✅
theory_nseqskeleton and SMT integration (src/smt/theory_nseq.h/.cpp) - ✅ Parameter integration —
smt.string_solver=nseqandsmt.nseq.max_depth(src/params/smt_params*.{cpp,h,pyg},src/smt/smt_setup.cpp) - ✅ Build system —
theory_nseq.cpp,nseq_state.cpp,nseq_regex.cpp,nseq_model.cppinsrc/smt/CMakeLists.txt - ✅ All 13 Nielsen modifiers (
apply_det_modifier,apply_const_nielsen,apply_var_nielsen,apply_eq_split,apply_regex_char_split,apply_power_epsilon,apply_num_cmp,apply_const_num_unwinding,apply_star_intr,apply_gpower_intr,apply_regex_var_split,apply_power_split,apply_var_num_unwinding) - ✅ Graph expansion / iterative deepening DFS —
nielsen_graph::solve()+search_dfs()(depth 3, +1/failure, bounded bysmt.nseq.max_depth) - ✅ Constraint simplification engine —
nielsen_node::simplify_and_init() - ✅ Conflict detection and explanation —
dep_tracker-based conflict clauses - ✅ Propagation engine —
theory_nseq::propagate(), eq/diseq/regex membership wiring - ✅ Model generation —
nseq_model(src/smt/nseq_model.h/.cpp), wired viainit_model()/mk_value()/finalize_model() - ✅ Regex membership processing —
nseq_regex(src/smt/nseq_regex.h/.cpp) - ✅ Constraint state management —
nseq_state(src/smt/nseq_state.h/.cpp) with push/pop backtracking - ✅ Integration unit tests —
tst_nseq_basic(src/test/nseq_basic.cpp),tst_nseq_regex(src/test/nseq_regex.cpp) - ✅ Length constraint generation —
nielsen_graph::generate_length_constraints()(length equalities, power non-negativity, regex bounds)
Not yet implemented / incomplete:
- ❌ Integer constraint reasoning via
lp::int_solver— length constraints are generated but not yet discharged throughlp::int_solver; currently propagated as theory lemmas viaarith_value - ❌ Correct results on Z3's full string regression suite (
C:\z3test\regressions\smt2\string*.smt2)
2.2 The Problem
- Algorithmic gap: The core solving algorithms are implemented. The main remaining gap is full
lp::int_solverintegration for integer subproblems. - Regression coverage: The solver has not yet been validated against Z3's full string regression suite.
2.3 Reference Documents
spec/reference.md— Points to ZIPT repo (parikh branch), LazyMemberships.pdf, ClemensTableau.pdfspec/plan.md— Task descriptionspec/plan1.md— 7-phase implementation plan with file listresearch/docs/2026-03-03-zipt-string-theory-codebase-analysis.md— Comprehensive codebase research
3. Goals and Non-Goals
3.1 Functional Goals
theory_nseqtheory solver class implementingsmt::theoryinterface- Selectable via
smt.string_solver=nseqparameter - Nielsen transformation-based word equation solving
- Lazy regex membership via Brzozowski derivatives (reusing
sgraph::brzozowski_deriv) - Integer constraint reasoning via
lp::int_solver(length, power terms, Parikh image) - Iterative deepening search: initial depth 3, +1 per failure, upper bound via
smt.nseq.max_depth(0 = infinity) - Conflict detection with dependency-tracked explanations
- Model generation for satisfiable instances
- Correct results on Z3's existing string regression suite (
C:\z3test\regressions\smt2\string*.smt2)
3.2 Non-Goals (Out of Scope)
- Replacing
theory_seqas default —theory_nseqis an alternative, not a replacement - Polymorphic sequences — focus on strings (
Seq[Char]) initially; genericSeq[T]can follow later - Higher-order sequence ops (
seq.map,seq.foldl) — defer totheory_seqor axiomatize minimally - Bitvector-to-string conversions (
ubv2s,sbv2s) — defer - Integer-to-string / string-to-integer conversions (
str.to_int,int.to_str) — defer - Extended string predicates: containment (
str.contains), prefix/suffix (str.prefixof,str.suffixof) — defer - String ordering (
str.<,str.<=) — defer - New Z3 API surface —
theory_nseqis internal only, selected by parameter - Performance parity with
theory_seqon all benchmarks in Phase 1 — correctness first
4. Proposed Solution (High-Level Design)
4.1 System Architecture
┌─────────────────────────────────────────────────────────┐
│ smt::context │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ theory_arith │ │ theory_nseq │ │ theory_bv │ │
│ └──────────────┘ └──────┬───────┘ └──────────────┘ │
│ │ │
│ ┌────────────────────────┼──────────────────────────┐ │
│ │ theory_nseq internals │ │
│ │ ┌─────────────┐ ┌───┴───────┐ ┌────────────┐ │ │
│ │ │ nseq_regex │ │ propagator│ │ nseq_model │ │ │
│ │ └──────┬──────┘ └───┬───────┘ └────────────┘ │ │
│ │ │ │ │ │
│ │ ┌──────┴─────────────┴────────────────────────┐ │ │
│ │ │ nielsen_graph (search engine) │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ modifier │ │ simplify │ │ │ │
│ │ │ │ chain │ │ engine │ │ │ │
│ │ │ └──────────┘ └──────────┘ │ │ │
│ │ └───────────────────┬─────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌───────────────────┴─────────────────────────┐ │ │
│ │ │ sgraph + snode │ │ │
│ │ │ (string representation layer) │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ Shared infrastructure: seq_util, seq_rewriter, │
│ seq_skolem, arith_value, seq_factory │
└──────────────────────────────────────────────────────────┘
4.2 Architectural Pattern
Nielsen transformation graph with iterative deepening DFS — matching ZIPT's architecture:
- Internalization: String terms/atoms are registered, creating theory variables and snode representations.
- Propagation: Equality/disequality merges and boolean assignments update the constraint store.
- Final check: Build a nielsen_graph from active constraints, run iterative deepening search.
- Search: At each node, apply the modifier chain (13 modifiers in priority order) to generate child nodes. Check for conflicts and satisfaction.
- Conflict/Satisfaction: If all leaves are satisfied → FC_DONE. If a conflict is found → generate explanation clause and return FC_CONTINUE. If stuck → increase depth bound and retry.
4.3 Key Components
| Component | Responsibility | Location | Reuse |
|---|---|---|---|
theory_nseq |
SMT theory interface, internalization, propagation | src/smt/theory_nseq.h/.cpp |
New |
nseq_state |
Constraint store, variable tracking, backtrackable state | src/smt/nseq_state.h/.cpp |
New |
| Nielsen graph (solver) | Graph expansion, iterative deepening, conflict detection | src/smt/seq/seq_nielsen.h/.cpp |
Extend existing |
| Nielsen modifiers | 13 transformation rules (per ZIPT modifier chain) | src/smt/seq/seq_nielsen.h/.cpp |
New (in existing files) |
nseq_regex |
Regex membership processing, derivative dispatch | src/smt/nseq_regex.h/.cpp |
New |
nseq_model |
Model construction from solved constraints | src/smt/nseq_model.h/.cpp |
New |
| Parikh/length | Length reasoning, integer subproblems | src/smt/seq/seq_nielsen.h/.cpp |
New (in existing) |
lp::int_solver |
Integer subproblem solving (replaces ZIPT's PDD subsolver) | src/math/lp/int_solver.h |
Existing ✅ |
| sgraph | String representation, substitution, derivatives | src/ast/euf/euf_sgraph.* |
Existing ✅ |
| seq_plugin | Egraph associativity/simplification | src/ast/euf/euf_seq_plugin.* |
Existing ✅ |
| seq_util | Sort/op declarations and utilities | src/ast/seq_decl_plugin.* |
Existing ✅ |
| seq_rewriter | Rewrite rules, regex derivatives, nullability | src/ast/rewriter/seq_rewriter.* |
Existing ✅ |
| seq_skolem | Skolem function introduction | src/ast/rewriter/seq_skolem.* |
Existing ✅ |
| arith_value | Arithmetic solver queries | src/smt/smt_arith_value.* |
Existing ✅ |
| seq_factory | Model value factory | src/model/seq_factory.h |
Existing ✅ |
5. Detailed Design
5.1 Phase 1: Skeleton & Integration
5.1.1 theory_nseq class (src/smt/theory_nseq.h/.cpp)
Implements smt::theory with these pure virtual methods:
class theory_nseq : public theory {
// Required by smt::theory:
bool internalize_atom(app* atom, bool gate_ctx) override;
bool internalize_term(app* term) override;
void new_eq_eh(theory_var v1, theory_var v2) override;
void new_diseq_eh(theory_var v1, theory_var v2) override;
theory* mk_fresh(context* new_ctx) override;
void display(std::ostream& out) const override;
// Optional overrides:
void assign_eh(bool_var v, bool is_true) override;
bool can_propagate() override;
void propagate() override;
final_check_status final_check_eh() override;
void push_scope_eh() override;
void pop_scope_eh(unsigned num_scopes) override;
void init_model(model_generator& mg) override;
void finalize_model(model_generator& mg) override;
void collect_statistics(statistics& st) const override;
// Members:
seq_util m_seq;
arith_util m_autil;
arith_value m_arith_value;
seq_rewriter m_rewriter;
seq::skolem m_sk;
euf::egraph m_egraph;
euf::sgraph m_sgraph;
nielsen_graph m_nielsen;
// ... state management, constraint store
};
family_id: Uses m_seq.get_family_id() (same as theory_seq).
Internalization pattern (following theory_seq):
internalize_term: For string-sorted terms, create theory variable, register with egraph/sgraph. Forstr.++,str.len,str.in_re, etc., recursively internalize children.internalize_atom: Forstr.in_re— create boolean variable. (str.contains,str.prefixof,str.suffixof,str.<,str.<=are out of scope fortheory_nseq.)
5.1.2 Parameter integration
src/params/smt_params.cpp — Add "nseq" to validate_string_solver():
void smt_params::validate_string_solver(symbol const& s) const {
if (s == "z3str3" || s == "seq" || s == "empty" || s == "auto" || s == "none" || s == "nseq")
return;
throw default_exception("Invalid string solver value. Legal values are z3str3, seq, empty, auto, none, nseq");
}
src/smt/smt_setup.cpp — Add "nseq" dispatch:
void setup::setup_seq_str(static_features const& st) {
if (m_params.m_string_solver == "nseq") {
setup_nseq();
}
else if (m_params.m_string_solver == "seq") { ... }
// ...
}
void setup::setup_nseq() {
m_context.register_plugin(alloc(smt::theory_nseq, m_context));
}
src/smt/smt_setup.h — Declare setup_nseq().
src/params/smt_params_helper.pyg — Document nseq in the string_solver parameter description, and add:
def_param("nseq.max_depth", UINT, 0,
"Maximum Nielsen search depth for theory_nseq (0 = unlimited)")
5.1.3 Build system
src/smt/CMakeLists.txt — Add theory_nseq.cpp and new .cpp files.
5.2 Phase 2: Constraint Store & State Management
5.2.1 nseq_state (src/smt/nseq_state.h/.cpp)
Manages the backtrackable constraint store bridging SMT context to Nielsen graph:
class nseq_state {
// Variable tracking
th_union_find m_find; // Union-find for theory variables
obj_map<expr, theory_var> m_var_map; // expr → theory_var
// Active constraints (backtrackable)
scoped_vector<str_eq> m_eqs; // Active word equations
scoped_vector<str_mem> m_mems; // Active regex memberships
svector<std::pair<theory_var, theory_var>> m_diseqs; // Disequalities
svector<bool_var> m_pending_bools; // Pending boolean assignments
// Scope management
void push();
void pop(unsigned n);
// Constraint registration
void add_eq(euf::snode* lhs, euf::snode* rhs, unsigned dep_id);
void add_mem(euf::snode* str, euf::snode* regex, unsigned dep_id);
void add_diseq(theory_var v1, theory_var v2);
};
Key design decisions:
- Use
scoped_vectorfor backtrackable collections (same pattern astheory_seq) - Map theory variables to snode representations via sgraph
- Dependency IDs correspond to
dep_trackerbit positions for conflict explanation
5.3 Phase 3: Nielsen Transformation Algorithms
This is the core algorithmic phase. All additions go into src/smt/seq/seq_nielsen.h/.cpp.
5.3.1 Simplification Engine
Add to nielsen_node:
// In nielsen_node:
simplify_result simplify_and_init(nielsen_graph& g);
simplify_result simplify_str_eq(nielsen_graph& g, unsigned idx);
simplify_result simplify_str_mem(nielsen_graph& g, unsigned idx);
Simplification rules (matching ZIPT's SimplifyAndInit):
- Trivial equation removal: If
lhs == rhs(same snode), remove - Empty propagation: If
lhs = ε, thenrhsmust be empty → conflict or propagate - Constant prefix matching: If both sides start with same character, strip it
- Constant mismatch: If sides start with different characters → conflict (symbol_clash)
- Unit propagation: If
x = cwhere c is ground, substitute x→c everywhere - Nullable regex: If
str = εand regex is nullable → satisfied
5.3.2 Modifier Chain
Add 13 modifiers matching ZIPT's priority order. Each modifier returns a list of nielsen_edge expansions:
// In nielsen_graph:
struct modifier_result {
ptr_vector<nielsen_edge> edges;
bool deterministic; // true if only one branch is valid
};
// Modifiers (in priority order):
modifier_result apply_det_modifier(nielsen_node& n); // 1. Deterministic simplifications
modifier_result apply_power_epsilon(nielsen_node& n); // 2. Power-epsilon: u="" || n=0
modifier_result apply_num_cmp(nielsen_node& n); // 3. Numeric comparisons
modifier_result apply_const_num_unwinding(nielsen_node& n); // 4. Constant power unwinding
modifier_result apply_eq_split(nielsen_node& n); // 5. Equality prefix/suffix split
modifier_result apply_star_intr(nielsen_node& n); // 6. Stabilizer introduction
modifier_result apply_gpower_intr(nielsen_node& n); // 7. Power introduction
modifier_result apply_const_nielsen(nielsen_node& n); // 8. Nielsen vs constant
modifier_result apply_regex_char_split(nielsen_node& n); // 9. Regex character split
modifier_result apply_regex_var_split(nielsen_node& n); // 10. Regex variable split
modifier_result apply_power_split(nielsen_node& n); // 11. Power unwinding
modifier_result apply_var_nielsen(nielsen_node& n); // 12. Variable Nielsen split
modifier_result apply_var_num_unwinding(nielsen_node& n); // 13. Variable power unwinding
The modifier chain is applied in order; the first modifier that produces edges is used:
modifier_result generate_extensions(nielsen_node& n) {
modifier_result r;
r = apply_det_modifier(n); if (!r.edges.empty()) return r;
r = apply_power_epsilon(n); if (!r.edges.empty()) return r;
// ... through all 13 modifiers
r = apply_var_num_unwinding(n);
return r;
}
5.3.3 Graph Expansion & Iterative Deepening
Add to nielsen_graph:
// Search result
enum class search_result { sat, unsat, unknown };
// Main search entry point
search_result solve();
// Iterative deepening DFS
search_result search_dfs(nielsen_node* node, unsigned depth);
// Expand a single node
search_result expand_node(nielsen_node* node, unsigned depth);
Algorithm (matching ZIPT's NielsenGraph.Run()):
- Set initial depth bound to 3
- Run DFS from root with current depth bound
- If all branches conflict → UNSAT
- If a leaf is satisfied (all constraints trivial) → SAT
- If depth bound hit → increase bound by 1, restart (iterative deepening)
- If
smt.nseq.max_depth > 0and bound exceeds it → give up (FC_GIVEUP); ifsmt.nseq.max_depth == 0(default) → no upper limit
5.3.4 Conflict Detection & Explanation
Add to nielsen_graph:
// Extract conflict explanation as a set of input constraint indices
void explain_conflict(nielsen_node const& conflict_node, svector<unsigned>& deps);
The dep_tracker bitvectors on each constraint track which original input constraints contributed. On conflict, OR together the dep_trackers of the conflicting constraints to get the minimal explanation.
5.4 Phase 4: Integer Constraint Reasoning
Each Nielsen node accumulates integer constraints from all sources:
- Length constraints: from word equations
lhs = rhs→|lhs| = |rhs| - Power terms:
x^nwith symbolic exponent →n ≥ 0,|x^n| = n * |x| - Parikh image: from regex membership
x ∈ L(r)→ length bounds/modular constraints (e.g.x ∈ (ab)*→|x| mod 2 = 0)
All of these are collected together into a single integer constraint system per node and discharged uniformly via Z3's lp::int_solver (src/math/lp/int_solver.h). This replaces ZIPT's PDD-based integer subsolver. There is no separate treatment per source — they are all integer constraints over the same set of length variables.
// In nielsen_graph or a helper:
void generate_integer_constraints(nielsen_node const& node,
vector<expr_ref>& int_constraints);
Length computation from snode:
|ε|= 0|c|= 1 (character)|a · b|=|a| + |b||x^n|=n * |x|(power)|x|= length variable for x
Integration: Use arith_value to query current assignments and check feasibility. Propagate derived equalities/inequalities via theory_nseq::propagate().
5.5 Phase 5: Regex Membership
5.5.1 nseq_regex (src/smt/nseq_regex.h/.cpp)
Handles regex membership constraints using ZIPT's lazy approach:
class nseq_regex {
euf::sgraph& m_sg;
seq_rewriter& m_rewriter;
// Process regex membership after character consumption
simplify_result process_str_mem(str_mem& mem, euf::snode* consumed_char);
// Compute derivative and create new membership constraint
str_mem derive(str_mem const& mem, euf::snode* char_node);
// Check if regex is empty (unsatisfiable)
bool is_empty_regex(euf::snode* regex);
// Check for regex cycles (stabilizer introduction)
bool detect_cycle(str_mem const& mem);
// Compute minterms for character splitting
void get_minterms(euf::snode* regex, vector<euf::snode*>& minterms);
};
Lazy evaluation pattern (per LazyMemberships.pdf):
- Only consume characters from the string when they become concrete (via substitution)
- Use Brzozowski derivatives:
D(c, r)gives the derivative of regexrby characterc - Track history (consumed prefix) for cycle detection
- When a cycle is detected, introduce a stabilizer:
remaining ∈ base*
Reuse: sgraph::brzozowski_deriv() and sgraph::compute_minterms() are already implemented.
5.6 Phase 6: Propagation Engine & Integration
5.6.1 theory_nseq::final_check_eh()
The main solving loop:
final_check_status theory_nseq::final_check_eh() {
// 1. Collect constraints from SMT context into nielsen_graph
m_nielsen.reset();
populate_nielsen_graph();
// 2. Run Nielsen search
auto result = m_nielsen.solve();
switch (result) {
case search_result::sat:
return FC_DONE;
case search_result::unsat:
// Extract conflict explanation
svector<unsigned> deps;
m_nielsen.explain_conflict(deps);
// Build conflict clause from deps
add_conflict_clause(deps);
return FC_CONTINUE;
case search_result::unknown:
// Depth bound hit or timeout
return FC_GIVEUP;
}
}
5.6.2 populate_nielsen_graph()
Convert active SMT constraints to nielsen_graph format:
void theory_nseq::populate_nielsen_graph() {
for (auto& eq : m_state.eqs()) {
euf::snode* lhs = m_sgraph.mk(eq.first);
euf::snode* rhs = m_sgraph.mk(eq.second);
m_nielsen.add_str_eq(lhs, rhs);
}
for (auto& mem : m_state.mems()) {
euf::snode* str = m_sgraph.mk(mem.str);
euf::snode* regex = m_sgraph.mk(mem.regex);
m_nielsen.add_str_mem(str, regex);
}
}
5.6.3 theory_nseq::propagate()
Propagate consequences from Nielsen solving to the SMT context:
- New equalities (from substitutions) →
ctx.assign_eq() - New disequalities →
ctx.assign_diseq() - New literals (regex memberships) →
ctx.assign() - Length constraints →
ctx.assign()for arithmetic literals
5.7 Phase 7: Model Generation
5.7.1 nseq_model (src/smt/nseq_model.h/.cpp)
Build concrete string values from solved Nielsen graph:
class nseq_model {
// Extract model assignments from a satisfied nielsen_node
void extract_assignments(nielsen_node const& leaf,
obj_map<expr, expr*>& assignments);
// Generate fresh string values for unconstrained variables
expr_ref mk_fresh_value(sort* s, seq_factory& factory);
// Ensure consistency with regex memberships
bool validate_regex(expr* str_val, expr* regex);
};
Model construction algorithm:
- From the satisfied leaf node, collect all substitutions along the path from root
- Compose substitutions to get final variable assignments
- For unconstrained variables, use
seq_factory::get_fresh_value() - Register with
model_generatorviainit_model()/finalize_model()
6. Alternatives Considered
| Option | Pros | Cons | Decision |
|---|---|---|---|
Extend theory_seq with Nielsen |
Single solver, shared infrastructure | Massive refactor, high risk to production solver | Rejected: Too risky to production |
| Port ZIPT as standalone user propagator | Minimal Z3 changes | Poor integration, no sharing of Z3 infrastructure, slower | Rejected: Misses optimization opportunities |
New theory_nseq (selected) |
Clean separation, reuses infrastructure, safe coexistence | New code to maintain, some duplication | Selected: Best balance of risk and benefit |
| PDD for Parikh (full ZIPT port) | Exact polynomial reasoning | Complex, Z3 has lp::int_solver already |
Rejected: Use lp::int_solver instead of porting ZIPT's PDD subsolver |
7. Cross-Cutting Concerns
7.1 Backtracking / Scope Management
theory_nseq must correctly implement push_scope_eh() / pop_scope_eh():
nseq_stateusesscoped_vector(same astheory_seq)- sgraph/egraph have their own
push()/pop()— must be synchronized - nielsen_graph is rebuilt from scratch on each
final_check_eh()(no incremental state), so no backtracking needed for the graph itself
7.2 Conflict Clause Quality
Conflict explanations must be minimal for effective CDCL learning:
dep_trackerbitvectors track which input constraints are responsible- On conflict, extract the minimal set of input constraint IDs
- Map these back to Z3 literals for the conflict clause
7.3 Interaction with Arithmetic Solver
Length constraints create coupling between theory_nseq and the arithmetic solver:
- Use
arith_valueto query current arithmetic assignments (same pattern astheory_seq) - Propagate derived length equalities as theory lemmas
- May need to handle
FC_CONTINUEloops when arithmetic and string solvers ping-pong
7.4 Statistics
collect_statistics() should track:
- Number of Nielsen graph nodes explored
- Number of conflicts found
- Number of depth bound increases
- Number of regex derivatives computed
- Number of length constraints generated
8. Migration, Rollout, and Testing
8.1 Deployment Strategy
- Phase 1:
theory_nseqreturnsFC_GIVEUPfor all inputs — proves integration works - Phase 2: Handle trivial word equations (constant strings, simple variables)
- Phase 3: Full modifier chain — handle the string regression suite
- Phase 4: Regex support — handle
re*.smt2regression tests - Phase 5: Performance tuning — compare against
theory_seqon SMT-COMP benchmarks
8.2 Test Plan
Unit Tests (C++ tests in src/test/)
| Test | File | Scope |
|---|---|---|
TST(nseq_basic) |
src/test/nseq_basic.cpp |
theory_nseq instantiation, parameter selection, trivial sat/unsat |
TST(seq_nielsen) |
src/test/seq_nielsen.cpp (existing) |
Extend with simplification, modifier, and search tests |
TST(euf_sgraph) |
src/test/euf_sgraph.cpp (existing) |
Already passing; extend if sgraph changes are needed |
Regression Tests (SMT2 via z3 command line)
Use C:\z3test\regressions\smt2\ (the same suite as nightly CI):
-
String-specific files with
smt.string_solver=nseq:string-concat.smt2,string-eval.smt2,string-itos.smt2,string-literals.smt2,string-ops.smt2,string-simplify.smt2string3.smt2,string6.smt2,string7.smt2,string9.smt2,string11.smt2–string14.smt2re.smt2,re_rewriter.smt2
-
Comparison testing: Run same files with both
smt.string_solver=seqandsmt.string_solver=nseq, verify sat/unsat agreement (no soundness bugs). -
Full regression suite: Ensure
smt.string_solver=nseqdoesn't break non-string tests.
Targeted SMT2 Tests (New)
Create focused test files in tests/nseq/:
eq_basic.smt2— Simple word equationseq_nielsen.smt2— Equations requiring Nielsen transformationslength.smt2— Length constraintsregex.smt2— Regex membershipmixed.smt2— Combined string + arithmeticunsat.smt2— Unsatisfiable benchmarks
9. Design Decisions (Resolved)
-
Q1: egraph ownership —
theory_nseqcreates its own private egraph/sgraph, matching the ZIPT pattern where the string manager is independent. This avoids interference with other theories and keeps the string representation self-contained. -
Q2: Parikh implementation approach — Use Z3's built-in
lp::int_solver(src/math/lp/int_solver.h) for integer subproblems arising from Parikh/length constraints. Generate linear length constraints from word equations and regex membership, propagate them as theory lemmas viaarith_value. ZIPT's PDD subsolver is not ported —lp::int_solverdirectly replaces it. -
Q3: Nielsen graph persistence — The Nielsen graph will be maintained incrementally with backtracking support. The graph persists across
final_check_eh()calls with push/pop scope synchronization. This avoids redundant re-exploration of previously visited nodes when the SMT solver backtracks. -
Q4: Modifier chain completeness — Implement an essential subset of 5 modifiers first: DetModifier, EqSplitModifier, ConstNielsenModifier, VarNielsenModifier, RegexCharSplitModifier. These cover basic word equations and simple regex membership. The remaining 8 modifiers (PowerEpsilon, NumCmp, ConstNumUnwinding, StarIntr, GPowerIntr, RegexVarSplit, PowerSplit, VarNumUnwinding) will be added incrementally in subsequent phases.
-
Q5: Generic sequences —
theory_nseqhandles allSeq[T]types from the start, not just strings. Nielsen transformations work on any element type. Regex membership is string-specific but the word equation engine is generic. -
Q6: Higher-order ops — Axiomatize minimally for
seq.map,seq.mapi,seq.foldl,seq.foldli. Generate basic axioms (similar to theory_seq's approach) without deep reasoning. This ensures theory_nseq doesn't immediately give up on problems containing these operations. -
Q7: Disequality handling — Use separate tracking like ZIPT. Port ZIPT's character disequality and string disequality tracking into the nielsen_graph data structures. This gives the modifier chain direct access to disequality information (particularly needed for character minterm computation in RegexCharSplitModifier).
10. File Summary
New Files
| File | Phase | Description |
|---|---|---|
src/smt/theory_nseq.h |
1 | Theory class header |
src/smt/theory_nseq.cpp |
1 | Theory class implementation |
src/smt/nseq_state.h |
2 | Constraint store header |
src/smt/nseq_state.cpp |
2 | Constraint store implementation |
src/smt/nseq_regex.h |
5 | Regex handling header |
src/smt/nseq_regex.cpp |
5 | Regex handling implementation |
src/smt/nseq_model.h |
7 | Model generation header |
src/smt/nseq_model.cpp |
7 | Model generation implementation |
src/test/nseq_basic.cpp |
1 | Unit tests |
Files to Extend
| File | Phase | Change |
|---|---|---|
src/smt/seq/seq_nielsen.h |
3 | Add simplification, modifier chain, search methods |
src/smt/seq/seq_nielsen.cpp |
3 | Implement above methods |
Files to Modify (Integration)
| File | Phase | Change |
|---|---|---|
src/smt/smt_setup.h |
1 | Declare setup_nseq() |
src/smt/smt_setup.cpp |
1 | Add "nseq" dispatch, implement setup_nseq() |
src/params/smt_params.cpp |
1 | Add "nseq" to validate_string_solver() |
src/params/smt_params_helper.pyg |
1 | Document nseq option |
src/smt/CMakeLists.txt |
1 | Add new .cpp files to SOURCES |
src/test/main.cpp |
1 | Register TST(nseq_basic) |
src/test/CMakeLists.txt |
1 | Add nseq_basic.cpp |
11. Implementation Order
- Phase 1 — Skeleton (
theory_nseqclass, parameter integration, build system) — compilable stub returningFC_GIVEUP - Phase 2 — State management (
nseq_state) — constraint store with backtracking - Phase 3 — Nielsen algorithms (simplification, modifiers, search, conflict) — core solver
- Phase 4 — Integer constraint reasoning (
lp::int_solver) — length, power, Parikh - Phase 5 — Regex membership (
nseq_regex) — lazy derivatives - Phase 6 — Propagation wiring — connect all components in
final_check_eh - Phase 7 — Model generation (
nseq_model) — satisfying assignments