29 KiB
| date | researcher | git_commit | branch | repository | topic | tags | status | last_updated | last_updated_by | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2026-03-03 20:46:00 UTC | Copilot CLI | fb674ac5b2 |
c3 | c3 | ZIPT-related string theory components: sgraph, seq_plugin, Nielsen graphs, theory_seq, seq_rewriter, axioms |
|
complete | 2026-03-03 | Copilot CLI |
Research: ZIPT-Related String Theory Components in Z3
Research Question
Examine the Z3 codebase and specifications in the spec/ directory, focusing on code related to the ZIPT repository (https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT) and the files for sgraph, seq_plugin, Nielsen graphs, theory_seq, and associated files for sequences (seq_rewriter, axioms for seq).
Summary
This document provides a comprehensive analysis of Z3's string/sequence theory infrastructure and its relationship to the ZIPT reference implementation. Z3 has two layers of string solving:
- Existing production solver (
theory_seq): A mature, axiom-driven solver insrc/smt/with 20+ final_check phases, regex derivatives, and Skolemization. - New ZIPT-inspired infrastructure (in progress):
sgraph(string graph),seq_plugin(egraph plugin), andnielsen_graph(Nielsen transformations) — implementing the ZIPT approach of Nielsen-style word equation solving with lazy regex membership and Parikh image reasoning.
The spec directory (spec/plan1.md) describes the goal: implement theory_nseq as a new theory solver using the ZIPT algorithm, selectable via smt.string_solver=nseq. No theory_nseq files exist yet — only the foundational data structures (sgraph, snode, seq_plugin, nielsen_graph) are implemented and tested.
Detailed Findings
1. ZIPT Reference Implementation (C#)
Repository: https://github.com/CEisenhofer/ZIPT (parikh branch)
The ZIPT solver implements Nielsen-style word equation solving with lazy regex membership and Parikh image analysis. It is a Z3 user-propagator written in C#.
Architecture
| Component | File(s) | Purpose |
|---|---|---|
StringPropagator |
StringPropagator.cs |
Base user-propagator; handles Z3 callbacks (Fixed, Created, Eq, DisEq) |
NielsenGraph |
Constraints/NielsenGraph.cs |
Search graph with iterative deepening DFS |
NielsenNode |
Constraints/NielsenNode.cs |
Node = set of constraints; supports simplification and extension |
NielsenEdge |
Constraints/NielsenEdge.cs |
Edge = substitutions + side constraints |
StrManager |
Strings/StrManager.cs |
String canonicalization, concatenation, substitution |
PDD |
IntUtils/PDD.cs |
Polynomial Decision Diagrams for integer reasoning |
Interval |
IntUtils/Interval.cs |
Interval arithmetic for bounds |
Constraint Types
- StrEq: Word equations
LHS = RHS(regex-free) - StrMem: Regex membership
str ∈ L(regex)with history tracking - IntEq: Polynomial equality
P = 0(over length variables) - IntLe: Polynomial inequality
P ≤ 0 - IntNonEq: Polynomial disequality
P ≠ 0 - Character ranges: Symbolic char ∈ [a-z]
- Character disequalities: c₁ ≠ c₂
Modifier Chain (Priority Order)
- DetModifier — Deterministic simplifications (highest priority)
- PowerEpsilonModifier — Handle empty power:
u = "" || n = 0 - NumCmpModifier — Numeric comparisons:
n ≤ m || n > m - ConstNumUnwindingModifier — Constant power unwinding:
n = 0 || n > 0 - EqSplitModifier — Split equality into prefix/suffix parts
- StarIntrModifier — Introduce stabilizer
x ∈ base* - GPowerIntrModifier — Introduce power
x := u^n prefix(u) - ConstNielsenModifier — Nielsen split against constant:
x := ax || x := "" - RegexCharSplitModifier — Character-level regex split via minterms
- RegexVarSplitModifier — Variable-level regex split:
x / "" || x / cx - PowerSplitModifier — Power unwinding
- VarNielsenModifier — Variable Nielsen split (lowest priority)
- VarNumUnwindingModifier — Variable power unwinding
Key Design Patterns
- Lazy regex evaluation: Only consume characters when concrete
- Canonicalization: All strings/polynomials are canonicalized
- Parikh constraints: Length reasoning via PDD polynomials + interval arithmetic
- Iterative deepening: Depth-bounded search with increasing limits
- Subsumption: Avoid redundant branches via node subsumption checking
- Modification counts: Version string variables for Z3 (x → x_0, x_1, ...)
2. sgraph — String Graph (src/ast/euf/)
Files: euf_sgraph.h, euf_sgraph.cpp, euf_snode.h
The sgraph maps Z3 AST expressions to snode (string node) structures organized as binary concatenation trees. This is the Z3-native equivalent of ZIPT's StrManager.
snode Class (euf_snode.h)
snode_kind enum (16 kinds):
| Kind | AST Op | Description |
|---|---|---|
s_empty |
OP_SEQ_EMPTY |
Empty string |
s_char |
OP_SEQ_UNIT (literal) |
Concrete character |
s_var |
uninterpreted const | String variable |
s_unit |
OP_SEQ_UNIT (non-literal) |
Generic unit |
s_concat |
OP_SEQ_CONCAT |
Binary concatenation |
s_power |
OP_SEQ_POWER |
String exponentiation s^n |
s_star |
OP_RE_STAR |
Kleene star r* |
s_loop |
OP_RE_LOOP |
Bounded loop r{lo,hi} |
s_union |
OP_RE_UNION |
Union r₁|r₂ |
s_intersect |
OP_RE_INTERSECT |
Intersection r₁&r₂ |
s_complement |
OP_RE_COMPLEMENT |
Complement ~r |
s_fail |
OP_RE_EMPTY_SET |
Empty language ∅ |
s_full_char |
OP_RE_FULL_CHAR_SET |
Full character set . |
s_full_seq |
OP_RE_FULL_SEQ_SET |
Full sequence set .* |
s_to_re |
OP_SEQ_TO_RE |
String to regex |
s_in_re |
OP_SEQ_IN_RE |
Regex membership |
s_other |
(default) | Other expression |
Key Members:
expr* m_expr— Z3 AST expressionsnode_kind m_kind— Classificationunsigned m_id— Unique IDbool m_ground— No uninterpreted variables in subtreebool m_regex_free— No regex constructs in subtreebool m_nullable— Accepts empty stringunsigned m_level— Tree depthunsigned m_length— Token count (leaf count)unsigned m_hash_matrix[2][2]— 2×2 polynomial hash matrix for O(1) associative hashingsnode_subst_cache* m_subst_cache— ZIPT-style substitution cachesnode* m_args[0]— Flexible array of children
Hash Matrix (HASH_BASE = 31):
- Empty: identity matrix
[[1,0],[0,1]] - Leaf:
[[31, expr_id+1], [0, 1]] - Concat: matrix multiplication
M(left) * M(right) - Hash value:
m_hash_matrix[0][1](assoc_hash)
Navigation: first(), last(), at(i), collect_tokens(), is_token() (everything except empty and concat)
sgraph Class (euf_sgraph.h/cpp)
Constructor: sgraph(ast_manager& m, egraph& eg, bool add_plugin = true)
- Stores reference to egraph (not owned)
- If
add_plugin=true, creates and registersseq_plugin - Registers
on_makecallback with egraph: when any enode is created with seq/re sort, automatically creates corresponding snode
Core Operations:
| Method | Description |
|---|---|
mk(expr* e) |
Create snode from expression (idempotent) |
find(expr* e) |
Lookup existing snode |
mk_var(symbol) |
Create variable snode |
mk_char(unsigned) |
Create character snode |
mk_empty() |
Create empty snode |
mk_concat(a, b) |
Create concat (with identity elimination) |
drop_first/last(n) |
Remove first/last token |
drop_left/right(n, k) |
Remove k tokens from left/right |
subst(n, var, repl) |
Substitute variable with replacement (cached) |
brzozowski_deriv(re, elem) |
Regex derivative via seq_rewriter |
compute_minterms(re, out) |
Extract character classes (capped at 20 predicates) |
push()/pop(n) |
Scope management synchronized with egraph |
Substitution Caching: Each snode has lazy-allocated snode_subst_cache storing (var_id, repl_id) → result mappings, avoiding redundant substitution computations (ZIPT pattern).
3. seq_plugin — Egraph Plugin (src/ast/euf/)
Files: euf_seq_plugin.h, euf_seq_plugin.cpp
The seq_plugin implements equality saturation for sequence/regex concatenation in the egraph. It handles associativity, identity/absorption rules, and Kleene star simplifications.
Class Hierarchy
euf::plugin (abstract base)
└── euf::seq_plugin
Constructor: seq_plugin(egraph& g, sgraph* sg = nullptr)
- If
sgis null, creates and owns a new sgraph - Initializes
seq_util,seq_rewriter, hash table with custom functors
Concat Handling
| Predicate | String | Regex |
|---|---|---|
| Concat | is_str_concat() / OP_SEQ_CONCAT |
is_re_concat() / OP_RE_CONCAT |
| Identity | is_str_empty() / ε |
is_re_epsilon() / to_re("") |
| Absorbing | (none) | is_re_empty() / ∅ |
Simplification Rules (from propagate_simplify):
- Kleene star merging:
concat(v*, v*) = v*when bodies are congruent (root equality) - Extended star merging:
concat(v*, concat(v*, c)) = concat(v*, c) - Nullable absorption:
concat(.*, v) = .*when v is nullable - Symmetric nullable absorption:
concat(v, .*) = .*when v is nullable
Associativity (from propagate_assoc):
- Maintains
hashtable<enode*, enode_concat_hash, enode_concat_eq>that respects associativity - Hash: Uses sgraph's cached snode hash matrices for O(1) hashing, or falls back to leaf-based hashing
- Equality: Compares flattened leaf sequences pointwise
- When duplicate found, merges via
push_merge
Nullability: Delegates to m_rewriter.is_nullable(e) (from seq_rewriter)
Key Relationship: seq_plugin does NOT create snodes. The sgraph's on_make callback handles snode creation. seq_plugin only observes egraph events and triggers merges.
4. Nielsen Graph (src/smt/seq/)
Files: seq_nielsen.h, seq_nielsen.cpp, CMakeLists.txt
The Nielsen graph implements word equation solving via Nielsen transformations. This is the Z3-native port of ZIPT's NielsenGraph, NielsenNode, NielsenEdge, and constraint types.
Classes
dep_tracker (seq_nielsen.h:81-94)
Bitvector-based dependency tracker for conflict explanation.
merge(other): Bitwise ORis_superset(other): All bits of other presentis_empty(): No bits set
str_eq (seq_nielsen.h:98-119)
Word equation lhs = rhs (both regex-free snode trees).
sort(): Canonicalize by snode IDis_trivial(): Same snode or both emptycontains_var(var): Check if variable appears in lhs or rhs
str_mem (seq_nielsen.h:123-143)
Regex membership str ∈ L(regex).
m_history: Consumed prefix for cycle detectionm_id: Unique identifieris_primitive(): True when str is a single variable
nielsen_subst (seq_nielsen.h:147-162)
Variable substitution var → replacement.
is_eliminating(): True when variable doesn't appear in replacement (progress)
nielsen_edge (seq_nielsen.h:166-195)
Directed edge in the Nielsen graph.
m_subst: Vector of substitutions appliedm_side_str_eq/m_side_str_mem: Side constraints addedm_is_progress: Whether this is an eliminating step
nielsen_node (seq_nielsen.h:199-270)
Node = set of constraints at a point in the search.
m_str_eq: String equality constraintsm_str_mem: Regex membership constraintsm_outgoing: Outgoing edgesm_backedge: For cycle detectionm_is_general_conflict,m_reason: Conflict trackingclone_from(parent): Copy constraintsapply_subst(sg, s): Apply substitution to all constraints
nielsen_graph (seq_nielsen.h:274-325)
The overall graph structure.
m_sg: Reference to sgraph infrastructurem_root: Root node with initial constraintsm_run_idx,m_depth_bound: For iterative deepeningmk_node(),mk_child(parent),mk_edge(src, tgt, progress)add_str_eq(lhs, rhs),add_str_mem(str, regex): Populate from external solverreset(),inc_run_idx(): Lifecycle management
Enums
simplify_result: proceed, conflict, satisfied, restart, restart_and_satisfied
backtrack_reason: unevaluated, extended, symbol_clash, parikh_image, subsumption, arithmetic, regex, regex_widening, character_range, smt, children_failed
Build Configuration
src/smt/seq/CMakeLists.txt: Component smt_seq, depends on euf and rewriter.
Current Status
The Nielsen graph provides data structures and infrastructure — node/edge/constraint/substitution management, dependency tracking, backtracking support, display. The actual transformation algorithms (modifier chain, graph expansion, conflict detection, Parikh image analysis) are not yet implemented. Tests cover all data structure operations but not algorithmic behavior.
5. theory_seq — Existing String Theory Solver (src/smt/)
Files: theory_seq.h, theory_seq.cpp, theory_seq_empty.h
The production string solver implementing theory interface. Uses axiom-driven solving with 20+ cascading phases.
Class Structure
smt::theory (base)
seq::eq_solver_context (interface)
└── smt::theory_seq
Component Submodules (declared in theory_seq.h:356-366):
seq::skolem m_sk— Skolemizationseq_axioms m_ax— Axiom generation (bridges rewriter and SMT layers)seq::eq_solver m_eq— Core equality solver (solver-agnostic)seq_regex m_regex— Regular expression handlingseq_offset_eq m_offset_eq— Length offset trackingth_union_find m_find— Union-find for theory variablesdependency_manager m_dm— Justification trackingsolution_map m_rep— Variable → representative mapping
Final Check Cascade (20 phases)
1. simplify_and_solve_eqs() — Solve unitary equalities
2. check_lts() — Lexicographic ordering
3. solve_nqs(0) — Disequalities
4. m_regex.propagate() — Regex propagation
5. check_contains() — Contains constraints
6. check_fixed_length(true,false)— Zero-length sequences
7. len_based_split() — Length-based splitting
8. check_fixed_length(false,false)— Fixed-length expansion
9. check_int_string() — int/string conversions
10. check_ubv_string() — bitvector/string conversions
11. reduce_length_eq() — Reduce by length equality
12. branch_unit_variable() — Branch on unit variables
13. branch_binary_variable() — Branch on binary patterns
14. branch_variable() — General variable branching
15. check_length_coherence() — Length coherence
16. check_extensionality() — Extensionality
17. branch_nqs() — Branch on disequalities
18. branch_itos() — Branch on int-to-string
19. check_fixed_length(false,true)— Fixed-length (long strings)
20. solve_recfuns() — Recursive functions
Returns FC_CONTINUE if progress, FC_GIVEUP if stuck, FC_DONE if solved.
String Solver Selection (smt_setup.cpp)
Parameter: smt.string_solver
"seq"→setup_seq()→alloc(smt::theory_seq, m_context)"empty"→setup_seq()(with empty variant)"none"→ No solver"auto"→setup_seq()"nseq"→ Not yet implemented (planned per spec)
Key Data Structures
solution_map m_rep— Maps variables to representatives (with normalization cache)scoped_vector<depeq> m_eqs— Active equationsscoped_vector<ne> m_nqs— Disequalitiesscoped_vector<nc> m_ncs— Negated containsexclusion_table m_exclude— Asserted disequalitiesexpr_ref_vector m_axioms— Axiom queue (with deduplication)
6. seq_rewriter — Sequence Rewriter (src/ast/rewriter/)
Files: seq_rewriter.h, seq_rewriter.cpp
Provides local syntactic simplification rules for all string/sequence operations. Entry point: mk_app_core(func_decl* f, ...).
Rewrite Categories
| Category | Key Methods | Description |
|---|---|---|
| Construction | mk_seq_unit, mk_seq_concat |
Coalesce chars, right-associate, eliminate empty |
| Extraction | mk_seq_extract, mk_seq_at, mk_seq_nth |
Substring, index access |
| Length | mk_seq_length |
Compute length, simplify for concat/replace/extract |
| Search | mk_seq_contains, mk_seq_index, mk_seq_last_index |
Containment, index-of |
| Replacement | mk_seq_replace, mk_seq_replace_all, mk_seq_replace_re* |
String replacement |
| Comparison | mk_str_le, mk_str_lt, mk_seq_prefix, mk_seq_suffix |
Lexicographic, prefix/suffix |
| Conversion | mk_str_itos, mk_str_stoi, mk_str_from_code, mk_str_to_code |
Type conversions |
| Higher-order | mk_seq_map, mk_seq_mapi, mk_seq_foldl, mk_seq_foldli |
Map, fold |
| Regex | mk_re_concat, mk_re_union, mk_re_inter, mk_re_star, etc. |
Regex construction |
| Derivative | mk_re_derivative → mk_antimirov_deriv |
Antimirov-style regex derivative |
| Nullability | is_nullable → is_nullable_rec |
Nullable check for regex |
Regex Derivative System
Uses Antimirov derivatives with a BDD-like normal form:
- Top level: Union of Antimirov derivatives using
_OP_RE_ANTIMIROV_UNION - Middle level: Nested if-then-else sorted by condition ID
- Leaf level: Regular regex unions
Key derivative rules:
D(e, r₁·r₂) = D(e,r₁)·r₂ ∪ (nullable(r₁) ? D(e,r₂) : ∅)D(e, r₁∪r₂) = D(e,r₁) ∪ D(e,r₂)D(e, r*) = D(e,r)·r*D(e, ¬r) = ¬D(e,r)D(e, [lo-hi]) = ε if lo ≤ e ≤ hi else ∅
Validated by check_deriv_normal_form in debug mode.
7. Axiom System (src/ast/rewriter/ + src/smt/)
Two layers: rewriter-level (solver-independent) and SMT-level (bridges to Z3 context).
Rewriter-Level Axioms (seq::axioms, src/ast/rewriter/seq_axioms.h/cpp)
Solver-agnostic axiom generation via callbacks. Key axioms:
| Operation | Axiom Summary |
|---|---|
extract(s,i,l) |
0≤i≤|s| ∧ 0≤l → s=x·e·y ∧ |x|=i ∧ |e|=min(l,|s|-i) |
indexof(t,s,off) |
contains(t,s) → t=x·s·y ∧ indexof=|x| + tightest prefix |
replace(u,s,t) |
contains(u,s) → u=x·s·y ∧ r=x·t·y |
at(s,i) |
0≤i<|s| → s=x·e·y ∧ |x|=i ∧ |e|=1 |
itos(n) |
n≥0 → stoi(itos(n))=n ∧ no leading zeros |
stoi(s) |
stoi(s)≥-1 ∧ stoi("")=-1 |
Introduced via Skolem functions (see below).
SMT-Level Axioms (smt::seq_axioms, src/smt/seq_axioms.h/cpp)
Wraps seq::axioms with literal creation, clause addition, and phase forcing for Z3's SMT context.
8. Skolemization (src/ast/rewriter/seq_skolem.h/cpp)
seq::skolem introduces auxiliary Skolem functions for structural reasoning:
| Skolem | Symbol | Purpose |
|---|---|---|
seq.pre |
m_pre |
Prefix of s up to index i |
seq.post |
m_post |
Suffix of s from index i |
seq.tail |
m_tail |
Rest after first element |
seq.first |
m_seq_first |
First element |
seq.last |
m_seq_last |
Last element |
seq.idx.l/r |
m_indexof_left/right |
Decomposition for indexof: t = left·s·right |
seq.lidx.l/r |
m_lindexof_left/right |
Decomposition for last_indexof |
seq.cnt.l/r |
— | Decomposition for contains |
aut.accept |
m_accept |
Regex automaton acceptance |
aut.step |
m_aut_step |
Automaton state transition |
seq.eq |
m_eq |
Special equality predicate |
seq.length_limit |
m_length_limit |
Length bound |
9. seq_decl_plugin — Declaration Plugin (src/ast/)
Files: seq_decl_plugin.h, seq_decl_plugin.cpp
Provides the theory declarations for sequences, strings, and regular expressions.
Operation Kinds (seq_op_kind enum)
Sequence: OP_SEQ_UNIT, OP_SEQ_EMPTY, OP_SEQ_CONCAT, OP_SEQ_PREFIX, OP_SEQ_SUFFIX, OP_SEQ_CONTAINS, OP_SEQ_EXTRACT, OP_SEQ_REPLACE, OP_SEQ_AT, OP_SEQ_NTH, OP_SEQ_NTH_I, OP_SEQ_NTH_U, OP_SEQ_LENGTH, OP_SEQ_INDEX, OP_SEQ_LAST_INDEX, OP_SEQ_TO_RE, OP_SEQ_IN_RE, OP_SEQ_REPLACE_RE_ALL, OP_SEQ_REPLACE_RE, OP_SEQ_REPLACE_ALL, OP_SEQ_MAP, OP_SEQ_MAPI, OP_SEQ_FOLDL, OP_SEQ_FOLDLI, OP_SEQ_POWER
Regex: OP_RE_PLUS, OP_RE_STAR, OP_RE_OPTION, OP_RE_RANGE, OP_RE_CONCAT, OP_RE_UNION, OP_RE_DIFF, OP_RE_INTERSECT, OP_RE_LOOP, OP_RE_POWER, OP_RE_COMPLEMENT, OP_RE_EMPTY_SET, OP_RE_FULL_SEQ_SET, OP_RE_FULL_CHAR_SET, OP_RE_OF_PRED, OP_RE_REVERSE, OP_RE_DERIVATIVE
String-specific: OP_STRING_CONST, OP_STRING_ITOS, OP_STRING_STOI, OP_STRING_UBVTOS, OP_STRING_SBVTOS, OP_STRING_LT, OP_STRING_LE, OP_STRING_IS_DIGIT, OP_STRING_TO_CODE, OP_STRING_FROM_CODE
seq_util Utility Class
Nested classes for operation construction/recognition:
seq_util::str— String/sequence operations (mk_concat,mk_length,mk_empty,is_concat,is_empty,mk_power,is_power, etc.)seq_util::rex— Regex operations (mk_star,mk_union,mk_inter,mk_complement,mk_concat,is_star,is_concat,is_nullable, etc.)
Sort System
SEQ_SORT— Generic Seq[T]RE_SORT— RegEx[Seq[T]]_STRING_SORT— Seq[Char] (string)_REGLAN_SORT— RegEx[String]
10. Sequence Equality Solver (src/ast/rewriter/seq_eq_solver.h/cpp)
Solver-agnostic equality inference for sequence equations:
reduce_unit: Handle unit equationsreduce_itos1/2/3: Handleitos(s) = itos(t),itos(s) = "",itos(n) = "d1d2...dk"reduce_binary_eq: Handlex·xs = ys·ypatternsreduce_nth_solved: Solve nth equationsbranch_unit_variable: Split variable as empty or non-empty
11. Test Infrastructure
Test Files
| File | Tests | Lines | Component |
|---|---|---|---|
src/test/euf_sgraph.cpp |
18 | 769 | sgraph: classification, metadata, backtracking, navigation, substitution, derivatives, minterms |
src/test/euf_seq_plugin.cpp |
7 | 242 | seq_plugin: associativity, identity elimination, star merging, nullable absorption, egraph sync |
src/test/seq_nielsen.cpp |
13 | 480 | Nielsen: dep_tracker, str_eq, str_mem, subst, node, edge, graph population, expansion, backedge |
Registration
src/test/main.cpp:TST(euf_sgraph)(line 284),TST(euf_seq_plugin)(line 285),TST(seq_nielsen)(line 289)src/test/CMakeLists.txt: All three .cpp files included; depends onsmt_seq- Build:
ninja test-z3, Run:./test-z3 euf_sgraph,./test-z3 euf_seq_plugin,./test-z3 seq_nielsen
Test Setup Pattern
ast_manager m;
reg_decl_plugins(m);
euf::egraph eg(m);
euf::sgraph sg(m, eg); // Creates seq_plugin automatically
seq_util seq(m);
sort_ref str_sort(seq.str.mk_string_sort(), m);
// For nielsen:
seq::nielsen_graph ng(sg);
Code References
ZIPT Infrastructure (New)
| File | Description |
|---|---|
src/ast/euf/euf_snode.h |
snode class, snode_kind enum, hash matrix, subst cache |
src/ast/euf/euf_sgraph.h |
sgraph class header |
src/ast/euf/euf_sgraph.cpp |
sgraph implementation (classify, metadata, hash, subst, derivatives, minterms) |
src/ast/euf/euf_seq_plugin.h |
seq_plugin class header |
src/ast/euf/euf_seq_plugin.cpp |
seq_plugin implementation (assoc, simplify, nullable) |
src/smt/seq/seq_nielsen.h |
Nielsen graph classes (str_eq, str_mem, subst, edge, node, graph) |
src/smt/seq/seq_nielsen.cpp |
Nielsen graph implementation |
src/smt/seq/CMakeLists.txt |
smt_seq component build config |
Existing String Theory (Production)
| File | Description |
|---|---|
src/smt/theory_seq.h |
theory_seq class with 20-phase final_check |
src/smt/theory_seq.cpp |
Full theory solver implementation |
src/smt/theory_seq_empty.h |
Empty/stub variant |
src/smt/seq_axioms.h/cpp |
SMT-level axiom bridge |
src/smt/seq_regex.h/cpp |
Regex handling for theory_seq |
src/smt/seq_eq_solver.cpp |
SMT-level equality solving |
src/smt/seq_ne_solver.cpp |
Disequality solving |
src/smt/seq_offset_eq.h/cpp |
Length offset equality |
src/smt/smt_setup.h/cpp |
String solver selection/instantiation |
Rewriter Layer (Shared)
| File | Description |
|---|---|
src/ast/rewriter/seq_rewriter.h/cpp |
Rewrite rules + regex derivatives |
src/ast/rewriter/seq_axioms.h/cpp |
Solver-independent axiom generation |
src/ast/rewriter/seq_skolem.h/cpp |
Skolem function introduction |
src/ast/rewriter/seq_eq_solver.h/cpp |
Solver-agnostic equality inference |
Declaration Layer
| File | Description |
|---|---|
src/ast/seq_decl_plugin.h/cpp |
Sort/operation declarations, seq_util |
src/model/seq_factory.h |
Model value factory |
src/params/theory_seq_params.h/cpp |
Theory parameters |
src/params/seq_rewriter_params.pyg |
Rewriter parameters |
Tests
| File | Description |
|---|---|
src/test/euf_sgraph.cpp |
18 sgraph unit tests |
src/test/euf_seq_plugin.cpp |
7 seq_plugin tests |
src/test/seq_nielsen.cpp |
13 Nielsen graph tests |
src/test/main.cpp |
Test registration (lines 284, 285, 289) |
src/test/CMakeLists.txt |
Test build config (lines 54, 55, 134) |
Architecture Documentation
Component Dependency Graph
seq_decl_plugin (AST sort/op declarations)
↓
seq_util (utility wrapper)
↓
seq_rewriter (simplification rules, derivatives)
↓ ↘
seq_axioms (rewriter) seq_skolem
↓ ↓
seq_axioms (SMT) ←───────┘
↓
┌────────────┬──────────────────────┐
│ theory_seq │ theory_nseq (TODO) │
│ (existing) │ (per spec/plan1.md) │
├────────────┤──────────────────────┤
│ seq_regex │ nielsen_graph ←──── sgraph ←── snode
│ seq_eq/ne │ (new ZIPT infra) seq_plugin
│ solution_map│ egraph
└────────────┴──────────────────────┘
ZIPT ↔ Z3 Component Mapping
| ZIPT (C#) | Z3 (C++) | Location |
|---|---|---|
StrManager |
sgraph + snode |
src/ast/euf/euf_sgraph.*, euf_snode.h |
Str (token tree) |
snode (binary concat tree) |
src/ast/euf/euf_snode.h |
StrEq |
seq::str_eq |
src/smt/seq/seq_nielsen.h:98 |
StrMem |
seq::str_mem |
src/smt/seq/seq_nielsen.h:123 |
NielsenGraph |
seq::nielsen_graph |
src/smt/seq/seq_nielsen.h:274 |
NielsenNode |
seq::nielsen_node |
src/smt/seq/seq_nielsen.h:199 |
NielsenEdge |
seq::nielsen_edge |
src/smt/seq/seq_nielsen.h:166 |
Subst |
seq::nielsen_subst |
src/smt/seq/seq_nielsen.h:147 |
DependencyTracker |
seq::dep_tracker |
src/smt/seq/seq_nielsen.h:81 |
SimplifyResult |
seq::simplify_result |
src/smt/seq/seq_nielsen.h:54 |
BacktrackReasons |
seq::backtrack_reason |
src/smt/seq/seq_nielsen.h:64 |
PDD / Interval |
(not yet ported) | — |
| Modifier chain | (not yet ported) | — |
StringPropagator |
theory_nseq (TODO) |
— |
What Exists vs. What's Planned
Implemented:
- ✅ sgraph + snode (string representation, metadata, hashing, substitution, derivatives, minterms)
- ✅ seq_plugin (associativity, identity, absorption, star merge, nullable absorption)
- ✅ Nielsen graph data structures (str_eq, str_mem, subst, edge, node, graph, dep_tracker)
- ✅ All unit tests passing (18 + 7 + 13 = 38 tests)
Not Yet Implemented (per spec/plan1.md):
- ❌
theory_nseqskeleton and SMT integration - ❌ Parameter integration (
smt.string_solver=nseq) - ❌ Nielsen transformation algorithms (modifier chain)
- ❌ Graph expansion / iterative deepening search
- ❌ Conflict detection and explanation
- ❌ Parikh image / PDD / Interval reasoning
- ❌ Regex splitting modifiers
- ❌ String propagator (propagation engine)
- ❌ Model generation
- ❌ Integration tests
Historical Context (from spec/)
spec/reference.md
Points to ZIPT repository (parikh branch) and two PDF papers:
LazyMemberships.pdf— Lazy regex membership approachClemensTableau.pdf— Additional tableau algorithms
spec/plan.md
Task description: Implement theory_nseq in src/smt/, add smt.string_solver parameter support, put self-contained utilities in ast/rewriter/.
spec/plan1.md
Detailed implementation plan with 7 phases:
- Skeleton & Integration
- Core Data Structures (state, constraints, dependencies)
- Word Equation Solver (Nielsen transformations)
- Length Constraint Integration (Parikh image)
- Regex Membership
- Propagation Engine
- Model Generation
Open Questions
- theory_nseq skeleton: When will Phase 1 (skeleton + parameter integration) be implemented?
- Modifier chain priority: Should Z3's modifier priorities match ZIPT's exactly, or be adjusted?
- PDD porting: Will Z3 reuse existing polynomial infrastructure or port ZIPT's PDD?
- Regex derivatives: Will
theory_nseqreuseseq_rewriter::mk_derivativeor implement ZIPT's approach? - Integer solver integration: Will
theory_nsequsearith_value(liketheory_seq) or a sub-solver (like ZIPT)? - Backtracking scope: How will Nielsen graph state interact with SMT solver backtracking?
- Performance target: What benchmarks should
theory_nseqmatch or exceed compared totheory_seq?