--- date: 2026-03-03 20:46:00 UTC researcher: Copilot CLI git_commit: fb674ac5b2e722dfe1086c7d952054a0bc11c963 branch: c3 repository: c3 topic: "ZIPT-related string theory components: sgraph, seq_plugin, Nielsen graphs, theory_seq, seq_rewriter, axioms" tags: [research, codebase, zipt, sgraph, seq_plugin, nielsen, theory_seq, seq_rewriter, seq_axioms, seq_decl_plugin, strings, regex] status: complete last_updated: 2026-03-03 last_updated_by: Copilot CLI --- # Research: ZIPT-Related String Theory Components in Z3 ## Research Question Examine the Z3 codebase and specifications in the `spec/` directory, focusing on code related to the ZIPT repository (https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT) and the files for sgraph, seq_plugin, Nielsen graphs, theory_seq, and associated files for sequences (seq_rewriter, axioms for seq). ## Summary This document provides a comprehensive analysis of Z3's string/sequence theory infrastructure and its relationship to the ZIPT reference implementation. Z3 has **two layers** of string solving: 1. **Existing production solver** (`theory_seq`): A mature, axiom-driven solver in `src/smt/` with 20+ final_check phases, regex derivatives, and Skolemization. 2. **New ZIPT-inspired infrastructure** (in progress): `sgraph` (string graph), `seq_plugin` (egraph plugin), and `nielsen_graph` (Nielsen transformations) — implementing the ZIPT approach of Nielsen-style word equation solving with lazy regex membership and Parikh image reasoning. The spec directory (`spec/plan1.md`) describes the goal: implement `theory_nseq` as a new theory solver using the ZIPT algorithm, selectable via `smt.string_solver=nseq`. No `theory_nseq` files exist yet — only the foundational data structures (sgraph, snode, seq_plugin, nielsen_graph) are implemented and tested. --- ## Detailed Findings ### 1. ZIPT Reference Implementation (C#) **Repository**: https://github.com/CEisenhofer/ZIPT (parikh branch) The ZIPT solver implements Nielsen-style word equation solving with lazy regex membership and Parikh image analysis. It is a Z3 user-propagator written in C#. #### Architecture | Component | File(s) | Purpose | |-----------|---------|---------| | `StringPropagator` | `StringPropagator.cs` | Base user-propagator; handles Z3 callbacks (Fixed, Created, Eq, DisEq) | | `NielsenGraph` | `Constraints/NielsenGraph.cs` | Search graph with iterative deepening DFS | | `NielsenNode` | `Constraints/NielsenNode.cs` | Node = set of constraints; supports simplification and extension | | `NielsenEdge` | `Constraints/NielsenEdge.cs` | Edge = substitutions + side constraints | | `StrManager` | `Strings/StrManager.cs` | String canonicalization, concatenation, substitution | | `PDD` | `IntUtils/PDD.cs` | Polynomial Decision Diagrams for integer reasoning | | `Interval` | `IntUtils/Interval.cs` | Interval arithmetic for bounds | #### Constraint Types - **StrEq**: Word equations `LHS = RHS` (regex-free) - **StrMem**: Regex membership `str ∈ L(regex)` with history tracking - **IntEq**: Polynomial equality `P = 0` (over length variables) - **IntLe**: Polynomial inequality `P ≤ 0` - **IntNonEq**: Polynomial disequality `P ≠ 0` - **Character ranges**: Symbolic char ∈ [a-z] - **Character disequalities**: c₁ ≠ c₂ #### Modifier Chain (Priority Order) 1. **DetModifier** — Deterministic simplifications (highest priority) 2. **PowerEpsilonModifier** — Handle empty power: `u = "" || n = 0` 3. **NumCmpModifier** — Numeric comparisons: `n ≤ m || n > m` 4. **ConstNumUnwindingModifier** — Constant power unwinding: `n = 0 || n > 0` 5. **EqSplitModifier** — Split equality into prefix/suffix parts 6. **StarIntrModifier** — Introduce stabilizer `x ∈ base*` 7. **GPowerIntrModifier** — Introduce power `x := u^n prefix(u)` 8. **ConstNielsenModifier** — Nielsen split against constant: `x := ax || x := ""` 9. **RegexCharSplitModifier** — Character-level regex split via minterms 10. **RegexVarSplitModifier** — Variable-level regex split: `x / "" || x / cx` 11. **PowerSplitModifier** — Power unwinding 12. **VarNielsenModifier** — Variable Nielsen split (lowest priority) 13. **VarNumUnwindingModifier** — Variable power unwinding #### Key Design Patterns - **Lazy regex evaluation**: Only consume characters when concrete - **Canonicalization**: All strings/polynomials are canonicalized - **Parikh constraints**: Length reasoning via PDD polynomials + interval arithmetic - **Iterative deepening**: Depth-bounded search with increasing limits - **Subsumption**: Avoid redundant branches via node subsumption checking - **Modification counts**: Version string variables for Z3 (x → x_0, x_1, ...) --- ### 2. sgraph — String Graph (`src/ast/euf/`) **Files**: `euf_sgraph.h`, `euf_sgraph.cpp`, `euf_snode.h` The sgraph maps Z3 AST expressions to `snode` (string node) structures organized as binary concatenation trees. This is the Z3-native equivalent of ZIPT's `StrManager`. #### snode Class (`euf_snode.h`) **snode_kind enum** (16 kinds): | Kind | AST Op | Description | |------|--------|-------------| | `s_empty` | `OP_SEQ_EMPTY` | Empty string | | `s_char` | `OP_SEQ_UNIT` (literal) | Concrete character | | `s_var` | uninterpreted const | String variable | | `s_unit` | `OP_SEQ_UNIT` (non-literal) | Generic unit | | `s_concat` | `OP_SEQ_CONCAT` | Binary concatenation | | `s_power` | `OP_SEQ_POWER` | String exponentiation s^n | | `s_star` | `OP_RE_STAR` | Kleene star r* | | `s_loop` | `OP_RE_LOOP` | Bounded loop r{lo,hi} | | `s_union` | `OP_RE_UNION` | Union r₁\|r₂ | | `s_intersect` | `OP_RE_INTERSECT` | Intersection r₁&r₂ | | `s_complement` | `OP_RE_COMPLEMENT` | Complement ~r | | `s_fail` | `OP_RE_EMPTY_SET` | Empty language ∅ | | `s_full_char` | `OP_RE_FULL_CHAR_SET` | Full character set . | | `s_full_seq` | `OP_RE_FULL_SEQ_SET` | Full sequence set .* | | `s_to_re` | `OP_SEQ_TO_RE` | String to regex | | `s_in_re` | `OP_SEQ_IN_RE` | Regex membership | | `s_other` | (default) | Other expression | **Key Members**: - `expr* m_expr` — Z3 AST expression - `snode_kind m_kind` — Classification - `unsigned m_id` — Unique ID - `bool m_ground` — No uninterpreted variables in subtree - `bool m_regex_free` — No regex constructs in subtree - `bool m_nullable` — Accepts empty string - `unsigned m_level` — Tree depth - `unsigned m_length` — Token count (leaf count) - `unsigned m_hash_matrix[2][2]` — 2×2 polynomial hash matrix for O(1) associative hashing - `snode_subst_cache* m_subst_cache` — ZIPT-style substitution cache - `snode* m_args[0]` — Flexible array of children **Hash Matrix** (`HASH_BASE = 31`): - Empty: identity matrix `[[1,0],[0,1]]` - Leaf: `[[31, expr_id+1], [0, 1]]` - Concat: matrix multiplication `M(left) * M(right)` - Hash value: `m_hash_matrix[0][1]` (assoc_hash) **Navigation**: `first()`, `last()`, `at(i)`, `collect_tokens()`, `is_token()` (everything except empty and concat) #### sgraph Class (`euf_sgraph.h/cpp`) **Constructor**: `sgraph(ast_manager& m, egraph& eg, bool add_plugin = true)` - Stores reference to egraph (not owned) - If `add_plugin=true`, creates and registers `seq_plugin` - Registers `on_make` callback with egraph: when any enode is created with seq/re sort, automatically creates corresponding snode **Core Operations**: | Method | Description | |--------|-------------| | `mk(expr* e)` | Create snode from expression (idempotent) | | `find(expr* e)` | Lookup existing snode | | `mk_var(symbol)` | Create variable snode | | `mk_char(unsigned)` | Create character snode | | `mk_empty()` | Create empty snode | | `mk_concat(a, b)` | Create concat (with identity elimination) | | `drop_first/last(n)` | Remove first/last token | | `drop_left/right(n, k)` | Remove k tokens from left/right | | `subst(n, var, repl)` | Substitute variable with replacement (cached) | | `brzozowski_deriv(re, elem)` | Regex derivative via seq_rewriter | | `compute_minterms(re, out)` | Extract character classes (capped at 20 predicates) | | `push()/pop(n)` | Scope management synchronized with egraph | **Substitution Caching**: Each snode has lazy-allocated `snode_subst_cache` storing `(var_id, repl_id) → result` mappings, avoiding redundant substitution computations (ZIPT pattern). --- ### 3. seq_plugin — Egraph Plugin (`src/ast/euf/`) **Files**: `euf_seq_plugin.h`, `euf_seq_plugin.cpp` The seq_plugin implements equality saturation for sequence/regex concatenation in the egraph. It handles associativity, identity/absorption rules, and Kleene star simplifications. #### Class Hierarchy ``` euf::plugin (abstract base) └── euf::seq_plugin ``` **Constructor**: `seq_plugin(egraph& g, sgraph* sg = nullptr)` - If `sg` is null, creates and owns a new sgraph - Initializes `seq_util`, `seq_rewriter`, hash table with custom functors #### Concat Handling | Predicate | String | Regex | |-----------|--------|-------| | Concat | `is_str_concat()` / `OP_SEQ_CONCAT` | `is_re_concat()` / `OP_RE_CONCAT` | | Identity | `is_str_empty()` / ε | `is_re_epsilon()` / to_re("") | | Absorbing | (none) | `is_re_empty()` / ∅ | **Simplification Rules** (from `propagate_simplify`): 1. **Kleene star merging**: `concat(v*, v*) = v*` when bodies are congruent (root equality) 2. **Extended star merging**: `concat(v*, concat(v*, c)) = concat(v*, c)` 3. **Nullable absorption**: `concat(.*, v) = .*` when v is nullable 4. **Symmetric nullable absorption**: `concat(v, .*) = .*` when v is nullable **Associativity** (from `propagate_assoc`): - Maintains `hashtable` that respects associativity - Hash: Uses sgraph's cached snode hash matrices for O(1) hashing, or falls back to leaf-based hashing - Equality: Compares flattened leaf sequences pointwise - When duplicate found, merges via `push_merge` **Nullability**: Delegates to `m_rewriter.is_nullable(e)` (from `seq_rewriter`) **Key Relationship**: seq_plugin does NOT create snodes. The sgraph's `on_make` callback handles snode creation. seq_plugin only observes egraph events and triggers merges. --- ### 4. Nielsen Graph (`src/smt/seq/`) **Files**: `seq_nielsen.h`, `seq_nielsen.cpp`, `CMakeLists.txt` The Nielsen graph implements word equation solving via Nielsen transformations. This is the Z3-native port of ZIPT's `NielsenGraph`, `NielsenNode`, `NielsenEdge`, and constraint types. #### Classes ##### dep_tracker (`seq_nielsen.h:81-94`) Bitvector-based dependency tracker for conflict explanation. - `merge(other)`: Bitwise OR - `is_superset(other)`: All bits of other present - `is_empty()`: No bits set ##### str_eq (`seq_nielsen.h:98-119`) Word equation `lhs = rhs` (both regex-free snode trees). - `sort()`: Canonicalize by snode ID - `is_trivial()`: Same snode or both empty - `contains_var(var)`: Check if variable appears in lhs or rhs ##### str_mem (`seq_nielsen.h:123-143`) Regex membership `str ∈ L(regex)`. - `m_history`: Consumed prefix for cycle detection - `m_id`: Unique identifier - `is_primitive()`: True when str is a single variable ##### nielsen_subst (`seq_nielsen.h:147-162`) Variable substitution `var → replacement`. - `is_eliminating()`: True when variable doesn't appear in replacement (progress) ##### nielsen_edge (`seq_nielsen.h:166-195`) Directed edge in the Nielsen graph. - `m_subst`: Vector of substitutions applied - `m_side_str_eq/m_side_str_mem`: Side constraints added - `m_is_progress`: Whether this is an eliminating step ##### nielsen_node (`seq_nielsen.h:199-270`) Node = set of constraints at a point in the search. - `m_str_eq`: String equality constraints - `m_str_mem`: Regex membership constraints - `m_outgoing`: Outgoing edges - `m_backedge`: For cycle detection - `m_is_general_conflict`, `m_reason`: Conflict tracking - `clone_from(parent)`: Copy constraints - `apply_subst(sg, s)`: Apply substitution to all constraints ##### nielsen_graph (`seq_nielsen.h:274-325`) The overall graph structure. - `m_sg`: Reference to sgraph infrastructure - `m_root`: Root node with initial constraints - `m_run_idx`, `m_depth_bound`: For iterative deepening - `mk_node()`, `mk_child(parent)`, `mk_edge(src, tgt, progress)` - `add_str_eq(lhs, rhs)`, `add_str_mem(str, regex)`: Populate from external solver - `reset()`, `inc_run_idx()`: Lifecycle management #### Enums **simplify_result**: `proceed`, `conflict`, `satisfied`, `restart`, `restart_and_satisfied` **backtrack_reason**: `unevaluated`, `extended`, `symbol_clash`, `parikh_image`, `subsumption`, `arithmetic`, `regex`, `regex_widening`, `character_range`, `smt`, `children_failed` #### Build Configuration `src/smt/seq/CMakeLists.txt`: Component `smt_seq`, depends on `euf` and `rewriter`. #### Current Status The Nielsen graph provides **data structures and infrastructure** — node/edge/constraint/substitution management, dependency tracking, backtracking support, display. The **actual transformation algorithms** (modifier chain, graph expansion, conflict detection, Parikh image analysis) are not yet implemented. Tests cover all data structure operations but not algorithmic behavior. --- ### 5. theory_seq — Existing String Theory Solver (`src/smt/`) **Files**: `theory_seq.h`, `theory_seq.cpp`, `theory_seq_empty.h` The production string solver implementing `theory` interface. Uses axiom-driven solving with 20+ cascading phases. #### Class Structure ``` smt::theory (base) seq::eq_solver_context (interface) └── smt::theory_seq ``` **Component Submodules** (declared in `theory_seq.h:356-366`): - `seq::skolem m_sk` — Skolemization - `seq_axioms m_ax` — Axiom generation (bridges rewriter and SMT layers) - `seq::eq_solver m_eq` — Core equality solver (solver-agnostic) - `seq_regex m_regex` — Regular expression handling - `seq_offset_eq m_offset_eq` — Length offset tracking - `th_union_find m_find` — Union-find for theory variables - `dependency_manager m_dm` — Justification tracking - `solution_map m_rep` — Variable → representative mapping #### Final Check Cascade (20 phases) ``` 1. simplify_and_solve_eqs() — Solve unitary equalities 2. check_lts() — Lexicographic ordering 3. solve_nqs(0) — Disequalities 4. m_regex.propagate() — Regex propagation 5. check_contains() — Contains constraints 6. check_fixed_length(true,false)— Zero-length sequences 7. len_based_split() — Length-based splitting 8. check_fixed_length(false,false)— Fixed-length expansion 9. check_int_string() — int/string conversions 10. check_ubv_string() — bitvector/string conversions 11. reduce_length_eq() — Reduce by length equality 12. branch_unit_variable() — Branch on unit variables 13. branch_binary_variable() — Branch on binary patterns 14. branch_variable() — General variable branching 15. check_length_coherence() — Length coherence 16. check_extensionality() — Extensionality 17. branch_nqs() — Branch on disequalities 18. branch_itos() — Branch on int-to-string 19. check_fixed_length(false,true)— Fixed-length (long strings) 20. solve_recfuns() — Recursive functions ``` Returns `FC_CONTINUE` if progress, `FC_GIVEUP` if stuck, `FC_DONE` if solved. #### String Solver Selection (`smt_setup.cpp`) Parameter: `smt.string_solver` - `"seq"` → `setup_seq()` → `alloc(smt::theory_seq, m_context)` - `"empty"` → `setup_seq()` (with empty variant) - `"none"` → No solver - `"auto"` → `setup_seq()` - `"nseq"` → **Not yet implemented** (planned per spec) #### Key Data Structures - `solution_map m_rep` — Maps variables to representatives (with normalization cache) - `scoped_vector m_eqs` — Active equations - `scoped_vector m_nqs` — Disequalities - `scoped_vector m_ncs` — Negated contains - `exclusion_table m_exclude` — Asserted disequalities - `expr_ref_vector m_axioms` — Axiom queue (with deduplication) --- ### 6. seq_rewriter — Sequence Rewriter (`src/ast/rewriter/`) **Files**: `seq_rewriter.h`, `seq_rewriter.cpp` Provides local syntactic simplification rules for all string/sequence operations. Entry point: `mk_app_core(func_decl* f, ...)`. #### Rewrite Categories | Category | Key Methods | Description | |----------|-------------|-------------| | Construction | `mk_seq_unit`, `mk_seq_concat` | Coalesce chars, right-associate, eliminate empty | | Extraction | `mk_seq_extract`, `mk_seq_at`, `mk_seq_nth` | Substring, index access | | Length | `mk_seq_length` | Compute length, simplify for concat/replace/extract | | Search | `mk_seq_contains`, `mk_seq_index`, `mk_seq_last_index` | Containment, index-of | | Replacement | `mk_seq_replace`, `mk_seq_replace_all`, `mk_seq_replace_re*` | String replacement | | Comparison | `mk_str_le`, `mk_str_lt`, `mk_seq_prefix`, `mk_seq_suffix` | Lexicographic, prefix/suffix | | Conversion | `mk_str_itos`, `mk_str_stoi`, `mk_str_from_code`, `mk_str_to_code` | Type conversions | | Higher-order | `mk_seq_map`, `mk_seq_mapi`, `mk_seq_foldl`, `mk_seq_foldli` | Map, fold | | Regex | `mk_re_concat`, `mk_re_union`, `mk_re_inter`, `mk_re_star`, etc. | Regex construction | | Derivative | `mk_re_derivative` → `mk_antimirov_deriv` | Antimirov-style regex derivative | | Nullability | `is_nullable` → `is_nullable_rec` | Nullable check for regex | #### Regex Derivative System Uses **Antimirov derivatives** with a BDD-like normal form: 1. **Top level**: Union of Antimirov derivatives using `_OP_RE_ANTIMIROV_UNION` 2. **Middle level**: Nested if-then-else sorted by condition ID 3. **Leaf level**: Regular regex unions Key derivative rules: - `D(e, r₁·r₂) = D(e,r₁)·r₂ ∪ (nullable(r₁) ? D(e,r₂) : ∅)` - `D(e, r₁∪r₂) = D(e,r₁) ∪ D(e,r₂)` - `D(e, r*) = D(e,r)·r*` - `D(e, ¬r) = ¬D(e,r)` - `D(e, [lo-hi]) = ε if lo ≤ e ≤ hi else ∅` Validated by `check_deriv_normal_form` in debug mode. --- ### 7. Axiom System (`src/ast/rewriter/` + `src/smt/`) Two layers: **rewriter-level** (solver-independent) and **SMT-level** (bridges to Z3 context). #### Rewriter-Level Axioms (`seq::axioms`, `src/ast/rewriter/seq_axioms.h/cpp`) Solver-agnostic axiom generation via callbacks. Key axioms: | Operation | Axiom Summary | |-----------|---------------| | `extract(s,i,l)` | `0≤i≤\|s\| ∧ 0≤l → s=x·e·y ∧ \|x\|=i ∧ \|e\|=min(l,\|s\|-i)` | | `indexof(t,s,off)` | `contains(t,s) → t=x·s·y ∧ indexof=\|x\|` + tightest prefix | | `replace(u,s,t)` | `contains(u,s) → u=x·s·y ∧ r=x·t·y` | | `at(s,i)` | `0≤i<\|s\| → s=x·e·y ∧ \|x\|=i ∧ \|e\|=1` | | `itos(n)` | `n≥0 → stoi(itos(n))=n ∧ no leading zeros` | | `stoi(s)` | `stoi(s)≥-1 ∧ stoi("")=-1` | Introduced via Skolem functions (see below). #### SMT-Level Axioms (`smt::seq_axioms`, `src/smt/seq_axioms.h/cpp`) Wraps `seq::axioms` with literal creation, clause addition, and phase forcing for Z3's SMT context. --- ### 8. Skolemization (`src/ast/rewriter/seq_skolem.h/cpp`) `seq::skolem` introduces auxiliary Skolem functions for structural reasoning: | Skolem | Symbol | Purpose | |--------|--------|---------| | `seq.pre` | `m_pre` | Prefix of s up to index i | | `seq.post` | `m_post` | Suffix of s from index i | | `seq.tail` | `m_tail` | Rest after first element | | `seq.first` | `m_seq_first` | First element | | `seq.last` | `m_seq_last` | Last element | | `seq.idx.l/r` | `m_indexof_left/right` | Decomposition for indexof: t = left·s·right | | `seq.lidx.l/r` | `m_lindexof_left/right` | Decomposition for last_indexof | | `seq.cnt.l/r` | — | Decomposition for contains | | `aut.accept` | `m_accept` | Regex automaton acceptance | | `aut.step` | `m_aut_step` | Automaton state transition | | `seq.eq` | `m_eq` | Special equality predicate | | `seq.length_limit` | `m_length_limit` | Length bound | --- ### 9. seq_decl_plugin — Declaration Plugin (`src/ast/`) **Files**: `seq_decl_plugin.h`, `seq_decl_plugin.cpp` Provides the theory declarations for sequences, strings, and regular expressions. #### Operation Kinds (`seq_op_kind` enum) **Sequence**: `OP_SEQ_UNIT`, `OP_SEQ_EMPTY`, `OP_SEQ_CONCAT`, `OP_SEQ_PREFIX`, `OP_SEQ_SUFFIX`, `OP_SEQ_CONTAINS`, `OP_SEQ_EXTRACT`, `OP_SEQ_REPLACE`, `OP_SEQ_AT`, `OP_SEQ_NTH`, `OP_SEQ_NTH_I`, `OP_SEQ_NTH_U`, `OP_SEQ_LENGTH`, `OP_SEQ_INDEX`, `OP_SEQ_LAST_INDEX`, `OP_SEQ_TO_RE`, `OP_SEQ_IN_RE`, `OP_SEQ_REPLACE_RE_ALL`, `OP_SEQ_REPLACE_RE`, `OP_SEQ_REPLACE_ALL`, `OP_SEQ_MAP`, `OP_SEQ_MAPI`, `OP_SEQ_FOLDL`, `OP_SEQ_FOLDLI`, **`OP_SEQ_POWER`** **Regex**: `OP_RE_PLUS`, `OP_RE_STAR`, `OP_RE_OPTION`, `OP_RE_RANGE`, `OP_RE_CONCAT`, `OP_RE_UNION`, `OP_RE_DIFF`, `OP_RE_INTERSECT`, `OP_RE_LOOP`, `OP_RE_POWER`, `OP_RE_COMPLEMENT`, `OP_RE_EMPTY_SET`, `OP_RE_FULL_SEQ_SET`, `OP_RE_FULL_CHAR_SET`, `OP_RE_OF_PRED`, `OP_RE_REVERSE`, `OP_RE_DERIVATIVE` **String-specific**: `OP_STRING_CONST`, `OP_STRING_ITOS`, `OP_STRING_STOI`, `OP_STRING_UBVTOS`, `OP_STRING_SBVTOS`, `OP_STRING_LT`, `OP_STRING_LE`, `OP_STRING_IS_DIGIT`, `OP_STRING_TO_CODE`, `OP_STRING_FROM_CODE` #### seq_util Utility Class Nested classes for operation construction/recognition: - `seq_util::str` — String/sequence operations (`mk_concat`, `mk_length`, `mk_empty`, `is_concat`, `is_empty`, `mk_power`, `is_power`, etc.) - `seq_util::rex` — Regex operations (`mk_star`, `mk_union`, `mk_inter`, `mk_complement`, `mk_concat`, `is_star`, `is_concat`, `is_nullable`, etc.) #### Sort System - `SEQ_SORT` — Generic Seq[T] - `RE_SORT` — RegEx[Seq[T]] - `_STRING_SORT` — Seq[Char] (string) - `_REGLAN_SORT` — RegEx[String] --- ### 10. Sequence Equality Solver (`src/ast/rewriter/seq_eq_solver.h/cpp`) Solver-agnostic equality inference for sequence equations: - `reduce_unit`: Handle unit equations - `reduce_itos1/2/3`: Handle `itos(s) = itos(t)`, `itos(s) = ""`, `itos(n) = "d1d2...dk"` - `reduce_binary_eq`: Handle `x·xs = ys·y` patterns - `reduce_nth_solved`: Solve nth equations - `branch_unit_variable`: Split variable as empty or non-empty --- ### 11. Test Infrastructure #### Test Files | File | Tests | Lines | Component | |------|-------|-------|-----------| | `src/test/euf_sgraph.cpp` | 18 | 769 | sgraph: classification, metadata, backtracking, navigation, substitution, derivatives, minterms | | `src/test/euf_seq_plugin.cpp` | 7 | 242 | seq_plugin: associativity, identity elimination, star merging, nullable absorption, egraph sync | | `src/test/seq_nielsen.cpp` | 13 | 480 | Nielsen: dep_tracker, str_eq, str_mem, subst, node, edge, graph population, expansion, backedge | #### Registration - `src/test/main.cpp`: `TST(euf_sgraph)` (line 284), `TST(euf_seq_plugin)` (line 285), `TST(seq_nielsen)` (line 289) - `src/test/CMakeLists.txt`: All three .cpp files included; depends on `smt_seq` - Build: `ninja test-z3`, Run: `./test-z3 euf_sgraph`, `./test-z3 euf_seq_plugin`, `./test-z3 seq_nielsen` #### Test Setup Pattern ```cpp ast_manager m; reg_decl_plugins(m); euf::egraph eg(m); euf::sgraph sg(m, eg); // Creates seq_plugin automatically seq_util seq(m); sort_ref str_sort(seq.str.mk_string_sort(), m); // For nielsen: seq::nielsen_graph ng(sg); ``` --- ## Code References ### ZIPT Infrastructure (New) | File | Description | |------|-------------| | `src/ast/euf/euf_snode.h` | snode class, snode_kind enum, hash matrix, subst cache | | `src/ast/euf/euf_sgraph.h` | sgraph class header | | `src/ast/euf/euf_sgraph.cpp` | sgraph implementation (classify, metadata, hash, subst, derivatives, minterms) | | `src/ast/euf/euf_seq_plugin.h` | seq_plugin class header | | `src/ast/euf/euf_seq_plugin.cpp` | seq_plugin implementation (assoc, simplify, nullable) | | `src/smt/seq/seq_nielsen.h` | Nielsen graph classes (str_eq, str_mem, subst, edge, node, graph) | | `src/smt/seq/seq_nielsen.cpp` | Nielsen graph implementation | | `src/smt/seq/CMakeLists.txt` | smt_seq component build config | ### Existing String Theory (Production) | File | Description | |------|-------------| | `src/smt/theory_seq.h` | theory_seq class with 20-phase final_check | | `src/smt/theory_seq.cpp` | Full theory solver implementation | | `src/smt/theory_seq_empty.h` | Empty/stub variant | | `src/smt/seq_axioms.h/cpp` | SMT-level axiom bridge | | `src/smt/seq_regex.h/cpp` | Regex handling for theory_seq | | `src/smt/seq_eq_solver.cpp` | SMT-level equality solving | | `src/smt/seq_ne_solver.cpp` | Disequality solving | | `src/smt/seq_offset_eq.h/cpp` | Length offset equality | | `src/smt/smt_setup.h/cpp` | String solver selection/instantiation | ### Rewriter Layer (Shared) | File | Description | |------|-------------| | `src/ast/rewriter/seq_rewriter.h/cpp` | Rewrite rules + regex derivatives | | `src/ast/rewriter/seq_axioms.h/cpp` | Solver-independent axiom generation | | `src/ast/rewriter/seq_skolem.h/cpp` | Skolem function introduction | | `src/ast/rewriter/seq_eq_solver.h/cpp` | Solver-agnostic equality inference | ### Declaration Layer | File | Description | |------|-------------| | `src/ast/seq_decl_plugin.h/cpp` | Sort/operation declarations, seq_util | | `src/model/seq_factory.h` | Model value factory | | `src/params/theory_seq_params.h/cpp` | Theory parameters | | `src/params/seq_rewriter_params.pyg` | Rewriter parameters | ### Tests | File | Description | |------|-------------| | `src/test/euf_sgraph.cpp` | 18 sgraph unit tests | | `src/test/euf_seq_plugin.cpp` | 7 seq_plugin tests | | `src/test/seq_nielsen.cpp` | 13 Nielsen graph tests | | `src/test/main.cpp` | Test registration (lines 284, 285, 289) | | `src/test/CMakeLists.txt` | Test build config (lines 54, 55, 134) | --- ## Architecture Documentation ### Component Dependency Graph ``` seq_decl_plugin (AST sort/op declarations) ↓ seq_util (utility wrapper) ↓ seq_rewriter (simplification rules, derivatives) ↓ ↘ seq_axioms (rewriter) seq_skolem ↓ ↓ seq_axioms (SMT) ←───────┘ ↓ ┌────────────┬──────────────────────┐ │ theory_seq │ theory_nseq (TODO) │ │ (existing) │ (per spec/plan1.md) │ ├────────────┤──────────────────────┤ │ seq_regex │ nielsen_graph ←──── sgraph ←── snode │ seq_eq/ne │ (new ZIPT infra) seq_plugin │ solution_map│ egraph └────────────┴──────────────────────┘ ``` ### ZIPT ↔ Z3 Component Mapping | ZIPT (C#) | Z3 (C++) | Location | |------------|----------|----------| | `StrManager` | `sgraph` + `snode` | `src/ast/euf/euf_sgraph.*`, `euf_snode.h` | | `Str` (token tree) | `snode` (binary concat tree) | `src/ast/euf/euf_snode.h` | | `StrEq` | `seq::str_eq` | `src/smt/seq/seq_nielsen.h:98` | | `StrMem` | `seq::str_mem` | `src/smt/seq/seq_nielsen.h:123` | | `NielsenGraph` | `seq::nielsen_graph` | `src/smt/seq/seq_nielsen.h:274` | | `NielsenNode` | `seq::nielsen_node` | `src/smt/seq/seq_nielsen.h:199` | | `NielsenEdge` | `seq::nielsen_edge` | `src/smt/seq/seq_nielsen.h:166` | | `Subst` | `seq::nielsen_subst` | `src/smt/seq/seq_nielsen.h:147` | | `DependencyTracker` | `seq::dep_tracker` | `src/smt/seq/seq_nielsen.h:81` | | `SimplifyResult` | `seq::simplify_result` | `src/smt/seq/seq_nielsen.h:54` | | `BacktrackReasons` | `seq::backtrack_reason` | `src/smt/seq/seq_nielsen.h:64` | | `PDD` / `Interval` | (not yet ported) | — | | Modifier chain | (not yet ported) | — | | `StringPropagator` | `theory_nseq` (TODO) | — | ### What Exists vs. What's Planned **Implemented**: - ✅ sgraph + snode (string representation, metadata, hashing, substitution, derivatives, minterms) - ✅ seq_plugin (associativity, identity, absorption, star merge, nullable absorption) - ✅ Nielsen graph data structures (str_eq, str_mem, subst, edge, node, graph, dep_tracker) - ✅ All unit tests passing (18 + 7 + 13 = 38 tests) **Not Yet Implemented** (per spec/plan1.md): - ❌ `theory_nseq` skeleton and SMT integration - ❌ Parameter integration (`smt.string_solver=nseq`) - ❌ Nielsen transformation algorithms (modifier chain) - ❌ Graph expansion / iterative deepening search - ❌ Conflict detection and explanation - ❌ Parikh image / PDD / Interval reasoning - ❌ Regex splitting modifiers - ❌ String propagator (propagation engine) - ❌ Model generation - ❌ Integration tests --- ## Historical Context (from spec/) ### spec/reference.md Points to ZIPT repository (parikh branch) and two PDF papers: - `LazyMemberships.pdf` — Lazy regex membership approach - `ClemensTableau.pdf` — Additional tableau algorithms ### spec/plan.md Task description: Implement `theory_nseq` in `src/smt/`, add `smt.string_solver` parameter support, put self-contained utilities in `ast/rewriter/`. ### spec/plan1.md Detailed implementation plan with 7 phases: 1. Skeleton & Integration 2. Core Data Structures (state, constraints, dependencies) 3. Word Equation Solver (Nielsen transformations) 4. Length Constraint Integration (Parikh image) 5. Regex Membership 6. Propagation Engine 7. Model Generation --- ## Open Questions 1. **theory_nseq skeleton**: When will Phase 1 (skeleton + parameter integration) be implemented? 2. **Modifier chain priority**: Should Z3's modifier priorities match ZIPT's exactly, or be adjusted? 3. **PDD porting**: Will Z3 reuse existing polynomial infrastructure or port ZIPT's PDD? 4. **Regex derivatives**: Will `theory_nseq` reuse `seq_rewriter::mk_derivative` or implement ZIPT's approach? 5. **Integer solver integration**: Will `theory_nseq` use `arith_value` (like `theory_seq`) or a sub-solver (like ZIPT)? 6. **Backtracking scope**: How will Nielsen graph state interact with SMT solver backtracking? 7. **Performance target**: What benchmarks should `theory_nseq` match or exceed compared to `theory_seq`?