3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-03-09 23:00:30 +00:00
z3/research/docs/2026-03-03-zipt-string-theory-codebase-analysis.md
Nikolaj Bjorner ce3ddd1718 added specs and research
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
2026-03-03 13:17:29 -08:00

29 KiB
Raw Permalink Blame History

date researcher git_commit branch repository topic tags status last_updated last_updated_by
2026-03-03 20:46:00 UTC Copilot CLI fb674ac5b2 c3 c3 ZIPT-related string theory components: sgraph, seq_plugin, Nielsen graphs, theory_seq, seq_rewriter, axioms
research
codebase
zipt
sgraph
seq_plugin
nielsen
theory_seq
seq_rewriter
seq_axioms
seq_decl_plugin
strings
regex
complete 2026-03-03 Copilot CLI

Research: ZIPT-Related String Theory Components in Z3

Research Question

Examine the Z3 codebase and specifications in the spec/ directory, focusing on code related to the ZIPT repository (https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT) and the files for sgraph, seq_plugin, Nielsen graphs, theory_seq, and associated files for sequences (seq_rewriter, axioms for seq).

Summary

This document provides a comprehensive analysis of Z3's string/sequence theory infrastructure and its relationship to the ZIPT reference implementation. Z3 has two layers of string solving:

  1. Existing production solver (theory_seq): A mature, axiom-driven solver in src/smt/ with 20+ final_check phases, regex derivatives, and Skolemization.
  2. New ZIPT-inspired infrastructure (in progress): sgraph (string graph), seq_plugin (egraph plugin), and nielsen_graph (Nielsen transformations) — implementing the ZIPT approach of Nielsen-style word equation solving with lazy regex membership and Parikh image reasoning.

The spec directory (spec/plan1.md) describes the goal: implement theory_nseq as a new theory solver using the ZIPT algorithm, selectable via smt.string_solver=nseq. No theory_nseq files exist yet — only the foundational data structures (sgraph, snode, seq_plugin, nielsen_graph) are implemented and tested.


Detailed Findings

1. ZIPT Reference Implementation (C#)

Repository: https://github.com/CEisenhofer/ZIPT (parikh branch)

The ZIPT solver implements Nielsen-style word equation solving with lazy regex membership and Parikh image analysis. It is a Z3 user-propagator written in C#.

Architecture

Component File(s) Purpose
StringPropagator StringPropagator.cs Base user-propagator; handles Z3 callbacks (Fixed, Created, Eq, DisEq)
NielsenGraph Constraints/NielsenGraph.cs Search graph with iterative deepening DFS
NielsenNode Constraints/NielsenNode.cs Node = set of constraints; supports simplification and extension
NielsenEdge Constraints/NielsenEdge.cs Edge = substitutions + side constraints
StrManager Strings/StrManager.cs String canonicalization, concatenation, substitution
PDD IntUtils/PDD.cs Polynomial Decision Diagrams for integer reasoning
Interval IntUtils/Interval.cs Interval arithmetic for bounds

Constraint Types

  • StrEq: Word equations LHS = RHS (regex-free)
  • StrMem: Regex membership str ∈ L(regex) with history tracking
  • IntEq: Polynomial equality P = 0 (over length variables)
  • IntLe: Polynomial inequality P ≤ 0
  • IntNonEq: Polynomial disequality P ≠ 0
  • Character ranges: Symbolic char ∈ [a-z]
  • Character disequalities: c₁ ≠ c₂

Modifier Chain (Priority Order)

  1. DetModifier — Deterministic simplifications (highest priority)
  2. PowerEpsilonModifier — Handle empty power: u = "" || n = 0
  3. NumCmpModifier — Numeric comparisons: n ≤ m || n > m
  4. ConstNumUnwindingModifier — Constant power unwinding: n = 0 || n > 0
  5. EqSplitModifier — Split equality into prefix/suffix parts
  6. StarIntrModifier — Introduce stabilizer x ∈ base*
  7. GPowerIntrModifier — Introduce power x := u^n prefix(u)
  8. ConstNielsenModifier — Nielsen split against constant: x := ax || x := ""
  9. RegexCharSplitModifier — Character-level regex split via minterms
  10. RegexVarSplitModifier — Variable-level regex split: x / "" || x / cx
  11. PowerSplitModifier — Power unwinding
  12. VarNielsenModifier — Variable Nielsen split (lowest priority)
  13. VarNumUnwindingModifier — Variable power unwinding

Key Design Patterns

  • Lazy regex evaluation: Only consume characters when concrete
  • Canonicalization: All strings/polynomials are canonicalized
  • Parikh constraints: Length reasoning via PDD polynomials + interval arithmetic
  • Iterative deepening: Depth-bounded search with increasing limits
  • Subsumption: Avoid redundant branches via node subsumption checking
  • Modification counts: Version string variables for Z3 (x → x_0, x_1, ...)

2. sgraph — String Graph (src/ast/euf/)

Files: euf_sgraph.h, euf_sgraph.cpp, euf_snode.h

The sgraph maps Z3 AST expressions to snode (string node) structures organized as binary concatenation trees. This is the Z3-native equivalent of ZIPT's StrManager.

snode Class (euf_snode.h)

snode_kind enum (16 kinds):

Kind AST Op Description
s_empty OP_SEQ_EMPTY Empty string
s_char OP_SEQ_UNIT (literal) Concrete character
s_var uninterpreted const String variable
s_unit OP_SEQ_UNIT (non-literal) Generic unit
s_concat OP_SEQ_CONCAT Binary concatenation
s_power OP_SEQ_POWER String exponentiation s^n
s_star OP_RE_STAR Kleene star r*
s_loop OP_RE_LOOP Bounded loop r{lo,hi}
s_union OP_RE_UNION Union r₁|r₂
s_intersect OP_RE_INTERSECT Intersection r₁&r₂
s_complement OP_RE_COMPLEMENT Complement ~r
s_fail OP_RE_EMPTY_SET Empty language ∅
s_full_char OP_RE_FULL_CHAR_SET Full character set .
s_full_seq OP_RE_FULL_SEQ_SET Full sequence set .*
s_to_re OP_SEQ_TO_RE String to regex
s_in_re OP_SEQ_IN_RE Regex membership
s_other (default) Other expression

Key Members:

  • expr* m_expr — Z3 AST expression
  • snode_kind m_kind — Classification
  • unsigned m_id — Unique ID
  • bool m_ground — No uninterpreted variables in subtree
  • bool m_regex_free — No regex constructs in subtree
  • bool m_nullable — Accepts empty string
  • unsigned m_level — Tree depth
  • unsigned m_length — Token count (leaf count)
  • unsigned m_hash_matrix[2][2] — 2×2 polynomial hash matrix for O(1) associative hashing
  • snode_subst_cache* m_subst_cache — ZIPT-style substitution cache
  • snode* m_args[0] — Flexible array of children

Hash Matrix (HASH_BASE = 31):

  • Empty: identity matrix [[1,0],[0,1]]
  • Leaf: [[31, expr_id+1], [0, 1]]
  • Concat: matrix multiplication M(left) * M(right)
  • Hash value: m_hash_matrix[0][1] (assoc_hash)

Navigation: first(), last(), at(i), collect_tokens(), is_token() (everything except empty and concat)

sgraph Class (euf_sgraph.h/cpp)

Constructor: sgraph(ast_manager& m, egraph& eg, bool add_plugin = true)

  • Stores reference to egraph (not owned)
  • If add_plugin=true, creates and registers seq_plugin
  • Registers on_make callback with egraph: when any enode is created with seq/re sort, automatically creates corresponding snode

Core Operations:

Method Description
mk(expr* e) Create snode from expression (idempotent)
find(expr* e) Lookup existing snode
mk_var(symbol) Create variable snode
mk_char(unsigned) Create character snode
mk_empty() Create empty snode
mk_concat(a, b) Create concat (with identity elimination)
drop_first/last(n) Remove first/last token
drop_left/right(n, k) Remove k tokens from left/right
subst(n, var, repl) Substitute variable with replacement (cached)
brzozowski_deriv(re, elem) Regex derivative via seq_rewriter
compute_minterms(re, out) Extract character classes (capped at 20 predicates)
push()/pop(n) Scope management synchronized with egraph

Substitution Caching: Each snode has lazy-allocated snode_subst_cache storing (var_id, repl_id) → result mappings, avoiding redundant substitution computations (ZIPT pattern).


3. seq_plugin — Egraph Plugin (src/ast/euf/)

Files: euf_seq_plugin.h, euf_seq_plugin.cpp

The seq_plugin implements equality saturation for sequence/regex concatenation in the egraph. It handles associativity, identity/absorption rules, and Kleene star simplifications.

Class Hierarchy

euf::plugin (abstract base)
  └── euf::seq_plugin

Constructor: seq_plugin(egraph& g, sgraph* sg = nullptr)

  • If sg is null, creates and owns a new sgraph
  • Initializes seq_util, seq_rewriter, hash table with custom functors

Concat Handling

Predicate String Regex
Concat is_str_concat() / OP_SEQ_CONCAT is_re_concat() / OP_RE_CONCAT
Identity is_str_empty() / ε is_re_epsilon() / to_re("")
Absorbing (none) is_re_empty() / ∅

Simplification Rules (from propagate_simplify):

  1. Kleene star merging: concat(v*, v*) = v* when bodies are congruent (root equality)
  2. Extended star merging: concat(v*, concat(v*, c)) = concat(v*, c)
  3. Nullable absorption: concat(.*, v) = .* when v is nullable
  4. Symmetric nullable absorption: concat(v, .*) = .* when v is nullable

Associativity (from propagate_assoc):

  • Maintains hashtable<enode*, enode_concat_hash, enode_concat_eq> that respects associativity
  • Hash: Uses sgraph's cached snode hash matrices for O(1) hashing, or falls back to leaf-based hashing
  • Equality: Compares flattened leaf sequences pointwise
  • When duplicate found, merges via push_merge

Nullability: Delegates to m_rewriter.is_nullable(e) (from seq_rewriter)

Key Relationship: seq_plugin does NOT create snodes. The sgraph's on_make callback handles snode creation. seq_plugin only observes egraph events and triggers merges.


4. Nielsen Graph (src/smt/seq/)

Files: seq_nielsen.h, seq_nielsen.cpp, CMakeLists.txt

The Nielsen graph implements word equation solving via Nielsen transformations. This is the Z3-native port of ZIPT's NielsenGraph, NielsenNode, NielsenEdge, and constraint types.

Classes

dep_tracker (seq_nielsen.h:81-94)

Bitvector-based dependency tracker for conflict explanation.

  • merge(other): Bitwise OR
  • is_superset(other): All bits of other present
  • is_empty(): No bits set
str_eq (seq_nielsen.h:98-119)

Word equation lhs = rhs (both regex-free snode trees).

  • sort(): Canonicalize by snode ID
  • is_trivial(): Same snode or both empty
  • contains_var(var): Check if variable appears in lhs or rhs
str_mem (seq_nielsen.h:123-143)

Regex membership str ∈ L(regex).

  • m_history: Consumed prefix for cycle detection
  • m_id: Unique identifier
  • is_primitive(): True when str is a single variable
nielsen_subst (seq_nielsen.h:147-162)

Variable substitution var → replacement.

  • is_eliminating(): True when variable doesn't appear in replacement (progress)
nielsen_edge (seq_nielsen.h:166-195)

Directed edge in the Nielsen graph.

  • m_subst: Vector of substitutions applied
  • m_side_str_eq/m_side_str_mem: Side constraints added
  • m_is_progress: Whether this is an eliminating step
nielsen_node (seq_nielsen.h:199-270)

Node = set of constraints at a point in the search.

  • m_str_eq: String equality constraints
  • m_str_mem: Regex membership constraints
  • m_outgoing: Outgoing edges
  • m_backedge: For cycle detection
  • m_is_general_conflict, m_reason: Conflict tracking
  • clone_from(parent): Copy constraints
  • apply_subst(sg, s): Apply substitution to all constraints
nielsen_graph (seq_nielsen.h:274-325)

The overall graph structure.

  • m_sg: Reference to sgraph infrastructure
  • m_root: Root node with initial constraints
  • m_run_idx, m_depth_bound: For iterative deepening
  • mk_node(), mk_child(parent), mk_edge(src, tgt, progress)
  • add_str_eq(lhs, rhs), add_str_mem(str, regex): Populate from external solver
  • reset(), inc_run_idx(): Lifecycle management

Enums

simplify_result: proceed, conflict, satisfied, restart, restart_and_satisfied

backtrack_reason: unevaluated, extended, symbol_clash, parikh_image, subsumption, arithmetic, regex, regex_widening, character_range, smt, children_failed

Build Configuration

src/smt/seq/CMakeLists.txt: Component smt_seq, depends on euf and rewriter.

Current Status

The Nielsen graph provides data structures and infrastructure — node/edge/constraint/substitution management, dependency tracking, backtracking support, display. The actual transformation algorithms (modifier chain, graph expansion, conflict detection, Parikh image analysis) are not yet implemented. Tests cover all data structure operations but not algorithmic behavior.


5. theory_seq — Existing String Theory Solver (src/smt/)

Files: theory_seq.h, theory_seq.cpp, theory_seq_empty.h

The production string solver implementing theory interface. Uses axiom-driven solving with 20+ cascading phases.

Class Structure

smt::theory (base)
seq::eq_solver_context (interface)
  └── smt::theory_seq

Component Submodules (declared in theory_seq.h:356-366):

  • seq::skolem m_sk — Skolemization
  • seq_axioms m_ax — Axiom generation (bridges rewriter and SMT layers)
  • seq::eq_solver m_eq — Core equality solver (solver-agnostic)
  • seq_regex m_regex — Regular expression handling
  • seq_offset_eq m_offset_eq — Length offset tracking
  • th_union_find m_find — Union-find for theory variables
  • dependency_manager m_dm — Justification tracking
  • solution_map m_rep — Variable → representative mapping

Final Check Cascade (20 phases)

1.  simplify_and_solve_eqs()     — Solve unitary equalities
2.  check_lts()                   — Lexicographic ordering
3.  solve_nqs(0)                  — Disequalities
4.  m_regex.propagate()           — Regex propagation
5.  check_contains()              — Contains constraints
6.  check_fixed_length(true,false)— Zero-length sequences
7.  len_based_split()             — Length-based splitting
8.  check_fixed_length(false,false)— Fixed-length expansion
9.  check_int_string()            — int/string conversions
10. check_ubv_string()            — bitvector/string conversions
11. reduce_length_eq()            — Reduce by length equality
12. branch_unit_variable()        — Branch on unit variables
13. branch_binary_variable()      — Branch on binary patterns
14. branch_variable()             — General variable branching
15. check_length_coherence()      — Length coherence
16. check_extensionality()        — Extensionality
17. branch_nqs()                  — Branch on disequalities
18. branch_itos()                 — Branch on int-to-string
19. check_fixed_length(false,true)— Fixed-length (long strings)
20. solve_recfuns()               — Recursive functions

Returns FC_CONTINUE if progress, FC_GIVEUP if stuck, FC_DONE if solved.

String Solver Selection (smt_setup.cpp)

Parameter: smt.string_solver

  • "seq"setup_seq()alloc(smt::theory_seq, m_context)
  • "empty"setup_seq() (with empty variant)
  • "none" → No solver
  • "auto"setup_seq()
  • "nseq"Not yet implemented (planned per spec)

Key Data Structures

  • solution_map m_rep — Maps variables to representatives (with normalization cache)
  • scoped_vector<depeq> m_eqs — Active equations
  • scoped_vector<ne> m_nqs — Disequalities
  • scoped_vector<nc> m_ncs — Negated contains
  • exclusion_table m_exclude — Asserted disequalities
  • expr_ref_vector m_axioms — Axiom queue (with deduplication)

6. seq_rewriter — Sequence Rewriter (src/ast/rewriter/)

Files: seq_rewriter.h, seq_rewriter.cpp

Provides local syntactic simplification rules for all string/sequence operations. Entry point: mk_app_core(func_decl* f, ...).

Rewrite Categories

Category Key Methods Description
Construction mk_seq_unit, mk_seq_concat Coalesce chars, right-associate, eliminate empty
Extraction mk_seq_extract, mk_seq_at, mk_seq_nth Substring, index access
Length mk_seq_length Compute length, simplify for concat/replace/extract
Search mk_seq_contains, mk_seq_index, mk_seq_last_index Containment, index-of
Replacement mk_seq_replace, mk_seq_replace_all, mk_seq_replace_re* String replacement
Comparison mk_str_le, mk_str_lt, mk_seq_prefix, mk_seq_suffix Lexicographic, prefix/suffix
Conversion mk_str_itos, mk_str_stoi, mk_str_from_code, mk_str_to_code Type conversions
Higher-order mk_seq_map, mk_seq_mapi, mk_seq_foldl, mk_seq_foldli Map, fold
Regex mk_re_concat, mk_re_union, mk_re_inter, mk_re_star, etc. Regex construction
Derivative mk_re_derivativemk_antimirov_deriv Antimirov-style regex derivative
Nullability is_nullableis_nullable_rec Nullable check for regex

Regex Derivative System

Uses Antimirov derivatives with a BDD-like normal form:

  1. Top level: Union of Antimirov derivatives using _OP_RE_ANTIMIROV_UNION
  2. Middle level: Nested if-then-else sorted by condition ID
  3. Leaf level: Regular regex unions

Key derivative rules:

  • D(e, r₁·r₂) = D(e,r₁)·r₂ (nullable(r₁) ? D(e,r₂) : ∅)
  • D(e, r₁r₂) = D(e,r₁) D(e,r₂)
  • D(e, r*) = D(e,r)·r*
  • D(e, ¬r) = ¬D(e,r)
  • D(e, [lo-hi]) = ε if lo ≤ e ≤ hi else ∅

Validated by check_deriv_normal_form in debug mode.


7. Axiom System (src/ast/rewriter/ + src/smt/)

Two layers: rewriter-level (solver-independent) and SMT-level (bridges to Z3 context).

Rewriter-Level Axioms (seq::axioms, src/ast/rewriter/seq_axioms.h/cpp)

Solver-agnostic axiom generation via callbacks. Key axioms:

Operation Axiom Summary
extract(s,i,l) 0≤i≤|s| ∧ 0≤l → s=x·e·y ∧ |x|=i ∧ |e|=min(l,|s|-i)
indexof(t,s,off) contains(t,s) → t=x·s·y ∧ indexof=|x| + tightest prefix
replace(u,s,t) contains(u,s) → u=x·s·y ∧ r=x·t·y
at(s,i) 0≤i<|s| → s=x·e·y ∧ |x|=i ∧ |e|=1
itos(n) n≥0 → stoi(itos(n))=n ∧ no leading zeros
stoi(s) stoi(s)≥-1 ∧ stoi("")=-1

Introduced via Skolem functions (see below).

SMT-Level Axioms (smt::seq_axioms, src/smt/seq_axioms.h/cpp)

Wraps seq::axioms with literal creation, clause addition, and phase forcing for Z3's SMT context.


8. Skolemization (src/ast/rewriter/seq_skolem.h/cpp)

seq::skolem introduces auxiliary Skolem functions for structural reasoning:

Skolem Symbol Purpose
seq.pre m_pre Prefix of s up to index i
seq.post m_post Suffix of s from index i
seq.tail m_tail Rest after first element
seq.first m_seq_first First element
seq.last m_seq_last Last element
seq.idx.l/r m_indexof_left/right Decomposition for indexof: t = left·s·right
seq.lidx.l/r m_lindexof_left/right Decomposition for last_indexof
seq.cnt.l/r Decomposition for contains
aut.accept m_accept Regex automaton acceptance
aut.step m_aut_step Automaton state transition
seq.eq m_eq Special equality predicate
seq.length_limit m_length_limit Length bound

9. seq_decl_plugin — Declaration Plugin (src/ast/)

Files: seq_decl_plugin.h, seq_decl_plugin.cpp

Provides the theory declarations for sequences, strings, and regular expressions.

Operation Kinds (seq_op_kind enum)

Sequence: OP_SEQ_UNIT, OP_SEQ_EMPTY, OP_SEQ_CONCAT, OP_SEQ_PREFIX, OP_SEQ_SUFFIX, OP_SEQ_CONTAINS, OP_SEQ_EXTRACT, OP_SEQ_REPLACE, OP_SEQ_AT, OP_SEQ_NTH, OP_SEQ_NTH_I, OP_SEQ_NTH_U, OP_SEQ_LENGTH, OP_SEQ_INDEX, OP_SEQ_LAST_INDEX, OP_SEQ_TO_RE, OP_SEQ_IN_RE, OP_SEQ_REPLACE_RE_ALL, OP_SEQ_REPLACE_RE, OP_SEQ_REPLACE_ALL, OP_SEQ_MAP, OP_SEQ_MAPI, OP_SEQ_FOLDL, OP_SEQ_FOLDLI, OP_SEQ_POWER

Regex: OP_RE_PLUS, OP_RE_STAR, OP_RE_OPTION, OP_RE_RANGE, OP_RE_CONCAT, OP_RE_UNION, OP_RE_DIFF, OP_RE_INTERSECT, OP_RE_LOOP, OP_RE_POWER, OP_RE_COMPLEMENT, OP_RE_EMPTY_SET, OP_RE_FULL_SEQ_SET, OP_RE_FULL_CHAR_SET, OP_RE_OF_PRED, OP_RE_REVERSE, OP_RE_DERIVATIVE

String-specific: OP_STRING_CONST, OP_STRING_ITOS, OP_STRING_STOI, OP_STRING_UBVTOS, OP_STRING_SBVTOS, OP_STRING_LT, OP_STRING_LE, OP_STRING_IS_DIGIT, OP_STRING_TO_CODE, OP_STRING_FROM_CODE

seq_util Utility Class

Nested classes for operation construction/recognition:

  • seq_util::str — String/sequence operations (mk_concat, mk_length, mk_empty, is_concat, is_empty, mk_power, is_power, etc.)
  • seq_util::rex — Regex operations (mk_star, mk_union, mk_inter, mk_complement, mk_concat, is_star, is_concat, is_nullable, etc.)

Sort System

  • SEQ_SORT — Generic Seq[T]
  • RE_SORT — RegEx[Seq[T]]
  • _STRING_SORT — Seq[Char] (string)
  • _REGLAN_SORT — RegEx[String]

10. Sequence Equality Solver (src/ast/rewriter/seq_eq_solver.h/cpp)

Solver-agnostic equality inference for sequence equations:

  • reduce_unit: Handle unit equations
  • reduce_itos1/2/3: Handle itos(s) = itos(t), itos(s) = "", itos(n) = "d1d2...dk"
  • reduce_binary_eq: Handle x·xs = ys·y patterns
  • reduce_nth_solved: Solve nth equations
  • branch_unit_variable: Split variable as empty or non-empty

11. Test Infrastructure

Test Files

File Tests Lines Component
src/test/euf_sgraph.cpp 18 769 sgraph: classification, metadata, backtracking, navigation, substitution, derivatives, minterms
src/test/euf_seq_plugin.cpp 7 242 seq_plugin: associativity, identity elimination, star merging, nullable absorption, egraph sync
src/test/seq_nielsen.cpp 13 480 Nielsen: dep_tracker, str_eq, str_mem, subst, node, edge, graph population, expansion, backedge

Registration

  • src/test/main.cpp: TST(euf_sgraph) (line 284), TST(euf_seq_plugin) (line 285), TST(seq_nielsen) (line 289)
  • src/test/CMakeLists.txt: All three .cpp files included; depends on smt_seq
  • Build: ninja test-z3, Run: ./test-z3 euf_sgraph, ./test-z3 euf_seq_plugin, ./test-z3 seq_nielsen

Test Setup Pattern

ast_manager m;
reg_decl_plugins(m);
euf::egraph eg(m);
euf::sgraph sg(m, eg);        // Creates seq_plugin automatically
seq_util seq(m);
sort_ref str_sort(seq.str.mk_string_sort(), m);
// For nielsen:
seq::nielsen_graph ng(sg);

Code References

ZIPT Infrastructure (New)

File Description
src/ast/euf/euf_snode.h snode class, snode_kind enum, hash matrix, subst cache
src/ast/euf/euf_sgraph.h sgraph class header
src/ast/euf/euf_sgraph.cpp sgraph implementation (classify, metadata, hash, subst, derivatives, minterms)
src/ast/euf/euf_seq_plugin.h seq_plugin class header
src/ast/euf/euf_seq_plugin.cpp seq_plugin implementation (assoc, simplify, nullable)
src/smt/seq/seq_nielsen.h Nielsen graph classes (str_eq, str_mem, subst, edge, node, graph)
src/smt/seq/seq_nielsen.cpp Nielsen graph implementation
src/smt/seq/CMakeLists.txt smt_seq component build config

Existing String Theory (Production)

File Description
src/smt/theory_seq.h theory_seq class with 20-phase final_check
src/smt/theory_seq.cpp Full theory solver implementation
src/smt/theory_seq_empty.h Empty/stub variant
src/smt/seq_axioms.h/cpp SMT-level axiom bridge
src/smt/seq_regex.h/cpp Regex handling for theory_seq
src/smt/seq_eq_solver.cpp SMT-level equality solving
src/smt/seq_ne_solver.cpp Disequality solving
src/smt/seq_offset_eq.h/cpp Length offset equality
src/smt/smt_setup.h/cpp String solver selection/instantiation

Rewriter Layer (Shared)

File Description
src/ast/rewriter/seq_rewriter.h/cpp Rewrite rules + regex derivatives
src/ast/rewriter/seq_axioms.h/cpp Solver-independent axiom generation
src/ast/rewriter/seq_skolem.h/cpp Skolem function introduction
src/ast/rewriter/seq_eq_solver.h/cpp Solver-agnostic equality inference

Declaration Layer

File Description
src/ast/seq_decl_plugin.h/cpp Sort/operation declarations, seq_util
src/model/seq_factory.h Model value factory
src/params/theory_seq_params.h/cpp Theory parameters
src/params/seq_rewriter_params.pyg Rewriter parameters

Tests

File Description
src/test/euf_sgraph.cpp 18 sgraph unit tests
src/test/euf_seq_plugin.cpp 7 seq_plugin tests
src/test/seq_nielsen.cpp 13 Nielsen graph tests
src/test/main.cpp Test registration (lines 284, 285, 289)
src/test/CMakeLists.txt Test build config (lines 54, 55, 134)

Architecture Documentation

Component Dependency Graph

seq_decl_plugin  (AST sort/op declarations)
    ↓
seq_util  (utility wrapper)
    ↓
seq_rewriter  (simplification rules, derivatives)
    ↓ ↘
seq_axioms (rewriter)    seq_skolem
    ↓                       ↓
seq_axioms (SMT)   ←───────┘
    ↓
┌────────────┬──────────────────────┐
│ theory_seq │  theory_nseq (TODO)  │
│ (existing) │  (per spec/plan1.md) │
├────────────┤──────────────────────┤
│ seq_regex  │  nielsen_graph ←──── sgraph ←── snode
│ seq_eq/ne  │  (new ZIPT infra)    seq_plugin
│ solution_map│                      egraph
└────────────┴──────────────────────┘

ZIPT ↔ Z3 Component Mapping

ZIPT (C#) Z3 (C++) Location
StrManager sgraph + snode src/ast/euf/euf_sgraph.*, euf_snode.h
Str (token tree) snode (binary concat tree) src/ast/euf/euf_snode.h
StrEq seq::str_eq src/smt/seq/seq_nielsen.h:98
StrMem seq::str_mem src/smt/seq/seq_nielsen.h:123
NielsenGraph seq::nielsen_graph src/smt/seq/seq_nielsen.h:274
NielsenNode seq::nielsen_node src/smt/seq/seq_nielsen.h:199
NielsenEdge seq::nielsen_edge src/smt/seq/seq_nielsen.h:166
Subst seq::nielsen_subst src/smt/seq/seq_nielsen.h:147
DependencyTracker seq::dep_tracker src/smt/seq/seq_nielsen.h:81
SimplifyResult seq::simplify_result src/smt/seq/seq_nielsen.h:54
BacktrackReasons seq::backtrack_reason src/smt/seq/seq_nielsen.h:64
PDD / Interval (not yet ported)
Modifier chain (not yet ported)
StringPropagator theory_nseq (TODO)

What Exists vs. What's Planned

Implemented:

  • sgraph + snode (string representation, metadata, hashing, substitution, derivatives, minterms)
  • seq_plugin (associativity, identity, absorption, star merge, nullable absorption)
  • Nielsen graph data structures (str_eq, str_mem, subst, edge, node, graph, dep_tracker)
  • All unit tests passing (18 + 7 + 13 = 38 tests)

Not Yet Implemented (per spec/plan1.md):

  • theory_nseq skeleton and SMT integration
  • Parameter integration (smt.string_solver=nseq)
  • Nielsen transformation algorithms (modifier chain)
  • Graph expansion / iterative deepening search
  • Conflict detection and explanation
  • Parikh image / PDD / Interval reasoning
  • Regex splitting modifiers
  • String propagator (propagation engine)
  • Model generation
  • Integration tests

Historical Context (from spec/)

spec/reference.md

Points to ZIPT repository (parikh branch) and two PDF papers:

  • LazyMemberships.pdf — Lazy regex membership approach
  • ClemensTableau.pdf — Additional tableau algorithms

spec/plan.md

Task description: Implement theory_nseq in src/smt/, add smt.string_solver parameter support, put self-contained utilities in ast/rewriter/.

spec/plan1.md

Detailed implementation plan with 7 phases:

  1. Skeleton & Integration
  2. Core Data Structures (state, constraints, dependencies)
  3. Word Equation Solver (Nielsen transformations)
  4. Length Constraint Integration (Parikh image)
  5. Regex Membership
  6. Propagation Engine
  7. Model Generation

Open Questions

  1. theory_nseq skeleton: When will Phase 1 (skeleton + parameter integration) be implemented?
  2. Modifier chain priority: Should Z3's modifier priorities match ZIPT's exactly, or be adjusted?
  3. PDD porting: Will Z3 reuse existing polynomial infrastructure or port ZIPT's PDD?
  4. Regex derivatives: Will theory_nseq reuse seq_rewriter::mk_derivative or implement ZIPT's approach?
  5. Integer solver integration: Will theory_nseq use arith_value (like theory_seq) or a sub-solver (like ZIPT)?
  6. Backtracking scope: How will Nielsen graph state interact with SMT solver backtracking?
  7. Performance target: What benchmarks should theory_nseq match or exceed compared to theory_seq?