mirror of
https://github.com/Z3Prover/z3
synced 2026-03-07 05:44:51 +00:00
added specs and research
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
This commit is contained in:
parent
7438ae6975
commit
ce3ddd1718
1 changed files with 667 additions and 0 deletions
667
research/docs/2026-03-03-zipt-string-theory-codebase-analysis.md
Normal file
667
research/docs/2026-03-03-zipt-string-theory-codebase-analysis.md
Normal file
|
|
@ -0,0 +1,667 @@
|
|||
---
|
||||
date: 2026-03-03 20:46:00 UTC
|
||||
researcher: Copilot CLI
|
||||
git_commit: fb674ac5b2e722dfe1086c7d952054a0bc11c963
|
||||
branch: c3
|
||||
repository: c3
|
||||
topic: "ZIPT-related string theory components: sgraph, seq_plugin, Nielsen graphs, theory_seq, seq_rewriter, axioms"
|
||||
tags: [research, codebase, zipt, sgraph, seq_plugin, nielsen, theory_seq, seq_rewriter, seq_axioms, seq_decl_plugin, strings, regex]
|
||||
status: complete
|
||||
last_updated: 2026-03-03
|
||||
last_updated_by: Copilot CLI
|
||||
---
|
||||
|
||||
# Research: ZIPT-Related String Theory Components in Z3
|
||||
|
||||
## Research Question
|
||||
|
||||
Examine the Z3 codebase and specifications in the `spec/` directory, focusing on code related to the ZIPT repository (https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT) and the files for sgraph, seq_plugin, Nielsen graphs, theory_seq, and associated files for sequences (seq_rewriter, axioms for seq).
|
||||
|
||||
## Summary
|
||||
|
||||
This document provides a comprehensive analysis of Z3's string/sequence theory infrastructure and its relationship to the ZIPT reference implementation. Z3 has **two layers** of string solving:
|
||||
|
||||
1. **Existing production solver** (`theory_seq`): A mature, axiom-driven solver in `src/smt/` with 20+ final_check phases, regex derivatives, and Skolemization.
|
||||
2. **New ZIPT-inspired infrastructure** (in progress): `sgraph` (string graph), `seq_plugin` (egraph plugin), and `nielsen_graph` (Nielsen transformations) — implementing the ZIPT approach of Nielsen-style word equation solving with lazy regex membership and Parikh image reasoning.
|
||||
|
||||
The spec directory (`spec/plan1.md`) describes the goal: implement `theory_nseq` as a new theory solver using the ZIPT algorithm, selectable via `smt.string_solver=nseq`. No `theory_nseq` files exist yet — only the foundational data structures (sgraph, snode, seq_plugin, nielsen_graph) are implemented and tested.
|
||||
|
||||
---
|
||||
|
||||
## Detailed Findings
|
||||
|
||||
### 1. ZIPT Reference Implementation (C#)
|
||||
|
||||
**Repository**: https://github.com/CEisenhofer/ZIPT (parikh branch)
|
||||
|
||||
The ZIPT solver implements Nielsen-style word equation solving with lazy regex membership and Parikh image analysis. It is a Z3 user-propagator written in C#.
|
||||
|
||||
#### Architecture
|
||||
|
||||
| Component | File(s) | Purpose |
|
||||
|-----------|---------|---------|
|
||||
| `StringPropagator` | `StringPropagator.cs` | Base user-propagator; handles Z3 callbacks (Fixed, Created, Eq, DisEq) |
|
||||
| `NielsenGraph` | `Constraints/NielsenGraph.cs` | Search graph with iterative deepening DFS |
|
||||
| `NielsenNode` | `Constraints/NielsenNode.cs` | Node = set of constraints; supports simplification and extension |
|
||||
| `NielsenEdge` | `Constraints/NielsenEdge.cs` | Edge = substitutions + side constraints |
|
||||
| `StrManager` | `Strings/StrManager.cs` | String canonicalization, concatenation, substitution |
|
||||
| `PDD` | `IntUtils/PDD.cs` | Polynomial Decision Diagrams for integer reasoning |
|
||||
| `Interval` | `IntUtils/Interval.cs` | Interval arithmetic for bounds |
|
||||
|
||||
#### Constraint Types
|
||||
|
||||
- **StrEq**: Word equations `LHS = RHS` (regex-free)
|
||||
- **StrMem**: Regex membership `str ∈ L(regex)` with history tracking
|
||||
- **IntEq**: Polynomial equality `P = 0` (over length variables)
|
||||
- **IntLe**: Polynomial inequality `P ≤ 0`
|
||||
- **IntNonEq**: Polynomial disequality `P ≠ 0`
|
||||
- **Character ranges**: Symbolic char ∈ [a-z]
|
||||
- **Character disequalities**: c₁ ≠ c₂
|
||||
|
||||
#### Modifier Chain (Priority Order)
|
||||
|
||||
1. **DetModifier** — Deterministic simplifications (highest priority)
|
||||
2. **PowerEpsilonModifier** — Handle empty power: `u = "" || n = 0`
|
||||
3. **NumCmpModifier** — Numeric comparisons: `n ≤ m || n > m`
|
||||
4. **ConstNumUnwindingModifier** — Constant power unwinding: `n = 0 || n > 0`
|
||||
5. **EqSplitModifier** — Split equality into prefix/suffix parts
|
||||
6. **StarIntrModifier** — Introduce stabilizer `x ∈ base*`
|
||||
7. **GPowerIntrModifier** — Introduce power `x := u^n prefix(u)`
|
||||
8. **ConstNielsenModifier** — Nielsen split against constant: `x := ax || x := ""`
|
||||
9. **RegexCharSplitModifier** — Character-level regex split via minterms
|
||||
10. **RegexVarSplitModifier** — Variable-level regex split: `x / "" || x / cx`
|
||||
11. **PowerSplitModifier** — Power unwinding
|
||||
12. **VarNielsenModifier** — Variable Nielsen split (lowest priority)
|
||||
13. **VarNumUnwindingModifier** — Variable power unwinding
|
||||
|
||||
#### Key Design Patterns
|
||||
|
||||
- **Lazy regex evaluation**: Only consume characters when concrete
|
||||
- **Canonicalization**: All strings/polynomials are canonicalized
|
||||
- **Parikh constraints**: Length reasoning via PDD polynomials + interval arithmetic
|
||||
- **Iterative deepening**: Depth-bounded search with increasing limits
|
||||
- **Subsumption**: Avoid redundant branches via node subsumption checking
|
||||
- **Modification counts**: Version string variables for Z3 (x → x_0, x_1, ...)
|
||||
|
||||
---
|
||||
|
||||
### 2. sgraph — String Graph (`src/ast/euf/`)
|
||||
|
||||
**Files**: `euf_sgraph.h`, `euf_sgraph.cpp`, `euf_snode.h`
|
||||
|
||||
The sgraph maps Z3 AST expressions to `snode` (string node) structures organized as binary concatenation trees. This is the Z3-native equivalent of ZIPT's `StrManager`.
|
||||
|
||||
#### snode Class (`euf_snode.h`)
|
||||
|
||||
**snode_kind enum** (16 kinds):
|
||||
|
||||
| Kind | AST Op | Description |
|
||||
|------|--------|-------------|
|
||||
| `s_empty` | `OP_SEQ_EMPTY` | Empty string |
|
||||
| `s_char` | `OP_SEQ_UNIT` (literal) | Concrete character |
|
||||
| `s_var` | uninterpreted const | String variable |
|
||||
| `s_unit` | `OP_SEQ_UNIT` (non-literal) | Generic unit |
|
||||
| `s_concat` | `OP_SEQ_CONCAT` | Binary concatenation |
|
||||
| `s_power` | `OP_SEQ_POWER` | String exponentiation s^n |
|
||||
| `s_star` | `OP_RE_STAR` | Kleene star r* |
|
||||
| `s_loop` | `OP_RE_LOOP` | Bounded loop r{lo,hi} |
|
||||
| `s_union` | `OP_RE_UNION` | Union r₁\|r₂ |
|
||||
| `s_intersect` | `OP_RE_INTERSECT` | Intersection r₁&r₂ |
|
||||
| `s_complement` | `OP_RE_COMPLEMENT` | Complement ~r |
|
||||
| `s_fail` | `OP_RE_EMPTY_SET` | Empty language ∅ |
|
||||
| `s_full_char` | `OP_RE_FULL_CHAR_SET` | Full character set . |
|
||||
| `s_full_seq` | `OP_RE_FULL_SEQ_SET` | Full sequence set .* |
|
||||
| `s_to_re` | `OP_SEQ_TO_RE` | String to regex |
|
||||
| `s_in_re` | `OP_SEQ_IN_RE` | Regex membership |
|
||||
| `s_other` | (default) | Other expression |
|
||||
|
||||
**Key Members**:
|
||||
- `expr* m_expr` — Z3 AST expression
|
||||
- `snode_kind m_kind` — Classification
|
||||
- `unsigned m_id` — Unique ID
|
||||
- `bool m_ground` — No uninterpreted variables in subtree
|
||||
- `bool m_regex_free` — No regex constructs in subtree
|
||||
- `bool m_nullable` — Accepts empty string
|
||||
- `unsigned m_level` — Tree depth
|
||||
- `unsigned m_length` — Token count (leaf count)
|
||||
- `unsigned m_hash_matrix[2][2]` — 2×2 polynomial hash matrix for O(1) associative hashing
|
||||
- `snode_subst_cache* m_subst_cache` — ZIPT-style substitution cache
|
||||
- `snode* m_args[0]` — Flexible array of children
|
||||
|
||||
**Hash Matrix** (`HASH_BASE = 31`):
|
||||
- Empty: identity matrix `[[1,0],[0,1]]`
|
||||
- Leaf: `[[31, expr_id+1], [0, 1]]`
|
||||
- Concat: matrix multiplication `M(left) * M(right)`
|
||||
- Hash value: `m_hash_matrix[0][1]` (assoc_hash)
|
||||
|
||||
**Navigation**: `first()`, `last()`, `at(i)`, `collect_tokens()`, `is_token()` (everything except empty and concat)
|
||||
|
||||
#### sgraph Class (`euf_sgraph.h/cpp`)
|
||||
|
||||
**Constructor**: `sgraph(ast_manager& m, egraph& eg, bool add_plugin = true)`
|
||||
- Stores reference to egraph (not owned)
|
||||
- If `add_plugin=true`, creates and registers `seq_plugin`
|
||||
- Registers `on_make` callback with egraph: when any enode is created with seq/re sort, automatically creates corresponding snode
|
||||
|
||||
**Core Operations**:
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `mk(expr* e)` | Create snode from expression (idempotent) |
|
||||
| `find(expr* e)` | Lookup existing snode |
|
||||
| `mk_var(symbol)` | Create variable snode |
|
||||
| `mk_char(unsigned)` | Create character snode |
|
||||
| `mk_empty()` | Create empty snode |
|
||||
| `mk_concat(a, b)` | Create concat (with identity elimination) |
|
||||
| `drop_first/last(n)` | Remove first/last token |
|
||||
| `drop_left/right(n, k)` | Remove k tokens from left/right |
|
||||
| `subst(n, var, repl)` | Substitute variable with replacement (cached) |
|
||||
| `brzozowski_deriv(re, elem)` | Regex derivative via seq_rewriter |
|
||||
| `compute_minterms(re, out)` | Extract character classes (capped at 20 predicates) |
|
||||
| `push()/pop(n)` | Scope management synchronized with egraph |
|
||||
|
||||
**Substitution Caching**: Each snode has lazy-allocated `snode_subst_cache` storing `(var_id, repl_id) → result` mappings, avoiding redundant substitution computations (ZIPT pattern).
|
||||
|
||||
---
|
||||
|
||||
### 3. seq_plugin — Egraph Plugin (`src/ast/euf/`)
|
||||
|
||||
**Files**: `euf_seq_plugin.h`, `euf_seq_plugin.cpp`
|
||||
|
||||
The seq_plugin implements equality saturation for sequence/regex concatenation in the egraph. It handles associativity, identity/absorption rules, and Kleene star simplifications.
|
||||
|
||||
#### Class Hierarchy
|
||||
|
||||
```
|
||||
euf::plugin (abstract base)
|
||||
└── euf::seq_plugin
|
||||
```
|
||||
|
||||
**Constructor**: `seq_plugin(egraph& g, sgraph* sg = nullptr)`
|
||||
- If `sg` is null, creates and owns a new sgraph
|
||||
- Initializes `seq_util`, `seq_rewriter`, hash table with custom functors
|
||||
|
||||
#### Concat Handling
|
||||
|
||||
| Predicate | String | Regex |
|
||||
|-----------|--------|-------|
|
||||
| Concat | `is_str_concat()` / `OP_SEQ_CONCAT` | `is_re_concat()` / `OP_RE_CONCAT` |
|
||||
| Identity | `is_str_empty()` / ε | `is_re_epsilon()` / to_re("") |
|
||||
| Absorbing | (none) | `is_re_empty()` / ∅ |
|
||||
|
||||
**Simplification Rules** (from `propagate_simplify`):
|
||||
|
||||
1. **Kleene star merging**: `concat(v*, v*) = v*` when bodies are congruent (root equality)
|
||||
2. **Extended star merging**: `concat(v*, concat(v*, c)) = concat(v*, c)`
|
||||
3. **Nullable absorption**: `concat(.*, v) = .*` when v is nullable
|
||||
4. **Symmetric nullable absorption**: `concat(v, .*) = .*` when v is nullable
|
||||
|
||||
**Associativity** (from `propagate_assoc`):
|
||||
- Maintains `hashtable<enode*, enode_concat_hash, enode_concat_eq>` that respects associativity
|
||||
- Hash: Uses sgraph's cached snode hash matrices for O(1) hashing, or falls back to leaf-based hashing
|
||||
- Equality: Compares flattened leaf sequences pointwise
|
||||
- When duplicate found, merges via `push_merge`
|
||||
|
||||
**Nullability**: Delegates to `m_rewriter.is_nullable(e)` (from `seq_rewriter`)
|
||||
|
||||
**Key Relationship**: seq_plugin does NOT create snodes. The sgraph's `on_make` callback handles snode creation. seq_plugin only observes egraph events and triggers merges.
|
||||
|
||||
---
|
||||
|
||||
### 4. Nielsen Graph (`src/smt/seq/`)
|
||||
|
||||
**Files**: `seq_nielsen.h`, `seq_nielsen.cpp`, `CMakeLists.txt`
|
||||
|
||||
The Nielsen graph implements word equation solving via Nielsen transformations. This is the Z3-native port of ZIPT's `NielsenGraph`, `NielsenNode`, `NielsenEdge`, and constraint types.
|
||||
|
||||
#### Classes
|
||||
|
||||
##### dep_tracker (`seq_nielsen.h:81-94`)
|
||||
Bitvector-based dependency tracker for conflict explanation.
|
||||
- `merge(other)`: Bitwise OR
|
||||
- `is_superset(other)`: All bits of other present
|
||||
- `is_empty()`: No bits set
|
||||
|
||||
##### str_eq (`seq_nielsen.h:98-119`)
|
||||
Word equation `lhs = rhs` (both regex-free snode trees).
|
||||
- `sort()`: Canonicalize by snode ID
|
||||
- `is_trivial()`: Same snode or both empty
|
||||
- `contains_var(var)`: Check if variable appears in lhs or rhs
|
||||
|
||||
##### str_mem (`seq_nielsen.h:123-143`)
|
||||
Regex membership `str ∈ L(regex)`.
|
||||
- `m_history`: Consumed prefix for cycle detection
|
||||
- `m_id`: Unique identifier
|
||||
- `is_primitive()`: True when str is a single variable
|
||||
|
||||
##### nielsen_subst (`seq_nielsen.h:147-162`)
|
||||
Variable substitution `var → replacement`.
|
||||
- `is_eliminating()`: True when variable doesn't appear in replacement (progress)
|
||||
|
||||
##### nielsen_edge (`seq_nielsen.h:166-195`)
|
||||
Directed edge in the Nielsen graph.
|
||||
- `m_subst`: Vector of substitutions applied
|
||||
- `m_side_str_eq/m_side_str_mem`: Side constraints added
|
||||
- `m_is_progress`: Whether this is an eliminating step
|
||||
|
||||
##### nielsen_node (`seq_nielsen.h:199-270`)
|
||||
Node = set of constraints at a point in the search.
|
||||
- `m_str_eq`: String equality constraints
|
||||
- `m_str_mem`: Regex membership constraints
|
||||
- `m_outgoing`: Outgoing edges
|
||||
- `m_backedge`: For cycle detection
|
||||
- `m_is_general_conflict`, `m_reason`: Conflict tracking
|
||||
- `clone_from(parent)`: Copy constraints
|
||||
- `apply_subst(sg, s)`: Apply substitution to all constraints
|
||||
|
||||
##### nielsen_graph (`seq_nielsen.h:274-325`)
|
||||
The overall graph structure.
|
||||
- `m_sg`: Reference to sgraph infrastructure
|
||||
- `m_root`: Root node with initial constraints
|
||||
- `m_run_idx`, `m_depth_bound`: For iterative deepening
|
||||
- `mk_node()`, `mk_child(parent)`, `mk_edge(src, tgt, progress)`
|
||||
- `add_str_eq(lhs, rhs)`, `add_str_mem(str, regex)`: Populate from external solver
|
||||
- `reset()`, `inc_run_idx()`: Lifecycle management
|
||||
|
||||
#### Enums
|
||||
|
||||
**simplify_result**: `proceed`, `conflict`, `satisfied`, `restart`, `restart_and_satisfied`
|
||||
|
||||
**backtrack_reason**: `unevaluated`, `extended`, `symbol_clash`, `parikh_image`, `subsumption`, `arithmetic`, `regex`, `regex_widening`, `character_range`, `smt`, `children_failed`
|
||||
|
||||
#### Build Configuration
|
||||
|
||||
`src/smt/seq/CMakeLists.txt`: Component `smt_seq`, depends on `euf` and `rewriter`.
|
||||
|
||||
#### Current Status
|
||||
|
||||
The Nielsen graph provides **data structures and infrastructure** — node/edge/constraint/substitution management, dependency tracking, backtracking support, display. The **actual transformation algorithms** (modifier chain, graph expansion, conflict detection, Parikh image analysis) are not yet implemented. Tests cover all data structure operations but not algorithmic behavior.
|
||||
|
||||
---
|
||||
|
||||
### 5. theory_seq — Existing String Theory Solver (`src/smt/`)
|
||||
|
||||
**Files**: `theory_seq.h`, `theory_seq.cpp`, `theory_seq_empty.h`
|
||||
|
||||
The production string solver implementing `theory` interface. Uses axiom-driven solving with 20+ cascading phases.
|
||||
|
||||
#### Class Structure
|
||||
|
||||
```
|
||||
smt::theory (base)
|
||||
seq::eq_solver_context (interface)
|
||||
└── smt::theory_seq
|
||||
```
|
||||
|
||||
**Component Submodules** (declared in `theory_seq.h:356-366`):
|
||||
- `seq::skolem m_sk` — Skolemization
|
||||
- `seq_axioms m_ax` — Axiom generation (bridges rewriter and SMT layers)
|
||||
- `seq::eq_solver m_eq` — Core equality solver (solver-agnostic)
|
||||
- `seq_regex m_regex` — Regular expression handling
|
||||
- `seq_offset_eq m_offset_eq` — Length offset tracking
|
||||
- `th_union_find m_find` — Union-find for theory variables
|
||||
- `dependency_manager m_dm` — Justification tracking
|
||||
- `solution_map m_rep` — Variable → representative mapping
|
||||
|
||||
#### Final Check Cascade (20 phases)
|
||||
|
||||
```
|
||||
1. simplify_and_solve_eqs() — Solve unitary equalities
|
||||
2. check_lts() — Lexicographic ordering
|
||||
3. solve_nqs(0) — Disequalities
|
||||
4. m_regex.propagate() — Regex propagation
|
||||
5. check_contains() — Contains constraints
|
||||
6. check_fixed_length(true,false)— Zero-length sequences
|
||||
7. len_based_split() — Length-based splitting
|
||||
8. check_fixed_length(false,false)— Fixed-length expansion
|
||||
9. check_int_string() — int/string conversions
|
||||
10. check_ubv_string() — bitvector/string conversions
|
||||
11. reduce_length_eq() — Reduce by length equality
|
||||
12. branch_unit_variable() — Branch on unit variables
|
||||
13. branch_binary_variable() — Branch on binary patterns
|
||||
14. branch_variable() — General variable branching
|
||||
15. check_length_coherence() — Length coherence
|
||||
16. check_extensionality() — Extensionality
|
||||
17. branch_nqs() — Branch on disequalities
|
||||
18. branch_itos() — Branch on int-to-string
|
||||
19. check_fixed_length(false,true)— Fixed-length (long strings)
|
||||
20. solve_recfuns() — Recursive functions
|
||||
```
|
||||
|
||||
Returns `FC_CONTINUE` if progress, `FC_GIVEUP` if stuck, `FC_DONE` if solved.
|
||||
|
||||
#### String Solver Selection (`smt_setup.cpp`)
|
||||
|
||||
Parameter: `smt.string_solver`
|
||||
- `"seq"` → `setup_seq()` → `alloc(smt::theory_seq, m_context)`
|
||||
- `"empty"` → `setup_seq()` (with empty variant)
|
||||
- `"none"` → No solver
|
||||
- `"auto"` → `setup_seq()`
|
||||
- `"nseq"` → **Not yet implemented** (planned per spec)
|
||||
|
||||
#### Key Data Structures
|
||||
|
||||
- `solution_map m_rep` — Maps variables to representatives (with normalization cache)
|
||||
- `scoped_vector<depeq> m_eqs` — Active equations
|
||||
- `scoped_vector<ne> m_nqs` — Disequalities
|
||||
- `scoped_vector<nc> m_ncs` — Negated contains
|
||||
- `exclusion_table m_exclude` — Asserted disequalities
|
||||
- `expr_ref_vector m_axioms` — Axiom queue (with deduplication)
|
||||
|
||||
---
|
||||
|
||||
### 6. seq_rewriter — Sequence Rewriter (`src/ast/rewriter/`)
|
||||
|
||||
**Files**: `seq_rewriter.h`, `seq_rewriter.cpp`
|
||||
|
||||
Provides local syntactic simplification rules for all string/sequence operations. Entry point: `mk_app_core(func_decl* f, ...)`.
|
||||
|
||||
#### Rewrite Categories
|
||||
|
||||
| Category | Key Methods | Description |
|
||||
|----------|-------------|-------------|
|
||||
| Construction | `mk_seq_unit`, `mk_seq_concat` | Coalesce chars, right-associate, eliminate empty |
|
||||
| Extraction | `mk_seq_extract`, `mk_seq_at`, `mk_seq_nth` | Substring, index access |
|
||||
| Length | `mk_seq_length` | Compute length, simplify for concat/replace/extract |
|
||||
| Search | `mk_seq_contains`, `mk_seq_index`, `mk_seq_last_index` | Containment, index-of |
|
||||
| Replacement | `mk_seq_replace`, `mk_seq_replace_all`, `mk_seq_replace_re*` | String replacement |
|
||||
| Comparison | `mk_str_le`, `mk_str_lt`, `mk_seq_prefix`, `mk_seq_suffix` | Lexicographic, prefix/suffix |
|
||||
| Conversion | `mk_str_itos`, `mk_str_stoi`, `mk_str_from_code`, `mk_str_to_code` | Type conversions |
|
||||
| Higher-order | `mk_seq_map`, `mk_seq_mapi`, `mk_seq_foldl`, `mk_seq_foldli` | Map, fold |
|
||||
| Regex | `mk_re_concat`, `mk_re_union`, `mk_re_inter`, `mk_re_star`, etc. | Regex construction |
|
||||
| Derivative | `mk_re_derivative` → `mk_antimirov_deriv` | Antimirov-style regex derivative |
|
||||
| Nullability | `is_nullable` → `is_nullable_rec` | Nullable check for regex |
|
||||
|
||||
#### Regex Derivative System
|
||||
|
||||
Uses **Antimirov derivatives** with a BDD-like normal form:
|
||||
|
||||
1. **Top level**: Union of Antimirov derivatives using `_OP_RE_ANTIMIROV_UNION`
|
||||
2. **Middle level**: Nested if-then-else sorted by condition ID
|
||||
3. **Leaf level**: Regular regex unions
|
||||
|
||||
Key derivative rules:
|
||||
- `D(e, r₁·r₂) = D(e,r₁)·r₂ ∪ (nullable(r₁) ? D(e,r₂) : ∅)`
|
||||
- `D(e, r₁∪r₂) = D(e,r₁) ∪ D(e,r₂)`
|
||||
- `D(e, r*) = D(e,r)·r*`
|
||||
- `D(e, ¬r) = ¬D(e,r)`
|
||||
- `D(e, [lo-hi]) = ε if lo ≤ e ≤ hi else ∅`
|
||||
|
||||
Validated by `check_deriv_normal_form` in debug mode.
|
||||
|
||||
---
|
||||
|
||||
### 7. Axiom System (`src/ast/rewriter/` + `src/smt/`)
|
||||
|
||||
Two layers: **rewriter-level** (solver-independent) and **SMT-level** (bridges to Z3 context).
|
||||
|
||||
#### Rewriter-Level Axioms (`seq::axioms`, `src/ast/rewriter/seq_axioms.h/cpp`)
|
||||
|
||||
Solver-agnostic axiom generation via callbacks. Key axioms:
|
||||
|
||||
| Operation | Axiom Summary |
|
||||
|-----------|---------------|
|
||||
| `extract(s,i,l)` | `0≤i≤\|s\| ∧ 0≤l → s=x·e·y ∧ \|x\|=i ∧ \|e\|=min(l,\|s\|-i)` |
|
||||
| `indexof(t,s,off)` | `contains(t,s) → t=x·s·y ∧ indexof=\|x\|` + tightest prefix |
|
||||
| `replace(u,s,t)` | `contains(u,s) → u=x·s·y ∧ r=x·t·y` |
|
||||
| `at(s,i)` | `0≤i<\|s\| → s=x·e·y ∧ \|x\|=i ∧ \|e\|=1` |
|
||||
| `itos(n)` | `n≥0 → stoi(itos(n))=n ∧ no leading zeros` |
|
||||
| `stoi(s)` | `stoi(s)≥-1 ∧ stoi("")=-1` |
|
||||
|
||||
Introduced via Skolem functions (see below).
|
||||
|
||||
#### SMT-Level Axioms (`smt::seq_axioms`, `src/smt/seq_axioms.h/cpp`)
|
||||
|
||||
Wraps `seq::axioms` with literal creation, clause addition, and phase forcing for Z3's SMT context.
|
||||
|
||||
---
|
||||
|
||||
### 8. Skolemization (`src/ast/rewriter/seq_skolem.h/cpp`)
|
||||
|
||||
`seq::skolem` introduces auxiliary Skolem functions for structural reasoning:
|
||||
|
||||
| Skolem | Symbol | Purpose |
|
||||
|--------|--------|---------|
|
||||
| `seq.pre` | `m_pre` | Prefix of s up to index i |
|
||||
| `seq.post` | `m_post` | Suffix of s from index i |
|
||||
| `seq.tail` | `m_tail` | Rest after first element |
|
||||
| `seq.first` | `m_seq_first` | First element |
|
||||
| `seq.last` | `m_seq_last` | Last element |
|
||||
| `seq.idx.l/r` | `m_indexof_left/right` | Decomposition for indexof: t = left·s·right |
|
||||
| `seq.lidx.l/r` | `m_lindexof_left/right` | Decomposition for last_indexof |
|
||||
| `seq.cnt.l/r` | — | Decomposition for contains |
|
||||
| `aut.accept` | `m_accept` | Regex automaton acceptance |
|
||||
| `aut.step` | `m_aut_step` | Automaton state transition |
|
||||
| `seq.eq` | `m_eq` | Special equality predicate |
|
||||
| `seq.length_limit` | `m_length_limit` | Length bound |
|
||||
|
||||
---
|
||||
|
||||
### 9. seq_decl_plugin — Declaration Plugin (`src/ast/`)
|
||||
|
||||
**Files**: `seq_decl_plugin.h`, `seq_decl_plugin.cpp`
|
||||
|
||||
Provides the theory declarations for sequences, strings, and regular expressions.
|
||||
|
||||
#### Operation Kinds (`seq_op_kind` enum)
|
||||
|
||||
**Sequence**: `OP_SEQ_UNIT`, `OP_SEQ_EMPTY`, `OP_SEQ_CONCAT`, `OP_SEQ_PREFIX`, `OP_SEQ_SUFFIX`, `OP_SEQ_CONTAINS`, `OP_SEQ_EXTRACT`, `OP_SEQ_REPLACE`, `OP_SEQ_AT`, `OP_SEQ_NTH`, `OP_SEQ_NTH_I`, `OP_SEQ_NTH_U`, `OP_SEQ_LENGTH`, `OP_SEQ_INDEX`, `OP_SEQ_LAST_INDEX`, `OP_SEQ_TO_RE`, `OP_SEQ_IN_RE`, `OP_SEQ_REPLACE_RE_ALL`, `OP_SEQ_REPLACE_RE`, `OP_SEQ_REPLACE_ALL`, `OP_SEQ_MAP`, `OP_SEQ_MAPI`, `OP_SEQ_FOLDL`, `OP_SEQ_FOLDLI`, **`OP_SEQ_POWER`**
|
||||
|
||||
**Regex**: `OP_RE_PLUS`, `OP_RE_STAR`, `OP_RE_OPTION`, `OP_RE_RANGE`, `OP_RE_CONCAT`, `OP_RE_UNION`, `OP_RE_DIFF`, `OP_RE_INTERSECT`, `OP_RE_LOOP`, `OP_RE_POWER`, `OP_RE_COMPLEMENT`, `OP_RE_EMPTY_SET`, `OP_RE_FULL_SEQ_SET`, `OP_RE_FULL_CHAR_SET`, `OP_RE_OF_PRED`, `OP_RE_REVERSE`, `OP_RE_DERIVATIVE`
|
||||
|
||||
**String-specific**: `OP_STRING_CONST`, `OP_STRING_ITOS`, `OP_STRING_STOI`, `OP_STRING_UBVTOS`, `OP_STRING_SBVTOS`, `OP_STRING_LT`, `OP_STRING_LE`, `OP_STRING_IS_DIGIT`, `OP_STRING_TO_CODE`, `OP_STRING_FROM_CODE`
|
||||
|
||||
#### seq_util Utility Class
|
||||
|
||||
Nested classes for operation construction/recognition:
|
||||
- `seq_util::str` — String/sequence operations (`mk_concat`, `mk_length`, `mk_empty`, `is_concat`, `is_empty`, `mk_power`, `is_power`, etc.)
|
||||
- `seq_util::rex` — Regex operations (`mk_star`, `mk_union`, `mk_inter`, `mk_complement`, `mk_concat`, `is_star`, `is_concat`, `is_nullable`, etc.)
|
||||
|
||||
#### Sort System
|
||||
|
||||
- `SEQ_SORT` — Generic Seq[T]
|
||||
- `RE_SORT` — RegEx[Seq[T]]
|
||||
- `_STRING_SORT` — Seq[Char] (string)
|
||||
- `_REGLAN_SORT` — RegEx[String]
|
||||
|
||||
---
|
||||
|
||||
### 10. Sequence Equality Solver (`src/ast/rewriter/seq_eq_solver.h/cpp`)
|
||||
|
||||
Solver-agnostic equality inference for sequence equations:
|
||||
|
||||
- `reduce_unit`: Handle unit equations
|
||||
- `reduce_itos1/2/3`: Handle `itos(s) = itos(t)`, `itos(s) = ""`, `itos(n) = "d1d2...dk"`
|
||||
- `reduce_binary_eq`: Handle `x·xs = ys·y` patterns
|
||||
- `reduce_nth_solved`: Solve nth equations
|
||||
- `branch_unit_variable`: Split variable as empty or non-empty
|
||||
|
||||
---
|
||||
|
||||
### 11. Test Infrastructure
|
||||
|
||||
#### Test Files
|
||||
|
||||
| File | Tests | Lines | Component |
|
||||
|------|-------|-------|-----------|
|
||||
| `src/test/euf_sgraph.cpp` | 18 | 769 | sgraph: classification, metadata, backtracking, navigation, substitution, derivatives, minterms |
|
||||
| `src/test/euf_seq_plugin.cpp` | 7 | 242 | seq_plugin: associativity, identity elimination, star merging, nullable absorption, egraph sync |
|
||||
| `src/test/seq_nielsen.cpp` | 13 | 480 | Nielsen: dep_tracker, str_eq, str_mem, subst, node, edge, graph population, expansion, backedge |
|
||||
|
||||
#### Registration
|
||||
|
||||
- `src/test/main.cpp`: `TST(euf_sgraph)` (line 284), `TST(euf_seq_plugin)` (line 285), `TST(seq_nielsen)` (line 289)
|
||||
- `src/test/CMakeLists.txt`: All three .cpp files included; depends on `smt_seq`
|
||||
- Build: `ninja test-z3`, Run: `./test-z3 euf_sgraph`, `./test-z3 euf_seq_plugin`, `./test-z3 seq_nielsen`
|
||||
|
||||
#### Test Setup Pattern
|
||||
|
||||
```cpp
|
||||
ast_manager m;
|
||||
reg_decl_plugins(m);
|
||||
euf::egraph eg(m);
|
||||
euf::sgraph sg(m, eg); // Creates seq_plugin automatically
|
||||
seq_util seq(m);
|
||||
sort_ref str_sort(seq.str.mk_string_sort(), m);
|
||||
// For nielsen:
|
||||
seq::nielsen_graph ng(sg);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code References
|
||||
|
||||
### ZIPT Infrastructure (New)
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/ast/euf/euf_snode.h` | snode class, snode_kind enum, hash matrix, subst cache |
|
||||
| `src/ast/euf/euf_sgraph.h` | sgraph class header |
|
||||
| `src/ast/euf/euf_sgraph.cpp` | sgraph implementation (classify, metadata, hash, subst, derivatives, minterms) |
|
||||
| `src/ast/euf/euf_seq_plugin.h` | seq_plugin class header |
|
||||
| `src/ast/euf/euf_seq_plugin.cpp` | seq_plugin implementation (assoc, simplify, nullable) |
|
||||
| `src/smt/seq/seq_nielsen.h` | Nielsen graph classes (str_eq, str_mem, subst, edge, node, graph) |
|
||||
| `src/smt/seq/seq_nielsen.cpp` | Nielsen graph implementation |
|
||||
| `src/smt/seq/CMakeLists.txt` | smt_seq component build config |
|
||||
|
||||
### Existing String Theory (Production)
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/smt/theory_seq.h` | theory_seq class with 20-phase final_check |
|
||||
| `src/smt/theory_seq.cpp` | Full theory solver implementation |
|
||||
| `src/smt/theory_seq_empty.h` | Empty/stub variant |
|
||||
| `src/smt/seq_axioms.h/cpp` | SMT-level axiom bridge |
|
||||
| `src/smt/seq_regex.h/cpp` | Regex handling for theory_seq |
|
||||
| `src/smt/seq_eq_solver.cpp` | SMT-level equality solving |
|
||||
| `src/smt/seq_ne_solver.cpp` | Disequality solving |
|
||||
| `src/smt/seq_offset_eq.h/cpp` | Length offset equality |
|
||||
| `src/smt/smt_setup.h/cpp` | String solver selection/instantiation |
|
||||
|
||||
### Rewriter Layer (Shared)
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/ast/rewriter/seq_rewriter.h/cpp` | Rewrite rules + regex derivatives |
|
||||
| `src/ast/rewriter/seq_axioms.h/cpp` | Solver-independent axiom generation |
|
||||
| `src/ast/rewriter/seq_skolem.h/cpp` | Skolem function introduction |
|
||||
| `src/ast/rewriter/seq_eq_solver.h/cpp` | Solver-agnostic equality inference |
|
||||
|
||||
### Declaration Layer
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/ast/seq_decl_plugin.h/cpp` | Sort/operation declarations, seq_util |
|
||||
| `src/model/seq_factory.h` | Model value factory |
|
||||
| `src/params/theory_seq_params.h/cpp` | Theory parameters |
|
||||
| `src/params/seq_rewriter_params.pyg` | Rewriter parameters |
|
||||
|
||||
### Tests
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/test/euf_sgraph.cpp` | 18 sgraph unit tests |
|
||||
| `src/test/euf_seq_plugin.cpp` | 7 seq_plugin tests |
|
||||
| `src/test/seq_nielsen.cpp` | 13 Nielsen graph tests |
|
||||
| `src/test/main.cpp` | Test registration (lines 284, 285, 289) |
|
||||
| `src/test/CMakeLists.txt` | Test build config (lines 54, 55, 134) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Documentation
|
||||
|
||||
### Component Dependency Graph
|
||||
|
||||
```
|
||||
seq_decl_plugin (AST sort/op declarations)
|
||||
↓
|
||||
seq_util (utility wrapper)
|
||||
↓
|
||||
seq_rewriter (simplification rules, derivatives)
|
||||
↓ ↘
|
||||
seq_axioms (rewriter) seq_skolem
|
||||
↓ ↓
|
||||
seq_axioms (SMT) ←───────┘
|
||||
↓
|
||||
┌────────────┬──────────────────────┐
|
||||
│ theory_seq │ theory_nseq (TODO) │
|
||||
│ (existing) │ (per spec/plan1.md) │
|
||||
├────────────┤──────────────────────┤
|
||||
│ seq_regex │ nielsen_graph ←──── sgraph ←── snode
|
||||
│ seq_eq/ne │ (new ZIPT infra) seq_plugin
|
||||
│ solution_map│ egraph
|
||||
└────────────┴──────────────────────┘
|
||||
```
|
||||
|
||||
### ZIPT ↔ Z3 Component Mapping
|
||||
|
||||
| ZIPT (C#) | Z3 (C++) | Location |
|
||||
|------------|----------|----------|
|
||||
| `StrManager` | `sgraph` + `snode` | `src/ast/euf/euf_sgraph.*`, `euf_snode.h` |
|
||||
| `Str` (token tree) | `snode` (binary concat tree) | `src/ast/euf/euf_snode.h` |
|
||||
| `StrEq` | `seq::str_eq` | `src/smt/seq/seq_nielsen.h:98` |
|
||||
| `StrMem` | `seq::str_mem` | `src/smt/seq/seq_nielsen.h:123` |
|
||||
| `NielsenGraph` | `seq::nielsen_graph` | `src/smt/seq/seq_nielsen.h:274` |
|
||||
| `NielsenNode` | `seq::nielsen_node` | `src/smt/seq/seq_nielsen.h:199` |
|
||||
| `NielsenEdge` | `seq::nielsen_edge` | `src/smt/seq/seq_nielsen.h:166` |
|
||||
| `Subst` | `seq::nielsen_subst` | `src/smt/seq/seq_nielsen.h:147` |
|
||||
| `DependencyTracker` | `seq::dep_tracker` | `src/smt/seq/seq_nielsen.h:81` |
|
||||
| `SimplifyResult` | `seq::simplify_result` | `src/smt/seq/seq_nielsen.h:54` |
|
||||
| `BacktrackReasons` | `seq::backtrack_reason` | `src/smt/seq/seq_nielsen.h:64` |
|
||||
| `PDD` / `Interval` | (not yet ported) | — |
|
||||
| Modifier chain | (not yet ported) | — |
|
||||
| `StringPropagator` | `theory_nseq` (TODO) | — |
|
||||
|
||||
### What Exists vs. What's Planned
|
||||
|
||||
**Implemented**:
|
||||
- ✅ sgraph + snode (string representation, metadata, hashing, substitution, derivatives, minterms)
|
||||
- ✅ seq_plugin (associativity, identity, absorption, star merge, nullable absorption)
|
||||
- ✅ Nielsen graph data structures (str_eq, str_mem, subst, edge, node, graph, dep_tracker)
|
||||
- ✅ All unit tests passing (18 + 7 + 13 = 38 tests)
|
||||
|
||||
**Not Yet Implemented** (per spec/plan1.md):
|
||||
- ❌ `theory_nseq` skeleton and SMT integration
|
||||
- ❌ Parameter integration (`smt.string_solver=nseq`)
|
||||
- ❌ Nielsen transformation algorithms (modifier chain)
|
||||
- ❌ Graph expansion / iterative deepening search
|
||||
- ❌ Conflict detection and explanation
|
||||
- ❌ Parikh image / PDD / Interval reasoning
|
||||
- ❌ Regex splitting modifiers
|
||||
- ❌ String propagator (propagation engine)
|
||||
- ❌ Model generation
|
||||
- ❌ Integration tests
|
||||
|
||||
---
|
||||
|
||||
## Historical Context (from spec/)
|
||||
|
||||
### spec/reference.md
|
||||
Points to ZIPT repository (parikh branch) and two PDF papers:
|
||||
- `LazyMemberships.pdf` — Lazy regex membership approach
|
||||
- `ClemensTableau.pdf` — Additional tableau algorithms
|
||||
|
||||
### spec/plan.md
|
||||
Task description: Implement `theory_nseq` in `src/smt/`, add `smt.string_solver` parameter support, put self-contained utilities in `ast/rewriter/`.
|
||||
|
||||
### spec/plan1.md
|
||||
Detailed implementation plan with 7 phases:
|
||||
1. Skeleton & Integration
|
||||
2. Core Data Structures (state, constraints, dependencies)
|
||||
3. Word Equation Solver (Nielsen transformations)
|
||||
4. Length Constraint Integration (Parikh image)
|
||||
5. Regex Membership
|
||||
6. Propagation Engine
|
||||
7. Model Generation
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **theory_nseq skeleton**: When will Phase 1 (skeleton + parameter integration) be implemented?
|
||||
2. **Modifier chain priority**: Should Z3's modifier priorities match ZIPT's exactly, or be adjusted?
|
||||
3. **PDD porting**: Will Z3 reuse existing polynomial infrastructure or port ZIPT's PDD?
|
||||
4. **Regex derivatives**: Will `theory_nseq` reuse `seq_rewriter::mk_derivative` or implement ZIPT's approach?
|
||||
5. **Integer solver integration**: Will `theory_nseq` use `arith_value` (like `theory_seq`) or a sub-solver (like ZIPT)?
|
||||
6. **Backtracking scope**: How will Nielsen graph state interact with SMT solver backtracking?
|
||||
7. **Performance target**: What benchmarks should `theory_nseq` match or exceed compared to `theory_seq`?
|
||||
Loading…
Add table
Add a link
Reference in a new issue