3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-04-24 04:43:31 +00:00
z3/src/smt/seq/seq_nielsen.h
copilot-swe-agent[bot] 0bdec633d7 Implement ZIPT string solver skeleton (theory_nseq)
Add theory_nseq, a Nielsen-graph-based string solver plugin for Z3.

## New files
- src/smt/nseq_state.h/.cpp: constraint store bridging SMT context to
  Nielsen graph with manual push/pop backtracking
- src/smt/nseq_regex.h/.cpp: regex membership handling via Brzozowski
  derivatives (stub delegates to sgraph::brzozowski_deriv)
- src/smt/nseq_model.h/.cpp: model generation stub
- src/smt/theory_nseq.h/.cpp: main theory class implementing smt::theory
  with its own private egraph/sgraph, returns FC_GIVEUP as skeleton
- src/test/nseq_basic.cpp: unit tests covering instantiation, parameter
  validation, trivial-equality SAT, and node simplification

## Extensions to seq_nielsen.h/.cpp
- Add search_result enum and solve() iterative-deepening DFS entry point
- Add search_dfs() recursive DFS driver
- Add simplify_node(), generate_extensions(), collect_conflict_deps()
- Add nielsen_node::simplify_and_init(): trivial removal, empty
  propagation, prefix matching, symbol clash detection
- Add nielsen_node::is_satisfied(), is_subsumed_by()
- Implement Det, Const Nielsen, and Eq-split modifiers in
  generate_extensions()

## Integration
- smt_params.cpp: accept 'nseq' as valid string_solver value
- smt_params_helper.pyg: document 'nseq' option
- smt_setup.h/.cpp: add setup_nseq(), wire into setup_QF_S() and
  setup_seq_str()
- smt/CMakeLists.txt: add new sources and smt_seq dependency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-03 21:50:21 +00:00

548 lines
23 KiB
C++

/*++
Copyright (c) 2026 Microsoft Corporation
Module Name:
seq_nielsen.h
Abstract:
Nielsen graph for string constraint solving.
Ports the constraint types and Nielsen graph structures from
ZIPT (https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT/Constraints)
into Z3's smt/seq framework.
The Nielsen graph is used for solving word equations and regex
membership constraints via Nielsen transformations. Each node
contains a set of constraints (string equalities, regex memberships,
integer equalities/inequalities) and edges represent substitutions
that transform one constraint set into another.
Key components:
-- str_eq: string equality constraint (lhs = rhs)
-- str_mem: regex membership constraint (str in regex)
-- nielsen_subst: variable substitution (var -> replacement)
-- nielsen_edge: graph edge with substitutions and side constraints
-- nielsen_node: graph node with constraint set and outgoing edges
-- nielsen_graph: the overall Nielsen transformation graph
-----------------------------------------------------------------------
ZIPT PORT COMPARISON SUMMARY
-----------------------------------------------------------------------
The ZIPT reference is organized as follows (all under ZIPT/Constraints/):
NielsenGraph.cs -- the graph manager class
NielsenNode.cs -- node class + BacktrackReasons enum
NielsenEdge.cs -- edge class with string and character substitutions
ConstraintElement/
Constraint.cs -- abstract base for all constraints
StrEqBase.cs -- abstract base for StrEq and StrMem
StrEq.cs -- string equality with full simplification/splitting
StrMem.cs -- regex membership with Brzozowski derivatives
IntEq.cs -- integer equality over length polynomials
IntLe.cs -- integer inequality over length polynomials
Modifier/ -- ~15 modifier types driving graph expansion
A. PORTED FAITHFULLY
--------------------
1. backtrack_reason enum (BacktrackReasons): all eleven values (Unevaluated,
Extended, SymbolClash, ParikhImage, Subsumption, Arithmetic, Regex,
RegexWidening, CharacterRange, SMT, ChildrenFailed) are present with
identical semantics.
2. simplify_result enum (SimplifyResult): all five values (Proceed, Conflict,
Satisfied, Restart, RestartAndSatisfied) are present with identical semantics.
Note: RestartAndSatisfied is declared but not yet exercised in this port.
3. nielsen_node status fields and accessors: m_is_general_conflict,
m_is_extended, m_reason, m_eval_idx map directly to IsGeneralConflict,
IsExtended, CurrentReason, evalIdx. The is_currently_conflict() predicate
faithfully mirrors IsCurrentlyConflict (GeneralConflict || (reason !=
Unevaluated && IsExtended)).
4. nielsen_node::reset_counter() mirrors NielsenNode.ResetCounter() exactly.
5. nielsen_node::clone_from() mirrors the copy constructor
NielsenNode(graph, parent) for str_eq and str_mem constraints.
6. nielsen_edge identity (operator==) mirrors NielsenEdge.Equals(): both
compare by source and target node pointer identity.
7. nielsen_graph::inc_run_idx() mirrors the RunIdx increment in NielsenGraph.
Check(), including the UINT_MAX overflow guard that calls reset_counter()
on all nodes.
8. str_eq::sort() mirrors StrEqBase.SortStr(): swaps lhs/rhs when lhs > rhs.
(Z3 compares by snode id; ZIPT compares Str lexicographically.)
9. str_eq::is_trivial() mirrors the trivially-satisfied check when both sides
are empty.
10. str_mem fields (m_str, m_regex, m_history, m_id, m_dep) mirror StrMem
fields (Str, Regex, History, Id, Reason) faithfully, including the unique
identifier used for cycle tracking.
11. str_mem::is_primitive() mirrors StrMem.IsPrimitiveRegex(): single variable
on the left side of the membership constraint.
12. nielsen_subst::is_eliminating() mirrors the logic behind
NielsenEdge.BumpedModCount: a substitution is non-eliminating (bumps the
modification counter) when the substituted variable appears in the
replacement.
13. nielsen_graph::mk_edge() faithfully mirrors NielsenEdge construction: it
links src to tgt and registers the outgoing edge.
B. PORTED WITH ALGORITHMIC CHANGES
------------------------------------
1. dep_tracker (DependencyTracker): ZIPT's DependencyTracker is a .NET
class using a BitArray-like structure for tracking constraint origins.
Z3's dep_tracker uses a dense bitvector stored as svector<unsigned>
(32-bit words). The merge/is_superset/empty semantics are equivalent,
but the representation is more cache-friendly and avoids managed-heap
allocation.
2. Substitution application (nielsen_node::apply_subst): ZIPT uses an
immutable, functional style -- Apply() returns a new constraint if
changed, using C# reference equality to detect no-ops. Z3 uses
in-place mutation via sgraph::subst(), modifying the constraint vectors
directly. The functional change also propagates the substitution's
dependency to the merged constraint.
3. Node constraint containers: ZIPT's NielsenNode stores str_eq constraints
in NList<StrEq> (a sorted list for O(log n) subsumption lookup) and str_mem
constraints in Dictionary<uint, StrMem> (keyed by id for O(1) cycle lookup).
Z3 uses plain vector<str_eq> and vector<str_mem>, which is simpler but
loses the sorted-list subsumption candidate structure.
4. nielsen_edge substitution list: ZIPT's NielsenEdge carries two substitution
lists -- Subst (string-level, mapping string variables to strings) and
SubstC (character-level, mapping symbolic character variables to concrete
characters). Z3's nielsen_edge carries a single vector<nielsen_subst>,
covering only string-level substitutions; character substitutions are not
represented.
5. nielsen_graph node registry: ZIPT keeps nodes in a HashSet<NielsenNode> plus
a Dictionary<NList<StrEq>, List<NielsenNode>> for subsumption candidate
lookup. Z3 uses a ptr_vector<nielsen_node> without any subsumption map,
simplifying memory management at the cost of subsumption checking.
6. nielsen_graph::display() vs NielsenGraph.ToDot(): ZIPT outputs a DOT-format
graph with color highlighting for the current satisfying path. Z3 outputs
plain human-readable text with node/edge details but no DOT syntax or path
highlighting.
7. str_eq::contains_var() / str_mem::contains_var(): ZIPT performs occurrence
checks through StrManager.Subst() (which uses hash-consing and reference
equality). Z3 walks the snode tree via collect_tokens(), which is correct
but re-traverses the DAG on every call.
C. NOT PORTED
-------------
The following ZIPT components are absent from this implementation.
They represent the algorithmic core of the search procedure and
are expected to be ported in subsequent work.
Constraint simplification and propagation:
- Constraint.SimplifyAndPropagate() / SimplifyAndPropagateInternal(): the
main constraint-driven simplification loop is not ported. str_eq and
str_mem have no Simplify methods.
- StrEq.SimplifyDir() / SimplifyFinal() / AddDefinition(): forward/backward
simplification passes, including Makanin-style prefix cancellation, power
token handling, and variable definition propagation.
- StrEq.GetNielsenDep() / SplitEq(): the Nielsen dependency analysis and
equation-splitting heuristic used to choose the best split point.
- StrMem.SimplifyCharRegex() / SimplifyDir(): Brzozowski derivative-based
simplification consuming ground prefixes/suffixes of the string.
- StrMem.TrySubsume(): stabilizer-based subsumption that drops leading
variables already covered by regex stabilizers.
- StrMem.ExtractCycle() / StabilizerFromCycle(): cycle detection over the
search path and extraction of a Kleene-star stabilizer to generalize the
cycle. This is the key termination argument for regex membership.
- StrMem.Extend(): the splitting driver that produces the next modifier
(RegexVarSplitModifier, RegexCharSplitModifier, StarIntrModifier, etc.).
Integer constraints:
- IntEq / IntLe: integer equality and inequality constraints over Presburger
arithmetic polynomials (PDD<BigInteger>) are entirely absent. The Z3 port
has no ConstraintsIntEq or ConstraintsIntLe in nielsen_node.
- IntBounds / VarBoundWatcher: per-variable integer interval bounds and the
watcher mechanism that reruns bound propagation when a string variable is
substituted are not ported.
- AddLowerIntBound() / AddHigherIntBound(): incremental interval tightening
with restart signaling is not ported.
Character-level handling:
- CharSubst: character-level variable substitution (symbolic char -> concrete
char) is absent. ZIPT uses this to handle symbolic character tokens
(SymCharToken) that represent a single unknown character.
- SymCharToken / CharacterSet: symbolic character tokens with associated
character range constraints (CharRanges) are not ported.
- DisEqualities: per-node character disequality constraints used for conflict
detection during character substitution are not ported.
Modifier hierarchy (Constraints/Modifier/):
- All ~15 Modifier subclasses driving graph expansion are not ported:
VarNielsenModifier, ConstNielsenModifier, DirectedNielsenModifier,
EqSplitModifier, RegexVarSplitModifier, RegexCharSplitModifier,
StarIntrModifier, PowerSplitModifier, GPowerIntrModifier,
NumCmpModifier, NumUnwindingModifier, PowerEpsilonModifier,
DecomposeModifier, CombinedModifier, DetModifier.
- The modifier pattern (each Modifier produces one or more child nodes by
applying substitutions + side conditions to the parent node) is not ported.
Search procedure:
- NielsenNode.GraphExpansion(): the recursive search with iterative deepening
(depth-bounded DFS with SAT/UNSAT/CYCLIC return codes) is not ported.
- NielsenNode.SimplifyAndInit(): the simplification-and-initialization pass
run at node creation is not ported.
- NielsenGraph.Check(): the top-level entry point with iterative deepening,
inner solver setup and subsumption-node lookup is not ported.
- NielsenGraph.FindExisting(): the subsumption cache lookup over
subsumptionCandidates is not ported.
Auxiliary infrastructure:
- LocalInfo: thread-local search bookkeeping (current path, modification
counts, regex occurrence cache for cycle detection, current node pointer)
is not ported.
- NielsenGraph.SubSolver / InnerStringPropagator: the auxiliary Z3 solver
for arithmetic lemma generation and the inner string propagator for
model-based refinement are not ported.
- PowerToken: word-repetition tokens of the form u^n (distinct from regex
Kleene star) are not represented in Z3's snode.
- GetSignature(): the constraint-pair signature used for subsumption
candidate matching is not ported.
- Constraint.Shared: the flag indicating whether a constraint should be
forwarded to the outer solver is not ported.
- Interpretation: the model-extraction class mapping string and integer
variables to concrete values is not ported.
-----------------------------------------------------------------------
Author:
Nikolaj Bjorner (nbjorner) 2026-03-02
Clemens Eisenhofer 2026-03-02
--*/
#pragma once
#include "util/vector.h"
#include "util/uint_set.h"
#include "ast/ast.h"
#include "ast/seq_decl_plugin.h"
#include "ast/euf/euf_sgraph.h"
namespace seq {
// forward declarations
class nielsen_node;
class nielsen_edge;
class nielsen_graph;
// simplification result for constraint processing
// mirrors ZIPT's SimplifyResult enum
enum class simplify_result {
proceed, // no change, continue
conflict, // constraint is unsatisfiable
satisfied, // constraint is trivially satisfied
restart, // constraint was simplified, restart
restart_and_satisfied, // simplified and satisfied
};
// reason for backtracking in the Nielsen graph
// mirrors ZIPT's BacktrackReasons enum
enum class backtrack_reason {
unevaluated,
extended,
symbol_clash,
parikh_image,
subsumption,
arithmetic,
regex,
regex_widening,
character_range,
smt,
children_failed,
};
// dependency tracker: bitvector tracking which input constraints
// contributed to deriving a given constraint
// mirrors ZIPT's DependencyTracker
class dep_tracker {
svector<unsigned> m_bits;
public:
dep_tracker() = default;
explicit dep_tracker(unsigned num_bits);
dep_tracker(unsigned num_bits, unsigned set_bit);
void merge(dep_tracker const& other);
bool is_superset(dep_tracker const& other) const;
bool empty() const;
bool operator==(dep_tracker const& other) const { return m_bits == other.m_bits; }
bool operator!=(dep_tracker const& other) const { return !(*this == other); }
};
// string equality constraint: lhs = rhs
// mirrors ZIPT's StrEq (both sides are regex-free snode trees)
struct str_eq {
euf::snode* m_lhs;
euf::snode* m_rhs;
dep_tracker m_dep;
str_eq(): m_lhs(nullptr), m_rhs(nullptr) {}
str_eq(euf::snode* lhs, euf::snode* rhs, dep_tracker const& dep):
m_lhs(lhs), m_rhs(rhs), m_dep(dep) {}
bool operator==(str_eq const& other) const {
return m_lhs == other.m_lhs && m_rhs == other.m_rhs;
}
// sort so that lhs <= rhs by snode id
void sort();
// check if both sides are empty (trivially satisfied)
bool is_trivial() const;
// check if the constraint contains a given variable
bool contains_var(euf::snode* var) const;
};
// regex membership constraint: str in regex
// mirrors ZIPT's StrMem
struct str_mem {
euf::snode* m_str;
euf::snode* m_regex;
euf::snode* m_history; // tracks derivation history for cycle detection
unsigned m_id; // unique identifier
dep_tracker m_dep;
str_mem(): m_str(nullptr), m_regex(nullptr), m_history(nullptr), m_id(UINT_MAX) {}
str_mem(euf::snode* str, euf::snode* regex, euf::snode* history, unsigned id, dep_tracker const& dep):
m_str(str), m_regex(regex), m_history(history), m_id(id), m_dep(dep) {}
bool operator==(str_mem const& other) const {
return m_id == other.m_id && m_str == other.m_str && m_regex == other.m_regex;
}
// check if the constraint has the form x in R with x a single variable
bool is_primitive() const;
// check if the constraint contains a given variable
bool contains_var(euf::snode* var) const;
};
// string variable substitution: var -> replacement
// mirrors ZIPT's Subst
struct nielsen_subst {
euf::snode* m_var;
euf::snode* m_replacement;
dep_tracker m_dep;
nielsen_subst(): m_var(nullptr), m_replacement(nullptr) {}
nielsen_subst(euf::snode* var, euf::snode* repl, dep_tracker const& dep):
m_var(var), m_replacement(repl), m_dep(dep) {}
// an eliminating substitution does not contain the variable in the replacement
bool is_eliminating() const;
bool operator==(nielsen_subst const& other) const {
return m_var == other.m_var && m_replacement == other.m_replacement;
}
};
// edge in the Nielsen graph connecting two nodes
// mirrors ZIPT's NielsenEdge
class nielsen_edge {
nielsen_node* m_src;
nielsen_node* m_tgt;
vector<nielsen_subst> m_subst;
ptr_vector<str_eq> m_side_str_eq; // side constraints: string equalities
ptr_vector<str_mem> m_side_str_mem; // side constraints: regex memberships
bool m_is_progress; // does this edge represent progress?
public:
nielsen_edge(nielsen_node* src, nielsen_node* tgt, bool is_progress);
nielsen_node* src() const { return m_src; }
nielsen_node* tgt() const { return m_tgt; }
void set_tgt(nielsen_node* tgt) { m_tgt = tgt; }
vector<nielsen_subst> const& subst() const { return m_subst; }
void add_subst(nielsen_subst const& s) { m_subst.push_back(s); }
void add_side_str_eq(str_eq* eq) { m_side_str_eq.push_back(eq); }
void add_side_str_mem(str_mem* mem) { m_side_str_mem.push_back(mem); }
ptr_vector<str_eq> const& side_str_eq() const { return m_side_str_eq; }
ptr_vector<str_mem> const& side_str_mem() const { return m_side_str_mem; }
bool is_progress() const { return m_is_progress; }
bool operator==(nielsen_edge const& other) const {
return m_src == other.m_src && m_tgt == other.m_tgt;
}
};
// node in the Nielsen graph
// mirrors ZIPT's NielsenNode
class nielsen_node {
friend class nielsen_graph;
unsigned m_id;
nielsen_graph* m_graph;
// constraints at this node
vector<str_eq> m_str_eq; // string equalities
vector<str_mem> m_str_mem; // regex memberships
// edges
ptr_vector<nielsen_edge> m_outgoing;
nielsen_node* m_backedge = nullptr;
// status flags
bool m_is_general_conflict = false;
bool m_is_extended = false;
backtrack_reason m_reason = backtrack_reason::unevaluated;
bool m_is_progress = false;
// evaluation index for run tracking
unsigned m_eval_idx = 0;
public:
nielsen_node(nielsen_graph* graph, unsigned id);
unsigned id() const { return m_id; }
nielsen_graph* graph() const { return m_graph; }
// constraint access
vector<str_eq> const& str_eqs() const { return m_str_eq; }
vector<str_eq>& str_eqs() { return m_str_eq; }
vector<str_mem> const& str_mems() const { return m_str_mem; }
vector<str_mem>& str_mems() { return m_str_mem; }
void add_str_eq(str_eq const& eq) { m_str_eq.push_back(eq); }
void add_str_mem(str_mem const& mem) { m_str_mem.push_back(mem); }
// edge access
ptr_vector<nielsen_edge> const& outgoing() const { return m_outgoing; }
void add_outgoing(nielsen_edge* e) { m_outgoing.push_back(e); }
nielsen_node* backedge() const { return m_backedge; }
void set_backedge(nielsen_node* n) { m_backedge = n; }
// status
bool is_general_conflict() const { return m_is_general_conflict; }
void set_general_conflict(bool v) { m_is_general_conflict = v; }
bool is_extended() const { return m_is_extended; }
void set_extended(bool v) { m_is_extended = v; }
bool is_currently_conflict() const {
return m_is_general_conflict ||
(m_reason != backtrack_reason::unevaluated && m_is_extended);
}
backtrack_reason reason() const { return m_reason; }
void set_reason(backtrack_reason r) { m_reason = r; }
bool is_progress() const { return m_is_progress; }
unsigned eval_idx() const { return m_eval_idx; }
void set_eval_idx(unsigned idx) { m_eval_idx = idx; }
void reset_counter() { m_eval_idx = 0; }
// clone constraints from a parent node
void clone_from(nielsen_node const& parent);
// apply a substitution to all constraints
void apply_subst(euf::sgraph& sg, nielsen_subst const& s);
// simplify all constraints at this node and initialize status.
// Returns proceed, conflict, satisfied, or restart.
simplify_result simplify_and_init(nielsen_graph& g);
// true if all str_eqs are trivial and there are no str_mems
bool is_satisfied() const;
// true if other's constraint set is a subset of this node's
bool is_subsumed_by(nielsen_node const& other) const;
};
// the overall Nielsen transformation graph
// mirrors ZIPT's NielsenGraph
class nielsen_graph {
euf::sgraph& m_sg;
region m_region;
ptr_vector<nielsen_node> m_nodes;
ptr_vector<nielsen_edge> m_edges;
nielsen_node* m_root = nullptr;
unsigned m_run_idx = 0;
unsigned m_depth_bound = 0;
unsigned m_next_mem_id = 0;
public:
nielsen_graph(euf::sgraph& sg);
~nielsen_graph();
euf::sgraph& sg() { return m_sg; }
// node management
nielsen_node* mk_node();
nielsen_node* mk_child(nielsen_node* parent);
// edge management
nielsen_edge* mk_edge(nielsen_node* src, nielsen_node* tgt, bool is_progress);
// root node access
nielsen_node* root() const { return m_root; }
void set_root(nielsen_node* n) { m_root = n; }
// add constraints to the root node from external solver
void add_str_eq(euf::snode* lhs, euf::snode* rhs);
void add_str_mem(euf::snode* str, euf::snode* regex);
// run management
unsigned run_idx() const { return m_run_idx; }
void inc_run_idx();
// access all nodes
ptr_vector<nielsen_node> const& nodes() const { return m_nodes; }
unsigned num_nodes() const { return m_nodes.size(); }
// depth bound for iterative deepening
unsigned depth_bound() const { return m_depth_bound; }
void set_depth_bound(unsigned d) { m_depth_bound = d; }
// generate next unique regex membership id
unsigned next_mem_id() { return m_next_mem_id++; }
// display for debugging
std::ostream& display(std::ostream& out) const;
// reset all nodes and state
void reset();
// search result returned by solve() / search_dfs()
enum class search_result { sat, unsat, unknown };
// main search entry point: iterative deepening DFS
search_result solve();
// simplify a node's constraints (delegate to node)
simplify_result simplify_node(nielsen_node* node);
// generate child nodes by applying modifier rules
// returns true if at least one child was generated
bool generate_extensions(nielsen_node* node, unsigned depth);
// collect dependency information from conflicting constraints
void collect_conflict_deps(dep_tracker& deps) const;
private:
search_result search_dfs(nielsen_node* node, unsigned depth);
};
}