3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-03-19 11:33:09 +00:00
z3/src/smt/seq/seq_nielsen.h
copilot-swe-agent[bot] a51ba544ea Fix typo: rename m_parith to m_parikh in seq_nielsen.h/.cpp
Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
2026-03-11 19:08:11 +00:00

975 lines
45 KiB
C++

/*++
Copyright (c) 2026 Microsoft Corporation
Module Name:
seq_nielsen.h
Abstract:
Nielsen graph for string constraint solving.
Ports the constraint types and Nielsen graph structures from
ZIPT (https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT/Constraints)
into Z3's smt/seq framework.
The Nielsen graph is used for solving word equations and regex
membership constraints via Nielsen transformations. Each node
contains a set of constraints (string equalities, regex memberships,
integer equalities/inequalities) and edges represent substitutions
that transform one constraint set into another.
Key components:
-- str_eq: string equality constraint (lhs = rhs)
-- str_mem: regex membership constraint (str in regex)
-- nielsen_subst: variable substitution (var -> replacement)
-- nielsen_edge: graph edge with substitutions and side constraints
-- nielsen_node: graph node with constraint set and outgoing edges
-- nielsen_graph: the overall Nielsen transformation graph
-----------------------------------------------------------------------
ZIPT PORT COMPARISON SUMMARY
-----------------------------------------------------------------------
The ZIPT reference is organized as follows (all under ZIPT/Constraints/):
NielsenGraph.cs -- the graph manager class
NielsenNode.cs -- node class + BacktrackReasons enum
NielsenEdge.cs -- edge class with string and character substitutions
ConstraintElement/
Constraint.cs -- abstract base for all constraints
StrEqBase.cs -- abstract base for StrEq and StrMem
StrEq.cs -- string equality with full simplification/splitting
StrMem.cs -- regex membership with Brzozowski derivatives
IntEq.cs -- integer equality over length polynomials
IntLe.cs -- integer inequality over length polynomials
Modifier/ -- ~15 modifier types driving graph expansion
A. PORTED FAITHFULLY
--------------------
1. backtrack_reason enum (BacktrackReasons): all eleven values (Unevaluated,
Extended, SymbolClash, ParikhImage, Subsumption, Arithmetic, Regex,
RegexWidening, CharacterRange, SMT, ChildrenFailed) are present with
identical semantics.
2. simplify_result enum (SimplifyResult): all five values (Proceed, Conflict,
Satisfied, Restart, RestartAndSatisfied) are present with identical semantics.
Note: RestartAndSatisfied is declared but not yet exercised in this port.
3. nielsen_node status fields and accessors: m_is_general_conflict,
m_is_extended, m_reason, m_eval_idx map directly to IsGeneralConflict,
IsExtended, CurrentReason, evalIdx. The is_currently_conflict() predicate
faithfully mirrors IsCurrentlyConflict (GeneralConflict || (reason !=
Unevaluated && IsExtended)).
4. nielsen_node::reset_counter() mirrors NielsenNode.ResetCounter() exactly.
5. nielsen_node::clone_from() mirrors the copy constructor
NielsenNode(graph, parent) for str_eq and str_mem constraints.
6. nielsen_edge identity (operator==) mirrors NielsenEdge.Equals(): both
compare by source and target node pointer identity.
7. nielsen_graph::inc_run_idx() mirrors the RunIdx increment in NielsenGraph.
Check(), including the UINT_MAX overflow guard that calls reset_counter()
on all nodes.
8. str_eq::sort() mirrors StrEqBase.SortStr(): swaps lhs/rhs when lhs > rhs.
(Z3 compares by snode id; ZIPT compares Str lexicographically.)
9. str_eq::is_trivial() mirrors the trivially-satisfied check when both sides
are empty.
10. str_mem fields (m_str, m_regex, m_history, m_id, m_dep) mirror StrMem
fields (Str, Regex, History, Id, Reason) faithfully, including the unique
identifier used for cycle tracking.
11. str_mem::is_primitive() mirrors StrMem.IsPrimitiveRegex(): single variable
on the left side of the membership constraint.
12. nielsen_subst::is_eliminating() mirrors the logic behind
NielsenEdge.BumpedModCount: a substitution is non-eliminating (bumps the
modification counter) when the substituted variable appears in the
replacement.
13. nielsen_graph::mk_edge() faithfully mirrors NielsenEdge construction: it
links src to tgt and registers the outgoing edge.
B. PORTED WITH ALGORITHMIC CHANGES
------------------------------------
1. dep_tracker (DependencyTracker): ZIPT's DependencyTracker is a .NET
class using a BitArray-like structure for tracking constraint origins.
Z3 uses uint_set (a dense bitvector from util/uint_set.h) for the same
purpose. The |=/subset_of/empty semantics are equivalent.
2. Substitution application (nielsen_node::apply_subst): ZIPT uses an
immutable, functional style -- Apply() returns a new constraint if
changed, using C# reference equality to detect no-ops. Z3 uses
in-place mutation via sgraph::subst(), modifying the constraint vectors
directly. The functional change also propagates the substitution's
dependency to the merged constraint.
3. Node constraint containers: ZIPT's NielsenNode stores str_eq constraints
in NList<StrEq> (a sorted list for O(log n) subsumption lookup) and str_mem
constraints in Dictionary<uint, StrMem> (keyed by id for O(1) cycle lookup).
Z3 uses plain vector<str_eq> and vector<str_mem>, which is simpler.
4. nielsen_edge substitution list: ZIPT's NielsenEdge carries two substitution
lists -- Subst (string-level, mapping string variables to strings) and
SubstC (character-level, mapping symbolic character variables to concrete
characters). Z3's nielsen_edge carries a single vector<nielsen_subst>,
covering only string-level substitutions; character substitutions are not
represented.
5. nielsen_graph node registry: ZIPT keeps nodes in a HashSet<NielsenNode> plus
a Dictionary<NList<StrEq>, List<NielsenNode>> for subsumption candidate
lookup. Z3 uses a ptr_vector<nielsen_node>, simplifying memory management.
6. nielsen_graph::display() vs NielsenGraph.ToDot(): ZIPT outputs a DOT-format
graph with color highlighting for the current satisfying path. Z3 outputs
plain human-readable text with node/edge details but no DOT syntax or path
highlighting.
7. str_eq::contains_var() / str_mem::contains_var(): ZIPT performs occurrence
checks through StrManager.Subst() (which uses hash-consing and reference
equality). Z3 walks the snode tree via collect_tokens(), which is correct
but re-traverses the DAG on every call.
C. NOT PORTED
-------------
The following ZIPT components are absent from this implementation.
They represent the algorithmic core of the search procedure and
are expected to be ported in subsequent work.
Constraint simplification and propagation:
- Constraint.SimplifyAndPropagate() / SimplifyAndPropagateInternal(): the
main constraint-driven simplification loop is not ported. str_eq and
str_mem have no Simplify methods.
- StrEq.SimplifyDir() / SimplifyFinal() / AddDefinition(): forward/backward
simplification passes, including Makanin-style prefix cancellation, power
token handling, and variable definition propagation.
- StrEq.GetNielsenDep() / SplitEq(): the Nielsen dependency analysis and
equation-splitting heuristic used to choose the best split point.
- StrMem.SimplifyCharRegex() / SimplifyDir(): Brzozowski derivative-based
simplification consuming ground prefixes/suffixes of the string.
- StrMem.TrySubsume(): stabilizer-based subsumption (not ported, not needed).
- StrMem.ExtractCycle() / StabilizerFromCycle(): cycle detection over the
search path and extraction of a Kleene-star stabilizer to generalize the
cycle. This is the key termination argument for regex membership.
- StrMem.Extend(): the splitting driver that produces the next modifier
(RegexVarSplitModifier, RegexCharSplitModifier, StarIntrModifier, etc.).
Integer constraints:
- IntEq / IntLe: integer equality and inequality constraints over Presburger
arithmetic polynomials (PDD<BigInteger>) are entirely absent. The Z3 port
has no ConstraintsIntEq or ConstraintsIntLe in nielsen_node.
- IntBounds / VarBoundWatcher: per-variable integer interval bounds and the
watcher mechanism that reruns bound propagation when a string variable is
substituted — PORTED as nielsen_node::{add_lower_int_bound,
add_upper_int_bound, watch_var_bounds, init_var_bounds_from_mems}.
- AddLowerIntBound() / AddHigherIntBound(): incremental interval tightening
— PORTED as the above add_lower/upper_int_bound methods.
Character-level handling:
- CharSubst: character-level variable substitution (symbolic char -> concrete
char) is absent. ZIPT uses this to handle symbolic character tokens
(SymCharToken) that represent a single unknown character.
- SymCharToken / CharacterSet: symbolic character tokens with associated
character range constraints (CharRanges) are not ported.
- DisEqualities: per-node character disequality constraints used for conflict
detection during character substitution are not ported.
Modifier hierarchy (Constraints/Modifier/):
- 13 Modifier subclasses driving graph expansion are ported as
apply_* methods in generate_extensions, matching ZIPT's TypeOrder
priority: DetModifier(1), PowerEpsilonModifier(2), NumCmpModifier(3),
ConstNumUnwindingModifier(4), EqSplitModifier(5), StarIntrModifier(6),
GPowerIntrModifier(7), ConstNielsenModifier(8), RegexCharSplitModifier(9),
RegexVarSplitModifier(10), PowerSplitModifier(11), VarNielsenModifier(12),
VarNumUnwindingModifier(13).
- NOT PORTED: DirectedNielsenModifier, DecomposeModifier, CombinedModifier.
- NumCmp, ConstNumUnwinding, VarNumUnwinding are approximated (no PDD
integer polynomial infrastructure; power tokens are replaced with ε
or peeled with fresh variables instead of exact exponent arithmetic).
Search procedure:
- NielsenGraph.Check() / NielsenNode.GraphExpansion(): ported as
nielsen_graph::solve() (iterative deepening, starting at depth 3,
incrementing by 1 per failure, bounded by smt.nseq.max_depth) and
search_dfs() (depth-bounded DFS with eval_idx cycle detection and
node status tracking).
- NielsenNode.SimplifyAndInit(): ported as
nielsen_node::simplify_and_init() with prefix matching, symbol clash,
empty propagation, and Brzozowski derivative consumption.
- NielsenGraph.FindExisting() / subsumption cache lookup: not ported,
not needed.
Auxiliary infrastructure:
- LocalInfo: thread-local search bookkeeping (current path, modification
counts, regex occurrence cache for cycle detection, current node pointer)
is not ported.
- NielsenGraph.SubSolver / InnerStringPropagator: the auxiliary Z3 solver
for arithmetic lemma generation and the inner string propagator for
model-based refinement are not ported.
- PowerToken: word-repetition tokens of the form u^n (distinct from regex
Kleene star) are not represented in Z3's snode.
- GetSignature(): the constraint-pair signature used for subsumption
candidate matching is not ported.
- Constraint.Shared: the flag indicating whether a constraint should be
forwarded to the outer solver — PORTED as
nielsen_graph::assert_root_constraints_to_solver(), called at the start
of solve() to make all root-level length/Parikh constraints immediately
visible to m_solver.
- Interpretation: the model-extraction class mapping string and integer
variables to concrete values is not ported.
-----------------------------------------------------------------------
Author:
Clemens Eisenhofer 2026-03-02
Nikolaj Bjorner (nbjorner) 2026-03-02
--*/
#pragma once
#include "util/vector.h"
#include "util/uint_set.h"
#include "util/map.h"
#include "util/lbool.h"
#include "ast/ast.h"
#include "ast/arith_decl_plugin.h"
#include "ast/seq_decl_plugin.h"
#include "ast/euf/euf_sgraph.h"
#include <functional>
namespace seq {
// forward declarations
class nielsen_node;
class nielsen_edge;
class nielsen_graph;
class seq_parikh; // Parikh image filter (see seq_parikh.h)
/**
* Abstract interface for an incremental solver used by nielsen_graph
* to check integer/arithmetic feasibility of path constraints.
*
* Users of nielsen_graph can wrap smt::kernel or any other solver
* to serve as the arithmetic back-end. When no solver is provided,
* integer feasibility checks are skipped (optimistically assumed feasible).
*/
class simple_solver {
public:
virtual ~simple_solver() {}
virtual lbool check() = 0;
virtual void assert_expr(expr* e) = 0;
virtual void push() = 0;
virtual void pop(unsigned num_scopes) = 0;
};
// simplification result for constraint processing
// mirrors ZIPT's SimplifyResult enum
enum class simplify_result {
proceed, // no change, continue
conflict, // constraint is unsatisfiable
satisfied, // constraint is trivially satisfied
restart, // constraint was simplified, restart
restart_and_satisfied, // simplified and satisfied
};
// reason for backtracking in the Nielsen graph
// mirrors ZIPT's BacktrackReasons enum
enum class backtrack_reason {
unevaluated,
extended,
symbol_clash,
parikh_image,
subsumption, // not used; retained for enum completeness
arithmetic,
regex,
regex_widening,
character_range,
smt,
children_failed,
};
// dependency tracker: bitvector tracking which input constraints
// contributed to deriving a given constraint
// mirrors ZIPT's DependencyTracker
using dep_tracker = uint_set;
// -----------------------------------------------
// character-level substitution
// mirrors ZIPT's CharSubst
// -----------------------------------------------
// maps a symbolic char (s_unit snode) to a concrete or symbolic char
struct char_subst {
euf::snode* m_var; // the symbolic char being substituted (s_unit)
euf::snode* m_val; // replacement: s_char (concrete) or s_unit (symbolic)
char_subst(): m_var(nullptr), m_val(nullptr) {}
char_subst(euf::snode* var, euf::snode* val):
m_var(var), m_val(val) {
SASSERT(var && var->is_unit());
SASSERT(val && (val->is_char() || val->is_unit()));
}
// true when the replacement is a concrete character
bool is_eliminating() const { return m_val && m_val->is_char(); }
bool operator==(char_subst const& o) const {
return m_var == o.m_var && m_val == o.m_val;
}
};
// string equality constraint: lhs = rhs
// mirrors ZIPT's StrEq (both sides are regex-free snode trees)
struct str_eq {
euf::snode* m_lhs;
euf::snode* m_rhs;
dep_tracker m_dep;
str_eq(): m_lhs(nullptr), m_rhs(nullptr) {}
str_eq(euf::snode* lhs, euf::snode* rhs, dep_tracker const& dep):
m_lhs(lhs), m_rhs(rhs), m_dep(dep) {}
bool operator==(str_eq const& other) const {
return m_lhs == other.m_lhs && m_rhs == other.m_rhs;
}
// sort so that lhs <= rhs by snode id
void sort();
// check if both sides are empty (trivially satisfied)
bool is_trivial() const;
// check if the constraint contains a given variable
bool contains_var(euf::snode* var) const;
};
// regex membership constraint: str in regex
// mirrors ZIPT's StrMem
struct str_mem {
euf::snode* m_str;
euf::snode* m_regex;
euf::snode* m_history; // tracks derivation history for cycle detection
unsigned m_id; // unique identifier
dep_tracker m_dep;
str_mem(): m_str(nullptr), m_regex(nullptr), m_history(nullptr), m_id(UINT_MAX) {}
str_mem(euf::snode* str, euf::snode* regex, euf::snode* history, unsigned id, dep_tracker const& dep):
m_str(str), m_regex(regex), m_history(history), m_id(id), m_dep(dep) {}
bool operator==(str_mem const& other) const {
return m_id == other.m_id && m_str == other.m_str && m_regex == other.m_regex;
}
// check if the constraint has the form x in R with x a single variable
bool is_primitive() const;
// check if the constraint contains a given variable
bool contains_var(euf::snode* var) const;
};
// string variable substitution: var -> replacement
// mirrors ZIPT's Subst
struct nielsen_subst {
euf::snode* m_var;
euf::snode* m_replacement;
dep_tracker m_dep;
nielsen_subst(): m_var(nullptr), m_replacement(nullptr) {}
nielsen_subst(euf::snode* var, euf::snode* repl, dep_tracker const& dep):
m_var(var), m_replacement(repl), m_dep(dep) {
SASSERT(var != nullptr);
SASSERT(repl != nullptr);
// var may be s_var or s_power; sgraph::subst uses pointer identity matching
SASSERT(var->is_var() || var->is_power());
}
// an eliminating substitution does not contain the variable in the replacement
bool is_eliminating() const;
bool operator==(nielsen_subst const& other) const {
return m_var == other.m_var && m_replacement == other.m_replacement;
}
};
// kind of length constraint determines propagation strategy
enum class length_kind {
nonneg, // len(x) >= 0: unconditional axiom
eq, // len(lhs) = len(rhs): conditional on string equality
bound // Parikh bound: conditional on regex membership
};
// arithmetic length constraint derived from string equations
struct length_constraint {
expr_ref m_expr; // arithmetic expression (e.g., len(x) + len(y) = len(a) + 1)
dep_tracker m_dep; // tracks which input constraints contributed
length_kind m_kind; // determines propagation strategy
length_constraint(ast_manager& m): m_expr(m), m_kind(length_kind::nonneg) {}
length_constraint(expr* e, dep_tracker const& dep, length_kind kind, ast_manager& m):
m_expr(e, m), m_dep(dep), m_kind(kind) {}
};
// -----------------------------------------------
// integer constraint: equality or inequality over length expressions
// mirrors ZIPT's IntEq and IntLe
// -----------------------------------------------
enum class int_constraint_kind {
eq, // lhs = rhs
le, // lhs <= rhs
ge, // lhs >= rhs
};
// integer constraint stored per nielsen_node, tracking arithmetic
// relationships between length variables and power exponents.
// mirrors ZIPT's IntEq / IntLe over Presburger arithmetic polynomials.
struct int_constraint {
expr_ref m_lhs; // left-hand side (arithmetic expression)
expr_ref m_rhs; // right-hand side (arithmetic expression)
int_constraint_kind m_kind; // eq, le, or ge
dep_tracker m_dep; // tracks which input constraints contributed
int_constraint(ast_manager& m):
m_lhs(m), m_rhs(m), m_kind(int_constraint_kind::eq) {}
int_constraint(expr* lhs, expr* rhs, int_constraint_kind kind, dep_tracker const& dep, ast_manager& m):
m_lhs(lhs, m), m_rhs(rhs, m), m_kind(kind), m_dep(dep) {}
std::ostream& display(std::ostream& out) const;
};
// edge in the Nielsen graph connecting two nodes
// mirrors ZIPT's NielsenEdge
class nielsen_edge {
nielsen_node* m_src;
nielsen_node* m_tgt;
vector<nielsen_subst> m_subst;
vector<char_subst> m_char_subst; // character-level substitutions (mirrors ZIPT's SubstC)
ptr_vector<str_eq> m_side_str_eq; // side constraints: string equalities
ptr_vector<str_mem> m_side_str_mem; // side constraints: regex memberships
vector<int_constraint> m_side_int; // side constraints: integer equalities/inequalities
bool m_is_progress; // does this edge represent progress?
public:
nielsen_edge(nielsen_node* src, nielsen_node* tgt, bool is_progress);
nielsen_node* src() const { return m_src; }
nielsen_node* tgt() const { return m_tgt; }
void set_tgt(nielsen_node* tgt) { m_tgt = tgt; }
vector<nielsen_subst> const& subst() const { return m_subst; }
void add_subst(nielsen_subst const& s) { m_subst.push_back(s); }
vector<char_subst> const& char_substs() const { return m_char_subst; }
void add_char_subst(char_subst const& s) { m_char_subst.push_back(s); }
void add_side_str_eq(str_eq* eq) { m_side_str_eq.push_back(eq); }
void add_side_str_mem(str_mem* mem) { m_side_str_mem.push_back(mem); }
void add_side_int(int_constraint const& ic) { m_side_int.push_back(ic); }
ptr_vector<str_eq> const& side_str_eq() const { return m_side_str_eq; }
ptr_vector<str_mem> const& side_str_mem() const { return m_side_str_mem; }
vector<int_constraint> const& side_int() const { return m_side_int; }
bool is_progress() const { return m_is_progress; }
bool operator==(nielsen_edge const& other) const {
return m_src == other.m_src && m_tgt == other.m_tgt;
}
};
// node in the Nielsen graph
// mirrors ZIPT's NielsenNode
class nielsen_node {
friend class nielsen_graph;
unsigned m_id;
nielsen_graph* m_graph;
// constraints at this node
vector<str_eq> m_str_eq; // string equalities
vector<str_mem> m_str_mem; // regex memberships
vector<int_constraint> m_int_constraints; // integer equalities/inequalities (mirrors ZIPT's IntEq/IntLe)
// per-variable integer bounds for len(var). Mirrors ZIPT's IntBounds.
// key: snode id of the string variable.
// default lb = 0 (unrestricted); default ub = UINT_MAX (unrestricted).
u_map<unsigned> m_var_lb; // lower bound: lb <= len(var)
u_map<unsigned> m_var_ub; // upper bound: len(var) <= ub
// character constraints (mirrors ZIPT's DisEqualities and CharRanges)
// key: snode id of the s_unit symbolic character
u_map<ptr_vector<euf::snode>> m_char_diseqs; // ?c != {?d, ?e, ...}
u_map<char_set> m_char_ranges; // ?c in [lo, hi)
// edges
ptr_vector<nielsen_edge> m_outgoing;
nielsen_node* m_backedge = nullptr;
nielsen_edge* m_parent_edge = nullptr;
// status flags
bool m_is_general_conflict = false;
bool m_is_extended = false;
backtrack_reason m_reason = backtrack_reason::unevaluated;
bool m_is_progress = false;
// evaluation index for run tracking
unsigned m_eval_idx = 0;
// Parikh filter: set to true once apply_parikh_to_node has been applied
// to this node. Prevents duplicate constraint generation across DFS runs.
bool m_parikh_applied = false;
public:
nielsen_node(nielsen_graph* graph, unsigned id);
unsigned id() const { return m_id; }
nielsen_graph* graph() const { return m_graph; }
// constraint access
vector<str_eq> const& str_eqs() const { return m_str_eq; }
vector<str_eq>& str_eqs() { return m_str_eq; }
vector<str_mem> const& str_mems() const { return m_str_mem; }
vector<str_mem>& str_mems() { return m_str_mem; }
void add_str_eq(str_eq const& eq) { m_str_eq.push_back(eq); }
void add_str_mem(str_mem const& mem) { m_str_mem.push_back(mem); }
void add_int_constraint(int_constraint const& ic) { m_int_constraints.push_back(ic); }
vector<int_constraint> const& int_constraints() const { return m_int_constraints; }
vector<int_constraint>& int_constraints() { return m_int_constraints; }
// IntBounds: tighten the lower bound for len(var).
// Returns true if the bound was tightened (lb > current lower bound).
// When tightened without conflict, adds an int_constraint len(var) >= lb.
// When lb > current upper bound, sets arithmetic conflict (no constraint added)
// and still returns true (the bound value changed). Check is_general_conflict()
// separately to distinguish tightening-with-conflict from normal tightening.
// Mirrors ZIPT's AddLowerIntBound().
bool add_lower_int_bound(euf::snode* var, unsigned lb, dep_tracker const& dep);
// IntBounds: tighten the upper bound for len(var).
// Returns true if the bound was tightened (ub < current upper bound).
// When tightened without conflict, adds an int_constraint len(var) <= ub.
// When current lower bound > ub, sets arithmetic conflict (no constraint added)
// and still returns true (the bound value changed). Check is_general_conflict()
// separately to distinguish tightening-with-conflict from normal tightening.
// Mirrors ZIPT's AddHigherIntBound().
bool add_upper_int_bound(euf::snode* var, unsigned ub, dep_tracker const& dep);
// Query current bounds for a variable (default: 0 / UINT_MAX if not set).
unsigned var_lb(euf::snode* var) const;
unsigned var_ub(euf::snode* var) const;
// character constraint access (mirrors ZIPT's DisEqualities / CharRanges)
u_map<ptr_vector<euf::snode>> const& char_diseqs() const { return m_char_diseqs; }
u_map<char_set> const& char_ranges() const { return m_char_ranges; }
// add a character range constraint for a symbolic char.
// intersects with existing range; sets conflict if result is empty.
void add_char_range(euf::snode* sym_char, char_set const& range);
// add a character disequality: sym_char != other
void add_char_diseq(euf::snode* sym_char, euf::snode* other);
// apply a character-level substitution to all constraints.
// checks disequalities and ranges; sets conflict on violation.
void apply_char_subst(euf::sgraph& sg, char_subst const& s);
// edge access
ptr_vector<nielsen_edge> const& outgoing() const { return m_outgoing; }
void add_outgoing(nielsen_edge* e) { m_outgoing.push_back(e); }
nielsen_node* backedge() const { return m_backedge; }
void set_backedge(nielsen_node* n) { m_backedge = n; }
nielsen_edge* parent_edge() const { return m_parent_edge; }
void set_parent_edge(nielsen_edge* e) { m_parent_edge = e; }
// status
bool is_general_conflict() const { return m_is_general_conflict; }
void set_general_conflict(bool v) { m_is_general_conflict = v; }
bool is_extended() const { return m_is_extended; }
void set_extended(bool v) { m_is_extended = v; }
bool is_currently_conflict() const {
return m_is_general_conflict ||
(m_reason != backtrack_reason::unevaluated && m_is_extended);
}
backtrack_reason reason() const { return m_reason; }
void set_reason(backtrack_reason r) { m_reason = r; }
bool is_progress() const { return m_is_progress; }
unsigned eval_idx() const { return m_eval_idx; }
void set_eval_idx(unsigned idx) { m_eval_idx = idx; }
void reset_counter() { m_eval_idx = 0; }
// clone constraints from a parent node
void clone_from(nielsen_node const& parent);
// apply a substitution to all constraints
void apply_subst(euf::sgraph& sg, nielsen_subst const& s);
// simplify all constraints at this node and initialize status.
// Returns proceed, conflict, satisfied, or restart.
simplify_result simplify_and_init(nielsen_graph& g);
// true if all str_eqs are trivial and there are no str_mems
bool is_satisfied() const;
// true if any constraint has opaque (s_other) terms that
// the Nielsen graph cannot decompose
bool has_opaque_terms() const;
// render constraint set as an HTML fragment for DOT node labels.
// mirrors ZIPT's NielsenNode.ToHtmlString()
std::ostream& display_html(std::ostream& out, ast_manager& m) const;
private:
// Helper: handle one empty vs one non-empty side of a string equality.
// Collects tokens from non_empty_side; if any token causes a conflict
// (is a concrete character or an unexpected kind), sets conflict flags
// and returns true. Otherwise substitutes all variables to empty,
// sets changed=true, and returns false.
bool handle_empty_side(euf::sgraph& sg, euf::snode* non_empty_side,
dep_tracker const& dep, bool& changed);
// VarBoundWatcher: after applying substitution s, propagate the bounds
// of s.m_var to variables appearing in s.m_replacement.
// When var has bounds [lo, hi], derives bounds for variables in replacement
// using the known constant-length contribution of non-variable tokens.
// Mirrors ZIPT's VarBoundWatcher re-check mechanism.
void watch_var_bounds(nielsen_subst const& s);
// Initialize per-variable Parikh bounds from this node's regex memberships.
// For each str_mem constraint (str ∈ regex) where regex has length bounds
// [min_len, max_len], adds lower/upper bound constraints for len(str).
// Called from simplify_and_init to populate IntBounds at node creation.
void init_var_bounds_from_mems();
};
// search statistics collected during Nielsen graph solving
struct nielsen_stats {
unsigned m_num_solve_calls = 0;
unsigned m_num_dfs_nodes = 0;
unsigned m_num_sat = 0;
unsigned m_num_unsat = 0;
unsigned m_num_unknown = 0;
unsigned m_num_simplify_conflict = 0;
unsigned m_num_extensions = 0;
unsigned m_num_fresh_vars = 0;
unsigned m_max_depth = 0;
// modifier application counts
unsigned m_mod_det = 0;
unsigned m_mod_power_epsilon = 0;
unsigned m_mod_num_cmp = 0;
unsigned m_mod_const_num_unwinding = 0;
unsigned m_mod_eq_split = 0;
unsigned m_mod_star_intr = 0;
unsigned m_mod_gpower_intr = 0;
unsigned m_mod_const_nielsen = 0;
unsigned m_mod_regex_char_split = 0;
unsigned m_mod_regex_var_split = 0;
unsigned m_mod_power_split = 0;
unsigned m_mod_var_nielsen = 0;
unsigned m_mod_var_num_unwinding = 0;
void reset() { memset(this, 0, sizeof(nielsen_stats)); }
};
// the overall Nielsen transformation graph
// mirrors ZIPT's NielsenGraph
class nielsen_graph {
euf::sgraph& m_sg;
region m_region;
ptr_vector<nielsen_node> m_nodes;
ptr_vector<nielsen_edge> m_edges;
nielsen_node* m_root = nullptr;
nielsen_node* m_sat_node = nullptr;
svector<nielsen_edge*> m_sat_path;
unsigned m_run_idx = 0;
unsigned m_depth_bound = 0;
unsigned m_max_search_depth = 0;
unsigned m_next_mem_id = 0;
unsigned m_fresh_cnt = 0;
unsigned m_num_input_eqs = 0;
unsigned m_num_input_mems = 0;
nielsen_stats m_stats;
// external cancellation callback: returns true if solving should abort
std::function<bool()> m_cancel_fn;
// -----------------------------------------------
// Integer subsolver (abstract interface)
// When m_solver is null, check_int_feasibility skips arithmetic checking
// and optimistically assumes feasibility.
// -----------------------------------------------
simple_solver& m_solver;
// Constraint.Shared: guards re-assertion of root-level constraints.
// Set to true after assert_root_constraints_to_solver() is first called.
bool m_root_constraints_asserted = false;
// Parikh image filter: generates modular length constraints from regex
// memberships. Allocated in the constructor; owned by this graph.
seq_parikh* m_parikh = nullptr;
public:
// Construct with a caller-supplied solver. Ownership is NOT transferred;
// the caller is responsible for keeping the solver alive.
nielsen_graph(euf::sgraph& sg, simple_solver& solver);
~nielsen_graph();
euf::sgraph& sg() { return m_sg; }
// node management
nielsen_node* mk_node();
nielsen_node* mk_child(nielsen_node* parent);
// edge management
nielsen_edge* mk_edge(nielsen_node* src, nielsen_node* tgt, bool is_progress);
// root node access
nielsen_node* root() const { return m_root; }
void set_root(nielsen_node* n) { m_root = n; }
// satisfying leaf node (set by solve() when result is sat)
nielsen_node* sat_node() const { return m_sat_node; }
// path of edges from root to sat_node (set when sat_node is set)
svector<nielsen_edge*> const& sat_path() const { return m_sat_path; }
// add constraints to the root node from external solver
void add_str_eq(euf::snode* lhs, euf::snode* rhs);
void add_str_mem(euf::snode* str, euf::snode* regex);
// run management
unsigned run_idx() const { return m_run_idx; }
void inc_run_idx();
// access all nodes
ptr_vector<nielsen_node> const& nodes() const { return m_nodes; }
unsigned num_nodes() const { return m_nodes.size(); }
// maximum overall search depth (0 = unlimited)
void set_max_search_depth(unsigned d) { m_max_search_depth = d; }
// set a cancellation callback; solve() checks this periodically
void set_cancel_fn(std::function<bool()> fn) { m_cancel_fn = std::move(fn); }
// generate next unique regex membership id
unsigned next_mem_id() { return m_next_mem_id++; }
// number of input constraints (for dep_tracker bit mapping)
unsigned num_input_eqs() const { return m_num_input_eqs; }
unsigned num_input_mems() const { return m_num_input_mems; }
// display for debugging
std::ostream& display(std::ostream& out) const;
// output the graph in graphviz DOT format.
// nodes on the sat_path are highlighted green; conflict nodes red/darkred.
// mirrors ZIPT's NielsenGraph.ToDot()
std::ostream& to_dot(std::ostream& out) const;
std::string to_dot() const;
// reset all nodes and state
void reset();
// search result returned by solve() / search_dfs()
enum class search_result { sat, unsat, unknown };
// main search entry point: iterative deepening DFS
search_result solve();
// generate child nodes by applying modifier rules
// returns true if at least one child was generated
bool generate_extensions(nielsen_node *node);
// collect dependency information from conflicting constraints
void collect_conflict_deps(dep_tracker& deps) const;
// explain a conflict: partition the set bits into str_eq indices
// (bits 0..num_eqs-1) and str_mem indices (bits num_eqs..num_eqs+num_mems-1).
// Must be called after solve() returns unsat.
void explain_conflict(unsigned_vector& eq_indices, unsigned_vector& mem_indices) const;
// accumulated search statistics
nielsen_stats const& stats() const { return m_stats; }
void reset_stats() { m_stats.reset(); }
// generate arithmetic length constraints from the root node's string
// equalities and regex memberships. For each non-trivial equation lhs = rhs,
// produces len(lhs) = len(rhs) by expanding concatenations into sums.
// For each regex membership str in regex, produces Parikh interval
// constraints: len(str) >= min_len and len(str) <= max_len.
// Also generates len(x) >= 0 for each variable appearing in the equations.
void generate_length_constraints(vector<length_constraint>& constraints);
// build an arithmetic expression representing the length of an snode tree.
// concatenations are expanded to sums, chars to 1, empty to 0,
// variables to (str.len var_expr).
expr_ref compute_length_expr(euf::snode* n);
// compute Parikh length interval [min_len, max_len] for a regex snode.
// uses seq_util::rex min_length/max_length on the underlying expression.
// max_len == UINT_MAX means unbounded.
void compute_regex_length_interval(euf::snode* regex, unsigned& min_len, unsigned& max_len);
private:
search_result search_dfs(nielsen_node* node, unsigned depth, svector<nielsen_edge*>& cur_path);
// Apply the Parikh image filter to a node: generate modular length
// constraints from regex memberships and append them to the node's
// int_constraints. Also performs a lightweight feasibility pre-check;
// if a Parikh conflict is detected the node's conflict flag is set with
// backtrack_reason::parikh_image.
//
// Guarded by node.m_parikh_applied so that constraints are generated
// only once per node across DFS iterations.
void apply_parikh_to_node(nielsen_node& node);
// create a fresh variable with a unique name
euf::snode* mk_fresh_var();
// create a fresh symbolic character: seq.unit(fresh_char_const)
// analogous to ZIPT's SymCharToken creation
euf::snode* mk_fresh_char_var();
// deterministic modifier: var = ε, same-head cancel
bool apply_det_modifier(nielsen_node* node);
// const nielsen modifier: char vs var (2 branches per case)
bool apply_const_nielsen(nielsen_node* node);
// variable Nielsen modifier: var vs var, all progress (3 branches)
bool apply_var_nielsen(nielsen_node* node);
// eq split modifier: splits a regex-free equation at a chosen index into
// two shorter equalities with optional padding (single progress child).
// Mirrors ZIPT's EqSplitModifier + StrEq.SplitEq.
bool apply_eq_split(nielsen_node* node);
// helper: classify whether a token has variable (symbolic) length
// returns true for variables, powers, etc.; false for chars, units, string literals
bool token_has_variable_length(euf::snode* tok) const;
// helper: get the constant length of a token (only valid when !token_has_variable_length)
unsigned token_const_length(euf::snode* tok) const;
// helper: find a split point in a regex-free equation.
// ports ZIPT's StrEq.SplitEq algorithm.
// returns true if a valid split point is found, with results in out params.
bool find_eq_split_point(
euf::snode_vector const& lhs_toks,
euf::snode_vector const& rhs_toks,
unsigned& out_lhs_idx,
unsigned& out_rhs_idx,
int& out_padding) const;
// apply regex character split modifier to a node.
// for a str_mem constraint x·s ∈ R where x is a variable:
// (1) x → c·z for each char c accepted by R at first position
// (2) x → ε (x is empty)
// returns true if children were generated.
bool apply_regex_char_split(nielsen_node* node);
// power epsilon modifier: for a power token u^n in an equation,
// branch: (1) base u = ε, (2) power is empty (n = 0 semantics).
// mirrors ZIPT's PowerEpsilonModifier
bool apply_power_epsilon(nielsen_node* node);
// numeric comparison modifier: for equations involving power tokens
// u^m and u^n with the same base, branch on m < n vs n <= m.
// mirrors ZIPT's NumCmpModifier
bool apply_num_cmp(nielsen_node* node);
// constant numeric unwinding: for a power token u^n vs a constant
// (non-variable), branch: (1) n = 0 (u^n = ε), (2) n >= 1 (peel one u).
// mirrors ZIPT's ConstNumUnwindingModifier
bool apply_const_num_unwinding(nielsen_node* node);
// star introduction: for a str_mem x·s ∈ R where a cycle is detected
// (backedge exists), introduce stabilizer: x ∈ base* with x split.
// mirrors ZIPT's StarIntrModifier
bool apply_star_intr(nielsen_node* node);
// generalized power introduction: for an equation where one head is
// a variable v and the other side has ground prefix + a variable x
// forming a cycle back to v, introduce v = base^n · suffix.
// mirrors ZIPT's GPowerIntrModifier
bool apply_gpower_intr(nielsen_node* node);
// helper for apply_gpower_intr: fires the substitution
bool fire_gpower_intro(nielsen_node* node, str_eq const& eq,
euf::snode* var, euf::snode_vector const& ground_prefix);
// regex variable split: for str_mem x·s ∈ R where x is a variable,
// split using minterms: x → ε, or x → c·x' for each minterm c.
// More general than regex_char_split, uses minterm partitioning.
// mirrors ZIPT's RegexVarSplitModifier
bool apply_regex_var_split(nielsen_node* node);
// power split: for a variable x facing a power token u^n,
// branch: x = u^m · prefix(u) with m < n, or x = u^n · x.
// mirrors ZIPT's PowerSplitModifier
bool apply_power_split(nielsen_node* node);
// variable numeric unwinding: for a power token u^n vs a variable,
// branch: (1) n = 0 (u^n = ε), (2) n >= 1 (peel one u).
// mirrors ZIPT's VarNumUnwindingModifier
bool apply_var_num_unwinding(nielsen_node* node);
// collect concrete first-position characters from a regex snode
void collect_first_chars(euf::snode* re, euf::snode_vector& chars);
// find the first power token in any str_eq at this node
euf::snode* find_power_token(nielsen_node* node) const;
// find a power token facing a constant (char) head
bool find_power_vs_const(nielsen_node* node, euf::snode*& power, euf::snode*& other_head, str_eq const*& eq_out) const;
// find a power token facing a variable head
bool find_power_vs_var(nielsen_node* node, euf::snode*& power, euf::snode*& var_head, str_eq const*& eq_out) const;
// -----------------------------------------------
// Integer feasibility subsolver methods
// -----------------------------------------------
// Constraint.Shared: assert all root-level length/Parikh constraints to
// m_solver at the base level (outside push/pop). Called once at the start
// of solve(). Makes derived constraints immediately visible to m_solver
// for arithmetic pruning at every DFS node, not just the root.
// Mirrors ZIPT's Constraint.Shared forwarding mechanism.
void assert_root_constraints_to_solver();
// collect int_constraints along the path from root to the given node,
// including constraints from edges and nodes.
void collect_path_int_constraints(nielsen_node* node,
svector<nielsen_edge*> const& cur_path,
vector<int_constraint>& out);
// check integer feasibility of the constraints along the current path.
// returns true if feasible, false if infeasible.
bool check_int_feasibility(nielsen_node* node, svector<nielsen_edge*> const& cur_path);
// create an integer constraint: lhs <kind> rhs
int_constraint mk_int_constraint(expr* lhs, expr* rhs, int_constraint_kind kind, dep_tracker const& dep);
// get the exponent expression from a power snode (arg(1))
expr* get_power_exponent(euf::snode* power);
// create a fresh integer variable expression (for power exponents)
expr_ref mk_fresh_int_var();
// convert an int_constraint to an expr* assertion
expr_ref int_constraint_to_expr(int_constraint const& ic);
};
}