diff --git a/spec/plan.md b/spec/plan.md new file mode 100644 index 000000000..41ea69e20 --- /dev/null +++ b/spec/plan.md @@ -0,0 +1,12 @@ +Your task is to implement the theory solver for strings that is described in the resources under reference.md + +The theory solver should be implemented as a theory solver in src/smt. Call it theory\_nseq. + +Add a parameter setting to control whether using the default string solver theory\_seq or theory\_nseq or the empty string solver. + +Utilities that can be made self-contained go in the directory ast/rewriter. + + + +Start out by creating a plan for this implementation. + diff --git a/spec/plan1.md b/spec/plan1.md new file mode 100644 index 000000000..e30c778ee --- /dev/null +++ b/spec/plan1.md @@ -0,0 +1,265 @@ +# Implementation Plan: theory_nseq — New String Theory Solver for Z3 + +## Problem Statement + +Implement a new theory solver `theory_nseq` for strings/sequences in Z3's SMT solver framework (`src/smt/`). The solver is based on the algorithms described in **LazyMemberships.pdf** and **ClemensTableau.pdf**, with a reference C# implementation at [ZIPT](https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT). The new solver should coexist with the existing `theory_seq` and be selectable via the `smt.string_solver` parameter. + +## Reference Architecture + +The ZIPT reference implementation (C#) contains these key components: +- **StringPropagator** — core constraint propagation engine +- **Nielsen Graph/Node/Edge** — word equation solving via Nielsen transformations +- **Constraint system** — StrEq, StrMem (regex membership), StrContains, prefix/suffix, disequalities +- **Modifier chain** — decomposition, equation splitting, Nielsen transforms, regex splitting +- **IntUtils (PDD, Interval)** — Parikh image / integer constraint reasoning for lengths +- **StrManager** — string state tracking, canonicalization + +## Approach + +Build `theory_nseq` as a new theory plugin in `src/smt/`, following the same patterns as `theory_seq`. Reuse existing Z3 infrastructure (seq_decl_plugin, seq_rewriter, seq_skolem, arith_value) and add new algorithm-specific modules. Self-contained rewriter utilities go in `src/ast/rewriter/`. + +--- + +## Module Breakdown + +### Phase 1: Skeleton & Integration + +#### 1.1 theory_nseq skeleton (`src/smt/theory_nseq.h`, `src/smt/theory_nseq.cpp`) +- New class `theory_nseq : public theory` +- Implement all required virtual methods from `smt::theory`: + - `internalize_atom`, `internalize_term` + - `new_eq_eh`, `new_diseq_eh`, `assign_eh` + - `can_propagate`, `propagate` + - `final_check_eh` + - `push_scope_eh`, `pop_scope_eh` + - `mk_fresh`, `get_name` (return `"nseq"`) + - `mk_value`, `init_model`, `finalize_model` + - `display`, `collect_statistics` +- Initially stub all methods (FC_GIVEUP on final_check) +- Use `seq_util` and `arith_util` from existing infrastructure +- Hold references to `seq_rewriter`, `seq::skolem`, `arith_value` + +#### 1.2 Parameter integration +- **`src/params/smt_params_helper.pyg`**: Add `'nseq'` to string_solver options documentation +- **`src/params/smt_params.cpp`**: Add `"nseq"` to `validate_string_solver()` +- **`src/smt/smt_setup.cpp`**: Add `"nseq"` case in `setup_QF_S()` and `setup_seq_str()`, calling new `setup_nseq()` method +- **`src/smt/smt_setup.h`**: Declare `setup_nseq()` + +#### 1.3 Build system integration +- **`src/smt/CMakeLists.txt`**: Add `theory_nseq.cpp` and all new `.cpp` files to SOURCES + +### Phase 2: Core Data Structures + +#### 2.1 String representation & state management (`src/smt/nseq_state.h/.cpp`) +- String variable tracking (analogous to ZIPT's StrManager) +- Concatenation decomposition into token sequences +- Canonicalization of string terms +- Variable-to-representative mapping (union-find or solution map) +- Backtrackable state via `scoped_vector` / trail + +#### 2.2 Constraint representation (`src/smt/nseq_constraint.h`) +- Constraint types: StrEq, StrNe, StrMem (regex membership), StrContains, StrPrefix, StrSuffix +- IntEq, IntLe, IntNe for length constraints +- Dependency tracking for conflict explanations +- Constraint simplification results + +#### 2.3 Dependency tracker (`src/smt/nseq_dep.h/.cpp`) +- Track justifications for derived facts +- Integrate with Z3's `scoped_dependency_manager` pattern (same as theory_seq) +- Support linearization for conflict clause generation + +### Phase 3: Word Equation Solver (Nielsen Transformations) + +#### 3.1 Nielsen graph (`src/ast/rewriter/nseq_nielsen.h/.cpp`) +- Nielsen transformation graph: nodes represent equation states, edges represent transformations +- Node: stores a set of simplified word equations +- Edge: transformation applied (variable elimination, constant matching, split) +- Graph exploration with cycle detection +- Self-contained rewriter utility (no dependency on smt context) + +#### 3.2 Nielsen node operations +- **Constant matching**: if both sides start with same constant, strip it +- **Variable elimination**: if LHS starts with variable x, case-split: x=ε or x=a·x' +- **Equation decomposition**: split compound equations +- **Determinism detection**: identify forced assignments +- **Directed Nielsen**: apply transformations in a directed manner + +#### 3.3 Modifier chain (integrated into nielsen module) +- DecomposeModifier — decompose strings into concatenation components +- EqSplitModifier — case analysis on equalities +- VarNielsenModifier / ConstNielsenModifier / DirectedNielsenModifier +- StarIntrModifier — Kleene star introduction/elimination +- PowerSplitModifier / PowerEpsilonModifier +- RegexCharSplitModifier / RegexVarSplitModifier + +### Phase 4: Length Constraint Integration (Parikh Image) + +#### 4.1 Length reasoning (`src/ast/rewriter/nseq_length.h/.cpp`) +- Interval arithmetic for length bounds +- Length variable management +- Integration with Z3's arithmetic solver via `arith_value` +- Parikh image constraints: derive integer constraints from string equations +- Self-contained rewriter utility + +#### 4.2 Arithmetic interface in theory_nseq +- Query arithmetic solver for current length assignments +- Propagate length equalities/inequalities derived from string equations +- Use `arith_value` (same pattern as theory_seq) + +### Phase 5: Regex Membership + +#### 5.1 Regex handling (`src/smt/nseq_regex.h/.cpp`) +- Lazy membership checking (per LazyMemberships.pdf) +- Regex derivative computation (reuse `seq_rewriter.mk_derivative`) +- Regex splitting for equation solving +- Integration with Nielsen transformations +- Nullable checking + +### Phase 6: Propagation Engine + +#### 6.1 String propagator (`src/smt/nseq_propagator.h/.cpp`) +- Main propagation loop integrating all components +- Equation simplification pipeline +- Constraint interaction (string eq + length + regex) +- Case splitting decisions +- Conflict detection and clause generation + +#### 6.2 Wire into theory_nseq +- `internalize_term/atom`: register string terms, create theory variables, enqueue axioms +- `assign_eh`: process boolean assignments (e.g., regex memberships, contains) +- `new_eq_eh/new_diseq_eh`: process equality/disequality merges +- `propagate`: run propagation engine +- `final_check_eh`: comprehensive consistency check, case splits + +### Phase 7: Model Generation + +#### 7.1 Model construction +- Build string values from solved equations +- Handle unconstrained variables +- Generate models consistent with length constraints and regex memberships +- Register `seq_factory` for value generation + +--- + +## Testing Plan + +### Unit Tests + +#### T1: Basic smoke test (`src/test/nseq_basic.cpp`) +- Register test in `src/test/main.cpp` via `TST(nseq_basic)` +- Test that theory_nseq can be selected via `smt.string_solver=nseq` +- Verify basic sat/unsat on trivial string equalities + +#### T2: Nielsen transformation tests +- Unit tests for the Nielsen graph/node rewriter module +- Test constant matching, variable elimination, equation decomposition + +#### T3: Length reasoning tests +- Test interval propagation, Parikh image constraint generation + +### Integration Tests (via z3 command line) + +#### T4: z3test regression benchmarks (local at `C:\z3test`) +The z3test repository is available locally at `C:\z3test`. It contains the same regression suite used by nightly CI (`.github/workflows/nightly.yml:198`). + +**Key string/regex benchmarks in `C:\z3test\regressions\smt2`:** +- `string-concat.smt2`, `string-eval.smt2`, `string-itos.smt2`, `string-literals.smt2`, `string-ops.smt2`, `string-simplify.smt2` +- `string3.smt2`, `string6.smt2`, `string7.smt2`, `string9.smt2`, `string11.smt2` through `string14.smt2` +- `re.smt2`, `re_rewriter.smt2` +- `testfoldl.smt2`, `testmap.smt2`, `testmapi.smt2` (sequence higher-order) +- `byte_string.smt2` (disabled) + +**Testing approach:** +1. Run full regression suite: `python C:\z3test\scripts\test_benchmarks.py C:\z3test\regressions\smt2` + - The test harness runs each `.smt2` file with `model_validate=true` and compares output to `.expected.out` + - Script supports parallel execution (uses all CPU cores by default) +2. Run string-specific subset with `smt.string_solver=nseq`: + - For each `string*.smt2` and `re*.smt2` in `C:\z3test\regressions\smt2`, run: `z3 smt.string_solver=nseq ` + - Compare output against `.expected.out` files +3. Also test `C:\z3test\regressions\issues` (8 issue-specific regression tests) + +**Other regression directories to verify (non-string, but ensure no regressions):** +- `C:\z3test\regressions\smt2-extra` +- `C:\z3test\regressions\smt2-debug` + +#### T5: Targeted string SMT2 tests +Create focused test files exercising specific features: +- `tests/nseq/eq_basic.smt2` — simple word equations (x·a = a·x) +- `tests/nseq/eq_nielsen.smt2` — equations requiring Nielsen transformations +- `tests/nseq/concat.smt2` — concatenation reasoning +- `tests/nseq/contains.smt2` — str.contains, str.prefixof, str.suffixof +- `tests/nseq/length.smt2` — length constraints and Parikh image +- `tests/nseq/regex.smt2` — regex membership +- `tests/nseq/mixed.smt2` — combinations of string ops with arithmetic +- `tests/nseq/unsat.smt2` — unsatisfiable benchmarks + +#### T6: Comparison testing with parallel runner +- Use `~/dev/test_scripts/src/run_in_parallel.py` for parallel testing +- Run the same SMT files with both `smt.string_solver=seq` and `smt.string_solver=nseq` +- Verify no soundness bugs (sat/unsat agree) + +### Performance Testing + +#### T7: Performance benchmarks (release builds only) +- Use SMT-COMP string benchmarks (QF_S, QF_SLIA tracks) +- Compare solve times between theory_seq and theory_nseq +- Use `z3 -st` for statistics +- **Only on release builds** (never debug builds) + +--- + +## File Summary + +### New files to create: +| File | Description | +|------|-------------| +| `src/smt/theory_nseq.h` | Theory class header | +| `src/smt/theory_nseq.cpp` | Theory class implementation | +| `src/smt/nseq_state.h` | String state management header | +| `src/smt/nseq_state.cpp` | String state management impl | +| `src/smt/nseq_constraint.h` | Constraint type definitions | +| `src/smt/nseq_dep.h` | Dependency tracking header | +| `src/smt/nseq_dep.cpp` | Dependency tracking impl | +| `src/smt/nseq_propagator.h` | Propagation engine header | +| `src/smt/nseq_propagator.cpp` | Propagation engine impl | +| `src/smt/nseq_regex.h` | Regex handling header | +| `src/smt/nseq_regex.cpp` | Regex handling impl | +| `src/ast/rewriter/nseq_nielsen.h` | Nielsen transformation header | +| `src/ast/rewriter/nseq_nielsen.cpp` | Nielsen transformation impl | +| `src/ast/rewriter/nseq_length.h` | Length/Parikh reasoning header | +| `src/ast/rewriter/nseq_length.cpp` | Length/Parikh reasoning impl | +| `src/test/nseq_basic.cpp` | Unit tests | + +### Existing files to modify: +| File | Change | +|------|--------| +| `src/smt/smt_setup.h` | Declare `setup_nseq()` | +| `src/smt/smt_setup.cpp` | Add `"nseq"` to string_solver dispatch, implement `setup_nseq()` | +| `src/smt/CMakeLists.txt` | Add new .cpp files | +| `src/ast/rewriter/CMakeLists.txt` | Add new rewriter .cpp files | +| `src/params/smt_params_helper.pyg` | Document `nseq` option | +| `src/params/smt_params.cpp` | Add `"nseq"` to validation | +| `src/test/main.cpp` | Register `TST(nseq_basic)` | +| `src/test/CMakeLists.txt` | Add `nseq_basic.cpp` (if needed) | + +--- + +## Implementation Order + +1. **Phase 1** (Skeleton & Integration) — get a compilable stub that Z3 can load +2. **Phase 2** (Core Data Structures) — state management and constraint representation +3. **Phase 3** (Nielsen Transformations) — core word equation solving +4. **Phase 4** (Length/Parikh) — arithmetic integration +5. **Phase 5** (Regex) — regex membership +6. **Phase 6** (Propagation Engine) — wire everything together +7. **Phase 7** (Model Generation) — produce satisfying assignments +8. **Testing** — runs in parallel with each phase; benchmarks after Phase 6 + +## Notes + +- The ZIPT C# reference at `github.com/CEisenhofer/ZIPT` (parikh branch) is the primary algorithm reference +- LazyMemberships.pdf describes the lazy regex membership approach +- ClemensTableau.pdf describes additional algorithms +- Reuse Z3 infrastructure wherever possible: `seq_util`, `seq_rewriter`, `seq_skolem`, `arith_value`, `seq_factory` +- Self-contained utilities (Nielsen graph, length reasoning) go in `src/ast/rewriter/` per the spec +- All timing-sensitive builds require 30+ minute timeouts diff --git a/spec/reference.md b/spec/reference.md new file mode 100644 index 000000000..6a6238917 --- /dev/null +++ b/spec/reference.md @@ -0,0 +1,8 @@ +A C# implementation of the procedure is here: https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT + +A description of the algorithm is in LazyMemberships.pdf + +Other algorithms are descried in ClemensTableau.pdf + + + diff --git a/src/smt/seq/seq_nielsen.cpp b/src/smt/seq/seq_nielsen.cpp index 0df729467..a747debf4 100644 --- a/src/smt/seq/seq_nielsen.cpp +++ b/src/smt/seq/seq_nielsen.cpp @@ -762,11 +762,109 @@ namespace seq { return r; } - // Helper: render an snode as an HTML label for DOT output. + static std::string regex_expr_html(expr* e, ast_manager& m, seq_util& seq) { + if (!e) return "null"; + expr* a = nullptr, * b = nullptr; + + if (seq.re.is_to_re(e, a)) { + zstring s; + if (seq.str.is_string(a, s)) { + return "\"" + dot_html_escape(s.encode()) + "\""; + } + std::ostringstream os; + os << mk_pp(a, m); + return dot_html_escape(os.str()); + } + if (seq.re.is_concat(e)) { + app* ap = to_app(e); + std::string res; + if (ap->get_num_args() == 0) return "()"; + for (unsigned i = 0; i < ap->get_num_args(); ++i) { + if (i > 0) res += " "; + bool needs_parens = seq.re.is_union(ap->get_arg(i)); + if (needs_parens) res += "("; + res += regex_expr_html(ap->get_arg(i), m, seq); + if (needs_parens) res += ")"; + } + return res; + } + if (seq.re.is_union(e)) { + app* ap = to_app(e); + std::string res; + if (ap->get_num_args() == 0) return "∅"; + for (unsigned i = 0; i < ap->get_num_args(); ++i) { + if (i > 0) res += " | "; + res += regex_expr_html(ap->get_arg(i), m, seq); + } + return res; + } + if (seq.re.is_intersection(e)) { + app* ap = to_app(e); + std::string res; + for (unsigned i = 0; i < ap->get_num_args(); ++i) { + if (i > 0) res += " & "; + bool needs_parens = seq.re.is_union(ap->get_arg(i)) || seq.re.is_concat(ap->get_arg(i)); + if (needs_parens) res += "("; + res += regex_expr_html(ap->get_arg(i), m, seq); + if (needs_parens) res += ")"; + } + return res; + } + if (seq.re.is_star(e, a)) { + bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a); + std::string res = needs_parens ? "(" : ""; + res += regex_expr_html(a, m, seq); + res += needs_parens ? ")*" : "*"; + return res; + } + if (seq.re.is_plus(e, a)) { + bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a); + std::string res = needs_parens ? "(" : ""; + res += regex_expr_html(a, m, seq); + res += needs_parens ? ")+" : "+"; + return res; + } + if (seq.re.is_opt(e, a)) { + bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a); + std::string res = needs_parens ? "(" : ""; + res += regex_expr_html(a, m, seq); + res += needs_parens ? ")?" : "?"; + return res; + } + if (seq.re.is_complement(e, a)) { + bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a); + std::string res = "~"; + res += needs_parens ? "(" : ""; + res += regex_expr_html(a, m, seq); + res += needs_parens ? ")" : ""; + return res; + } + if (seq.re.is_range(e, a, b)) { + zstring s1, s2; + std::string c1 = seq.str.is_string(a, s1) ? dot_html_escape(s1.encode()) : arith_expr_html(a, m); + std::string c2 = seq.str.is_string(b, s2) ? dot_html_escape(s2.encode()) : arith_expr_html(b, m); + return "[" + c1 + "-" + c2 + "]"; + } + if (seq.re.is_full_char(e)) { + return "Σ"; // Sigma + } + if (seq.re.is_full_seq(e)) { + return "Σ*"; // Sigma* + } + if (seq.re.is_empty(e)) { + return "∅"; // empty set + } + + std::ostringstream os; + os << mk_pp(e, m); + return dot_html_escape(os.str()); + } + + // Helper: render a snode as an HTML label for DOT output. // Groups consecutive s_char tokens into quoted strings, renders s_var by name, // shows s_power with superscripts, s_unit by its inner expression, // and falls back to mk_pp (HTML-escaped) for other token kinds. - static std::string snode_label_html(euf::snode const* n, ast_manager& m) { + std::string snode_label_html(euf::snode const* n, ast_manager& m) { if (!n) return "null"; seq_util seq(m); @@ -840,6 +938,8 @@ namespace seq { expr* exp_expr = to_app(e)->get_arg(1); result += arith_expr_html(exp_expr, m); result += ""; + } else if (e && seq.is_re(e)) { + result += regex_expr_html(e, m, seq); } else { std::ostringstream os; os << mk_pp(e, m); @@ -3397,9 +3497,6 @@ namespace seq { // ----------------------------------------------------------------------- bool nielsen_graph::apply_gpower_intr(nielsen_node* node) { - ast_manager& m = m_sg.get_manager(); - arith_util arith(m); - for (str_eq const& eq : node->str_eqs()) { if (eq.is_trivial()) continue; if (!eq.m_lhs || !eq.m_rhs) continue; @@ -3468,11 +3565,12 @@ namespace seq { if (n % p != 0) continue; bool match = true; for (unsigned i = p; i < n && match; ++i) - match = (ground_prefix_orig[i]->id() == ground_prefix_orig[i % p]->id()); + match = ground_prefix_orig[i]->id() == ground_prefix_orig[i % p]->id(); if (match) { period = p; break; } } - for (unsigned i = 0; i < period; ++i) + for (unsigned i = 0; i < period; ++i) { ground_prefix.push_back(ground_prefix_orig[i]); + } // If the compressed prefix is a single power snode, unwrap it to use // its base tokens, avoiding nested powers. diff --git a/src/smt/seq/seq_nielsen.h b/src/smt/seq/seq_nielsen.h index e5070524f..12092e772 100644 --- a/src/smt/seq/seq_nielsen.h +++ b/src/smt/seq/seq_nielsen.h @@ -494,6 +494,8 @@ namespace seq { } }; + std::string snode_label_html(euf::snode const* n, ast_manager& m); + // node in the Nielsen graph // mirrors ZIPT's NielsenNode class nielsen_node { diff --git a/src/smt/seq/seq_regex.cpp b/src/smt/seq/seq_regex.cpp index 252991e6a..abeb2b38f 100644 --- a/src/smt/seq/seq_regex.cpp +++ b/src/smt/seq/seq_regex.cpp @@ -519,135 +519,25 @@ namespace seq { if (regexes.empty()) return l_false; // empty intersection = full language (vacuously non-empty) - // Quick checks: if any regex is fail/empty, intersection is empty - for (euf::snode* re : regexes) { - if (!re || !re->get_expr()) - return l_undef; - if (re->is_fail() || is_empty_regex(re)) - return l_true; - } - - // Check if all are nullable (intersection accepts ε) - bool all_nullable = true; - for (euf::snode* re : regexes) { - if (!re->is_nullable()) { all_nullable = false; break; } - } - if (all_nullable) - return l_false; - // Single regex: delegate to is_empty_bfs if (regexes.size() == 1) return is_empty_bfs(regexes[0], max_states); - // Build product BFS. State = tuple of regex snode ids. - // Use a map from state hash to visited set. - using state_t = svector; + seq_util& seq = m_sg.get_seq_util(); + ast_manager& mgr = m_sg.get_manager(); - auto state_hash = [](state_t const& s) -> unsigned { - unsigned h = 0; - for (unsigned id : s) - h = h * 31 + id; - return h; - }; - - auto state_eq = [](state_t const& a, state_t const& b) -> bool { - if (a.size() != b.size()) return false; - for (unsigned i = 0; i < a.size(); ++i) - if (a[i] != b[i]) return false; - return true; - }; - - // Use simple set via sorted vector of hashes (good enough for bounded BFS) - std::unordered_set visited_hashes; - - struct bfs_state { - ptr_vector regexes; - }; - - std::vector worklist; - bfs_state initial; - initial.regexes.append(regexes); - worklist.push_back(std::move(initial)); - - state_t init_ids; - for (euf::snode* re : regexes) - init_ids.push_back(re->id()); - visited_hashes.insert(state_hash(init_ids)); - - unsigned states_explored = 0; - bool had_failed = false; - - // Collect alphabet representatives from the intersection of all regexes - // (merge boundaries from all) - unsigned_vector all_bounds; - all_bounds.push_back(0); - for (euf::snode* re : regexes) - collect_char_boundaries(re, all_bounds); - std::sort(all_bounds.begin(), all_bounds.end()); - - euf::snode_vector reps; - unsigned prev = UINT_MAX; - for (unsigned b : all_bounds) { - if (b != prev) { - reps.push_back(m_sg.mk_char(b)); - prev = b; - } - } - if (reps.empty()) - reps.push_back(m_sg.mk_char('a')); - - while (!worklist.empty()) { - if (states_explored >= max_states) + euf::snode* result = regexes[0]; + for (unsigned i = 1; i < regexes.size(); ++i) { + expr* r1 = result->get_expr(); + expr* r2 = regexes[i]->get_expr(); + if (!r1 || !r2) return l_undef; + expr_ref inter(seq.re.mk_inter(r1, r2), mgr); + result = m_sg.mk(inter); + if (!result) return l_undef; - - bfs_state current = std::move(worklist.back()); - worklist.pop_back(); - ++states_explored; - - for (euf::snode* ch : reps) { - ptr_vector derivs; - bool any_fail = false; - bool all_null = true; - bool deriv_failed = false; - - for (euf::snode* re : current.regexes) { - euf::snode* d = m_sg.brzozowski_deriv(re, ch); - if (!d) { deriv_failed = true; break; } - if (d->is_fail()) { any_fail = true; break; } - if (!d->is_nullable()) all_null = false; - derivs.push_back(d); - } - - if (deriv_failed) { had_failed = true; continue; } - if (any_fail) continue; // this character leads to empty intersection - - if (all_null) - return l_false; // found an accepting state in the product - - // Check if any component is structurally empty - bool any_empty = false; - for (euf::snode* d : derivs) { - if (is_empty_regex(d)) { any_empty = true; break; } - } - if (any_empty) continue; - - // Compute state hash and check visited - state_t ids; - for (euf::snode* d : derivs) - ids.push_back(d->id()); - unsigned h = state_hash(ids); - if (visited_hashes.count(h) == 0) { - visited_hashes.insert(h); - bfs_state next; - next.regexes.append(derivs); - worklist.push_back(std::move(next)); - } - } } - if (had_failed) - return l_undef; - return l_true; // exhausted all states, intersection is empty + return is_empty_bfs(result, max_states); } // ----------------------------------------------------------------------- diff --git a/src/smt/theory_nseq.cpp b/src/smt/theory_nseq.cpp index a299b13b2..8b64f811b 100644 --- a/src/smt/theory_nseq.cpp +++ b/src/smt/theory_nseq.cpp @@ -425,7 +425,12 @@ namespace smt { // here the actual Nielsen solving happens auto result = m_nielsen.solve(); - +#ifdef Z3DEBUG + // Examining the Nielsen graph is probably the best way of debugging + std::string dot = m_nielsen.to_dot(); + IF_VERBOSE(1, verbose_stream() << dot << "\n";); +#endif + if (result == seq::nielsen_graph::search_result::unsat) { IF_VERBOSE(1, verbose_stream() << "nseq final_check: solve UNSAT\n";); explain_nielsen_conflict(); @@ -464,8 +469,8 @@ namespace smt { if (mem_idx < m_nielsen_to_state_mem.size()) { unsigned state_mem_idx = m_nielsen_to_state_mem[mem_idx]; mem_source const& src = m_state.get_mem_source(state_mem_idx); - if (ctx.get_assignment(src.m_lit) == l_true) - lits.push_back(src.m_lit); + SASSERT(ctx.get_assignment(src.m_lit) == l_true); + lits.push_back(src.m_lit); } } } @@ -823,9 +828,8 @@ namespace smt { auto& vec = var_to_mems.insert_if_not_there(mem.m_str->id(), unsigned_vector()); vec.push_back(i); } - else { + else all_var_str = false; - } } if (var_to_mems.empty()) @@ -855,19 +859,23 @@ namespace smt { // jointly unsatisfiable. Assert a conflict from all their literals. enode_pair_vector eqs; literal_vector lits; + std::cout << "CONFLICT:\n"; for (unsigned i : mem_indices) { mem_source const& src = m_state.get_mem_source(i); - if (ctx.get_assignment(src.m_lit) == l_true) - lits.push_back(src.m_lit); + SASSERT(ctx.get_assignment(src.m_lit) == l_true); // we already stored the polarity of the literal + lits.push_back(src.m_lit); + std::cout << "\t\t"; + std::cout << mk_pp(ctx.literal2expr(src.m_lit), m) << std::endl; + std::cout << "\t\t"; + std::cout << src.m_lit << std::endl; } TRACE(seq, tout << "nseq regex precheck: empty intersection for var " << var_id << ", conflict with " << lits.size() << " lits\n";); set_conflict(eqs, lits); return l_true; // conflict asserted } - else if (result == l_undef) { + if (result == l_undef) any_undef = true; - } // l_false = non-empty intersection, this variable's constraints are satisfiable }