Regex intersection bug fixe

2026-05-10 12:22:25 +00:00 · 2026-03-16 16:30:20 +01:00 · 2026-03-16 16:30:20 +01:00 · 16f693b09a
commit 16f693b09a
parent 256f1bdf1a
7 changed files with 420 additions and 137 deletions
--- a/spec/plan.md
+++ b/spec/plan.md
@ -0,0 +1,12 @@
+Your task is to implement the theory solver for strings that is described in the resources under reference.md
+
+The theory solver should be implemented as a theory solver in src/smt. Call it theory\_nseq. 
+
+Add a parameter setting to control whether using the default string solver theory\_seq or theory\_nseq or the empty string solver. 
+
+Utilities that can be made self-contained go in the directory ast/rewriter.
+
+
+
+Start out by creating a plan for this implementation.
+
--- a/spec/plan1.md
+++ b/spec/plan1.md
@ -0,0 +1,265 @@
+# Implementation Plan: theory_nseq — New String Theory Solver for Z3
+
+## Problem Statement
+
+Implement a new theory solver `theory_nseq` for strings/sequences in Z3's SMT solver framework (`src/smt/`). The solver is based on the algorithms described in **LazyMemberships.pdf** and **ClemensTableau.pdf**, with a reference C# implementation at [ZIPT](https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT). The new solver should coexist with the existing `theory_seq` and be selectable via the `smt.string_solver` parameter.
+
+## Reference Architecture
+
+The ZIPT reference implementation (C#) contains these key components:
+- **StringPropagator** — core constraint propagation engine
+- **Nielsen Graph/Node/Edge** — word equation solving via Nielsen transformations
+- **Constraint system** — StrEq, StrMem (regex membership), StrContains, prefix/suffix, disequalities
+- **Modifier chain** — decomposition, equation splitting, Nielsen transforms, regex splitting
+- **IntUtils (PDD, Interval)** — Parikh image / integer constraint reasoning for lengths
+- **StrManager** — string state tracking, canonicalization
+
+## Approach
+
+Build `theory_nseq` as a new theory plugin in `src/smt/`, following the same patterns as `theory_seq`. Reuse existing Z3 infrastructure (seq_decl_plugin, seq_rewriter, seq_skolem, arith_value) and add new algorithm-specific modules. Self-contained rewriter utilities go in `src/ast/rewriter/`.
+
+---
+
+## Module Breakdown
+
+### Phase 1: Skeleton & Integration
+
+#### 1.1 theory_nseq skeleton (`src/smt/theory_nseq.h`, `src/smt/theory_nseq.cpp`)
+- New class `theory_nseq : public theory`
+- Implement all required virtual methods from `smt::theory`:
+  - `internalize_atom`, `internalize_term`
+  - `new_eq_eh`, `new_diseq_eh`, `assign_eh`
+  - `can_propagate`, `propagate`
+  - `final_check_eh`
+  - `push_scope_eh`, `pop_scope_eh`
+  - `mk_fresh`, `get_name` (return `"nseq"`)
+  - `mk_value`, `init_model`, `finalize_model`
+  - `display`, `collect_statistics`
+- Initially stub all methods (FC_GIVEUP on final_check)
+- Use `seq_util` and `arith_util` from existing infrastructure
+- Hold references to `seq_rewriter`, `seq::skolem`, `arith_value`
+
+#### 1.2 Parameter integration
+- **`src/params/smt_params_helper.pyg`**: Add `'nseq'` to string_solver options documentation
+- **`src/params/smt_params.cpp`**: Add `"nseq"` to `validate_string_solver()`
+- **`src/smt/smt_setup.cpp`**: Add `"nseq"` case in `setup_QF_S()` and `setup_seq_str()`, calling new `setup_nseq()` method
+- **`src/smt/smt_setup.h`**: Declare `setup_nseq()`
+
+#### 1.3 Build system integration
+- **`src/smt/CMakeLists.txt`**: Add `theory_nseq.cpp` and all new `.cpp` files to SOURCES
+
+### Phase 2: Core Data Structures
+
+#### 2.1 String representation & state management (`src/smt/nseq_state.h/.cpp`)
+- String variable tracking (analogous to ZIPT's StrManager)
+- Concatenation decomposition into token sequences
+- Canonicalization of string terms
+- Variable-to-representative mapping (union-find or solution map)
+- Backtrackable state via `scoped_vector` / trail
+
+#### 2.2 Constraint representation (`src/smt/nseq_constraint.h`)
+- Constraint types: StrEq, StrNe, StrMem (regex membership), StrContains, StrPrefix, StrSuffix
+- IntEq, IntLe, IntNe for length constraints
+- Dependency tracking for conflict explanations
+- Constraint simplification results
+
+#### 2.3 Dependency tracker (`src/smt/nseq_dep.h/.cpp`)
+- Track justifications for derived facts
+- Integrate with Z3's `scoped_dependency_manager<assumption>` pattern (same as theory_seq)
+- Support linearization for conflict clause generation
+
+### Phase 3: Word Equation Solver (Nielsen Transformations)
+
+#### 3.1 Nielsen graph (`src/ast/rewriter/nseq_nielsen.h/.cpp`)
+- Nielsen transformation graph: nodes represent equation states, edges represent transformations
+- Node: stores a set of simplified word equations
+- Edge: transformation applied (variable elimination, constant matching, split)
+- Graph exploration with cycle detection
+- Self-contained rewriter utility (no dependency on smt context)
+
+#### 3.2 Nielsen node operations
+- **Constant matching**: if both sides start with same constant, strip it
+- **Variable elimination**: if LHS starts with variable x, case-split: x=ε or x=a·x' 
+- **Equation decomposition**: split compound equations
+- **Determinism detection**: identify forced assignments
+- **Directed Nielsen**: apply transformations in a directed manner
+
+#### 3.3 Modifier chain (integrated into nielsen module)
+- DecomposeModifier — decompose strings into concatenation components
+- EqSplitModifier — case analysis on equalities
+- VarNielsenModifier / ConstNielsenModifier / DirectedNielsenModifier
+- StarIntrModifier — Kleene star introduction/elimination
+- PowerSplitModifier / PowerEpsilonModifier
+- RegexCharSplitModifier / RegexVarSplitModifier
+
+### Phase 4: Length Constraint Integration (Parikh Image)
+
+#### 4.1 Length reasoning (`src/ast/rewriter/nseq_length.h/.cpp`)
+- Interval arithmetic for length bounds
+- Length variable management
+- Integration with Z3's arithmetic solver via `arith_value`
+- Parikh image constraints: derive integer constraints from string equations
+- Self-contained rewriter utility
+
+#### 4.2 Arithmetic interface in theory_nseq
+- Query arithmetic solver for current length assignments
+- Propagate length equalities/inequalities derived from string equations
+- Use `arith_value` (same pattern as theory_seq)
+
+### Phase 5: Regex Membership
+
+#### 5.1 Regex handling (`src/smt/nseq_regex.h/.cpp`)
+- Lazy membership checking (per LazyMemberships.pdf)
+- Regex derivative computation (reuse `seq_rewriter.mk_derivative`)
+- Regex splitting for equation solving
+- Integration with Nielsen transformations
+- Nullable checking
+
+### Phase 6: Propagation Engine
+
+#### 6.1 String propagator (`src/smt/nseq_propagator.h/.cpp`)
+- Main propagation loop integrating all components
+- Equation simplification pipeline
+- Constraint interaction (string eq + length + regex)
+- Case splitting decisions
+- Conflict detection and clause generation
+
+#### 6.2 Wire into theory_nseq
+- `internalize_term/atom`: register string terms, create theory variables, enqueue axioms
+- `assign_eh`: process boolean assignments (e.g., regex memberships, contains)
+- `new_eq_eh/new_diseq_eh`: process equality/disequality merges
+- `propagate`: run propagation engine
+- `final_check_eh`: comprehensive consistency check, case splits
+
+### Phase 7: Model Generation
+
+#### 7.1 Model construction
+- Build string values from solved equations
+- Handle unconstrained variables
+- Generate models consistent with length constraints and regex memberships
+- Register `seq_factory` for value generation
+
+---
+
+## Testing Plan
+
+### Unit Tests
+
+#### T1: Basic smoke test (`src/test/nseq_basic.cpp`)
+- Register test in `src/test/main.cpp` via `TST(nseq_basic)`
+- Test that theory_nseq can be selected via `smt.string_solver=nseq`
+- Verify basic sat/unsat on trivial string equalities
+
+#### T2: Nielsen transformation tests
+- Unit tests for the Nielsen graph/node rewriter module
+- Test constant matching, variable elimination, equation decomposition
+
+#### T3: Length reasoning tests
+- Test interval propagation, Parikh image constraint generation
+
+### Integration Tests (via z3 command line)
+
+#### T4: z3test regression benchmarks (local at `C:\z3test`)
+The z3test repository is available locally at `C:\z3test`. It contains the same regression suite used by nightly CI (`.github/workflows/nightly.yml:198`).
+
+**Key string/regex benchmarks in `C:\z3test\regressions\smt2`:**
+- `string-concat.smt2`, `string-eval.smt2`, `string-itos.smt2`, `string-literals.smt2`, `string-ops.smt2`, `string-simplify.smt2`
+- `string3.smt2`, `string6.smt2`, `string7.smt2`, `string9.smt2`, `string11.smt2` through `string14.smt2`
+- `re.smt2`, `re_rewriter.smt2`
+- `testfoldl.smt2`, `testmap.smt2`, `testmapi.smt2` (sequence higher-order)
+- `byte_string.smt2` (disabled)
+
+**Testing approach:**
+1. Run full regression suite: `python C:\z3test\scripts\test_benchmarks.py <z3exe> C:\z3test\regressions\smt2`
+   - The test harness runs each `.smt2` file with `model_validate=true` and compares output to `.expected.out`
+   - Script supports parallel execution (uses all CPU cores by default)
+2. Run string-specific subset with `smt.string_solver=nseq`:
+   - For each `string*.smt2` and `re*.smt2` in `C:\z3test\regressions\smt2`, run: `z3 smt.string_solver=nseq <file>`
+   - Compare output against `.expected.out` files
+3. Also test `C:\z3test\regressions\issues` (8 issue-specific regression tests)
+
+**Other regression directories to verify (non-string, but ensure no regressions):**
+- `C:\z3test\regressions\smt2-extra`
+- `C:\z3test\regressions\smt2-debug`
+
+#### T5: Targeted string SMT2 tests
+Create focused test files exercising specific features:
+- `tests/nseq/eq_basic.smt2` — simple word equations (x·a = a·x)
+- `tests/nseq/eq_nielsen.smt2` — equations requiring Nielsen transformations
+- `tests/nseq/concat.smt2` — concatenation reasoning
+- `tests/nseq/contains.smt2` — str.contains, str.prefixof, str.suffixof
+- `tests/nseq/length.smt2` — length constraints and Parikh image
+- `tests/nseq/regex.smt2` — regex membership
+- `tests/nseq/mixed.smt2` — combinations of string ops with arithmetic
+- `tests/nseq/unsat.smt2` — unsatisfiable benchmarks
+
+#### T6: Comparison testing with parallel runner
+- Use `~/dev/test_scripts/src/run_in_parallel.py` for parallel testing
+- Run the same SMT files with both `smt.string_solver=seq` and `smt.string_solver=nseq`
+- Verify no soundness bugs (sat/unsat agree)
+
+### Performance Testing
+
+#### T7: Performance benchmarks (release builds only)
+- Use SMT-COMP string benchmarks (QF_S, QF_SLIA tracks)
+- Compare solve times between theory_seq and theory_nseq
+- Use `z3 -st` for statistics
+- **Only on release builds** (never debug builds)
+
+---
+
+## File Summary
+
+### New files to create:
+| File | Description |
+|------|-------------|
+| `src/smt/theory_nseq.h` | Theory class header |
+| `src/smt/theory_nseq.cpp` | Theory class implementation |
+| `src/smt/nseq_state.h` | String state management header |
+| `src/smt/nseq_state.cpp` | String state management impl |
+| `src/smt/nseq_constraint.h` | Constraint type definitions |
+| `src/smt/nseq_dep.h` | Dependency tracking header |
+| `src/smt/nseq_dep.cpp` | Dependency tracking impl |
+| `src/smt/nseq_propagator.h` | Propagation engine header |
+| `src/smt/nseq_propagator.cpp` | Propagation engine impl |
+| `src/smt/nseq_regex.h` | Regex handling header |
+| `src/smt/nseq_regex.cpp` | Regex handling impl |
+| `src/ast/rewriter/nseq_nielsen.h` | Nielsen transformation header |
+| `src/ast/rewriter/nseq_nielsen.cpp` | Nielsen transformation impl |
+| `src/ast/rewriter/nseq_length.h` | Length/Parikh reasoning header |
+| `src/ast/rewriter/nseq_length.cpp` | Length/Parikh reasoning impl |
+| `src/test/nseq_basic.cpp` | Unit tests |
+
+### Existing files to modify:
+| File | Change |
+|------|--------|
+| `src/smt/smt_setup.h` | Declare `setup_nseq()` |
+| `src/smt/smt_setup.cpp` | Add `"nseq"` to string_solver dispatch, implement `setup_nseq()` |
+| `src/smt/CMakeLists.txt` | Add new .cpp files |
+| `src/ast/rewriter/CMakeLists.txt` | Add new rewriter .cpp files |
+| `src/params/smt_params_helper.pyg` | Document `nseq` option |
+| `src/params/smt_params.cpp` | Add `"nseq"` to validation |
+| `src/test/main.cpp` | Register `TST(nseq_basic)` |
+| `src/test/CMakeLists.txt` | Add `nseq_basic.cpp` (if needed) |
+
+---
+
+## Implementation Order
+
+1. **Phase 1** (Skeleton & Integration) — get a compilable stub that Z3 can load
+2. **Phase 2** (Core Data Structures) — state management and constraint representation
+3. **Phase 3** (Nielsen Transformations) — core word equation solving
+4. **Phase 4** (Length/Parikh) — arithmetic integration
+5. **Phase 5** (Regex) — regex membership
+6. **Phase 6** (Propagation Engine) — wire everything together
+7. **Phase 7** (Model Generation) — produce satisfying assignments
+8. **Testing** — runs in parallel with each phase; benchmarks after Phase 6
+
+## Notes
+
+- The ZIPT C# reference at `github.com/CEisenhofer/ZIPT` (parikh branch) is the primary algorithm reference
+- LazyMemberships.pdf describes the lazy regex membership approach
+- ClemensTableau.pdf describes additional algorithms
+- Reuse Z3 infrastructure wherever possible: `seq_util`, `seq_rewriter`, `seq_skolem`, `arith_value`, `seq_factory`
+- Self-contained utilities (Nielsen graph, length reasoning) go in `src/ast/rewriter/` per the spec
+- All timing-sensitive builds require 30+ minute timeouts
--- a/spec/reference.md
+++ b/spec/reference.md
@ -0,0 +1,8 @@
+A C# implementation of the procedure is here: https://github.com/CEisenhofer/ZIPT/tree/parikh/ZIPT
+
+A description of the algorithm is in LazyMemberships.pdf
+
+Other algorithms are descried in ClemensTableau.pdf
+
+
+
--- a/src/smt/seq/seq_nielsen.cpp
+++ b/src/smt/seq/seq_nielsen.cpp
@ -762,11 +762,109 @@ namespace seq {
        return r;
    }

-    // Helper: render an snode as an HTML label for DOT output.
+    static std::string regex_expr_html(expr* e, ast_manager& m, seq_util& seq) {
+        if (!e) return "null";
+        expr* a = nullptr, * b = nullptr;
+
+        if (seq.re.is_to_re(e, a)) {
+            zstring s;
+            if (seq.str.is_string(a, s)) {
+                return "\"" + dot_html_escape(s.encode()) + "\"";
+            }
+            std::ostringstream os;
+            os << mk_pp(a, m);
+            return dot_html_escape(os.str());
+        }
+        if (seq.re.is_concat(e)) {
+            app* ap = to_app(e);
+            std::string res;
+            if (ap->get_num_args() == 0) return "()";
+            for (unsigned i = 0; i < ap->get_num_args(); ++i) {
+                if (i > 0) res += " ";
+                bool needs_parens = seq.re.is_union(ap->get_arg(i));
+                if (needs_parens) res += "(";
+                res += regex_expr_html(ap->get_arg(i), m, seq);
+                if (needs_parens) res += ")";
+            }
+            return res;
+        }
+        if (seq.re.is_union(e)) {
+            app* ap = to_app(e);
+            std::string res;
+            if (ap->get_num_args() == 0) return "&#8709;";
+            for (unsigned i = 0; i < ap->get_num_args(); ++i) {
+                if (i > 0) res += " | ";
+                res += regex_expr_html(ap->get_arg(i), m, seq);
+            }
+            return res;
+        }
+        if (seq.re.is_intersection(e)) {
+            app* ap = to_app(e);
+            std::string res;
+            for (unsigned i = 0; i < ap->get_num_args(); ++i) {
+                if (i > 0) res += " &amp; ";
+                bool needs_parens = seq.re.is_union(ap->get_arg(i)) || seq.re.is_concat(ap->get_arg(i));
+                if (needs_parens) res += "(";
+                res += regex_expr_html(ap->get_arg(i), m, seq);
+                if (needs_parens) res += ")";
+            }
+            return res;
+        }
+        if (seq.re.is_star(e, a)) {
+            bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a);
+            std::string res = needs_parens ? "(" : "";
+            res += regex_expr_html(a, m, seq);
+            res += needs_parens ? ")<SUP>*</SUP>" : "<SUP>*</SUP>";
+            return res;
+        }
+        if (seq.re.is_plus(e, a)) {
+            bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a);
+            std::string res = needs_parens ? "(" : "";
+            res += regex_expr_html(a, m, seq);
+            res += needs_parens ? ")<SUP>+</SUP>" : "<SUP>+</SUP>";
+            return res;
+        }
+        if (seq.re.is_opt(e, a)) {
+            bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a);
+            std::string res = needs_parens ? "(" : "";
+            res += regex_expr_html(a, m, seq);
+            res += needs_parens ? ")?" : "?";
+            return res;
+        }
+        if (seq.re.is_complement(e, a)) {
+            bool needs_parens = seq.re.is_union(a) || seq.re.is_concat(a) || seq.re.is_intersection(a);
+            std::string res = "~";
+            res += needs_parens ? "(" : "";
+            res += regex_expr_html(a, m, seq);
+            res += needs_parens ? ")" : "";
+            return res;
+        }
+        if (seq.re.is_range(e, a, b)) {
+            zstring s1, s2;
+            std::string c1 = seq.str.is_string(a, s1) ? dot_html_escape(s1.encode()) : arith_expr_html(a, m);
+            std::string c2 = seq.str.is_string(b, s2) ? dot_html_escape(s2.encode()) : arith_expr_html(b, m);
+            return "[" + c1 + "-" + c2 + "]";
+        }
+        if (seq.re.is_full_char(e)) {
+            return "&#931;"; // Sigma
+        }
+        if (seq.re.is_full_seq(e)) {
+            return "&#931;<SUP>*</SUP>"; // Sigma*
+        }
+        if (seq.re.is_empty(e)) {
+            return "&#8709;"; // empty set
+        }
+
+        std::ostringstream os;
+        os << mk_pp(e, m);
+        return dot_html_escape(os.str());
+    }
+
+    // Helper: render a snode as an HTML label for DOT output.
    // Groups consecutive s_char tokens into quoted strings, renders s_var by name,
    // shows s_power with superscripts, s_unit by its inner expression,
    // and falls back to mk_pp (HTML-escaped) for other token kinds.
-    static std::string snode_label_html(euf::snode const* n, ast_manager& m) {
+    std::string snode_label_html(euf::snode const* n, ast_manager& m) {
        if (!n) return "null";
        seq_util seq(m);

@ -840,6 +938,8 @@ namespace seq {
                expr* exp_expr = to_app(e)->get_arg(1);
                result += arith_expr_html(exp_expr, m);
                result += "</SUP>";
+            } else if (e && seq.is_re(e)) {
+                result += regex_expr_html(e, m, seq);
            } else {
                std::ostringstream os;
                os << mk_pp(e, m);
@ -3397,9 +3497,6 @@ namespace seq {
    // -----------------------------------------------------------------------

    bool nielsen_graph::apply_gpower_intr(nielsen_node* node) {
-        ast_manager& m = m_sg.get_manager();
-        arith_util arith(m);
-
        for (str_eq const& eq : node->str_eqs()) {
            if (eq.is_trivial()) continue;
            if (!eq.m_lhs || !eq.m_rhs) continue;
@ -3468,11 +3565,12 @@ namespace seq {
            if (n % p != 0) continue;
            bool match = true;
            for (unsigned i = p; i < n && match; ++i)
-                match = (ground_prefix_orig[i]->id() == ground_prefix_orig[i % p]->id());
+                match = ground_prefix_orig[i]->id() == ground_prefix_orig[i % p]->id();
            if (match) { period = p; break; }
        }
-        for (unsigned i = 0; i < period; ++i)
+        for (unsigned i = 0; i < period; ++i) {
            ground_prefix.push_back(ground_prefix_orig[i]);
+        }

        // If the compressed prefix is a single power snode, unwrap it to use
        // its base tokens, avoiding nested powers.
--- a/src/smt/seq/seq_nielsen.h
+++ b/src/smt/seq/seq_nielsen.h
@ -494,6 +494,8 @@ namespace seq {
        }
    };

+    std::string snode_label_html(euf::snode const* n, ast_manager& m);
+
    // node in the Nielsen graph
    // mirrors ZIPT's NielsenNode
    class nielsen_node {
--- a/src/smt/seq/seq_regex.cpp
+++ b/src/smt/seq/seq_regex.cpp
@ -519,135 +519,25 @@ namespace seq {
        if (regexes.empty())
            return l_false; // empty intersection = full language (vacuously non-empty)

-        // Quick checks: if any regex is fail/empty, intersection is empty
-        for (euf::snode* re : regexes) {
-            if (!re || !re->get_expr())
-                return l_undef;
-            if (re->is_fail() || is_empty_regex(re))
-                return l_true;
-        }
-
-        // Check if all are nullable (intersection accepts ε)
-        bool all_nullable = true;
-        for (euf::snode* re : regexes) {
-            if (!re->is_nullable()) { all_nullable = false; break; }
-        }
-        if (all_nullable)
-            return l_false;
-
        // Single regex: delegate to is_empty_bfs
        if (regexes.size() == 1)
            return is_empty_bfs(regexes[0], max_states);

-        // Build product BFS. State = tuple of regex snode ids.
-        // Use a map from state hash to visited set.
-        using state_t = svector<unsigned>;
+        seq_util& seq = m_sg.get_seq_util();
+        ast_manager& mgr = m_sg.get_manager();

-        auto state_hash = [](state_t const& s) -> unsigned {
-            unsigned h = 0;
-            for (unsigned id : s)
-                h = h * 31 + id;
-            return h;
-        };
-
-        auto state_eq = [](state_t const& a, state_t const& b) -> bool {
-            if (a.size() != b.size()) return false;
-            for (unsigned i = 0; i < a.size(); ++i)
-                if (a[i] != b[i]) return false;
-            return true;
-        };
-
-        // Use simple set via sorted vector of hashes (good enough for bounded BFS)
-        std::unordered_set<unsigned> visited_hashes;
-
-        struct bfs_state {
-            ptr_vector<euf::snode> regexes;
-        };
-
-        std::vector<bfs_state> worklist;
-        bfs_state initial;
-        initial.regexes.append(regexes);
-        worklist.push_back(std::move(initial));
-
-        state_t init_ids;
-        for (euf::snode* re : regexes)
-            init_ids.push_back(re->id());
-        visited_hashes.insert(state_hash(init_ids));
-
-        unsigned states_explored = 0;
-        bool had_failed = false;
-
-        // Collect alphabet representatives from the intersection of all regexes
-        // (merge boundaries from all)
-        unsigned_vector all_bounds;
-        all_bounds.push_back(0);
-        for (euf::snode* re : regexes)
-            collect_char_boundaries(re, all_bounds);
-        std::sort(all_bounds.begin(), all_bounds.end());
-
-        euf::snode_vector reps;
-        unsigned prev = UINT_MAX;
-        for (unsigned b : all_bounds) {
-            if (b != prev) {
-                reps.push_back(m_sg.mk_char(b));
-                prev = b;
-            }
-        }
-        if (reps.empty())
-            reps.push_back(m_sg.mk_char('a'));
-
-        while (!worklist.empty()) {
-            if (states_explored >= max_states)
+        euf::snode* result = regexes[0];
+        for (unsigned i = 1; i < regexes.size(); ++i) {
+            expr* r1 = result->get_expr();
+            expr* r2 = regexes[i]->get_expr();
+            if (!r1 || !r2) return l_undef;
+            expr_ref inter(seq.re.mk_inter(r1, r2), mgr);
+            result = m_sg.mk(inter);
+            if (!result)
                return l_undef;
-
-            bfs_state current = std::move(worklist.back());
-            worklist.pop_back();
-            ++states_explored;
-
-            for (euf::snode* ch : reps) {
-                ptr_vector<euf::snode> derivs;
-                bool any_fail = false;
-                bool all_null = true;
-                bool deriv_failed = false;
-
-                for (euf::snode* re : current.regexes) {
-                    euf::snode* d = m_sg.brzozowski_deriv(re, ch);
-                    if (!d) { deriv_failed = true; break; }
-                    if (d->is_fail()) { any_fail = true; break; }
-                    if (!d->is_nullable()) all_null = false;
-                    derivs.push_back(d);
-                }
-
-                if (deriv_failed) { had_failed = true; continue; }
-                if (any_fail) continue; // this character leads to empty intersection
-
-                if (all_null)
-                    return l_false; // found an accepting state in the product
-
-                // Check if any component is structurally empty
-                bool any_empty = false;
-                for (euf::snode* d : derivs) {
-                    if (is_empty_regex(d)) { any_empty = true; break; }
-                }
-                if (any_empty) continue;
-
-                // Compute state hash and check visited
-                state_t ids;
-                for (euf::snode* d : derivs)
-                    ids.push_back(d->id());
-                unsigned h = state_hash(ids);
-                if (visited_hashes.count(h) == 0) {
-                    visited_hashes.insert(h);
-                    bfs_state next;
-                    next.regexes.append(derivs);
-                    worklist.push_back(std::move(next));
-                }
-            }
        }

-        if (had_failed)
-            return l_undef;
-        return l_true; // exhausted all states, intersection is empty
+        return is_empty_bfs(result, max_states);
    }

    // -----------------------------------------------------------------------
--- a/src/smt/theory_nseq.cpp
+++ b/src/smt/theory_nseq.cpp
@ -425,7 +425,12 @@ namespace smt {
        // here the actual Nielsen solving happens
        auto result = m_nielsen.solve();

-        
+#ifdef Z3DEBUG
+        // Examining the Nielsen graph is probably the best way of debugging
+        std::string dot = m_nielsen.to_dot();
+        IF_VERBOSE(1, verbose_stream() << dot << "\n";);
+#endif
+
        if (result == seq::nielsen_graph::search_result::unsat) {
            IF_VERBOSE(1, verbose_stream() << "nseq final_check: solve UNSAT\n";);
            explain_nielsen_conflict();
@ -464,8 +469,8 @@ namespace smt {
                if (mem_idx < m_nielsen_to_state_mem.size()) {
                    unsigned state_mem_idx = m_nielsen_to_state_mem[mem_idx];
                    mem_source const& src = m_state.get_mem_source(state_mem_idx);
-                    if (ctx.get_assignment(src.m_lit) == l_true)
-                        lits.push_back(src.m_lit);
+                    SASSERT(ctx.get_assignment(src.m_lit) == l_true);
+                    lits.push_back(src.m_lit);
                }
            }
        }
@ -823,9 +828,8 @@ namespace smt {
                auto& vec = var_to_mems.insert_if_not_there(mem.m_str->id(), unsigned_vector());
                vec.push_back(i);
            }
-            else {
+            else
                all_var_str = false;
-            }
        }

        if (var_to_mems.empty())
@ -855,19 +859,23 @@ namespace smt {
                // jointly unsatisfiable.  Assert a conflict from all their literals.
                enode_pair_vector eqs;
                literal_vector lits;
+                std::cout << "CONFLICT:\n";
                for (unsigned i : mem_indices) {
                    mem_source const& src = m_state.get_mem_source(i);
-                    if (ctx.get_assignment(src.m_lit) == l_true)
-                        lits.push_back(src.m_lit);
+                    SASSERT(ctx.get_assignment(src.m_lit) == l_true); // we already stored the polarity of the literal
+                    lits.push_back(src.m_lit);
+                    std::cout << "\t\t";
+                    std::cout << mk_pp(ctx.literal2expr(src.m_lit), m) << std::endl;
+                    std::cout << "\t\t";
+                    std::cout << src.m_lit << std::endl;
                }
                TRACE(seq, tout << "nseq regex precheck: empty intersection for var "
                                << var_id << ", conflict with " << lits.size() << " lits\n";);
                set_conflict(eqs, lits);
                return l_true;   // conflict asserted
            }
-            else if (result == l_undef) {
+            if (result == l_undef)
                any_undef = true;
-            }
            // l_false = non-empty intersection, this variable's constraints are satisfiable
        }