Replace seq::derive's m_intervals + m_intervals_start append-only
char-range stack and the imperative intersect_intervals /
exclude_interval helpers with a single canonical range_predicate
m_path_pred that tracks the feasible character set under the
current path.
* Add range_predicate_translator: pure AST -> range_predicate
translator for the boolean-over-char_le fragment
(true/false, eq with const, char_le with const, not/and/or any
nesting). Returns false on the first sub-term outside the
fragment so the caller can fall back to other reasoning.
* push_intervals_impl: translate the candidate predicate to a
range_predicate and reduce path tracking to set arithmetic
(intersection + subset/empty checks). The legacy top-level
and/or descent is preserved for mixed char / non-char
conditions.
* eval_range_cond: implication becomes subset_of and contradiction
becomes !intersects, both linear in the number of ranges with no
AST allocation.
* range_predicate gains subset_of / intersects / disjoint_from to
support allocation-free path queries.
* path_save now stores the saved range_predicate by value; the
stack switches from svector (CallDestructors=false) to vector
because range_predicate owns an inner svector.
Tests: 91/91 pass with /a, including the new
range_predicate_translator unit test exercising true/false, eq,
char_le, and/or/not, and De Morgan agreement.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The single-ITE branch of hoist_ite was gated by 'm_path_stack.size() < 8'.
When the depth limit was reached, hoist_ite returned nullptr and callers
fell back to non-path-aware structural rewrites (mk_union0 / mk_xor0).
These rewrites simplify e.g. mk_union(empty, X) -> X and return X unchanged,
preserving inner ITE branches that were built at an earlier (less constrained)
path. Subsequent derivative computation never path-prunes those branches,
which can leak unreachable epsilon-leaves into the final t-regex and cause
the bisimulation algorithm to report inequivalence for equivalent regexes.
Concrete trigger: derivatives of unions/xors with >= 9 components produce
9-deep ITE chains; at depth 8 the inner ITE returns unprocessed, leaving an
unreachable epsilon-leaf that bisim mis-interprets as a distinguishing word.
Removing the guard restores soundness. The corpus run against
regex-equivalence (1523 files) fixes 22 triangulated soundness bugs
(mut_0013, mut_0241, mut_0257, mut_0301 among others) with zero regressions.
89/89 unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
`seq::derive::operator()(ele, r)` looks up `m_top_cache` keyed only by the
regex `r`, but on a miss it used to set `m_ele = ele` (a concrete char)
before calling `derive_rec(r)`. The resulting ITE-tree contained
constant-folded `(= ele c)` conditions, so the "symbolic" derivative
stored in the cache was actually specialized to that one ele. Subsequent
calls with the same `r` but a different ele hit the stale cached answer
and the substitution at the bottom was a no-op (no `v0` left to replace).
Simplest victim:
(str.in_re "aP" (re.++ (re.* "a") "P"))
returned `unsat`. The first call D_'a'(a*P) computed `a*P` and cached it
under key `a*P`; the next call D_'P'(a*P) hit that cache entry and
returned `a*P` instead of epsilon, so the membership check ended on a
non-nullable state.
Fix: set `m_ele = v` (the canonical fresh var) so the derivative is
genuinely symbolic. Concrete-ele callers go through the existing
substitution at the bottom of `operator()`.
Adds a regression test in src/test/seq_regex_bisim.cpp checking that
D_'a'(a*P) is not nullable while D_'P'(a*P) is.
Note: this is independent of the mut_0013 bisim-level unsoundness;
that case still fails and is being tracked separately.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The intersect_intervals routine in seq_derive.cpp maintains a path-tracking
disjoint union of character intervals. When intersecting the active suffix
with a new constraint [lo, hi], it iterated the suffix and, on encountering
the first interval disjoint from [lo, hi], reset the output cursor to the
end-marker and broke out of the loop. This both threw away the intervals it
had already kept and skipped every remaining interval, so e.g.
[(0, 96), (98, max)] intersected with [98, 98] became empty instead of
[(98, 98)].
Inside derive that silently killed valid branches of symbolic derivatives.
For example D(a|b) collapsed to ite(c='a', eps, empty) -- the 'b' branch
vanished -- which made the bisimulation procedure conclude bogus equalities
such as a* == (a|b)*. On the regex-equivalence corpus this single bug
accounted for ~510 false-unsat results vs master.
Fix: drop only the disjoint interval and continue scanning the rest of the
suffix. Add a small assertion-based regression test that builds D(a|b),
checks both branches survive, and runs bisim on a* vs (a|b)*.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce exclusion intervals alongside the existing path-based condition
tracking in simplify_ite_rec. The intervals track which character values
are still possible at each point in the ITE tree, enabling simplification
of nested range conditions that the per-entry path approach cannot handle.
Key additions:
- intervals_t type and push_intervals() to maintain live character ranges
- eval_range_cond() checks AND-of-char_le conditions against intervals
- intersect_intervals/exclude_interval utilities from seq_rewriter pattern
- Negated AND handling: ¬(lo<=x ∧ x<=hi) excludes [lo,hi] from intervals
The interval check runs before the existing eval_path_cond logic, catching
cases like: if(0<=x<=10, t, if(1<=x<=8, t2, e2)) → if(0<=x<=10, t, e2)
where the inner range [1,8] is fully contained in the excluded outer range.
Fixes remaining regression timeouts on 5728 P2 and 5731 P4.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Simplify trivial range bounds in derive_range: when lo=0, omit
the lo<=x condition; when hi=max_char, omit the x<=hi condition.
Full charset ranges return epsilon directly.
- Add char_le(0,x)=true and char_le(x,max)=true to eval_cond for
always-valid bounds.
- Add range implication logic to simplify_ite_rec: when path has
negated/positive char_le constraints, detect implied or contradicted
char_le conditions (e.g., ¬(x<=127) implies 128<=x).
- Add is_subset(a, .+) check: non-nullable regexes are subsets of .+
- In update_state_graph, skip recursive exploration of nullable targets
to avoid state explosion.
These fixes resolve timeouts on 5724 (all problems), 5721 P1, and 5693.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add top-level cache (m_top_cache) to ensure stable AST node identity
across repeated derivative calls, preventing state graph divergence
- Add get_head_tail helper for derive_to_re with str.is_unit/str.is_concat
- Add ITE hoisting in mk_union/mk_inter to keep ITEs at top level
- Add De Morgan rule in mk_complement: ~(A∪B) → ~A ∩ ~B
- Add ~ε → .+ simplification in mk_complement
- Add prefix factoring: a·x ∪ a·y = a·(x∪y) and a·x ∩ a·y = a·(x∩y)
- Add r* ∩ .+ = r+ special case in mk_inter
- Enhance is_subset with union/intersection distributivity and complement
- Remove De Morgan from mk_inter to prevent infinite recursion loop
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add lightweight structural is_subset for union/inter simplification
- Use m.is_value instead of is_const_char for swap checks
- Move eval_cond to beginning of simplify_ite_rec
- Use path.shrink(sz) instead of copying extended_path
- Fix normalize_reverse stuck case to return mk_reverse(r)
- Expose subsumes() in public API
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Cache now indexes by (ele, r) pair using obj_pair_map
- Remove eval() function; operator()(ele, r) handles all cases
- Rewrite simplify_ite_rec with path vector of signed conditions
- Add range-based simplification: (lo <= x, false) + (x <= hi, false)
eliminates ite(x = v, t, e) when v is outside [lo, hi]
- Add is_itos case in derive_to_re: guards on n >= 0, digit range,
and first character match
- Port is_reverse normalization (previous commit)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of treating reverse(r) as stuck (returning symbolic mk_derivative),
normalize it by pushing reverse inward through the regex structure, then
compute the derivative of the normalized result. Mirrors mk_re_reverse logic.
Handles: concat, union, intersection, diff, ite, opt, complement, star,
plus, loop, to_re (string literals, units, concats), and symmetric cases.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>