3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2025-08-02 01:13:18 +00:00

Regex solver updates (#4636)

* std::cout debugging statements

* comment out std::cout debugging as this is now a shared fork

* convert std::cout to TRACE statements for seq_rewriter and seq_regex

* add cases to min_length and max_length for regexes

* bug fix

* update min_length and max_length functions for REs

* initial pass on simplifying derivative normal forms by eliminating redundant predicates locally

* add seq_regex_brief trace statements

* working on debugging ref count issue

* fix ref count bug and convert trace statements to seq_regex_brief

* add compact tracing for cache hits/misses

* seq_regex fix cache hit/miss tracing and wrapper around is_nullable

* minor

* label and disable more experimental changes for testing

* minor documentation / tracing

* a few more @EXP annotations

* dead state elimination skeleton code

* progress on dead state elimination

* more progress on dead state elimination

* refactor dead state class to separate self-contained state_graph class

* finish factoring state_graph to only work with unsigned values, and implement separate functionality for expr* logic

* implement get_all_derivatives, add debug tracing

* trace statements for debugging is_nullable loop bug

* fix is_nullable loop bug

* comment out local nullable change and mark experimental

* pretty printing for state_graph

* rewrite state graph to remove the fragile assumption that all edges from a state are added at a time

* start of general cycle detection check + fix some comments

* implement full cycle detection procedure

* normalize derivative conditions to form 'ele <= a'

* order derivative conditions by character code

* fix confusing names m_to and m_from

* assign increasing state IDs from 1 instead of using get_id on AST node

* remove elim_condition call in get_dall_derivatives

* use u_map instead of uint_map to avoid memory leak

* remove unnecessary call to is_ground

* debugging

* small improvements to seq_regex_brief tracing

* fix bug on evil2 example

* save work

* new propagate code

* work in progress on using same seq sort for deriv calls

* avoid re-computing derivatives: use same head var for every derivative call

* use min_length on regexes to prune search

* simple implementation of can_be_in_cycle using rank function idea

* add a disabled experimental change

* minor cleanup comments, etc.

* seq_rewriter cleanup for PR

* typo noticed by Nikolaj

* move state graph to util/state_graph

* re-add accidentally removed line

* clean up seq_regex code removing obsolete functions and comments

* a few more cleanup items

* oops, missed merge change to fix compilation

* disabled change to lift unions to the top level and treat them seperately in seq_regex solver

* added get_overapprox_regex to over-approximate regex membership constraints

* replace calls to is_epsilon with a centrally available method in seq_decl_plugin

* simplifications and modifications in get_overapprox_regex and related

* added approximation support for sequence expressions that use ite

* removed is_app check that was redundant

* tweak differences with upstream

* rewrite derivative leaves

* enable Antimorov-style derivatives via lifting unions in the solver

* TODO placeholders for outputting state graph

* change order in seq_regex propagate_in_re

* implement a more restricted form of Antimorov derivatives via a special op code to indicate lifting unions

* minor

* new Antimorov optimizations based on BDD compatibility checking

* seq regex tracing for # of derivatives

* fix get_cofactors (currently this fix is buggy)

* partially revert get_cofactors buggy change

* re-implement get_cofactors to more efficiently explore nodes in the derivative expression

* dgml generation for state graph

* fix release build

* improved dgml output

* bug fixes in dgml generation

* dot output support for state_graph and moved dgml and dot output under CASSERT

* updated tracing of what regex corresponds to what state id with /tr:state_graph

* clean up & document Antimorov derivative support

* remove op cache tracing

* remove re_rank experimental idea

* small fix

* fix Antimorov derivative (important change for the good performance)

* remove unused and unnecessary code

* implemented simpler efficient get_cofactors alternative mk_deriv_accept

* simplifications in propagate_accept, and trace unusual cases

* document the various seq_regex tracing & debugging command-line options

* fix debug build (broken tracing)

* guard eager Antimorov lifting for possible disabling

* fix bug in propagate_accept Rule 1

* disable eager version of Antimorov lifting for performance reasons

* remove some remaining obsolete comments

Co-authored-by: calebstanford-msr <t-casta@microsoft.com>
Co-authored-by: Margus Veanes <margus@microsoft.com>
This commit is contained in:
Caleb Stanford 2020-08-13 15:47:36 -04:00 committed by GitHub
parent 9df6c10ad8
commit 2c02264a94
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 556 additions and 140 deletions

View file

@ -23,6 +23,71 @@ Author:
#include "smt/smt_context.h"
#include "smt/seq_skolem.h"
/*
*** Tracing and debugging in this module and related modules ***
Tracing and debugging for the regex solver are split across several
command-line flags.
TRACING
-tr:seq_regex and -tr:seq_regex_brief
These are the main tags to trace what the regex solver is doing.
They mostly trace the same things, except that seq_regex_brief
avoids printing out expressions and tries to abbreviate the output
as much as possible. seq_regex_brief shows the following output:
Top-level propagations:
PIR: Propagating an in_re constraint
PE/PNE: Propagating an empty/non-empty constraint
PEQ/PNEQ: Propagating a not-equal constraint
PA: Propagating an accept constraint
In tracing, arguments are generally put in parentheses.
To achieve abbreviated output, expressions are traced in one of two
ways:
id243 (expr ID): the regex or expression with id 243
3 (state ID): the regex with state ID 3
When a regex is newly assigned to a state ID, we print this:
new(id606)=4
Of these, PA is the most important, and traces as follows:
PA(x@i,r): propagate accept for string x at index i, regex r.
(empty), (dead), (blocked), (unfold): info about whether this
PA was cut off early, or unfolded into the derivatives
(next states)
d(r1)=r2: r2 is the derivative of r1
n(r1)=b: b = whether r1 is nullable or not
USG(r): updating state graph for regex r (add all derivatives)
-tr:state_graph
This is the tracing done by util/state_graph, the data structure
that seq_regex uses to track live and dead regexes, which can
altneratively be used to get a high-level picture of what states
are being explored and updated as the solver progresses.
-tr:seq_regex_verbose
Used for some more frequent tracing (in the style of seq_regex,
not in the style of seq_regex_brief)
-tr:seq and -tr:seq_verbose
These are the underlying sequence theory tracing, often used by
the rewriter.
DEBUGGING AND VIEWING STATE GRAPH GRAPHICAL OUTPUT
-dbg:seq_regex
Debugging that checks invariants. Currently, checks that derivative
normal form is correctly preserved in the rewriter.
-dbg:state_graph
Debugging for the state graph, which
1. Checks state graph invariants, and
2. Generates the files .z3-state-graph.dgml and .z3-state-graph.dot
which can be used to visually view the state graph being explored,
during or after executing Z3.
The output can be viewed:
- Using Visual Studio for .dgml
- Using a tool such as xdot (`xdot .z3-state-graph.dot`) for .dot
*/
namespace smt {
class theory_seq;
@ -93,12 +158,13 @@ namespace smt {
expr_ref is_nullable_wrapper(expr* r);
expr_ref derivative_wrapper(expr* hd, expr* r);
void get_cofactors(expr* r, expr_ref_vector& conds, expr_ref_pair_vector& result);
void get_cofactors(expr* r, expr_ref_pair_vector& result) {
expr_ref_vector conds(m);
get_cofactors(r, conds, result);
}
// Various support for unfolding derivative expressions that are
// returned by derivative_wrapper
expr_ref mk_deriv_accept(expr* s, unsigned i, expr* r);
void get_all_derivatives(expr* r, expr_ref_vector& results);
void get_cofactors(expr* r, expr_ref_pair_vector& result);
void get_cofactors_rec(expr* r, expr_ref_vector& conds,
expr_ref_pair_vector& result);
public: