diff --git a/.github/skills/benchmark/SKILL.md b/.github/skills/benchmark/SKILL.md index cffacde54..1d3494784 100644 --- a/.github/skills/benchmark/SKILL.md +++ b/.github/skills/benchmark/SKILL.md @@ -7,16 +7,37 @@ Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report per # Step 1: Run Z3 with statistics +Action: + Invoke benchmark.py with the formula or file. Use `--runs N` for + repeated timing. + +Expectation: + The script invokes `z3 -st`, parses the statistics block, and prints + a performance summary. A run entry is logged to z3agent.db. + +Result: + Timing and statistics are displayed. Proceed to Step 2 to interpret. + ```bash python3 scripts/benchmark.py --file problem.smt2 python3 scripts/benchmark.py --file problem.smt2 --runs 5 python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug ``` -The script invokes `z3 -st` and parses the `:key value` statistics block. - # Step 2: Interpret the output +Action: + Review wall-clock time, memory usage, conflict counts, and per-theory + breakdowns. + +Expectation: + A complete performance profile including min/median/max timing when + multiple runs are requested. + +Result: + If performance is acceptable, no action needed. + If slow, try **simplify** to reduce the formula or adjust tactic strategies. + The output includes: - wall-clock time (ms) @@ -29,7 +50,17 @@ With `--runs N`, the script runs Z3 N times and reports min/median/max timing. # Step 3: Compare over time -Past benchmark runs are logged to `z3agent.db`. Query them: +Action: + Query past benchmark runs from z3agent.db to detect regressions or + improvements. + +Expectation: + Historical run data is available for comparison, ordered by recency. + +Result: + If performance regressed, investigate recent formula or tactic changes. + If improved, record the successful configuration. + ```bash python3 ../../shared/z3db.py runs --skill benchmark --last 20 python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5" diff --git a/.github/skills/deeptest/SKILL.md b/.github/skills/deeptest/SKILL.md index ead3f5b84..2e2cd747e 100644 --- a/.github/skills/deeptest/SKILL.md +++ b/.github/skills/deeptest/SKILL.md @@ -7,6 +7,20 @@ Given a strategy and count, generate SMT-LIB2 formulas targeting Z3 internals an # Step 1: Choose a strategy and run +Action: + Select a generation strategy and invoke the script with the desired + count and seed. + +Expectation: + The script generates SMT-LIB2 formulas according to the chosen + strategy, runs each through Z3, and records results to z3agent.db. + +Result: + On completion: a summary is printed with formula count, anomaly count, + and elapsed time. Proceed to Step 2. + On early exit: verify the Z3 binary is accessible and review timeout + settings. + ```bash python3 scripts/deeptest.py --strategy random --count 100 --seed 42 python3 scripts/deeptest.py --strategy metamorphic --seed-file base.smt2 --count 50 @@ -23,7 +37,19 @@ Available strategies: # Step 2: Interpret the output -The script prints a summary after completion: +Action: + Review the summary printed after the run completes. + +Expectation: + The summary shows strategy, seed, formula count, anomaly count, and + elapsed time. + +Result: + On zero anomalies: Z3 handled all generated formulas without issue. + On nonzero anomalies: crashes, assertion failures, solver errors, or + result disagreements were detected. Proceed to Step 3 for details. + +Example summary: ``` strategy: random @@ -33,21 +59,38 @@ anomalies: 2 elapsed: 4500ms ``` -A nonzero anomaly count means the run detected crashes (nonzero exit code), assertion failures in stderr, solver errors, or result disagreements between a base formula and its metamorphic variants. - # Step 3: Inspect findings -Findings are logged to `z3agent.db` with category, severity, and details: +Action: + Query z3agent.db for detailed finding records from the run. + +Expectation: + Each finding includes category, severity, message, formula index, exit + code, and a stderr excerpt. + +Result: + Use findings to identify reproducible failure patterns and prioritize + fixes by severity. If a finding appears nondeterministic, proceed to + Step 4 with the same seed to confirm. ```bash python3 ../../shared/z3db.py query "SELECT category, severity, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='deeptest') ORDER BY finding_id DESC LIMIT 20" ``` -Each finding includes the formula index, exit code, and a stderr excerpt for triage. - # Step 4: Reproduce -Use the `--seed` parameter to reproduce a run exactly: +Action: + Re-run the script with the same seed to reproduce the exact sequence + of generated formulas. + +Expectation: + Identical formulas are generated, producing the same anomalies if the + underlying bug persists. + +Result: + On same anomalies: bug confirmed and suitable for a regression test. + On zero anomalies: the issue may be nondeterministic or already fixed; + investigate further before closing. ```bash python3 scripts/deeptest.py --strategy random --count 100 --seed 42 diff --git a/.github/skills/encode/SKILL.md b/.github/skills/encode/SKILL.md index eef343bef..32d5dd1da 100644 --- a/.github/skills/encode/SKILL.md +++ b/.github/skills/encode/SKILL.md @@ -7,7 +7,17 @@ Given a problem description (natural language, pseudocode, or a partial formulat # Step 1: Identify the problem class -Common encodings: +Action: + Determine the SMT theory and variable sorts required by the problem + description. + +Expectation: + A clear mapping from the problem to one of the supported theories + (LIA, LRA, QF_BV, etc.). + +Result: + If the theory is identified, proceed to Step 2. If the problem spans + multiple theories, select the appropriate combined logic. | Problem class | Theory | Typical sorts | |---------------|--------|---------------| @@ -21,17 +31,35 @@ Common encodings: # Step 2: Generate the encoding +Action: + Invoke encode.py with the problem description and desired output format. + +Expectation: + The script produces a complete SMT-LIB2 file or Z3 Python script with + all declarations, constraints, and check-sat commands. + +Result: + For `smtlib2` format: pass the output to **solve**. + For `python` format: execute the script directly. + Proceed to Step 3 for validation. + ```bash python3 scripts/encode.py --problem "Find integers x, y such that x^2 + y^2 = 25 and x > 0" --format smtlib2 python3 scripts/encode.py --problem "Schedule 4 tasks on 2 machines minimizing makespan" --format python ``` -For `--format smtlib2`, the output is a complete .smt2 file ready for the **solve** skill. -For `--format python`, the output is a standalone Z3 Python script. - # Step 3: Validate the encoding -The script checks that the generated formula is syntactically valid by running a quick `z3 -in` parse check (no solving, just syntax). Parse errors are reported with the offending line. +Action: + The script runs a syntax check by piping the output through `z3 -in` + in parse-only mode. + +Expectation: + No parse errors. If errors occur, the offending line is reported. + +Result: + On success: the encoding is ready for **solve**, **prove**, or **optimize**. + On parse error: fix the reported line and re-run. # Parameters diff --git a/.github/skills/explain/SKILL.md b/.github/skills/explain/SKILL.md index 515b51378..b7ffe25af 100644 --- a/.github/skills/explain/SKILL.md +++ b/.github/skills/explain/SKILL.md @@ -7,6 +7,17 @@ Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark # Step 1: Identify the output type +Action: + Determine the category of Z3 output to explain: model, core, + statistics, error, or proof. + +Expectation: + The output type maps to one of the recognized formats in the table below. + +Result: + If the type is ambiguous, use `--type auto` and let the script detect it. + Proceed to Step 2. + | Output contains | Explanation type | |----------------|-----------------| | `(define-fun ...)` blocks | model explanation | @@ -17,16 +28,36 @@ Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark # Step 2: Run the explainer +Action: + Invoke explain.py with the output file or stdin. + +Expectation: + The script auto-detects the output type and produces a structured + plain-language summary. + +Result: + A formatted explanation is printed. If detection fails, re-run with + an explicit `--type` flag. + ```bash python3 scripts/explain.py --file output.txt python3 scripts/explain.py --stdin < output.txt python3 scripts/explain.py --file output.txt --debug ``` -The script auto-detects the output type and produces a structured summary. - # Step 3: Interpret the explanation +Action: + Review the structured explanation for accuracy and completeness. + +Expectation: + Models list each variable with its value and sort. Cores list + conflicting assertions. Statistics show time and memory breakdowns. + +Result: + Use the explanation to answer the user query or to guide the next + skill invocation. + For models: - Each variable is listed with its value and sort - Array and function interpretations are expanded diff --git a/.github/skills/memory-safety/SKILL.md b/.github/skills/memory-safety/SKILL.md index 75a7861c2..8e2eee686 100644 --- a/.github/skills/memory-safety/SKILL.md +++ b/.github/skills/memory-safety/SKILL.md @@ -7,7 +7,22 @@ Build Z3 with compiler-based sanitizer instrumentation, execute the test suite, # Step 1: Configure and build -The script invokes cmake with the appropriate `-fsanitize` flags and builds the `test-z3` target. Each sanitizer uses a separate build directory to avoid flag conflicts. If a prior instrumented build exists with matching flags, only incremental compilation runs. +Action: + Invoke the script with the desired sanitizer flag. The script calls + cmake with the appropriate `-fsanitize` flags and builds the `test-z3` + target. Each sanitizer uses a separate build directory to avoid flag + conflicts. + +Expectation: + cmake configures successfully and make compiles the instrumented binary. + If a prior build exists with matching flags, only incremental compilation + runs. + +Result: + On success: an instrumented `test-z3` binary is ready in the build + directory. Proceed to Step 2. + On failure: verify compiler support for the requested sanitizer flags + and review cmake output. ```bash python3 scripts/memory_safety.py --sanitizer asan @@ -22,7 +37,21 @@ python3 scripts/memory_safety.py --sanitizer asan --skip-build --build-dir build # Step 2: Run and collect -The test binary runs with `halt_on_error=0` so the sanitizer reports all violations rather than aborting on the first. The script parses `ERROR: AddressSanitizer`, `runtime error:`, and `ERROR: LeakSanitizer` patterns from the combined output, extracts source locations where available, and deduplicates by category, file, and line. +Action: + Execute the instrumented test binary with halt_on_error=0 so all + violations are reported rather than aborting on the first. + +Expectation: + The script parses AddressSanitizer, UndefinedBehaviorSanitizer, and + LeakSanitizer patterns from combined output, extracts source locations, + and deduplicates by category/file/line. + +Result: + On `clean`: no violations detected. + On `findings`: one or more violations found, each printed with severity, + category, message, and source location. + On `timeout`: test suite did not finish; increase timeout or investigate. + On `error`: build or execution failed before sanitizer output. ```bash python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug @@ -30,10 +59,20 @@ python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug # Step 3: Interpret results -- `clean`: no sanitizer violations detected. -- `findings`: one or more violations found. Each is printed with severity, category, message, and source location. -- `timeout`: the test suite did not complete within the deadline. Increase the timeout or investigate a possible infinite loop. -- `error`: build or execution failed before sanitizer output could be collected. +Action: + Review printed findings and query z3agent.db for historical comparison. + +Expectation: + Each finding includes severity, category, message, and source location. + The database query returns prior runs for trend analysis. + +Result: + On `clean`: no action required; proceed. + On `findings`: triage by severity and category. Compare against prior + runs to distinguish new regressions from known issues. + On `timeout`: increase the deadline or investigate a possible infinite + loop. + On `error`: inspect build logs before re-running. Query past runs: ```bash diff --git a/.github/skills/optimize/SKILL.md b/.github/skills/optimize/SKILL.md index fc93a7e2c..0414e7e9f 100644 --- a/.github/skills/optimize/SKILL.md +++ b/.github/skills/optimize/SKILL.md @@ -7,7 +7,17 @@ Given a set of constraints and an objective function, find the optimal value. Z3 # Step 1: Formulate the problem -The formula uses the `(minimize ...)` or `(maximize ...)` directives followed by `(check-sat)` and `(get-model)`. +Action: + Write constraints and an objective using `(minimize ...)` or + `(maximize ...)` directives, followed by `(check-sat)` and `(get-model)`. + +Expectation: + A valid SMT-LIB2 formula with at least one optimization directive and + all variables declared. + +Result: + If the formula is well-formed, proceed to Step 2. For multi-objective + problems, list directives in priority order for lexicographic optimization. Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`: ```smtlib @@ -23,6 +33,17 @@ Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`: # Step 2: Run the optimizer +Action: + Invoke optimize.py with the formula or file path. + +Expectation: + The script prints `sat` with the optimal assignment, `unsat`, `unknown`, + or `timeout`. A run entry is logged to z3agent.db. + +Result: + On `sat`: proceed to Step 3 to read the optimal values. + On `unsat` or `timeout`: check constraints for contradictions or simplify. + ```bash python3 scripts/optimize.py --file scheduling.smt2 python3 scripts/optimize.py --formula "" --debug @@ -30,11 +51,17 @@ python3 scripts/optimize.py --formula "" --debug # Step 3: Interpret the output -- `sat` with a model: the optimal assignment satisfying all constraints. -- `unsat`: the constraints are contradictory; no feasible solution exists. -- `unknown` or `timeout`: Z3 could not determine optimality. +Action: + Parse the objective value and satisfying assignment from the output. -The script prints the objective value and the satisfying assignment. +Expectation: + `sat` with a model containing the optimal value, `unsat` indicating + infeasibility, or `unknown`/`timeout`. + +Result: + On `sat`: report the optimal value and assignment. + On `unsat`: the constraints are contradictory, no feasible solution exists. + On `unknown`/`timeout`: relax constraints or try **simplify**. # Parameters diff --git a/.github/skills/prove/SKILL.md b/.github/skills/prove/SKILL.md index a67d57758..60ddc8ea6 100644 --- a/.github/skills/prove/SKILL.md +++ b/.github/skills/prove/SKILL.md @@ -7,7 +7,17 @@ Given a conjecture (an SMT-LIB2 assertion or a natural language claim), determin # Step 1: Prepare the negated formula -Wrap the conjecture in `(assert (not ...))` and append `(check-sat)(get-model)`. +Action: + Wrap the conjecture in `(assert (not ...))` and append + `(check-sat)(get-model)`. + +Expectation: + A complete SMT-LIB2 formula that negates the original conjecture with + all variables declared. + +Result: + If the negation is well-formed, proceed to Step 2. + If the conjecture is natural language, run **encode** first. Example: to prove that `(> x 3)` implies `(> x 1)`: ```smtlib @@ -19,6 +29,18 @@ Example: to prove that `(> x 3)` implies `(> x 1)`: # Step 2: Run the prover +Action: + Invoke prove.py with the conjecture and variable declarations. + +Expectation: + The script prints `valid`, `invalid` (with counterexample), `unknown`, + or `timeout`. A run entry is logged to z3agent.db. + +Result: + On `valid`: proceed to **explain** if the user needs a summary. + On `invalid`: report the counterexample directly. + On `unknown`/`timeout`: try **simplify** first, or increase the timeout. + ```bash python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" ``` @@ -35,9 +57,16 @@ python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" --de # Step 3: Interpret the output -- `valid`: the negation was unsat, so the conjecture holds for all inputs. -- `invalid` followed by a counterexample: the negation was sat; the model shows a concrete assignment where the conjecture fails. -- `unknown` or `timeout`: Z3 could not decide. The conjecture may require auxiliary lemmas or induction. +Action: + Read the prover output to determine validity of the conjecture. + +Expectation: + One of `valid`, `invalid` (with counterexample), `unknown`, or `timeout`. + +Result: + On `valid`: the conjecture holds universally. + On `invalid`: the model shows a concrete counterexample. + On `unknown`/`timeout`: the conjecture may require auxiliary lemmas or induction. # Parameters diff --git a/.github/skills/simplify/SKILL.md b/.github/skills/simplify/SKILL.md index 5803b7148..bab3ad5b6 100644 --- a/.github/skills/simplify/SKILL.md +++ b/.github/skills/simplify/SKILL.md @@ -7,7 +7,16 @@ Given a formula, apply a sequence of Z3 tactics to produce an equivalent but sim # Step 1: Choose tactics -Z3 provides dozens of tactics. Common ones: +Action: + Select a tactic chain from the available Z3 tactics based on the + formula's theory. + +Expectation: + A comma-separated list of tactic names suitable for the formula domain. + +Result: + If unsure, use the default chain: `simplify,propagate-values,ctx-simplify`. + For bitvector formulas, add `bit-blast`. Proceed to Step 2. | Tactic | What it does | |--------|-------------| @@ -22,6 +31,17 @@ Z3 provides dozens of tactics. Common ones: # Step 2: Run simplification +Action: + Invoke simplify.py with the formula and optional tactic chain. + +Expectation: + The script applies each tactic in sequence and prints the simplified + formula. A run entry is logged to z3agent.db. + +Result: + If the output is simpler, pass it to **solve** or **prove**. + If unchanged, try a different tactic chain. + ```bash python3 scripts/simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int" python3 scripts/simplify.py --file formula.smt2 --tactics "simplify,propagate-values,ctx-simplify" @@ -32,7 +52,15 @@ Without `--tactics`, the script applies the default chain: `simplify`, `propagat # Step 3: Interpret the output -The script prints the simplified formula in SMT-LIB2 syntax. Subgoals are printed as separate `(assert ...)` blocks. +Action: + Read the simplified formula output in SMT-LIB2 syntax. + +Expectation: + One or more `(assert ...)` blocks representing equivalent subgoals. + +Result: + A smaller formula indicates successful reduction. Pass the result to + **solve**, **prove**, or **optimize** as needed. # Parameters diff --git a/.github/skills/solve/SKILL.md b/.github/skills/solve/SKILL.md index a7385635b..81293fc35 100644 --- a/.github/skills/solve/SKILL.md +++ b/.github/skills/solve/SKILL.md @@ -7,12 +7,31 @@ Given an SMT-LIB2 formula (or a set of constraints described in natural language # Step 1: Prepare the formula -If the input is already valid SMT-LIB2, use it directly. If it is a natural language description, use the **encode** skill first to produce SMT-LIB2. +Action: + Convert the input to valid SMT-LIB2. If the input is natural language, + use the **encode** skill first. -The formula must include `(check-sat)` at the end. Append `(get-model)` for satisfiable queries or `(get-unsat-core)` when named assertions are used. +Expectation: + A syntactically valid SMT-LIB2 formula ending with `(check-sat)` and + either `(get-model)` or `(get-unsat-core)` as appropriate. + +Result: + If valid SMT-LIB2 is ready, proceed to Step 2. + If encoding is needed, run **encode** first and return here. # Step 2: Run Z3 +Action: + Invoke solve.py with the formula string or file path. + +Expectation: + The script pipes the formula to `z3 -in`, logs the run to + `.z3-agent/z3agent.db`, and prints the result. + +Result: + Output is one of `sat`, `unsat`, `unknown`, or `timeout`. + Proceed to Step 3 to interpret. + ```bash python3 scripts/solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)" ``` @@ -27,14 +46,20 @@ With debug tracing: python3 scripts/solve.py --file problem.smt2 --debug ``` -The script pipes the formula to `z3 -in` via subprocess (no shell expansion), logs the run to `.z3-agent/z3agent.db`, and prints the result. - # Step 3: Interpret the output -- `sat` followed by a model: the formula is satisfiable; the model assigns concrete values to each declared constant. -- `unsat`: no assignment exists. If `(get-unsat-core)` was used, the conflicting named assertions are listed. -- `unknown`: Z3 could not decide within the timeout. Consider increasing the timeout or simplifying the formula. -- `timeout`: the process was killed after the deadline. Try the **simplify** skill to reduce complexity. +Action: + Parse the Z3 output to determine satisfiability and extract any model + or unsat core. + +Expectation: + `sat` with a model, `unsat` optionally with a core, `unknown`, or + `timeout`. + +Result: + On `sat`: report the model to the user. + On `unsat`: report the core if available. + On `unknown`/`timeout`: try **simplify** or increase the timeout. # Parameters diff --git a/.github/skills/static-analysis/SKILL.md b/.github/skills/static-analysis/SKILL.md index 566999813..0fd9018da 100644 --- a/.github/skills/static-analysis/SKILL.md +++ b/.github/skills/static-analysis/SKILL.md @@ -7,28 +7,63 @@ Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plis # Step 1: Run the analysis +Action: + Invoke the script pointing at the CMake build directory. The script + runs `scan-build cmake ..` followed by `scan-build make` and writes + checker output to the output directory. + +Expectation: + scan-build completes within the timeout, producing plist diagnostic + files in the output directory (defaults to a `scan-results` subdirectory + of the build directory). + +Result: + On success: diagnostics are parsed and findings are printed. Proceed to + Step 2. + On failure: verify that clang and scan-build are installed and that the + build directory contains a valid CMake configuration. + ```bash python3 scripts/static_analysis.py --build-dir build python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug python3 scripts/static_analysis.py --build-dir build --timeout 1800 ``` -The script invokes `scan-build cmake ..` followed by `scan-build make` inside the specified build directory. Clang checker output is written to `--output-dir` (defaults to a `scan-results` subdirectory of the build directory). - # Step 2: Interpret the output -Each finding is printed with its source location, category, and description: +Action: + Review the printed findings and the summary table grouped by category. + +Expectation: + Each finding shows its source location, category, and description. + The summary table ranks categories by frequency for quick triage. + +Result: + On zero findings: the codebase passes all enabled static checks. + On findings: prioritize by category frequency and severity. Address + null dereferences and use-after-free classes first. + +Example output: ``` [Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read [Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer ``` -A summary table groups findings by category so that high-frequency classes are visible at a glance. - # Step 3: Review historical findings -All findings are logged to `z3agent.db`. Query them to track trends: +Action: + Query z3agent.db to compare current results against prior analysis + runs. + +Expectation: + Queries return category counts and run history, enabling regression + detection across commits. + +Result: + On stable or decreasing counts: no regressions introduced. + On increased counts: cross-reference new findings with recent commits + to identify the responsible change. ```bash python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"