mirror of
https://github.com/Z3Prover/z3
synced 2026-03-14 09:09:58 +00:00
Add Copilot skill architecture with 10 skills, 2 agents, and shared infra
Introduce .github/skills/ with solve, prove, optimize, simplify, encode, explain, benchmark, memory-safety, static-analysis, and deeptest skills. Each skill follows a SKILL.md + scripts/ pattern with Python scripts backed by a shared SQLite logging library (z3db.py). Two orchestrator agents (z3-solver, z3-verifier) route requests to the appropriate skills. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
parent
1cba7cb5ee
commit
d349b93d1d
25 changed files with 2784 additions and 0 deletions
129
.github/agents/z3-solver.md
vendored
Normal file
129
.github/agents/z3-solver.md
vendored
Normal file
|
|
@ -0,0 +1,129 @@
|
|||
---
|
||||
name: z3-solver
|
||||
description: 'Z3 theorem prover assistant: satisfiability checking, validity proofs, optimization, simplification, encoding, and performance analysis.'
|
||||
---
|
||||
|
||||
## Instructions
|
||||
|
||||
You are the Z3 Solver Agent, a Copilot agent for SMT solving workflows using the Z3 theorem prover. You help users formulate, solve, optimize, and interpret constraint satisfaction problems. Follow the workflow below. Use subagents for long-running skill invocations such as benchmarking.
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Understand the Request**: Determine what the user needs: a satisfiability check, a validity proof, an optimization, a simplification, an encoding from natural language, an explanation of output, or a performance analysis.
|
||||
|
||||
2. **Encode (if needed)**: If the user provides a problem in natural language, pseudocode, or a domain-specific formulation, translate it into SMT-LIB2 using the **encode** skill before proceeding.
|
||||
|
||||
3. **Solve or Transform**: Route to the appropriate skill based on the request type. Multiple skills may be chained when the task requires it (for example, encoding followed by optimization followed by explanation).
|
||||
|
||||
4. **Explain Results**: After solving, invoke **explain** to present the result in clear, human-readable language. Always interpret models, proofs, and optimization results for the user.
|
||||
|
||||
5. **Iterate**: On follow-up queries, refine the formulation or re-invoke skills with adjusted parameters. Do not re-run the full pipeline when only a narrow adjustment is needed.
|
||||
|
||||
### Available Skills
|
||||
|
||||
| # | Skill | Purpose |
|
||||
|---|-------|---------|
|
||||
| 1 | solve | Check satisfiability of a formula. Extract models when satisfiable. Report unsatisfiable cores when unsat. |
|
||||
| 2 | prove | Establish validity of a formula by checking the negation for unsatisfiability. If the negation is unsat, the original is valid. |
|
||||
| 3 | optimize | Solve constrained optimization problems. Supports minimize and maximize objectives, lexicographic and Pareto modes. |
|
||||
| 4 | simplify | Apply Z3 tactics to reduce formula complexity. Useful for preprocessing, normal form conversion, and human-readable reformulation. |
|
||||
| 5 | encode | Translate a problem description into SMT-LIB2 syntax. Handles sort selection, quantifier introduction, and theory annotation. |
|
||||
| 6 | explain | Interpret Z3 output (models, unsat cores, proofs, optimization results, statistics) and present it in plain language. |
|
||||
| 7 | benchmark | Measure solving performance. Collect statistics, compare tactic configurations, identify bottlenecks, and suggest parameter tuning. |
|
||||
|
||||
### Skill Dependencies
|
||||
|
||||
The planner respects these edges:
|
||||
|
||||
```
|
||||
encode --> solve
|
||||
encode --> prove
|
||||
encode --> optimize
|
||||
encode --> simplify
|
||||
solve --> explain
|
||||
prove --> explain
|
||||
optimize --> explain
|
||||
simplify --> explain
|
||||
benchmark --> explain
|
||||
solve --> benchmark
|
||||
optimize --> benchmark
|
||||
```
|
||||
|
||||
Skills on the left must complete before skills on the right when both appear in a pipeline. Independent skills (for example, solve and optimize on separate formulas) may run in parallel.
|
||||
|
||||
### Skill Selection
|
||||
|
||||
Given a user request, select skills as follows:
|
||||
|
||||
- "Is this formula satisfiable?" : `solve`
|
||||
- "Find a model for these constraints" : `solve` then `explain`
|
||||
- "Prove that P implies Q" : `encode` (if needed) then `prove` then `explain`
|
||||
- "Prove this is always true" : `prove` then `explain`
|
||||
- "Optimize this scheduling problem" : `encode` then `optimize` then `explain`
|
||||
- "Minimize cost subject to constraints" : `optimize` then `explain`
|
||||
- "Simplify this expression" : `simplify` then `explain`
|
||||
- "Convert to CNF" : `simplify`
|
||||
- "Translate this problem to SMT-LIB2" : `encode`
|
||||
- "Why is Z3 returning unknown?" : `explain`
|
||||
- "Why is this query slow?" : `benchmark` then `explain`
|
||||
- "Compare these two tactic pipelines" : `benchmark` then `explain`
|
||||
- "What does this model mean?" : `explain`
|
||||
- "Get the unsat core" : `solve` then `explain`
|
||||
|
||||
When the request is ambiguous, prefer the most informative pipeline. For example, "check this formula" should invoke `solve` followed by `explain`, not `solve` alone.
|
||||
|
||||
### Examples
|
||||
|
||||
User: "Is (x > 0 and y > 0 and x + y < 1) satisfiable over the reals?"
|
||||
|
||||
1. **solve**: Assert the conjunction over real-valued variables. Run `(check-sat)`.
|
||||
2. **explain**: If sat, present the model. If unsat, state that no assignment satisfies all three constraints simultaneously.
|
||||
|
||||
User: "Prove that for all integers x, if x^2 is even then x is even."
|
||||
|
||||
1. **encode**: Formalize the statement. Negate it: assert there exists an integer x such that x^2 is even and x is odd.
|
||||
2. **prove**: Check the negation for unsatisfiability.
|
||||
3. **explain**: If unsat, the original statement is valid. Present the reasoning. If sat (counterexample found), report the model and explain why the conjecture fails.
|
||||
|
||||
User: "Schedule five tasks on two machines to minimize makespan."
|
||||
|
||||
1. **encode**: Define integer variables for task assignments and start times. Encode machine capacity, precedence, and non-overlap constraints.
|
||||
2. **optimize**: Minimize the makespan variable subject to the encoded constraints.
|
||||
3. **explain**: Present the optimal schedule, makespan value, and any binding constraints.
|
||||
|
||||
User: "Why is my bitvector query so slow?"
|
||||
|
||||
1. **benchmark**: Run the query with `(set-option :timeout 30000)` and collect statistics via `(get-info :all-statistics)`.
|
||||
2. **explain**: Identify dominant cost centers (conflict count, propagation ratio, theory combination overhead). Suggest tactic or parameter adjustments such as `:blast_full` for bitvectors or increasing the relevancy threshold.
|
||||
|
||||
### Error Handling
|
||||
|
||||
Z3 may return results other than `sat` or `unsat`. Handle each case as follows:
|
||||
|
||||
**unknown**: Z3 could not determine satisfiability within the given resource limits.
|
||||
- Check if a timeout was active. If so, suggest increasing it.
|
||||
- Inspect the reason with `(get-info :reason-unknown)`.
|
||||
- If the reason is "incomplete," the formula may use a theory fragment that Z3 cannot decide. Suggest alternative encodings (for example, replacing nonlinear arithmetic with linearization or bit-blasting).
|
||||
- If the reason is "timeout" or "max-conflicts," suggest parameter tuning: increase `:timeout`, adjust `:smt.relevancy`, or try a different tactic pipeline.
|
||||
|
||||
**error (syntax or sort mismatch)**: The input is malformed.
|
||||
- Report the exact error message from Z3.
|
||||
- Identify the offending declaration or assertion.
|
||||
- Suggest a corrected encoding.
|
||||
|
||||
**error (resource exhaustion)**: Z3 ran out of memory or hit an internal limit.
|
||||
- Suggest simplifying the problem: reduce variable count, eliminate quantifiers where possible, split into subproblems.
|
||||
- Suggest incremental solving with `(push)` / `(pop)` to reuse learned information.
|
||||
|
||||
**unsat with no core requested**: The formula is unsatisfiable but the user may want to understand why.
|
||||
- Offer to re-run with `(set-option :produce-unsat-cores true)` and named assertions to extract a minimal explanation.
|
||||
|
||||
### Notes
|
||||
|
||||
- Always validate SMT-LIB2 syntax before invoking Z3. A malformed input wastes time and produces confusing errors.
|
||||
- Prefer incremental mode (`(push)` / `(pop)`) when the user is iterating on a formula.
|
||||
- Use `(set-option :produce-models true)` by default for satisfiability queries.
|
||||
- Use `(set-option :produce-proofs true)` when the user requests validity proofs.
|
||||
- Collect statistics with `z3 -st` when performance is relevant.
|
||||
- Present models in a readable table format, not raw S-expressions, unless the user requests SMT-LIB2 output.
|
||||
- Never fabricate results. If a skill fails or Z3 produces an unexpected answer, report the raw output and explain what went wrong.
|
||||
131
.github/agents/z3-verifier.md
vendored
Normal file
131
.github/agents/z3-verifier.md
vendored
Normal file
|
|
@ -0,0 +1,131 @@
|
|||
---
|
||||
name: z3-verifier
|
||||
description: 'Z3 code quality agent: memory safety checking, static analysis, and stress testing for the Z3 codebase itself.'
|
||||
---
|
||||
|
||||
## Instructions
|
||||
|
||||
You are the Z3 Verifier Agent, a Copilot agent for code quality and correctness verification of the Z3 theorem prover codebase. You do not solve SMT problems (use **z3-solver** for that). Instead, you detect bugs, enforce code quality, and stress-test Z3 internals. Follow the workflow below. Use subagents for long-running skill invocations such as fuzzing campaigns.
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Identify the Verification Goal**: Determine what the user needs: memory bug detection, static analysis findings, or stress testing results. If the request is broad ("verify this code" or "full verification pass"), run all three skills.
|
||||
|
||||
2. **Build the Target**: Ensure a Z3 build exists with the required instrumentation (sanitizers, debug symbols, coverage). If not, build one before proceeding.
|
||||
|
||||
3. **Run Verification Skills**: Invoke the appropriate skill(s). When running a full verification pass, execute all three skills and aggregate results.
|
||||
|
||||
4. **Report Findings**: Present results sorted by severity. Each finding should include: location (file, function, line), category, severity, and reproduction steps where applicable.
|
||||
|
||||
5. **Iterate**: On follow-ups, narrow scope to specific files, functions, or bug categories. Do not re-run the full pipeline unnecessarily.
|
||||
|
||||
### Available Skills
|
||||
|
||||
| # | Skill | Purpose |
|
||||
|---|-------|---------|
|
||||
| 1 | memory-safety | Build Z3 with AddressSanitizer (ASan), MemorySanitizer (MSan), or UndefinedBehaviorSanitizer (UBSan). Run the test suite under instrumentation to detect memory corruption, use-after-free, buffer overflows, uninitialized reads, and undefined behavior. |
|
||||
| 2 | static-analysis | Run the Clang Static Analyzer over the Z3 source tree. Detects null pointer dereferences, resource leaks, dead stores, logic errors, and API misuse without executing the code. |
|
||||
| 3 | deeptest | Stress-test Z3 with randomized inputs, differential testing against known-good solvers, and targeted fuzzing of parser and solver components. Detects crashes, assertion failures, and correctness regressions. |
|
||||
|
||||
### Skill Dependencies
|
||||
|
||||
```
|
||||
memory-safety (independent)
|
||||
static-analysis (independent)
|
||||
deeptest (independent)
|
||||
```
|
||||
|
||||
All three skills are independent and may run in parallel. None requires the output of another as input. When running a full verification pass, launch all three simultaneously via subagents.
|
||||
|
||||
### Skill Selection
|
||||
|
||||
Given a user request, select skills as follows:
|
||||
|
||||
- "Check for memory bugs" : `memory-safety`
|
||||
- "Run ASan on the test suite" : `memory-safety`
|
||||
- "Find undefined behavior" : `memory-safety` (with UBSan configuration)
|
||||
- "Run static analysis" : `static-analysis`
|
||||
- "Find null pointer bugs" : `static-analysis`
|
||||
- "Check for resource leaks" : `static-analysis`
|
||||
- "Fuzz Z3" : `deeptest`
|
||||
- "Stress test the parser" : `deeptest`
|
||||
- "Run differential testing" : `deeptest`
|
||||
- "Full verification pass" : `memory-safety` + `static-analysis` + `deeptest`
|
||||
- "Verify this pull request" : `memory-safety` + `static-analysis` (scope to changed files)
|
||||
- "Is this change safe?" : `memory-safety` + `static-analysis` (scope to changed files)
|
||||
|
||||
### Examples
|
||||
|
||||
User: "Check for memory bugs in the SAT solver."
|
||||
|
||||
1. **memory-safety**: Build Z3 with ASan enabled (`cmake -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" ..`). Run the SAT solver tests. Collect any sanitizer reports.
|
||||
2. Report findings with stack traces, categorized by bug type (heap-buffer-overflow, use-after-free, stack-buffer-overflow, etc.).
|
||||
|
||||
User: "Run static analysis on src/ast/."
|
||||
|
||||
1. **static-analysis**: Invoke `scan-build` or `clang-tidy` over `src/ast/` with Z3's compile commands database.
|
||||
2. Report findings sorted by severity. Include checker name, file, line, and a brief description of each issue.
|
||||
|
||||
User: "Fuzz the SMT-LIB2 parser."
|
||||
|
||||
1. **deeptest**: Generate randomized SMT-LIB2 inputs targeting the parser. Run Z3 on each input with a timeout. Collect crashes, assertion failures, and unexpected error messages.
|
||||
2. Report crash-inducing inputs with minimized reproduction cases. Classify findings as crashes, assertion violations, or incorrect results.
|
||||
|
||||
User: "Full verification pass before the release."
|
||||
|
||||
1. Launch all three skills in parallel via subagents:
|
||||
- **memory-safety**: Full test suite under ASan and UBSan.
|
||||
- **static-analysis**: Full source tree scan.
|
||||
- **deeptest**: Broad fuzzing campaign across theories (arithmetic, bitvectors, arrays, strings).
|
||||
2. Aggregate all findings. Deduplicate issues that appear in multiple skills (for example, a null dereference found by both static analysis and ASan). Sort by severity: Critical, High, Medium, Low.
|
||||
3. Present a summary table followed by detailed findings.
|
||||
|
||||
### Build Configurations
|
||||
|
||||
Each skill may require a specific build configuration:
|
||||
|
||||
**memory-safety (ASan)**:
|
||||
```bash
|
||||
mkdir build-asan && cd build-asan
|
||||
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_BUILD_TYPE=Debug
|
||||
make -j$(nproc)
|
||||
```
|
||||
|
||||
**memory-safety (UBSan)**:
|
||||
```bash
|
||||
mkdir build-ubsan && cd build-ubsan
|
||||
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=undefined" -DCMAKE_C_FLAGS="-fsanitize=undefined" -DCMAKE_BUILD_TYPE=Debug
|
||||
make -j$(nproc)
|
||||
```
|
||||
|
||||
**static-analysis**:
|
||||
```bash
|
||||
mkdir build-analyze && cd build-analyze
|
||||
scan-build cmake .. -DCMAKE_BUILD_TYPE=Debug
|
||||
scan-build make -j$(nproc)
|
||||
```
|
||||
|
||||
**deeptest**: Uses a standard Release build for performance, with Debug builds reserved for reproducing crashes:
|
||||
```bash
|
||||
mkdir build-fuzz && cd build-fuzz
|
||||
cmake .. -DCMAKE_BUILD_TYPE=Release
|
||||
make -j$(nproc)
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
**Build failure**: If the instrumented build fails, report the compiler errors. Common causes: sanitizer flags incompatible with certain optimization levels, or missing sanitizer runtime libraries.
|
||||
|
||||
**Flaky sanitizer reports**: Some sanitizer findings may be nondeterministic (especially under MSan with uninitialized memory). Re-run flagged tests three times to confirm reproducibility. Mark non-reproducible findings as "intermittent" rather than discarding them.
|
||||
|
||||
**Fuzzing timeouts**: Individual fuzz inputs that cause Z3 to exceed the timeout threshold should be collected separately and reported as potential performance regressions, not crashes.
|
||||
|
||||
**False positives in static analysis**: The Clang Static Analyzer may produce false positives, particularly around custom allocators and reference-counted smart pointers used in Z3. Flag likely false positives but do not suppress them without user confirmation.
|
||||
|
||||
### Notes
|
||||
|
||||
- Sanitizer builds are significantly slower than Release builds. Set timeouts to at least 3x the normal test suite duration.
|
||||
- Store sanitizer reports and fuzzing artifacts in `.z3-verifier/` unless the user specifies otherwise.
|
||||
- When scoping to changed files for pull request verification, use `git diff` to determine the affected source files and limit skill invocations accordingly.
|
||||
- Never suppress or ignore sanitizer findings automatically. Every report should be presented to the user for triage.
|
||||
- Prefer ASan as the default sanitizer. It catches the broadest class of memory errors with the lowest false-positive rate.
|
||||
74
.github/skills/README.md
vendored
Normal file
74
.github/skills/README.md
vendored
Normal file
|
|
@ -0,0 +1,74 @@
|
|||
# Z3 Agent Skills
|
||||
|
||||
Reusable, composable verification primitives for the Z3 theorem prover.
|
||||
Each skill is a self-contained unit: a `SKILL.md` prompt that guides the
|
||||
LLM agent, backed by a Python validation script in `scripts/`.
|
||||
|
||||
## Skill Catalogue
|
||||
|
||||
| Skill | Status | Description |
|
||||
|-------|--------|-------------|
|
||||
| solve | implemented | Check satisfiability of SMT-LIB2 formulas; return models or unsat cores |
|
||||
| prove | implemented | Prove validity by negation and satisfiability checking |
|
||||
| encode | implemented | Translate constraint problems into SMT-LIB2 or Z3 Python API code |
|
||||
| simplify | implemented | Reduce formula complexity using configurable Z3 tactic chains |
|
||||
| optimize | implemented | Solve constrained optimization (minimize/maximize) over numeric domains |
|
||||
| explain | implemented | Parse and interpret Z3 output: models, cores, statistics, errors |
|
||||
| benchmark | implemented | Measure Z3 performance and collect solver statistics |
|
||||
| static-analysis | planned | Run Clang Static Analyzer on Z3 source and log structured findings |
|
||||
| deeptest | planned | Deep property-based testing of Z3 internals |
|
||||
| memory-safety | planned | Memory safety verification of Z3 C++ source |
|
||||
|
||||
## Agents
|
||||
|
||||
Two orchestration agents compose these skills into end-to-end workflows:
|
||||
|
||||
| Agent | Role |
|
||||
|-------|------|
|
||||
| z3-solver | Formulation and solving: encode, solve, prove, simplify, optimize, explain |
|
||||
| z3-verifier | Codebase quality: benchmark, static-analysis, deeptest, memory-safety |
|
||||
|
||||
## Shared Infrastructure
|
||||
|
||||
All scripts share a common library at `shared/z3db.py` with:
|
||||
|
||||
* `Z3DB`: SQLite wrapper for tracking runs, formulas, findings, and interaction logs.
|
||||
* `run_z3()`: Pipe SMT-LIB2 into `z3 -in` with timeout handling.
|
||||
* `find_z3()`: Locate the Z3 binary across build directories and PATH.
|
||||
* Parsers: `parse_model()`, `parse_stats()`, `parse_unsat_core()`.
|
||||
|
||||
The database schema lives in `shared/schema.sql`.
|
||||
|
||||
## Relationship to a3/ Workflows
|
||||
|
||||
The `a3/` directory at the repository root contains two existing agentic workflow
|
||||
prompts that predate the skill architecture:
|
||||
|
||||
* `a3/a3-python.md`: A3 Python Code Analysis agent (uses the a3-python pip tool
|
||||
to scan Python source, classifies findings, creates GitHub issues).
|
||||
* `a3/a3-rust.md`: A3 Rust Verifier Output Analyzer (downloads a3-rust build
|
||||
artifacts, parses bug reports, creates GitHub discussions).
|
||||
|
||||
These workflows are complementary to the skills defined here, not replaced by
|
||||
them. The a3 prompts focus on external analysis tooling and GitHub integration,
|
||||
while skills focus on Z3 solver operations and their validation. Both may be
|
||||
composed by the same orchestrating agent.
|
||||
|
||||
## Usage
|
||||
|
||||
Check database status and recent runs:
|
||||
|
||||
```
|
||||
python shared/z3db.py status
|
||||
python shared/z3db.py runs --skill solve --last 5
|
||||
python shared/z3db.py log --run-id 12
|
||||
python shared/z3db.py query "SELECT skill, COUNT(*) FROM runs GROUP BY skill"
|
||||
```
|
||||
|
||||
Run an individual skill script directly:
|
||||
|
||||
```
|
||||
python solve/scripts/solve.py --file problem.smt2
|
||||
python encode/scripts/encode.py --validate formula.smt2
|
||||
python benchmark/scripts/benchmark.py --file problem.smt2
|
||||
```
|
||||
48
.github/skills/benchmark/SKILL.md
vendored
Normal file
48
.github/skills/benchmark/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
---
|
||||
name: benchmark
|
||||
description: Measure Z3 performance on a formula or file. Collects wall-clock time, theory solver statistics, memory usage, and conflict counts. Results are logged to z3agent.db for longitudinal tracking.
|
||||
---
|
||||
|
||||
Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report performance characteristics. This is useful for identifying performance regressions, comparing tactic strategies, and profiling theory solver workload distribution.
|
||||
|
||||
# Step 1: Run Z3 with statistics
|
||||
|
||||
```bash
|
||||
python3 scripts/benchmark.py --file problem.smt2
|
||||
python3 scripts/benchmark.py --file problem.smt2 --runs 5
|
||||
python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug
|
||||
```
|
||||
|
||||
The script invokes `z3 -st` and parses the `:key value` statistics block.
|
||||
|
||||
# Step 2: Interpret the output
|
||||
|
||||
The output includes:
|
||||
|
||||
- wall-clock time (ms)
|
||||
- result (sat/unsat/unknown/timeout)
|
||||
- memory usage (MB)
|
||||
- conflicts, decisions, propagations
|
||||
- per-theory breakdown (arithmetic, bv, array, etc.)
|
||||
|
||||
With `--runs N`, the script runs Z3 N times and reports min/median/max timing.
|
||||
|
||||
# Step 3: Compare over time
|
||||
|
||||
Past benchmark runs are logged to `z3agent.db`. Query them:
|
||||
```bash
|
||||
python3 ../../shared/z3db.py runs --skill benchmark --last 20
|
||||
python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5"
|
||||
```
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula |
|
||||
| file | path | no | | path to .smt2 file |
|
||||
| runs | int | no | 1 | number of repeated runs for timing |
|
||||
| timeout | int | no | 60 | seconds per run |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
74
.github/skills/benchmark/scripts/benchmark.py
vendored
Normal file
74
.github/skills/benchmark/scripts/benchmark.py
vendored
Normal file
|
|
@ -0,0 +1,74 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
benchmark.py: measure Z3 performance with statistics.
|
||||
|
||||
Usage:
|
||||
python benchmark.py --file problem.smt2
|
||||
python benchmark.py --file problem.smt2 --runs 5
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import statistics
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_stats, setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="benchmark")
|
||||
parser.add_argument("--formula")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--runs", type=int, default=1)
|
||||
parser.add_argument("--timeout", type=int, default=60)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
formula = args.formula
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
timings = []
|
||||
|
||||
for i in range(args.runs):
|
||||
run_id = db.start_run("benchmark", formula)
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
|
||||
args=["-st"], debug=args.debug)
|
||||
|
||||
stats = parse_stats(result["stdout"])
|
||||
db.log_formula(run_id, formula, result["result"], stats=stats)
|
||||
db.finish_run(run_id, result["result"], result["duration_ms"],
|
||||
result["exit_code"])
|
||||
timings.append(result["duration_ms"])
|
||||
|
||||
if args.runs == 1:
|
||||
print(f"result: {result['result']}")
|
||||
print(f"time: {result['duration_ms']}ms")
|
||||
if stats:
|
||||
print("statistics:")
|
||||
for k, v in sorted(stats.items()):
|
||||
print(f" :{k} {v}")
|
||||
|
||||
if args.runs > 1:
|
||||
print(f"runs: {args.runs}")
|
||||
print(f"min: {min(timings)}ms")
|
||||
print(f"median: {statistics.median(timings):.0f}ms")
|
||||
print(f"max: {max(timings)}ms")
|
||||
print(f"result: {result['result']}")
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
70
.github/skills/deeptest/SKILL.md
vendored
Normal file
70
.github/skills/deeptest/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
---
|
||||
name: deeptest
|
||||
description: Generate stress tests and differential tests for Z3 theories. Creates random or structured SMT-LIB2 formulas, runs them through Z3, and checks for crashes, assertion failures, or result inconsistencies. Inspired by fuzzing and metamorphic testing approaches applied to SMT solvers.
|
||||
---
|
||||
|
||||
Given a strategy and count, generate SMT-LIB2 formulas targeting Z3 internals and report anomalies. Strategies range from pure random generation to structured metamorphic and cross-theory combinations. Every formula and finding is logged to z3agent.db.
|
||||
|
||||
# Step 1: Choose a strategy and run
|
||||
|
||||
```bash
|
||||
python3 scripts/deeptest.py --strategy random --count 100 --seed 42
|
||||
python3 scripts/deeptest.py --strategy metamorphic --seed-file base.smt2 --count 50
|
||||
python3 scripts/deeptest.py --strategy cross-theory --theories "LIA,BV" --count 80
|
||||
python3 scripts/deeptest.py --strategy incremental --count 60 --debug
|
||||
```
|
||||
|
||||
Available strategies:
|
||||
|
||||
- `random`: generate formulas with random declarations (Int, Bool, BitVec), random arithmetic and boolean assertions, and check-sat.
|
||||
- `metamorphic`: start from a base formula (generated or loaded from file), apply equisatisfiable transformations (tautology insertion, double negation, assertion duplication), and verify the result stays consistent.
|
||||
- `cross-theory`: combine multiple theories (LIA, Bool, BV) in a single formula with bridging constraints to stress theory combination.
|
||||
- `incremental`: generate push/pop sequences with per-frame assertions to stress incremental solving.
|
||||
|
||||
# Step 2: Interpret the output
|
||||
|
||||
The script prints a summary after completion:
|
||||
|
||||
```
|
||||
strategy: random
|
||||
seed: 42
|
||||
formulas: 100
|
||||
anomalies: 2
|
||||
elapsed: 4500ms
|
||||
```
|
||||
|
||||
A nonzero anomaly count means the run detected crashes (nonzero exit code), assertion failures in stderr, solver errors, or result disagreements between a base formula and its metamorphic variants.
|
||||
|
||||
# Step 3: Inspect findings
|
||||
|
||||
Findings are logged to `z3agent.db` with category, severity, and details:
|
||||
|
||||
```bash
|
||||
python3 ../../shared/z3db.py query "SELECT category, severity, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='deeptest') ORDER BY finding_id DESC LIMIT 20"
|
||||
```
|
||||
|
||||
Each finding includes the formula index, exit code, and a stderr excerpt for triage.
|
||||
|
||||
# Step 4: Reproduce
|
||||
|
||||
Use the `--seed` parameter to reproduce a run exactly:
|
||||
|
||||
```bash
|
||||
python3 scripts/deeptest.py --strategy random --count 100 --seed 42
|
||||
```
|
||||
|
||||
The seed is printed in every run summary and logged in the run record.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| strategy | string | no | random | test generation strategy: random, metamorphic, cross-theory, incremental |
|
||||
| count | int | no | 50 | number of formulas to generate |
|
||||
| seed | int | no | clock | random seed for reproducibility |
|
||||
| seed-file | path | no | | base .smt2 file for metamorphic strategy |
|
||||
| theories | string | no | LIA,BV | comma-separated theories for cross-theory strategy |
|
||||
| timeout | int | no | 10 | per-formula Z3 timeout in seconds |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
393
.github/skills/deeptest/scripts/deeptest.py
vendored
Normal file
393
.github/skills/deeptest/scripts/deeptest.py
vendored
Normal file
|
|
@ -0,0 +1,393 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
deeptest.py: generate and run stress tests for Z3.
|
||||
|
||||
Usage:
|
||||
python deeptest.py --strategy random --count 100
|
||||
python deeptest.py --strategy metamorphic --seed-file base.smt2
|
||||
python deeptest.py --strategy cross-theory --theories "LIA,BV" --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import random
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, setup_logging
|
||||
|
||||
log = logging.getLogger("deeptest")
|
||||
|
||||
# Sort and operator tables
|
||||
|
||||
THEORY_SORTS = {
|
||||
"LIA": "Int",
|
||||
"Bool": "Bool",
|
||||
"BV": "(_ BitVec 32)",
|
||||
}
|
||||
|
||||
INT_ARITH = ["+", "-", "*"]
|
||||
INT_CMP = [">", "<", ">=", "<=", "="]
|
||||
BV_ARITH = ["bvadd", "bvsub", "bvand", "bvor", "bvxor"]
|
||||
BV_CMP = ["bvslt", "bvsgt", "bvsle", "bvsge", "="]
|
||||
|
||||
# Assertion generators (one per sort)
|
||||
|
||||
|
||||
def _int_assertion(rng, vs):
|
||||
if len(vs) < 2:
|
||||
return f"(assert ({rng.choice(INT_CMP)} {vs[0]} {rng.randint(-10, 10)}))"
|
||||
a, b = rng.sample(vs, 2)
|
||||
return f"(assert ({rng.choice(INT_CMP)} ({rng.choice(INT_ARITH)} {a} {b}) {rng.randint(-10, 10)}))"
|
||||
|
||||
|
||||
def _bool_assertion(rng, vs):
|
||||
if len(vs) == 1:
|
||||
return f"(assert {vs[0]})" if rng.random() < 0.5 else f"(assert (not {vs[0]}))"
|
||||
a, b = rng.sample(vs, 2)
|
||||
return f"(assert ({rng.choice(['and', 'or', '=>'])} {a} {b}))"
|
||||
|
||||
|
||||
def _bv_assertion(rng, vs):
|
||||
lit = f"(_ bv{rng.randint(0, 255)} 32)"
|
||||
if len(vs) < 2:
|
||||
return f"(assert ({rng.choice(BV_CMP)} {vs[0]} {lit}))"
|
||||
a, b = rng.sample(vs, 2)
|
||||
return f"(assert ({rng.choice(BV_CMP)} ({rng.choice(BV_ARITH)} {a} {b}) {lit}))"
|
||||
|
||||
|
||||
SORT_ASSERTION = {
|
||||
"Int": _int_assertion,
|
||||
"Bool": _bool_assertion,
|
||||
"(_ BitVec 32)": _bv_assertion,
|
||||
}
|
||||
|
||||
|
||||
def _random_assertion(rng, vars_by_sort):
|
||||
"""Pick a populated sort and emit one random assertion."""
|
||||
available = [s for s in vars_by_sort if vars_by_sort[s]]
|
||||
if not available:
|
||||
return None
|
||||
sort = rng.choice(available)
|
||||
return SORT_ASSERTION[sort](rng, vars_by_sort[sort])
|
||||
|
||||
# Formula generators
|
||||
|
||||
|
||||
def gen_random_formula(rng, num_vars=5, num_assertions=5):
|
||||
"""Random declarations, random assertions, single check-sat."""
|
||||
lines = []
|
||||
vars_by_sort = {}
|
||||
sorts = list(THEORY_SORTS.values())
|
||||
|
||||
for i in range(num_vars):
|
||||
sort = rng.choice(sorts)
|
||||
name = f"v{i}"
|
||||
lines.append(f"(declare-const {name} {sort})")
|
||||
vars_by_sort.setdefault(sort, []).append(name)
|
||||
|
||||
for _ in range(num_assertions):
|
||||
a = _random_assertion(rng, vars_by_sort)
|
||||
if a:
|
||||
lines.append(a)
|
||||
|
||||
lines.append("(check-sat)")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def gen_metamorphic_variant(rng, base_formula):
|
||||
"""Apply an equisatisfiable transformation to a formula.
|
||||
|
||||
Transformations:
|
||||
tautology : insert (assert true) before check-sat
|
||||
double_neg : wrap one assertion body in double negation
|
||||
duplicate : repeat an existing assertion
|
||||
"""
|
||||
lines = base_formula.strip().split("\n")
|
||||
transform = rng.choice(["tautology", "double_neg", "duplicate"])
|
||||
assertion_idxs = [i for i, l in enumerate(lines)
|
||||
if l.strip().startswith("(assert")]
|
||||
|
||||
if transform == "tautology":
|
||||
pos = next((i for i, l in enumerate(lines) if "check-sat" in l),
|
||||
len(lines))
|
||||
lines.insert(pos, "(assert true)")
|
||||
|
||||
elif transform == "double_neg" and assertion_idxs:
|
||||
idx = rng.choice(assertion_idxs)
|
||||
body = lines[idx].strip()
|
||||
inner = body[len("(assert "):-1]
|
||||
lines[idx] = f"(assert (not (not {inner})))"
|
||||
|
||||
elif transform == "duplicate" and assertion_idxs:
|
||||
idx = rng.choice(assertion_idxs)
|
||||
lines.insert(idx + 1, lines[idx])
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def gen_cross_theory_formula(rng, theories, num_vars=4, num_assertions=6):
|
||||
"""Combine variables from multiple theories with bridging constraints."""
|
||||
lines = []
|
||||
vars_by_sort = {}
|
||||
sorts = [THEORY_SORTS[t] for t in theories if t in THEORY_SORTS]
|
||||
if not sorts:
|
||||
sorts = list(THEORY_SORTS.values())
|
||||
|
||||
for i in range(num_vars):
|
||||
sort = sorts[i % len(sorts)]
|
||||
name = f"v{i}"
|
||||
lines.append(f"(declare-const {name} {sort})")
|
||||
vars_by_sort.setdefault(sort, []).append(name)
|
||||
|
||||
for _ in range(num_assertions):
|
||||
a = _random_assertion(rng, vars_by_sort)
|
||||
if a:
|
||||
lines.append(a)
|
||||
|
||||
# Bridge Int and Bool when both present
|
||||
int_vs = vars_by_sort.get("Int", [])
|
||||
bool_vs = vars_by_sort.get("Bool", [])
|
||||
if int_vs and bool_vs:
|
||||
iv = rng.choice(int_vs)
|
||||
bv = rng.choice(bool_vs)
|
||||
lines.append(f"(assert (= {bv} (> {iv} 0)))")
|
||||
|
||||
lines.append("(check-sat)")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def gen_incremental_formula(rng, num_frames=3, num_vars=4,
|
||||
asserts_per_frame=3):
|
||||
"""Push/pop sequence: all variables declared globally, assertions scoped."""
|
||||
lines = []
|
||||
vars_by_sort = {}
|
||||
sorts = list(THEORY_SORTS.values())
|
||||
|
||||
for i in range(num_vars):
|
||||
sort = rng.choice(sorts)
|
||||
name = f"v{i}"
|
||||
lines.append(f"(declare-const {name} {sort})")
|
||||
vars_by_sort.setdefault(sort, []).append(name)
|
||||
|
||||
for _ in range(num_frames):
|
||||
lines.append("(push 1)")
|
||||
for _ in range(asserts_per_frame):
|
||||
a = _random_assertion(rng, vars_by_sort)
|
||||
if a:
|
||||
lines.append(a)
|
||||
lines.append("(check-sat)")
|
||||
lines.append("(pop 1)")
|
||||
|
||||
lines.append("(check-sat)")
|
||||
return "\n".join(lines)
|
||||
|
||||
# Anomaly detection
|
||||
|
||||
|
||||
def classify_result(result):
|
||||
"""Return an anomaly category string or None if the result looks normal."""
|
||||
if result["exit_code"] != 0 and result["result"] != "timeout":
|
||||
return "crash"
|
||||
if "assertion" in result["stderr"].lower():
|
||||
return "assertion_failure"
|
||||
if result["result"] == "error":
|
||||
return "error"
|
||||
return None
|
||||
|
||||
# Strategy runners
|
||||
|
||||
|
||||
def run_random(args, rng, db, run_id):
|
||||
anomalies = 0
|
||||
for i in range(args.count):
|
||||
formula = gen_random_formula(rng, rng.randint(2, 8),
|
||||
rng.randint(1, 10))
|
||||
log.debug("formula %d:\n%s", i, formula)
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
|
||||
debug=args.debug)
|
||||
db.log_formula(run_id, formula, result["result"])
|
||||
|
||||
cat = classify_result(result)
|
||||
if cat:
|
||||
anomalies += 1
|
||||
db.log_finding(
|
||||
run_id, cat,
|
||||
f"random formula #{i}: {cat} (exit={result['exit_code']})",
|
||||
severity="high" if cat == "crash" else "medium",
|
||||
details={"formula_index": i,
|
||||
"exit_code": result["exit_code"],
|
||||
"stderr": result["stderr"][:500]})
|
||||
log.warning("anomaly in formula %d: %s", i, cat)
|
||||
return anomalies
|
||||
|
||||
|
||||
def run_metamorphic(args, rng, db, run_id):
|
||||
if args.seed_file:
|
||||
base = Path(args.seed_file).read_text()
|
||||
else:
|
||||
base = gen_random_formula(rng, num_vars=4, num_assertions=3)
|
||||
|
||||
base_out = run_z3(base, z3_bin=args.z3, timeout=args.timeout,
|
||||
debug=args.debug)
|
||||
base_status = base_out["result"]
|
||||
db.log_formula(run_id, base, base_status)
|
||||
log.info("base formula result: %s", base_status)
|
||||
|
||||
if base_status not in ("sat", "unsat"):
|
||||
db.log_finding(run_id, "skip",
|
||||
f"base formula not definite: {base_status}",
|
||||
severity="info")
|
||||
return 0
|
||||
|
||||
anomalies = 0
|
||||
for i in range(args.count):
|
||||
variant = gen_metamorphic_variant(rng, base)
|
||||
log.debug("variant %d:\n%s", i, variant)
|
||||
result = run_z3(variant, z3_bin=args.z3, timeout=args.timeout,
|
||||
debug=args.debug)
|
||||
db.log_formula(run_id, variant, result["result"])
|
||||
|
||||
cat = classify_result(result)
|
||||
if cat:
|
||||
anomalies += 1
|
||||
db.log_finding(
|
||||
run_id, cat,
|
||||
f"metamorphic variant #{i}: {cat}",
|
||||
severity="high",
|
||||
details={"variant_index": i,
|
||||
"stderr": result["stderr"][:500]})
|
||||
log.warning("anomaly in variant %d: %s", i, cat)
|
||||
continue
|
||||
|
||||
if result["result"] in ("sat", "unsat") \
|
||||
and result["result"] != base_status:
|
||||
anomalies += 1
|
||||
db.log_finding(
|
||||
run_id, "disagreement",
|
||||
f"variant #{i}: expected {base_status}, "
|
||||
f"got {result['result']}",
|
||||
severity="critical",
|
||||
details={"variant_index": i,
|
||||
"expected": base_status,
|
||||
"actual": result["result"]})
|
||||
log.warning("disagreement in variant %d: expected %s, got %s",
|
||||
i, base_status, result["result"])
|
||||
return anomalies
|
||||
|
||||
|
||||
def run_cross_theory(args, rng, db, run_id):
|
||||
theories = [t.strip() for t in args.theories.split(",")]
|
||||
anomalies = 0
|
||||
for i in range(args.count):
|
||||
formula = gen_cross_theory_formula(rng, theories,
|
||||
rng.randint(3, 8),
|
||||
rng.randint(2, 10))
|
||||
log.debug("cross-theory formula %d:\n%s", i, formula)
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
|
||||
debug=args.debug)
|
||||
db.log_formula(run_id, formula, result["result"])
|
||||
|
||||
cat = classify_result(result)
|
||||
if cat:
|
||||
anomalies += 1
|
||||
db.log_finding(
|
||||
run_id, cat,
|
||||
f"cross-theory #{i} ({','.join(theories)}): {cat}",
|
||||
severity="high" if cat == "crash" else "medium",
|
||||
details={"formula_index": i, "theories": theories,
|
||||
"exit_code": result["exit_code"],
|
||||
"stderr": result["stderr"][:500]})
|
||||
log.warning("anomaly in cross-theory formula %d: %s", i, cat)
|
||||
return anomalies
|
||||
|
||||
|
||||
def run_incremental(args, rng, db, run_id):
|
||||
anomalies = 0
|
||||
for i in range(args.count):
|
||||
num_frames = rng.randint(2, 6)
|
||||
formula = gen_incremental_formula(rng, num_frames)
|
||||
log.debug("incremental formula %d:\n%s", i, formula)
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
|
||||
debug=args.debug)
|
||||
db.log_formula(run_id, formula, result["result"])
|
||||
|
||||
cat = classify_result(result)
|
||||
if cat:
|
||||
anomalies += 1
|
||||
db.log_finding(
|
||||
run_id, cat,
|
||||
f"incremental #{i} ({num_frames} frames): {cat}",
|
||||
severity="high" if cat == "crash" else "medium",
|
||||
details={"formula_index": i, "num_frames": num_frames,
|
||||
"exit_code": result["exit_code"],
|
||||
"stderr": result["stderr"][:500]})
|
||||
log.warning("anomaly in incremental formula %d: %s", i, cat)
|
||||
return anomalies
|
||||
|
||||
|
||||
STRATEGIES = {
|
||||
"random": run_random,
|
||||
"metamorphic": run_metamorphic,
|
||||
"cross-theory": run_cross_theory,
|
||||
"incremental": run_incremental,
|
||||
}
|
||||
|
||||
# Entry point
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="deeptest",
|
||||
description="Generate and run stress tests for Z3.",
|
||||
)
|
||||
parser.add_argument("--strategy", choices=list(STRATEGIES),
|
||||
default="random",
|
||||
help="test generation strategy")
|
||||
parser.add_argument("--count", type=int, default=50,
|
||||
help="number of formulas to generate")
|
||||
parser.add_argument("--seed", type=int, default=None,
|
||||
help="random seed for reproducibility")
|
||||
parser.add_argument("--seed-file", default=None,
|
||||
help="base .smt2 file for metamorphic strategy")
|
||||
parser.add_argument("--theories", default="LIA,BV",
|
||||
help="comma-separated theories for cross-theory")
|
||||
parser.add_argument("--timeout", type=int, default=10,
|
||||
help="per-formula Z3 timeout in seconds")
|
||||
parser.add_argument("--z3", default=None, help="path to z3 binary")
|
||||
parser.add_argument("--db", default=None, help="path to z3agent.db")
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
seed = args.seed if args.seed is not None else int(time.time())
|
||||
rng = random.Random(seed)
|
||||
log.info("seed: %d", seed)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run(
|
||||
"deeptest",
|
||||
f"strategy={args.strategy} count={args.count} seed={seed}")
|
||||
|
||||
t0 = time.monotonic()
|
||||
anomalies = STRATEGIES[args.strategy](args, rng, db, run_id)
|
||||
elapsed_ms = int((time.monotonic() - t0) * 1000)
|
||||
|
||||
status = "success" if anomalies == 0 else "findings"
|
||||
db.finish_run(run_id, status, elapsed_ms)
|
||||
|
||||
print(f"strategy: {args.strategy}")
|
||||
print(f"seed: {seed}")
|
||||
print(f"formulas: {args.count}")
|
||||
print(f"anomalies: {anomalies}")
|
||||
print(f"elapsed: {elapsed_ms}ms")
|
||||
|
||||
db.close()
|
||||
sys.exit(1 if anomalies > 0 else 0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
45
.github/skills/encode/SKILL.md
vendored
Normal file
45
.github/skills/encode/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
---
|
||||
name: encode
|
||||
description: Translate constraint problems into SMT-LIB2 or Z3 Python API code. Handles common problem classes including scheduling, graph coloring, arithmetic puzzles, and verification conditions.
|
||||
---
|
||||
|
||||
Given a problem description (natural language, pseudocode, or a partial formulation), produce a complete, syntactically valid SMT-LIB2 encoding or Z3 Python script. The encoding should declare all variables, assert all constraints, and include the appropriate check-sat / get-model commands.
|
||||
|
||||
# Step 1: Identify the problem class
|
||||
|
||||
Common encodings:
|
||||
|
||||
| Problem class | Theory | Typical sorts |
|
||||
|---------------|--------|---------------|
|
||||
| Integer arithmetic | LIA / NIA | Int |
|
||||
| Real arithmetic | LRA / NRA | Real |
|
||||
| Bitvector operations | QF_BV | (_ BitVec N) |
|
||||
| Arrays and maps | QF_AX | (Array Int Int) |
|
||||
| Strings and regex | QF_S | String, RegLan |
|
||||
| Uninterpreted functions | QF_UF | custom sorts |
|
||||
| Mixed theories | AUFLIA, etc. | combination |
|
||||
|
||||
# Step 2: Generate the encoding
|
||||
|
||||
```bash
|
||||
python3 scripts/encode.py --problem "Find integers x, y such that x^2 + y^2 = 25 and x > 0" --format smtlib2
|
||||
python3 scripts/encode.py --problem "Schedule 4 tasks on 2 machines minimizing makespan" --format python
|
||||
```
|
||||
|
||||
For `--format smtlib2`, the output is a complete .smt2 file ready for the **solve** skill.
|
||||
For `--format python`, the output is a standalone Z3 Python script.
|
||||
|
||||
# Step 3: Validate the encoding
|
||||
|
||||
The script checks that the generated formula is syntactically valid by running a quick `z3 -in` parse check (no solving, just syntax). Parse errors are reported with the offending line.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| problem | string | yes | | problem description |
|
||||
| format | string | no | smtlib2 | output format: smtlib2 or python |
|
||||
| output | path | no | stdout | write to file instead of stdout |
|
||||
| validate | flag | no | on | run syntax check on the output |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
144
.github/skills/encode/scripts/encode.py
vendored
Normal file
144
.github/skills/encode/scripts/encode.py
vendored
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
encode.py: validate and format SMT-LIB2 encodings.
|
||||
|
||||
Usage:
|
||||
python encode.py --validate formula.smt2
|
||||
python encode.py --validate formula.smt2 --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, setup_logging
|
||||
|
||||
|
||||
VALIDATION_TIMEOUT = 5
|
||||
|
||||
|
||||
def read_input(path_or_stdin: str) -> str:
|
||||
"""Read formula from a file path or stdin (when path is '-')."""
|
||||
if path_or_stdin == "-":
|
||||
return sys.stdin.read()
|
||||
p = Path(path_or_stdin)
|
||||
if not p.exists():
|
||||
print(f"file not found: {p}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return p.read_text()
|
||||
|
||||
|
||||
def find_errors(output: str) -> list:
|
||||
"""Extract (error ...) messages from Z3 output."""
|
||||
return re.findall(r'\(error\s+"([^"]+)"\)', output)
|
||||
|
||||
|
||||
def validate(formula: str, z3_bin: str = None, debug: bool = False) -> dict:
|
||||
"""
|
||||
Validate an SMT-LIB2 formula by piping it through z3 -in.
|
||||
Returns a dict with 'valid' (bool), 'errors' (list), and 'raw' output.
|
||||
"""
|
||||
result = run_z3(
|
||||
formula, z3_bin=z3_bin, timeout=VALIDATION_TIMEOUT, debug=debug,
|
||||
)
|
||||
errors = find_errors(result["stdout"]) + find_errors(result["stderr"])
|
||||
|
||||
if result["result"] == "timeout":
|
||||
# Timeout during validation is not a syntax error: the formula
|
||||
# parsed successfully but solving exceeded the limit. That counts
|
||||
# as syntactically valid.
|
||||
return {"valid": True, "errors": [], "raw": result}
|
||||
|
||||
if errors or result["exit_code"] != 0:
|
||||
return {"valid": False, "errors": errors, "raw": result}
|
||||
|
||||
return {"valid": True, "errors": [], "raw": result}
|
||||
|
||||
|
||||
def report_errors(errors: list, formula: str):
|
||||
"""Print each syntax error with surrounding context."""
|
||||
lines = formula.splitlines()
|
||||
print(f"validation failed: {len(errors)} error(s)", file=sys.stderr)
|
||||
for err in errors:
|
||||
print(f" : {err}", file=sys.stderr)
|
||||
if len(lines) <= 20:
|
||||
print("formula:", file=sys.stderr)
|
||||
for i, line in enumerate(lines, 1):
|
||||
print(f" {i:3d} {line}", file=sys.stderr)
|
||||
|
||||
|
||||
def write_output(formula: str, output_path: str, fmt: str):
|
||||
"""Write the validated formula to a file or stdout."""
|
||||
if fmt == "python":
|
||||
print("python format output is generated by the agent, "
|
||||
"not by this script", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if output_path:
|
||||
Path(output_path).write_text(formula)
|
||||
print(f"written to {output_path}")
|
||||
else:
|
||||
print(formula)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="encode")
|
||||
parser.add_argument(
|
||||
"--validate",
|
||||
metavar="FILE",
|
||||
help="path to .smt2 file to validate, or '-' for stdin",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--format",
|
||||
choices=["smtlib2", "python"],
|
||||
default="smtlib2",
|
||||
help="output format (default: smtlib2)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
metavar="FILE",
|
||||
default=None,
|
||||
help="write result to file instead of stdout",
|
||||
)
|
||||
parser.add_argument("--z3", default=None, help="path to z3 binary")
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if not args.validate:
|
||||
parser.error("provide --validate FILE")
|
||||
return
|
||||
|
||||
formula = read_input(args.validate)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("encode", formula)
|
||||
|
||||
result = validate(formula, z3_bin=args.z3, debug=args.debug)
|
||||
|
||||
if result["valid"]:
|
||||
db.log_formula(run_id, formula, "valid")
|
||||
db.finish_run(run_id, "valid", result["raw"]["duration_ms"], 0)
|
||||
write_output(formula, args.output, args.format)
|
||||
db.close()
|
||||
sys.exit(0)
|
||||
else:
|
||||
db.log_formula(run_id, formula, "error")
|
||||
for err in result["errors"]:
|
||||
db.log_finding(run_id, "syntax", err, severity="error")
|
||||
db.finish_run(
|
||||
run_id, "error",
|
||||
result["raw"]["duration_ms"],
|
||||
result["raw"]["exit_code"],
|
||||
)
|
||||
report_errors(result["errors"], formula)
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
52
.github/skills/explain/SKILL.md
vendored
Normal file
52
.github/skills/explain/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
---
|
||||
name: explain
|
||||
description: Parse and interpret Z3 output for human consumption. Handles models, unsat cores, proofs, statistics, and error messages. Translates solver internals into plain-language explanations.
|
||||
---
|
||||
|
||||
Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark** skills), produce a structured explanation. This skill is for cases where the solver output is large, nested, or otherwise difficult to read directly.
|
||||
|
||||
# Step 1: Identify the output type
|
||||
|
||||
| Output contains | Explanation type |
|
||||
|----------------|-----------------|
|
||||
| `(define-fun ...)` blocks | model explanation |
|
||||
| unsat core labels | conflict explanation |
|
||||
| `:key value` statistics | performance breakdown |
|
||||
| `(error ...)` | error diagnosis |
|
||||
| proof terms | proof sketch |
|
||||
|
||||
# Step 2: Run the explainer
|
||||
|
||||
```bash
|
||||
python3 scripts/explain.py --file output.txt
|
||||
python3 scripts/explain.py --stdin < output.txt
|
||||
python3 scripts/explain.py --file output.txt --debug
|
||||
```
|
||||
|
||||
The script auto-detects the output type and produces a structured summary.
|
||||
|
||||
# Step 3: Interpret the explanation
|
||||
|
||||
For models:
|
||||
- Each variable is listed with its value and sort
|
||||
- Array and function interpretations are expanded
|
||||
- Bitvector values are shown in decimal and hex
|
||||
|
||||
For unsat cores:
|
||||
- The conflicting named assertions are listed
|
||||
- A minimal conflict set is highlighted
|
||||
|
||||
For statistics:
|
||||
- Time breakdown by phase (preprocessing, solving, model construction)
|
||||
- Theory solver load distribution
|
||||
- Memory high-water mark
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| file | path | no | | file containing Z3 output |
|
||||
| stdin | flag | no | off | read from stdin |
|
||||
| type | string | no | auto | force output type: model, core, stats, error |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
128
.github/skills/explain/scripts/explain.py
vendored
Normal file
128
.github/skills/explain/scripts/explain.py
vendored
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
explain.py: interpret Z3 output in a readable form.
|
||||
|
||||
Usage:
|
||||
python explain.py --file output.txt
|
||||
echo "sat\n(model ...)" | python explain.py --stdin
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, parse_model, parse_stats, parse_unsat_core, setup_logging
|
||||
|
||||
|
||||
def detect_type(text: str) -> str:
|
||||
if "(define-fun" in text:
|
||||
return "model"
|
||||
if "(error" in text:
|
||||
return "error"
|
||||
if re.search(r':\S+\s+[\d.]+', text):
|
||||
return "stats"
|
||||
first = text.strip().split("\n")[0].strip()
|
||||
if first == "unsat":
|
||||
return "core"
|
||||
return "unknown"
|
||||
|
||||
|
||||
def explain_model(text: str):
|
||||
model = parse_model(text)
|
||||
if not model:
|
||||
print("no model found in output")
|
||||
return
|
||||
print("satisfying assignment:")
|
||||
for name, val in model.items():
|
||||
# show hex for large integers (likely bitvectors)
|
||||
try:
|
||||
n = int(val)
|
||||
if abs(n) > 255:
|
||||
print(f" {name} = {val} (0x{n:x})")
|
||||
else:
|
||||
print(f" {name} = {val}")
|
||||
except ValueError:
|
||||
print(f" {name} = {val}")
|
||||
|
||||
|
||||
def explain_core(text: str):
|
||||
core = parse_unsat_core(text)
|
||||
if core:
|
||||
print(f"conflicting assertions ({len(core)}):")
|
||||
for label in core:
|
||||
print(f" {label}")
|
||||
else:
|
||||
print("unsat (no named assertions for core extraction)")
|
||||
|
||||
|
||||
def explain_stats(text: str):
|
||||
stats = parse_stats(text)
|
||||
if not stats:
|
||||
print("no statistics found")
|
||||
return
|
||||
print("performance breakdown:")
|
||||
for k in sorted(stats):
|
||||
print(f" :{k} {stats[k]}")
|
||||
|
||||
if "time" in stats:
|
||||
print(f"\ntotal time: {stats['time']}s")
|
||||
if "memory" in stats:
|
||||
print(f"peak memory: {stats['memory']} MB")
|
||||
|
||||
|
||||
def explain_error(text: str):
|
||||
errors = re.findall(r'\(error\s+"([^"]+)"\)', text)
|
||||
if errors:
|
||||
print(f"Z3 reported {len(errors)} error(s):")
|
||||
for e in errors:
|
||||
print(f" {e}")
|
||||
else:
|
||||
print("error in output but could not parse message")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="explain")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--stdin", action="store_true")
|
||||
parser.add_argument("--type", choices=["model", "core", "stats", "error", "auto"],
|
||||
default="auto")
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
text = Path(args.file).read_text()
|
||||
elif args.stdin:
|
||||
text = sys.stdin.read()
|
||||
else:
|
||||
parser.error("provide --file or --stdin")
|
||||
return
|
||||
|
||||
output_type = args.type if args.type != "auto" else detect_type(text)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("explain", text[:200])
|
||||
|
||||
if output_type == "model":
|
||||
explain_model(text)
|
||||
elif output_type == "core":
|
||||
explain_core(text)
|
||||
elif output_type == "stats":
|
||||
explain_stats(text)
|
||||
elif output_type == "error":
|
||||
explain_error(text)
|
||||
else:
|
||||
print("could not determine output type")
|
||||
print("raw output:")
|
||||
print(text[:500])
|
||||
|
||||
db.finish_run(run_id, "success", 0, 0)
|
||||
db.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
53
.github/skills/memory-safety/SKILL.md
vendored
Normal file
53
.github/skills/memory-safety/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
---
|
||||
name: memory-safety
|
||||
description: Run AddressSanitizer and UndefinedBehaviorSanitizer on the Z3 test suite to detect memory errors, undefined behavior, and leaks. Logs each finding to z3agent.db.
|
||||
---
|
||||
|
||||
Build Z3 with compiler-based sanitizer instrumentation, execute the test suite, and parse the runtime output for memory safety violations. Supported sanitizers are AddressSanitizer (heap and stack buffer overflows, use-after-free, double-free, memory leaks) and UndefinedBehaviorSanitizer (signed integer overflow, null pointer dereference, misaligned access, shift errors). Findings are deduplicated and stored in z3agent.db for triage and longitudinal tracking.
|
||||
|
||||
# Step 1: Configure and build
|
||||
|
||||
The script invokes cmake with the appropriate `-fsanitize` flags and builds the `test-z3` target. Each sanitizer uses a separate build directory to avoid flag conflicts. If a prior instrumented build exists with matching flags, only incremental compilation runs.
|
||||
|
||||
```bash
|
||||
python3 scripts/memory_safety.py --sanitizer asan
|
||||
python3 scripts/memory_safety.py --sanitizer ubsan
|
||||
python3 scripts/memory_safety.py --sanitizer both
|
||||
```
|
||||
|
||||
To reuse an existing build:
|
||||
```bash
|
||||
python3 scripts/memory_safety.py --sanitizer asan --skip-build --build-dir build/sanitizer-asan
|
||||
```
|
||||
|
||||
# Step 2: Run and collect
|
||||
|
||||
The test binary runs with `halt_on_error=0` so the sanitizer reports all violations rather than aborting on the first. The script parses `ERROR: AddressSanitizer`, `runtime error:`, and `ERROR: LeakSanitizer` patterns from the combined output, extracts source locations where available, and deduplicates by category, file, and line.
|
||||
|
||||
```bash
|
||||
python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret results
|
||||
|
||||
- `clean`: no sanitizer violations detected.
|
||||
- `findings`: one or more violations found. Each is printed with severity, category, message, and source location.
|
||||
- `timeout`: the test suite did not complete within the deadline. Increase the timeout or investigate a possible infinite loop.
|
||||
- `error`: build or execution failed before sanitizer output could be collected.
|
||||
|
||||
Query past runs:
|
||||
```bash
|
||||
python3 ../../shared/z3db.py runs --skill memory-safety --last 10
|
||||
python3 ../../shared/z3db.py query "SELECT category, severity, file, line, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='memory-safety') ORDER BY run_id DESC LIMIT 20"
|
||||
```
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| sanitizer | choice | no | asan | which sanitizer to enable: asan, ubsan, or both |
|
||||
| build-dir | path | no | build/sanitizer-{name} | path to the build directory |
|
||||
| timeout | int | no | 600 | seconds before killing the test run |
|
||||
| skip-build | flag | no | off | reuse an existing instrumented build |
|
||||
| debug | flag | no | off | verbose cmake, make, and test output |
|
||||
| db | path | no | .z3-agent/z3agent.db | path to the logging database |
|
||||
266
.github/skills/memory-safety/scripts/memory_safety.py
vendored
Normal file
266
.github/skills/memory-safety/scripts/memory_safety.py
vendored
Normal file
|
|
@ -0,0 +1,266 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
memory_safety.py: run sanitizer checks on Z3 test suite.
|
||||
|
||||
Usage:
|
||||
python memory_safety.py --sanitizer asan
|
||||
python memory_safety.py --sanitizer ubsan --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, setup_logging
|
||||
|
||||
logger = logging.getLogger("z3agent")
|
||||
|
||||
SANITIZER_FLAGS = {
|
||||
"asan": "-fsanitize=address -fno-omit-frame-pointer",
|
||||
"ubsan": "-fsanitize=undefined -fno-omit-frame-pointer",
|
||||
}
|
||||
|
||||
ASAN_ERROR = re.compile(r"ERROR:\s*AddressSanitizer:\s*(\S+)")
|
||||
UBSAN_ERROR = re.compile(r":\d+:\d+:\s*runtime error:\s*(.+)")
|
||||
LEAK_ERROR = re.compile(r"ERROR:\s*LeakSanitizer:")
|
||||
LOCATION = re.compile(r"(\S+\.(?:cpp|c|h|hpp)):(\d+)")
|
||||
|
||||
|
||||
def find_repo_root() -> Path:
|
||||
d = Path.cwd()
|
||||
for _ in range(10):
|
||||
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
|
||||
return d
|
||||
parent = d.parent
|
||||
if parent == d:
|
||||
break
|
||||
d = parent
|
||||
logger.error("could not locate Z3 repository root")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def build_is_configured(build_dir: Path, sanitizer: str) -> bool:
|
||||
"""Check whether the build directory already has a matching cmake config."""
|
||||
cache = build_dir / "CMakeCache.txt"
|
||||
if not cache.is_file():
|
||||
return False
|
||||
expected = SANITIZER_FLAGS[sanitizer].split()[0]
|
||||
return expected in cache.read_text()
|
||||
|
||||
|
||||
def configure(build_dir: Path, sanitizer: str, repo_root: Path) -> bool:
|
||||
"""Run cmake with the requested sanitizer flags."""
|
||||
flags = SANITIZER_FLAGS[sanitizer]
|
||||
build_dir.mkdir(parents=True, exist_ok=True)
|
||||
cmd = [
|
||||
"cmake", str(repo_root),
|
||||
f"-DCMAKE_C_FLAGS={flags}",
|
||||
f"-DCMAKE_CXX_FLAGS={flags}",
|
||||
f"-DCMAKE_EXE_LINKER_FLAGS={flags}",
|
||||
"-DCMAKE_BUILD_TYPE=Debug",
|
||||
"-DZ3_BUILD_TEST=ON",
|
||||
]
|
||||
logger.info("configuring %s build in %s", sanitizer, build_dir)
|
||||
logger.debug("cmake command: %s", " ".join(cmd))
|
||||
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
|
||||
if proc.returncode != 0:
|
||||
logger.error("cmake failed:\n%s", proc.stderr)
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def compile_tests(build_dir: Path) -> bool:
|
||||
"""Compile the test-z3 target."""
|
||||
nproc = os.cpu_count() or 4
|
||||
cmd = ["make", f"-j{nproc}", "test-z3"]
|
||||
logger.info("compiling test-z3 (%d parallel jobs)", nproc)
|
||||
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
|
||||
if proc.returncode != 0:
|
||||
logger.error("compilation failed:\n%s", proc.stderr[-2000:])
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def run_tests(build_dir: Path, timeout: int) -> dict:
|
||||
"""Execute test-z3 under sanitizer runtime and capture output."""
|
||||
test_bin = build_dir / "test-z3"
|
||||
if not test_bin.is_file():
|
||||
logger.error("test-z3 not found at %s", test_bin)
|
||||
return {"stdout": "", "stderr": "binary not found", "exit_code": -1,
|
||||
"duration_ms": 0}
|
||||
|
||||
env = os.environ.copy()
|
||||
env["ASAN_OPTIONS"] = "detect_leaks=1:halt_on_error=0:print_stacktrace=1"
|
||||
env["UBSAN_OPTIONS"] = "print_stacktrace=1:halt_on_error=0"
|
||||
|
||||
cmd = [str(test_bin), "/a"]
|
||||
logger.info("running: %s", " ".join(cmd))
|
||||
start = time.monotonic()
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd, capture_output=True, text=True, timeout=timeout,
|
||||
cwd=build_dir, env=env,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
ms = int((time.monotonic() - start) * 1000)
|
||||
logger.warning("test-z3 timed out after %dms", ms)
|
||||
return {"stdout": "", "stderr": "timeout", "exit_code": -1,
|
||||
"duration_ms": ms}
|
||||
|
||||
ms = int((time.monotonic() - start) * 1000)
|
||||
logger.debug("exit_code=%d duration=%dms", proc.returncode, ms)
|
||||
return {
|
||||
"stdout": proc.stdout,
|
||||
"stderr": proc.stderr,
|
||||
"exit_code": proc.returncode,
|
||||
"duration_ms": ms,
|
||||
}
|
||||
|
||||
|
||||
def parse_findings(output: str) -> list:
|
||||
"""Extract sanitizer error reports from combined stdout and stderr."""
|
||||
findings = []
|
||||
lines = output.split("\n")
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
entry = None
|
||||
|
||||
m = ASAN_ERROR.search(line)
|
||||
if m:
|
||||
entry = {"category": "asan", "message": m.group(1),
|
||||
"severity": "high"}
|
||||
|
||||
if not entry:
|
||||
m = LEAK_ERROR.search(line)
|
||||
if m:
|
||||
entry = {"category": "leak",
|
||||
"message": "detected memory leaks",
|
||||
"severity": "high"}
|
||||
|
||||
if not entry:
|
||||
m = UBSAN_ERROR.search(line)
|
||||
if m:
|
||||
entry = {"category": "ubsan", "message": m.group(1),
|
||||
"severity": "medium"}
|
||||
|
||||
if not entry:
|
||||
continue
|
||||
|
||||
file_path, line_no = None, None
|
||||
window = lines[max(0, i - 2):i + 5]
|
||||
for ctx in window:
|
||||
loc = LOCATION.search(ctx)
|
||||
if loc and "/usr/" not in loc.group(1):
|
||||
file_path = loc.group(1)
|
||||
line_no = int(loc.group(2))
|
||||
break
|
||||
|
||||
entry["file"] = file_path
|
||||
entry["line"] = line_no
|
||||
entry["raw"] = line.strip()
|
||||
findings.append(entry)
|
||||
|
||||
return findings
|
||||
|
||||
|
||||
def deduplicate(findings: list) -> list:
|
||||
"""Remove duplicate reports at the same category, file, and line."""
|
||||
seen = set()
|
||||
result = []
|
||||
for f in findings:
|
||||
key = (f["category"], f["file"], f["line"], f["message"])
|
||||
if key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
result.append(f)
|
||||
return result
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="memory-safety")
|
||||
parser.add_argument("--sanitizer", choices=["asan", "ubsan", "both"],
|
||||
default="asan",
|
||||
help="sanitizer to enable (default: asan)")
|
||||
parser.add_argument("--build-dir", default=None,
|
||||
help="path to build directory")
|
||||
parser.add_argument("--timeout", type=int, default=600,
|
||||
help="seconds before killing test run")
|
||||
parser.add_argument("--skip-build", action="store_true",
|
||||
help="reuse existing instrumented build")
|
||||
parser.add_argument("--db", default=None,
|
||||
help="path to z3agent.db")
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
repo_root = find_repo_root()
|
||||
|
||||
sanitizers = ["asan", "ubsan"] if args.sanitizer == "both" else [args.sanitizer]
|
||||
all_findings = []
|
||||
|
||||
db = Z3DB(args.db)
|
||||
|
||||
for san in sanitizers:
|
||||
if args.build_dir:
|
||||
build_dir = Path(args.build_dir)
|
||||
else:
|
||||
build_dir = repo_root / "build" / f"sanitizer-{san}"
|
||||
|
||||
run_id = db.start_run("memory-safety", f"sanitizer={san}")
|
||||
db.log(f"sanitizer: {san}, build: {build_dir}", run_id=run_id)
|
||||
|
||||
if not args.skip_build:
|
||||
needs_configure = not build_is_configured(build_dir, san)
|
||||
if needs_configure and not configure(build_dir, san, repo_root):
|
||||
db.finish_run(run_id, "error", 0, exit_code=1)
|
||||
print(f"FAIL: cmake configuration failed for {san}")
|
||||
continue
|
||||
if not compile_tests(build_dir):
|
||||
db.finish_run(run_id, "error", 0, exit_code=1)
|
||||
print(f"FAIL: compilation failed for {san}")
|
||||
continue
|
||||
|
||||
result = run_tests(build_dir, args.timeout)
|
||||
combined = result["stdout"] + "\n" + result["stderr"]
|
||||
findings = deduplicate(parse_findings(combined))
|
||||
|
||||
for f in findings:
|
||||
db.log_finding(
|
||||
run_id,
|
||||
category=f["category"],
|
||||
message=f["message"],
|
||||
severity=f["severity"],
|
||||
file=f["file"],
|
||||
line=f["line"],
|
||||
details={"raw": f["raw"]},
|
||||
)
|
||||
|
||||
status = "clean" if not findings else "findings"
|
||||
if result["exit_code"] == -1:
|
||||
status = "timeout" if "timeout" in result["stderr"] else "error"
|
||||
|
||||
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
|
||||
all_findings.extend(findings)
|
||||
print(f"{san}: {len(findings)} finding(s), {result['duration_ms']}ms")
|
||||
|
||||
if all_findings:
|
||||
print(f"\nTotal: {len(all_findings)} finding(s)")
|
||||
for f in all_findings:
|
||||
loc = f"{f['file']}:{f['line']}" if f["file"] else "unknown location"
|
||||
print(f" [{f['severity']}] {f['category']}: {f['message']} at {loc}")
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("\nNo sanitizer findings.")
|
||||
db.close()
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
48
.github/skills/optimize/SKILL.md
vendored
Normal file
48
.github/skills/optimize/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
---
|
||||
name: optimize
|
||||
description: Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.
|
||||
---
|
||||
|
||||
Given a set of constraints and an objective function, find the optimal value. Z3 supports both hard constraints (must hold) and soft constraints (weighted preferences), as well as lexicographic multi-objective optimization.
|
||||
|
||||
# Step 1: Formulate the problem
|
||||
|
||||
The formula uses the `(minimize ...)` or `(maximize ...)` directives followed by `(check-sat)` and `(get-model)`.
|
||||
|
||||
Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`:
|
||||
```smtlib
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (>= x 1))
|
||||
(assert (>= y 2))
|
||||
(assert (<= (+ x y) 10))
|
||||
(minimize (+ x y))
|
||||
(check-sat)
|
||||
(get-model)
|
||||
```
|
||||
|
||||
# Step 2: Run the optimizer
|
||||
|
||||
```bash
|
||||
python3 scripts/optimize.py --file scheduling.smt2
|
||||
python3 scripts/optimize.py --formula "<inline smt-lib2>" --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
- `sat` with a model: the optimal assignment satisfying all constraints.
|
||||
- `unsat`: the constraints are contradictory; no feasible solution exists.
|
||||
- `unknown` or `timeout`: Z3 could not determine optimality.
|
||||
|
||||
The script prints the objective value and the satisfying assignment.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula with minimize/maximize |
|
||||
| file | path | no | | path to .smt2 file |
|
||||
| timeout | int | no | 60 | seconds |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
60
.github/skills/optimize/scripts/optimize.py
vendored
Normal file
60
.github/skills/optimize/scripts/optimize.py
vendored
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
optimize.py: solve constrained optimization problems via Z3.
|
||||
|
||||
Usage:
|
||||
python optimize.py --file scheduling.smt2
|
||||
python optimize.py --formula "(declare-const x Int)..." --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_model, setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="optimize")
|
||||
parser.add_argument("--formula")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--timeout", type=int, default=60)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
formula = args.formula
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("optimize", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
|
||||
|
||||
db.log_formula(run_id, formula, result["result"],
|
||||
str(model) if model else None)
|
||||
db.finish_run(run_id, result["result"], result["duration_ms"],
|
||||
result["exit_code"])
|
||||
|
||||
print(result["result"])
|
||||
if model:
|
||||
for name, val in model.items():
|
||||
print(f" {name} = {val}")
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
54
.github/skills/prove/SKILL.md
vendored
Normal file
54
.github/skills/prove/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
---
|
||||
name: prove
|
||||
description: Prove validity of logical statements by negation and satisfiability checking. If the negation is unsatisfiable, the original statement is valid. Otherwise a counterexample is returned.
|
||||
---
|
||||
|
||||
Given a conjecture (an SMT-LIB2 assertion or a natural language claim), determine whether it holds universally. The method is standard: negate the conjecture and check satisfiability. If the negation is unsatisfiable, the original is valid. If satisfiable, the model is a counterexample.
|
||||
|
||||
# Step 1: Prepare the negated formula
|
||||
|
||||
Wrap the conjecture in `(assert (not ...))` and append `(check-sat)(get-model)`.
|
||||
|
||||
Example: to prove that `(> x 3)` implies `(> x 1)`:
|
||||
```smtlib
|
||||
(declare-const x Int)
|
||||
(assert (not (=> (> x 3) (> x 1))))
|
||||
(check-sat)
|
||||
(get-model)
|
||||
```
|
||||
|
||||
# Step 2: Run the prover
|
||||
|
||||
```bash
|
||||
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
|
||||
```
|
||||
|
||||
For file input where the file contains the full negated formula:
|
||||
```bash
|
||||
python3 scripts/prove.py --file negated.smt2
|
||||
```
|
||||
|
||||
With debug tracing:
|
||||
```bash
|
||||
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
- `valid`: the negation was unsat, so the conjecture holds for all inputs.
|
||||
- `invalid` followed by a counterexample: the negation was sat; the model shows a concrete assignment where the conjecture fails.
|
||||
- `unknown` or `timeout`: Z3 could not decide. The conjecture may require auxiliary lemmas or induction.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| conjecture | string | no | | the assertion to prove (without negation) |
|
||||
| vars | string | no | | variable declarations as "name:sort" pairs, comma-separated |
|
||||
| file | path | no | | .smt2 file with the negated formula |
|
||||
| timeout | int | no | 30 | seconds |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
|
||||
Either `conjecture` (with `vars`) or `file` must be provided.
|
||||
82
.github/skills/prove/scripts/prove.py
vendored
Normal file
82
.github/skills/prove/scripts/prove.py
vendored
Normal file
|
|
@ -0,0 +1,82 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
prove.py: prove validity by negation + satisfiability check.
|
||||
|
||||
Usage:
|
||||
python prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
|
||||
python prove.py --file negated.smt2
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_model, setup_logging
|
||||
|
||||
|
||||
def build_formula(conjecture: str, vars_str: str) -> str:
|
||||
lines = []
|
||||
if vars_str:
|
||||
for v in vars_str.split(","):
|
||||
v = v.strip()
|
||||
name, sort = v.split(":")
|
||||
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
|
||||
lines.append(f"(assert (not {conjecture}))")
|
||||
lines.append("(check-sat)")
|
||||
lines.append("(get-model)")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="prove")
|
||||
parser.add_argument("--conjecture", help="assertion to prove")
|
||||
parser.add_argument("--vars", help="variable declarations, e.g. 'x:Int,y:Bool'")
|
||||
parser.add_argument("--file", help="path to .smt2 file with negated formula")
|
||||
parser.add_argument("--timeout", type=int, default=30)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.conjecture:
|
||||
formula = build_formula(args.conjecture, args.vars or "")
|
||||
else:
|
||||
parser.error("provide --conjecture or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("prove", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
if result["result"] == "unsat":
|
||||
verdict = "valid"
|
||||
elif result["result"] == "sat":
|
||||
verdict = "invalid"
|
||||
else:
|
||||
verdict = result["result"]
|
||||
|
||||
model = parse_model(result["stdout"]) if verdict == "invalid" else None
|
||||
|
||||
db.log_formula(run_id, formula, verdict, str(model) if model else None)
|
||||
db.finish_run(run_id, verdict, result["duration_ms"], result["exit_code"])
|
||||
|
||||
print(verdict)
|
||||
if model:
|
||||
print("counterexample:")
|
||||
for name, val in model.items():
|
||||
print(f" {name} = {val}")
|
||||
|
||||
db.close()
|
||||
# Exit 0 when we successfully determined validity or invalidity;
|
||||
# exit 1 only for errors/timeouts.
|
||||
sys.exit(0 if verdict in ("valid", "invalid") else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
57
.github/skills/shared/schema.sql
vendored
Normal file
57
.github/skills/shared/schema.sql
vendored
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
-- z3agent schema v1
|
||||
|
||||
PRAGMA journal_mode=WAL;
|
||||
PRAGMA foreign_keys=ON;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS runs (
|
||||
run_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
skill TEXT NOT NULL,
|
||||
input_hash TEXT,
|
||||
status TEXT NOT NULL DEFAULT 'running',
|
||||
duration_ms INTEGER,
|
||||
exit_code INTEGER,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_runs_skill ON runs(skill);
|
||||
CREATE INDEX IF NOT EXISTS idx_runs_status ON runs(status);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS formulas (
|
||||
formula_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
|
||||
smtlib2 TEXT NOT NULL,
|
||||
result TEXT,
|
||||
model TEXT,
|
||||
stats TEXT,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_formulas_run ON formulas(run_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_formulas_result ON formulas(result);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS findings (
|
||||
finding_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
|
||||
category TEXT NOT NULL,
|
||||
severity TEXT,
|
||||
file TEXT,
|
||||
line INTEGER,
|
||||
message TEXT NOT NULL,
|
||||
details TEXT,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_findings_run ON findings(run_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_findings_category ON findings(category);
|
||||
CREATE INDEX IF NOT EXISTS idx_findings_severity ON findings(severity);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS interaction_log (
|
||||
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(run_id) ON DELETE SET NULL,
|
||||
level TEXT NOT NULL DEFAULT 'info',
|
||||
message TEXT NOT NULL,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_log_run ON interaction_log(run_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_log_level ON interaction_log(level);
|
||||
328
.github/skills/shared/z3db.py
vendored
Normal file
328
.github/skills/shared/z3db.py
vendored
Normal file
|
|
@ -0,0 +1,328 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
z3db: shared library and CLI for Z3 skill scripts.
|
||||
|
||||
Library usage:
|
||||
from z3db import Z3DB, find_z3, run_z3
|
||||
|
||||
CLI usage:
|
||||
python z3db.py init
|
||||
python z3db.py status
|
||||
python z3db.py log [--run-id N]
|
||||
python z3db.py runs [--skill solve] [--last N]
|
||||
python z3db.py query "SELECT ..."
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import sqlite3
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
|
||||
SCHEMA_PATH = Path(__file__).parent / "schema.sql"
|
||||
DEFAULT_DB_DIR = ".z3-agent"
|
||||
DEFAULT_DB_NAME = "z3agent.db"
|
||||
|
||||
logger = logging.getLogger("z3agent")
|
||||
|
||||
|
||||
def setup_logging(debug: bool = False):
|
||||
level = logging.DEBUG if debug else logging.INFO
|
||||
fmt = "[%(levelname)s] %(message)s" if not debug else \
|
||||
"[%(levelname)s %(asctime)s] %(message)s"
|
||||
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
|
||||
|
||||
|
||||
class Z3DB:
|
||||
"""SQLite handle for z3agent.db, tracks runs, formulas, findings, logs."""
|
||||
|
||||
def __init__(self, db_path: Optional[str] = None):
|
||||
if db_path is None:
|
||||
db_dir = Path(DEFAULT_DB_DIR)
|
||||
db_dir.mkdir(exist_ok=True)
|
||||
db_path = str(db_dir / DEFAULT_DB_NAME)
|
||||
self.db_path = db_path
|
||||
self.conn = sqlite3.connect(db_path)
|
||||
self.conn.execute("PRAGMA foreign_keys=ON")
|
||||
self.conn.row_factory = sqlite3.Row
|
||||
self._init_schema()
|
||||
|
||||
def _init_schema(self):
|
||||
self.conn.executescript(SCHEMA_PATH.read_text())
|
||||
|
||||
def close(self):
|
||||
self.conn.close()
|
||||
|
||||
def start_run(self, skill: str, input_text: str = "") -> int:
|
||||
input_hash = hashlib.sha256(input_text.encode()).hexdigest()[:16]
|
||||
cur = self.conn.execute(
|
||||
"INSERT INTO runs (skill, input_hash) VALUES (?, ?)",
|
||||
(skill, input_hash),
|
||||
)
|
||||
self.conn.commit()
|
||||
run_id = cur.lastrowid
|
||||
logger.debug("started run %d (skill=%s, hash=%s)", run_id, skill, input_hash)
|
||||
return run_id
|
||||
|
||||
def finish_run(self, run_id: int, status: str, duration_ms: int,
|
||||
exit_code: int = 0):
|
||||
self.conn.execute(
|
||||
"UPDATE runs SET status=?, duration_ms=?, exit_code=? WHERE run_id=?",
|
||||
(status, duration_ms, exit_code, run_id),
|
||||
)
|
||||
self.conn.commit()
|
||||
logger.debug("finished run %d: %s (%dms)", run_id, status, duration_ms)
|
||||
|
||||
def log_formula(self, run_id: int, smtlib2: str, result: str = None,
|
||||
model: str = None, stats: dict = None) -> int:
|
||||
cur = self.conn.execute(
|
||||
"INSERT INTO formulas (run_id, smtlib2, result, model, stats) "
|
||||
"VALUES (?, ?, ?, ?, ?)",
|
||||
(run_id, smtlib2, result, model,
|
||||
json.dumps(stats) if stats else None),
|
||||
)
|
||||
self.conn.commit()
|
||||
return cur.lastrowid
|
||||
|
||||
def log_finding(self, run_id: int, category: str, message: str,
|
||||
severity: str = None, file: str = None,
|
||||
line: int = None, details: dict = None) -> int:
|
||||
cur = self.conn.execute(
|
||||
"INSERT INTO findings (run_id, category, severity, file, line, "
|
||||
"message, details) VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
(run_id, category, severity, file, line, message,
|
||||
json.dumps(details) if details else None),
|
||||
)
|
||||
self.conn.commit()
|
||||
return cur.lastrowid
|
||||
|
||||
def log(self, message: str, level: str = "info", run_id: int = None):
|
||||
"""Write to stderr and to the interaction_log table."""
|
||||
getattr(logger, level, logger.info)(message)
|
||||
self.conn.execute(
|
||||
"INSERT INTO interaction_log (run_id, level, message) "
|
||||
"VALUES (?, ?, ?)",
|
||||
(run_id, level, message),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def get_runs(self, skill: str = None, last: int = 10):
|
||||
sql = "SELECT * FROM runs"
|
||||
params = []
|
||||
if skill:
|
||||
sql += " WHERE skill = ?"
|
||||
params.append(skill)
|
||||
sql += " ORDER BY run_id DESC LIMIT ?"
|
||||
params.append(last)
|
||||
return self.conn.execute(sql, params).fetchall()
|
||||
|
||||
def get_status(self) -> dict:
|
||||
rows = self.conn.execute(
|
||||
"SELECT status, COUNT(*) as cnt FROM runs GROUP BY status"
|
||||
).fetchall()
|
||||
total = sum(r["cnt"] for r in rows)
|
||||
by_status = {r["status"]: r["cnt"] for r in rows}
|
||||
last = self.conn.execute(
|
||||
"SELECT timestamp FROM runs ORDER BY run_id DESC LIMIT 1"
|
||||
).fetchone()
|
||||
return {
|
||||
"total": total,
|
||||
**by_status,
|
||||
"last_run": last["timestamp"] if last else None,
|
||||
}
|
||||
|
||||
def get_logs(self, run_id: int = None, last: int = 50):
|
||||
if run_id:
|
||||
return self.conn.execute(
|
||||
"SELECT * FROM interaction_log WHERE run_id=? "
|
||||
"ORDER BY log_id DESC LIMIT ?", (run_id, last)
|
||||
).fetchall()
|
||||
return self.conn.execute(
|
||||
"SELECT * FROM interaction_log ORDER BY log_id DESC LIMIT ?",
|
||||
(last,)
|
||||
).fetchall()
|
||||
|
||||
def query(self, sql: str):
|
||||
return self.conn.execute(sql).fetchall()
|
||||
|
||||
|
||||
def find_z3(hint: str = None) -> str:
|
||||
"""Locate the z3 binary: explicit path > build dirs > PATH."""
|
||||
candidates = []
|
||||
if hint:
|
||||
candidates.append(hint)
|
||||
|
||||
repo_root = _find_repo_root()
|
||||
if repo_root:
|
||||
for build_dir in ["build", "build/release", "build/debug"]:
|
||||
candidates.append(str(repo_root / build_dir / "z3"))
|
||||
|
||||
path_z3 = shutil.which("z3")
|
||||
if path_z3:
|
||||
candidates.append(path_z3)
|
||||
|
||||
for c in candidates:
|
||||
p = Path(c)
|
||||
if p.is_file() and os.access(p, os.X_OK):
|
||||
logger.debug("found z3: %s", p)
|
||||
return str(p)
|
||||
|
||||
logger.error("z3 binary not found. Searched: %s", candidates)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def _find_repo_root() -> Optional[Path]:
|
||||
d = Path.cwd()
|
||||
for _ in range(10):
|
||||
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
|
||||
return d
|
||||
parent = d.parent
|
||||
if parent == d:
|
||||
break
|
||||
d = parent
|
||||
return None
|
||||
|
||||
|
||||
def run_z3(formula: str, z3_bin: str = None, timeout: int = 30,
|
||||
args: list = None, debug: bool = False) -> dict:
|
||||
"""Pipe an SMT-LIB2 formula into z3 -in, return parsed output."""
|
||||
z3_path = find_z3(z3_bin)
|
||||
cmd = [z3_path, "-in"] + (args or [])
|
||||
|
||||
logger.debug("cmd: %s", " ".join(cmd))
|
||||
logger.debug("stdin:\n%s", formula)
|
||||
|
||||
start = time.monotonic()
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd, input=formula, capture_output=True, text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
duration_ms = int((time.monotonic() - start) * 1000)
|
||||
logger.warning("z3 timed out after %dms", duration_ms)
|
||||
return {
|
||||
"stdout": "", "stderr": "timeout", "exit_code": -1,
|
||||
"duration_ms": duration_ms, "result": "timeout",
|
||||
}
|
||||
|
||||
duration_ms = int((time.monotonic() - start) * 1000)
|
||||
|
||||
logger.debug("exit_code=%d duration=%dms", proc.returncode, duration_ms)
|
||||
logger.debug("stdout:\n%s", proc.stdout)
|
||||
if proc.stderr:
|
||||
logger.debug("stderr:\n%s", proc.stderr)
|
||||
|
||||
first_line = proc.stdout.strip().split("\n")[0].strip() if proc.stdout else ""
|
||||
result = first_line if first_line in ("sat", "unsat", "unknown") else "error"
|
||||
|
||||
return {
|
||||
"stdout": proc.stdout,
|
||||
"stderr": proc.stderr,
|
||||
"exit_code": proc.returncode,
|
||||
"duration_ms": duration_ms,
|
||||
"result": result,
|
||||
}
|
||||
|
||||
|
||||
def parse_model(stdout: str) -> Optional[dict]:
|
||||
"""Pull define-fun entries from a (get-model) response."""
|
||||
model = {}
|
||||
for m in re.finditer(
|
||||
r'\(define-fun\s+(\S+)\s+\(\)\s+\S+\s+(.+?)\)', stdout
|
||||
):
|
||||
model[m.group(1)] = m.group(2).strip()
|
||||
return model if model else None
|
||||
|
||||
|
||||
def parse_stats(stdout: str) -> Optional[dict]:
|
||||
"""Parse :key value pairs from z3 -st output."""
|
||||
stats = {}
|
||||
for m in re.finditer(r':(\S+)\s+([\d.]+)', stdout):
|
||||
key, val = m.group(1), m.group(2)
|
||||
stats[key] = float(val) if '.' in val else int(val)
|
||||
return stats if stats else None
|
||||
|
||||
|
||||
def parse_unsat_core(stdout: str) -> Optional[list]:
|
||||
for line in stdout.strip().split("\n"):
|
||||
line = line.strip()
|
||||
if line.startswith("(") and not line.startswith("(error"):
|
||||
labels = line.strip("()").split()
|
||||
if labels:
|
||||
return labels
|
||||
return None
|
||||
|
||||
|
||||
def cli():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Z3 Agent database CLI",
|
||||
prog="z3db",
|
||||
)
|
||||
parser.add_argument("--db", default=None, help="path to z3agent.db")
|
||||
parser.add_argument("--debug", action="store_true", help="verbose output")
|
||||
|
||||
sub = parser.add_subparsers(dest="command")
|
||||
|
||||
sub.add_parser("init", help="initialize the database")
|
||||
|
||||
status_p = sub.add_parser("status", help="show run summary")
|
||||
|
||||
log_p = sub.add_parser("log", help="show interaction log")
|
||||
log_p.add_argument("--run-id", type=int, help="filter by run ID")
|
||||
log_p.add_argument("--last", type=int, default=50)
|
||||
|
||||
runs_p = sub.add_parser("runs", help="list runs")
|
||||
runs_p.add_argument("--skill", help="filter by skill name")
|
||||
runs_p.add_argument("--last", type=int, default=10)
|
||||
|
||||
query_p = sub.add_parser("query", help="run raw SQL")
|
||||
query_p.add_argument("sql", help="SQL query string")
|
||||
|
||||
args = parser.parse_args()
|
||||
setup_logging(args.debug)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
|
||||
if args.command == "init":
|
||||
print(f"Database initialized at {db.db_path}")
|
||||
|
||||
elif args.command == "status":
|
||||
s = db.get_status()
|
||||
print(f"Runs: {s['total']}"
|
||||
f" | success: {s.get('success', 0)}"
|
||||
f" | error: {s.get('error', 0)}"
|
||||
f" | timeout: {s.get('timeout', 0)}"
|
||||
f" | Last: {s['last_run'] or 'never'}")
|
||||
|
||||
elif args.command == "log":
|
||||
for row in db.get_logs(args.run_id, args.last):
|
||||
print(f"[{row['level']}] {row['timestamp']} "
|
||||
f"(run {row['run_id']}): {row['message']}")
|
||||
|
||||
elif args.command == "runs":
|
||||
for row in db.get_runs(args.skill, args.last):
|
||||
print(f"#{row['run_id']} {row['skill']} {row['status']} "
|
||||
f"{row['duration_ms']}ms @ {row['timestamp']}")
|
||||
|
||||
elif args.command == "query":
|
||||
for row in db.query(args.sql):
|
||||
print(dict(row))
|
||||
|
||||
else:
|
||||
parser.print_help()
|
||||
|
||||
db.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
cli()
|
||||
48
.github/skills/simplify/SKILL.md
vendored
Normal file
48
.github/skills/simplify/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
---
|
||||
name: simplify
|
||||
description: Reduce formula complexity using Z3 tactic chains. Supports configurable tactic pipelines for boolean, arithmetic, and bitvector simplification.
|
||||
---
|
||||
|
||||
Given a formula, apply a sequence of Z3 tactics to produce an equivalent but simpler form. This is useful for understanding what Z3 sees after preprocessing, debugging tactic selection, and reducing formula size before solving.
|
||||
|
||||
# Step 1: Choose tactics
|
||||
|
||||
Z3 provides dozens of tactics. Common ones:
|
||||
|
||||
| Tactic | What it does |
|
||||
|--------|-------------|
|
||||
| simplify | constant folding, algebraic identities |
|
||||
| propagate-values | substitute known equalities |
|
||||
| ctx-simplify | context-dependent simplification |
|
||||
| elim-uncnstr | remove unconstrained variables |
|
||||
| solve-eqs | Gaussian elimination |
|
||||
| bit-blast | reduce bitvectors to booleans |
|
||||
| tseitin-cnf | convert to CNF |
|
||||
| aig | and-inverter graph reduction |
|
||||
|
||||
# Step 2: Run simplification
|
||||
|
||||
```bash
|
||||
python3 scripts/simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
|
||||
python3 scripts/simplify.py --file formula.smt2 --tactics "simplify,propagate-values,ctx-simplify"
|
||||
python3 scripts/simplify.py --file formula.smt2 --debug
|
||||
```
|
||||
|
||||
Without `--tactics`, the script applies the default chain: `simplify`, `propagate-values`, `ctx-simplify`.
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
The script prints the simplified formula in SMT-LIB2 syntax. Subgoals are printed as separate `(assert ...)` blocks.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula to simplify |
|
||||
| vars | string | no | | variable declarations as "name:sort" pairs |
|
||||
| file | path | no | | path to .smt2 file |
|
||||
| tactics | string | no | simplify,propagate-values,ctx-simplify | comma-separated tactic names |
|
||||
| timeout | int | no | 30 | seconds |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
83
.github/skills/simplify/scripts/simplify.py
vendored
Normal file
83
.github/skills/simplify/scripts/simplify.py
vendored
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
simplify.py: apply Z3 tactics to simplify an SMT-LIB2 formula.
|
||||
|
||||
Usage:
|
||||
python simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
|
||||
python simplify.py --file formula.smt2 --tactics "simplify,solve-eqs"
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, setup_logging
|
||||
|
||||
|
||||
DEFAULT_TACTICS = "simplify,propagate-values,ctx-simplify"
|
||||
|
||||
|
||||
def build_tactic_formula(base_formula: str, tactics: str) -> str:
|
||||
tactic_list = [t.strip() for t in tactics.split(",")]
|
||||
if len(tactic_list) == 1:
|
||||
tactic_expr = f"(then {tactic_list[0]} skip)"
|
||||
else:
|
||||
tactic_expr = "(then " + " ".join(tactic_list) + ")"
|
||||
return base_formula + f"\n(apply {tactic_expr})\n"
|
||||
|
||||
|
||||
def build_formula_from_parts(formula_str: str, vars_str: str) -> str:
|
||||
lines = []
|
||||
if vars_str:
|
||||
for v in vars_str.split(","):
|
||||
v = v.strip()
|
||||
name, sort = v.split(":")
|
||||
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
|
||||
lines.append(formula_str)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="simplify")
|
||||
parser.add_argument("--formula")
|
||||
parser.add_argument("--vars")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--tactics", default=DEFAULT_TACTICS)
|
||||
parser.add_argument("--timeout", type=int, default=30)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
base = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
base = build_formula_from_parts(args.formula, args.vars or "")
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
formula = build_tactic_formula(base, args.tactics)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("simplify", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
status = "success" if result["exit_code"] == 0 else "error"
|
||||
db.log_formula(run_id, formula, status)
|
||||
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
|
||||
|
||||
print(result["stdout"])
|
||||
if result["stderr"] and result["exit_code"] != 0:
|
||||
print(result["stderr"], file=sys.stderr)
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
50
.github/skills/solve/SKILL.md
vendored
Normal file
50
.github/skills/solve/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
---
|
||||
name: solve
|
||||
description: Check satisfiability of SMT-LIB2 formulas using Z3. Returns sat/unsat with models or unsat cores. Logs every invocation to z3agent.db for auditability.
|
||||
---
|
||||
|
||||
Given an SMT-LIB2 formula (or a set of constraints described in natural language), determine whether the formula is satisfiable. If sat, extract a satisfying assignment. If unsat and tracking labels are present, extract the unsat core.
|
||||
|
||||
# Step 1: Prepare the formula
|
||||
|
||||
If the input is already valid SMT-LIB2, use it directly. If it is a natural language description, use the **encode** skill first to produce SMT-LIB2.
|
||||
|
||||
The formula must include `(check-sat)` at the end. Append `(get-model)` for satisfiable queries or `(get-unsat-core)` when named assertions are used.
|
||||
|
||||
# Step 2: Run Z3
|
||||
|
||||
```bash
|
||||
python3 scripts/solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
|
||||
```
|
||||
|
||||
For file input:
|
||||
```bash
|
||||
python3 scripts/solve.py --file problem.smt2
|
||||
```
|
||||
|
||||
With debug tracing:
|
||||
```bash
|
||||
python3 scripts/solve.py --file problem.smt2 --debug
|
||||
```
|
||||
|
||||
The script pipes the formula to `z3 -in` via subprocess (no shell expansion), logs the run to `.z3-agent/z3agent.db`, and prints the result.
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
- `sat` followed by a model: the formula is satisfiable; the model assigns concrete values to each declared constant.
|
||||
- `unsat`: no assignment exists. If `(get-unsat-core)` was used, the conflicting named assertions are listed.
|
||||
- `unknown`: Z3 could not decide within the timeout. Consider increasing the timeout or simplifying the formula.
|
||||
- `timeout`: the process was killed after the deadline. Try the **simplify** skill to reduce complexity.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula as a string |
|
||||
| file | path | no | | path to an .smt2 file |
|
||||
| timeout | int | no | 30 | seconds before killing z3 |
|
||||
| z3 | path | no | auto | explicit path to z3 binary |
|
||||
| debug | flag | no | off | print z3 command, stdin, stdout, stderr, timing |
|
||||
| db | path | no | .z3-agent/z3agent.db | path to the logging database |
|
||||
|
||||
Either `formula` or `file` must be provided.
|
||||
66
.github/skills/solve/scripts/solve.py
vendored
Normal file
66
.github/skills/solve/scripts/solve.py
vendored
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
solve.py: check satisfiability of an SMT-LIB2 formula via Z3.
|
||||
|
||||
Usage:
|
||||
python solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
|
||||
python solve.py --file problem.smt2
|
||||
python solve.py --file problem.smt2 --debug --timeout 60
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_model, parse_unsat_core, setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="solve")
|
||||
parser.add_argument("--formula", help="SMT-LIB2 formula string")
|
||||
parser.add_argument("--file", help="path to .smt2 file")
|
||||
parser.add_argument("--timeout", type=int, default=30)
|
||||
parser.add_argument("--z3", default=None, help="path to z3 binary")
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
formula = args.formula
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("solve", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
|
||||
core = parse_unsat_core(result["stdout"]) if result["result"] == "unsat" else None
|
||||
|
||||
db.log_formula(run_id, formula, result["result"],
|
||||
str(model) if model else None)
|
||||
db.finish_run(run_id, result["result"], result["duration_ms"],
|
||||
result["exit_code"])
|
||||
|
||||
print(result["result"])
|
||||
if model:
|
||||
for name, val in model.items():
|
||||
print(f" {name} = {val}")
|
||||
if core:
|
||||
print("unsat core:", " ".join(core))
|
||||
if result["stderr"] and result["result"] == "error":
|
||||
print(result["stderr"], file=sys.stderr)
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
46
.github/skills/static-analysis/SKILL.md
vendored
Normal file
46
.github/skills/static-analysis/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
---
|
||||
name: static-analysis
|
||||
description: Run Clang Static Analyzer (scan-build) on Z3 source and log structured findings to z3agent.db.
|
||||
---
|
||||
|
||||
Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plist diagnostics, and record each finding with file, line, category, and description. This skill wraps scan-build into a reproducible, logged workflow suitable for regular analysis sweeps and regression tracking.
|
||||
|
||||
# Step 1: Run the analysis
|
||||
|
||||
```bash
|
||||
python3 scripts/static_analysis.py --build-dir build
|
||||
python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug
|
||||
python3 scripts/static_analysis.py --build-dir build --timeout 1800
|
||||
```
|
||||
|
||||
The script invokes `scan-build cmake ..` followed by `scan-build make` inside the specified build directory. Clang checker output is written to `--output-dir` (defaults to a `scan-results` subdirectory of the build directory).
|
||||
|
||||
# Step 2: Interpret the output
|
||||
|
||||
Each finding is printed with its source location, category, and description:
|
||||
|
||||
```
|
||||
[Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read
|
||||
[Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer
|
||||
```
|
||||
|
||||
A summary table groups findings by category so that high-frequency classes are visible at a glance.
|
||||
|
||||
# Step 3: Review historical findings
|
||||
|
||||
All findings are logged to `z3agent.db`. Query them to track trends:
|
||||
|
||||
```bash
|
||||
python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"
|
||||
python3 ../../shared/z3db.py runs --skill static-analysis --last 10
|
||||
```
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| build-dir | path | yes | | path to the CMake build directory |
|
||||
| output-dir | path | no | BUILD/scan-results | directory for scan-build output |
|
||||
| timeout | int | no | 1200 | seconds allowed for the full build |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
255
.github/skills/static-analysis/scripts/static_analysis.py
vendored
Normal file
255
.github/skills/static-analysis/scripts/static_analysis.py
vendored
Normal file
|
|
@ -0,0 +1,255 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
static_analysis.py: run Clang Static Analyzer on Z3 source.
|
||||
|
||||
Usage:
|
||||
python static_analysis.py --build-dir build
|
||||
python static_analysis.py --build-dir build --output-dir /tmp/sa-results
|
||||
python static_analysis.py --build-dir build --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import plistlib
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from collections import Counter
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, setup_logging
|
||||
|
||||
logger = logging.getLogger("z3agent")
|
||||
|
||||
SCAN_BUILD_NAMES = ["scan-build", "scan-build-14", "scan-build-15", "scan-build-16"]
|
||||
|
||||
|
||||
def find_scan_build() -> str:
|
||||
"""Locate the scan-build binary on PATH."""
|
||||
for name in SCAN_BUILD_NAMES:
|
||||
path = shutil.which(name)
|
||||
if path:
|
||||
logger.debug("found scan-build: %s", path)
|
||||
return path
|
||||
logger.error(
|
||||
"scan-build not found. Install clang-tools or set PATH. "
|
||||
"Searched: %s", ", ".join(SCAN_BUILD_NAMES)
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def run_configure(scan_build: str, build_dir: Path, output_dir: Path,
|
||||
timeout: int) -> bool:
|
||||
"""Run scan-build cmake to configure the project."""
|
||||
repo_root = build_dir.parent
|
||||
cmd = [
|
||||
scan_build,
|
||||
"-o", str(output_dir),
|
||||
"cmake",
|
||||
str(repo_root),
|
||||
]
|
||||
logger.info("configuring: %s", " ".join(cmd))
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd, cwd=str(build_dir),
|
||||
capture_output=True, text=True, timeout=timeout,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.error("cmake configuration timed out after %ds", timeout)
|
||||
return False
|
||||
|
||||
if proc.returncode != 0:
|
||||
logger.error("cmake configuration failed (exit %d)", proc.returncode)
|
||||
logger.error("stderr: %s", proc.stderr[:2000])
|
||||
return False
|
||||
|
||||
logger.info("configuration complete")
|
||||
return True
|
||||
|
||||
|
||||
def run_build(scan_build: str, build_dir: Path, output_dir: Path,
|
||||
timeout: int) -> bool:
|
||||
"""Run scan-build make to build and analyze."""
|
||||
nproc = os.cpu_count() or 4
|
||||
cmd = [
|
||||
scan_build,
|
||||
"-o", str(output_dir),
|
||||
"--status-bugs",
|
||||
"make",
|
||||
f"-j{nproc}",
|
||||
]
|
||||
logger.info("building with analysis: %s", " ".join(cmd))
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd, cwd=str(build_dir),
|
||||
capture_output=True, text=True, timeout=timeout,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.error("build timed out after %ds", timeout)
|
||||
return False
|
||||
|
||||
# scan-build returns nonzero when bugs are found (due to --status-bugs),
|
||||
# so a nonzero exit code is not necessarily a build failure.
|
||||
if proc.returncode != 0:
|
||||
logger.info(
|
||||
"scan-build exited with code %d (nonzero may indicate findings)",
|
||||
proc.returncode,
|
||||
)
|
||||
else:
|
||||
logger.info("build complete, no bugs reported by scan-build")
|
||||
|
||||
if proc.stderr:
|
||||
logger.debug("build stderr (last 2000 chars): %s", proc.stderr[-2000:])
|
||||
return True
|
||||
|
||||
|
||||
def collect_plist_files(output_dir: Path) -> list:
|
||||
"""Recursively find all .plist diagnostic files under the output directory."""
|
||||
plists = sorted(output_dir.rglob("*.plist"))
|
||||
logger.debug("found %d plist files in %s", len(plists), output_dir)
|
||||
return plists
|
||||
|
||||
|
||||
def parse_plist_findings(plist_path: Path) -> list:
|
||||
"""Extract findings from a single Clang plist diagnostic file.
|
||||
|
||||
Returns a list of dicts with keys: file, line, col, category, type, description.
|
||||
"""
|
||||
findings = []
|
||||
try:
|
||||
with open(plist_path, "rb") as f:
|
||||
data = plistlib.load(f)
|
||||
except Exception as exc:
|
||||
logger.warning("could not parse %s: %s", plist_path, exc)
|
||||
return findings
|
||||
|
||||
source_files = data.get("files", [])
|
||||
for diag in data.get("diagnostics", []):
|
||||
location = diag.get("location", {})
|
||||
file_idx = location.get("file", 0)
|
||||
source_file = source_files[file_idx] if file_idx < len(source_files) else "<unknown>"
|
||||
findings.append({
|
||||
"file": source_file,
|
||||
"line": location.get("line", 0),
|
||||
"col": location.get("col", 0),
|
||||
"category": diag.get("category", "uncategorized"),
|
||||
"type": diag.get("type", ""),
|
||||
"description": diag.get("description", ""),
|
||||
})
|
||||
return findings
|
||||
|
||||
|
||||
def collect_all_findings(output_dir: Path) -> list:
|
||||
"""Parse every plist file under output_dir and return merged findings."""
|
||||
all_findings = []
|
||||
for plist_path in collect_plist_files(output_dir):
|
||||
all_findings.extend(parse_plist_findings(plist_path))
|
||||
return all_findings
|
||||
|
||||
|
||||
def log_findings(db, run_id: int, findings: list):
|
||||
"""Persist each finding to z3agent.db."""
|
||||
for f in findings:
|
||||
db.log_finding(
|
||||
run_id,
|
||||
category=f["category"],
|
||||
message=f["description"],
|
||||
severity=f.get("type"),
|
||||
file=f["file"],
|
||||
line=f["line"],
|
||||
details={"col": f["col"], "type": f["type"]},
|
||||
)
|
||||
|
||||
|
||||
def print_findings(findings: list):
|
||||
"""Print individual findings and a category summary."""
|
||||
if not findings:
|
||||
print("No findings reported.")
|
||||
return
|
||||
|
||||
for f in findings:
|
||||
label = f["category"]
|
||||
if f["type"]:
|
||||
label = f["type"]
|
||||
print(f"[{label}] {f['file']}:{f['line']}: {f['description']}")
|
||||
|
||||
print()
|
||||
counts = Counter(f["category"] for f in findings)
|
||||
print(f"Total findings: {len(findings)}")
|
||||
print("By category:")
|
||||
for cat, cnt in counts.most_common():
|
||||
print(f" {cat}: {cnt}")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="static_analysis",
|
||||
description="Run Clang Static Analyzer on Z3 and log findings.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--build-dir", required=True,
|
||||
help="path to the CMake build directory",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output-dir", default=None,
|
||||
help="directory for scan-build results (default: BUILD/scan-results)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout", type=int, default=1200,
|
||||
help="seconds allowed for the full analysis build",
|
||||
)
|
||||
parser.add_argument("--db", default=None, help="path to z3agent.db")
|
||||
parser.add_argument("--debug", action="store_true", help="verbose tracing")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
scan_build = find_scan_build()
|
||||
|
||||
build_dir = Path(args.build_dir).resolve()
|
||||
build_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
output_dir = Path(args.output_dir) if args.output_dir else build_dir / "scan-results"
|
||||
output_dir = output_dir.resolve()
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("static-analysis", f"build_dir={build_dir}")
|
||||
start = time.monotonic()
|
||||
|
||||
if not run_configure(scan_build, build_dir, output_dir, timeout=args.timeout):
|
||||
elapsed = int((time.monotonic() - start) * 1000)
|
||||
db.finish_run(run_id, "error", elapsed, exit_code=1)
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
|
||||
if not run_build(scan_build, build_dir, output_dir, timeout=args.timeout):
|
||||
elapsed = int((time.monotonic() - start) * 1000)
|
||||
db.finish_run(run_id, "error", elapsed, exit_code=1)
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
|
||||
elapsed = int((time.monotonic() - start) * 1000)
|
||||
|
||||
findings = collect_all_findings(output_dir)
|
||||
log_findings(db, run_id, findings)
|
||||
|
||||
status = "clean" if len(findings) == 0 else "findings"
|
||||
db.finish_run(run_id, status, elapsed, exit_code=0)
|
||||
|
||||
db.log(
|
||||
f"static analysis complete: {len(findings)} finding(s) in {elapsed}ms",
|
||||
run_id=run_id,
|
||||
)
|
||||
|
||||
print_findings(findings)
|
||||
|
||||
db.close()
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
Add table
Add a link
Reference in a new issue