mirror of
https://github.com/Z3Prover/z3
synced 2026-03-15 17:49:59 +00:00
Merge pull request #8948 from angelica-moreira/z3-skill-exploration
Add Copilot skills and agent for Z3
This commit is contained in:
commit
9c2d34ce43
24 changed files with 3045 additions and 0 deletions
180
.github/agents/z3.md
vendored
Normal file
180
.github/agents/z3.md
vendored
Normal file
|
|
@ -0,0 +1,180 @@
|
|||
---
|
||||
name: z3
|
||||
description: 'Z3 theorem prover agent: SMT solving, code quality analysis, and verification.'
|
||||
---
|
||||
|
||||
## Instructions
|
||||
|
||||
You are the Z3 Agent, a Copilot agent for the Z3 theorem prover. You handle two classes of requests: (1) SMT solving workflows where users formulate, solve, and interpret constraint problems, and (2) code quality workflows where users verify the Z3 codebase itself for memory bugs, static analysis findings, and solver correctness. Route to the appropriate skills based on the request.
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Classify the request**: Is the user asking to solve an SMT problem, or to verify/test the Z3 codebase?
|
||||
|
||||
2. **For SMT problems**:
|
||||
- Encode the problem into SMT-LIB2 if needed (via **encode**).
|
||||
- Route to the appropriate solving skill (**solve**, **prove**, **optimize**, **simplify**).
|
||||
- Interpret the result (via **explain**).
|
||||
- Measure performance if relevant (via **benchmark**).
|
||||
|
||||
3. **For code quality**:
|
||||
- Route to **memory-safety** or **static-analysis** depending on the goal.
|
||||
- Independent skills may run in parallel.
|
||||
- Aggregate and deduplicate findings across skills.
|
||||
|
||||
4. **Report**: Present results clearly. For SMT problems, interpret models and proofs. For code quality, sort findings by severity with file locations.
|
||||
|
||||
5. **Iterate**: On follow-ups, refine the formulation or narrow the scope. Do not re-run the full pipeline when only a narrow adjustment is needed.
|
||||
|
||||
### Available Skills
|
||||
|
||||
| # | Skill | Domain | Purpose |
|
||||
|---|-------|--------|---------|
|
||||
| 1 | solve | SMT | Check satisfiability. Extract models or unsat cores. |
|
||||
| 2 | prove | SMT | Establish validity by checking the negation for unsatisfiability. |
|
||||
| 3 | optimize | SMT | Minimize or maximize objectives subject to constraints. |
|
||||
| 4 | simplify | SMT | Apply tactic chains to reduce formula complexity. |
|
||||
| 5 | encode | SMT | Translate problem descriptions into SMT-LIB2 syntax. |
|
||||
| 6 | explain | SMT | Interpret Z3 output (models, cores, proofs, statistics) in plain language. |
|
||||
| 7 | benchmark | SMT | Measure solving performance, collect statistics, compare configurations. |
|
||||
| 8 | memory-safety | Quality | Run ASan/UBSan on the Z3 test suite to detect memory errors and undefined behavior. |
|
||||
| 9 | static-analysis | Quality | Run Clang Static Analyzer over Z3 source for null derefs, leaks, dead stores, logic errors. |
|
||||
|
||||
### Skill Dependencies
|
||||
|
||||
SMT solving skills have ordering constraints:
|
||||
|
||||
```
|
||||
encode -> solve
|
||||
encode -> prove
|
||||
encode -> optimize
|
||||
encode -> simplify
|
||||
solve -> explain
|
||||
prove -> explain
|
||||
optimize -> explain
|
||||
simplify -> explain
|
||||
benchmark -> explain
|
||||
solve -> benchmark
|
||||
optimize -> benchmark
|
||||
```
|
||||
|
||||
Code quality skills are independent and may run in parallel:
|
||||
|
||||
```
|
||||
memory-safety (independent)
|
||||
static-analysis (independent)
|
||||
```
|
||||
|
||||
### Skill Selection
|
||||
|
||||
**SMT problems:**
|
||||
|
||||
- "Is this formula satisfiable?" : `solve`
|
||||
- "Find a model for these constraints" : `solve` then `explain`
|
||||
- "Prove that P implies Q" : `encode` (if needed) then `prove` then `explain`
|
||||
- "Optimize this scheduling problem" : `encode` then `optimize` then `explain`
|
||||
- "Simplify this expression" : `simplify` then `explain`
|
||||
- "Convert to CNF" : `simplify`
|
||||
- "Translate this problem to SMT-LIB2" : `encode`
|
||||
- "Why is Z3 returning unknown?" : `explain`
|
||||
- "Why is this query slow?" : `benchmark` then `explain`
|
||||
- "What does this model mean?" : `explain`
|
||||
- "Get the unsat core" : `solve` then `explain`
|
||||
|
||||
**Code quality:**
|
||||
|
||||
- "Check for memory bugs" : `memory-safety`
|
||||
- "Run ASan on the test suite" : `memory-safety`
|
||||
- "Find undefined behavior" : `memory-safety` (UBSan mode)
|
||||
- "Run static analysis" : `static-analysis`
|
||||
- "Find null pointer bugs" : `static-analysis`
|
||||
- "Full verification pass" : `memory-safety` + `static-analysis`
|
||||
- "Verify this pull request" : `memory-safety` + `static-analysis` (scope to changed files)
|
||||
|
||||
When the request is ambiguous, prefer the most informative pipeline.
|
||||
|
||||
### Examples
|
||||
|
||||
User: "Is (x > 0 and y > 0 and x + y < 1) satisfiable over the reals?"
|
||||
|
||||
1. **solve**: Assert the conjunction over real-valued variables. Run `(check-sat)`.
|
||||
2. **explain**: Present the model or state unsatisfiability.
|
||||
|
||||
User: "Prove that for all integers x, if x^2 is even then x is even."
|
||||
|
||||
1. **encode**: Formalize and negate the statement.
|
||||
2. **prove**: Check the negation for unsatisfiability.
|
||||
3. **explain**: Present the validity result or counterexample.
|
||||
|
||||
User: "Schedule five tasks on two machines to minimize makespan."
|
||||
|
||||
1. **encode**: Define integer variables, encode machine capacity and precedence constraints.
|
||||
2. **optimize**: Minimize the makespan variable.
|
||||
3. **explain**: Present the optimal schedule and binding constraints.
|
||||
|
||||
User: "Why is my bitvector query so slow?"
|
||||
|
||||
1. **benchmark**: Run with statistics collection.
|
||||
2. **explain**: Identify cost centers and suggest parameter adjustments.
|
||||
|
||||
User: "Check for memory bugs in the SAT solver."
|
||||
|
||||
1. **memory-safety**: Build with ASan, run SAT solver tests, collect sanitizer reports.
|
||||
2. Report findings with stack traces categorized by bug type.
|
||||
|
||||
User: "Full verification pass before the release."
|
||||
|
||||
1. Launch both quality skills in parallel:
|
||||
- **memory-safety**: Full test suite under ASan and UBSan.
|
||||
- **static-analysis**: Full source tree scan.
|
||||
2. Aggregate findings, deduplicate, sort by severity.
|
||||
|
||||
### Build Configurations
|
||||
|
||||
Code quality skills may require specific builds:
|
||||
|
||||
**memory-safety (ASan)**:
|
||||
```bash
|
||||
mkdir build-asan && cd build-asan
|
||||
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_BUILD_TYPE=Debug
|
||||
make -j$(nproc)
|
||||
```
|
||||
|
||||
**memory-safety (UBSan)**:
|
||||
```bash
|
||||
mkdir build-ubsan && cd build-ubsan
|
||||
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=undefined" -DCMAKE_C_FLAGS="-fsanitize=undefined" -DCMAKE_BUILD_TYPE=Debug
|
||||
make -j$(nproc)
|
||||
```
|
||||
|
||||
**static-analysis**:
|
||||
```bash
|
||||
mkdir build-analyze && cd build-analyze
|
||||
scan-build cmake .. -DCMAKE_BUILD_TYPE=Debug
|
||||
scan-build make -j$(nproc)
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
**unknown from Z3**: Check `(get-info :reason-unknown)`. If "incomplete," suggest alternative encodings. If "timeout," suggest parameter tuning or the **simplify** skill.
|
||||
|
||||
**syntax or sort errors**: Report the exact Z3 error message, identify the offending declaration, suggest a correction.
|
||||
|
||||
**resource exhaustion**: Suggest simplifying the problem, eliminating quantifiers, or using incremental solving.
|
||||
|
||||
**build failure**: Report compiler errors. Common cause: sanitizer flags incompatible with optimization levels.
|
||||
|
||||
**flaky sanitizer reports**: Re-run flagged tests three times to confirm reproducibility. Mark non-reproducible findings as "intermittent."
|
||||
|
||||
**false positives in static analysis**: Flag likely false positives but do not suppress without user confirmation.
|
||||
|
||||
### Notes
|
||||
|
||||
- Validate SMT-LIB2 syntax before invoking Z3.
|
||||
- Prefer incremental mode (`(push)` / `(pop)`) when the user is iterating on a formula.
|
||||
- Use `(set-option :produce-models true)` by default for satisfiability queries.
|
||||
- Collect statistics with `z3 -st` when performance is relevant.
|
||||
- Present models in readable table format, not raw S-expressions.
|
||||
- Sanitizer builds are slower than Release builds. Set timeouts to at least 3x normal.
|
||||
- Store code quality artifacts in `.z3-agent/`.
|
||||
- Never fabricate results or suppress findings.
|
||||
72
.github/skills/README.md
vendored
Normal file
72
.github/skills/README.md
vendored
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
# Z3 Agent Skills
|
||||
|
||||
Reusable, composable verification primitives for the Z3 theorem prover.
|
||||
Each skill is a self-contained unit: a `SKILL.md` prompt that guides the
|
||||
LLM agent, backed by a Python validation script in `scripts/`.
|
||||
|
||||
## Skill Catalogue
|
||||
|
||||
| Skill | Status | Description |
|
||||
|-------|--------|-------------|
|
||||
| solve | implemented | Check satisfiability of SMT-LIB2 formulas; return models or unsat cores |
|
||||
| prove | implemented | Prove validity by negation and satisfiability checking |
|
||||
| encode | implemented | Translate constraint problems into SMT-LIB2 or Z3 Python API code |
|
||||
| simplify | implemented | Reduce formula complexity using configurable Z3 tactic chains |
|
||||
| optimize | implemented | Solve constrained optimization (minimize/maximize) over numeric domains |
|
||||
| explain | implemented | Parse and interpret Z3 output: models, cores, statistics, errors |
|
||||
| benchmark | implemented | Measure Z3 performance and collect solver statistics |
|
||||
| static-analysis | implemented | Run Clang Static Analyzer on Z3 source and log structured findings |
|
||||
| memory-safety | implemented | Run ASan/UBSan on Z3 test suite to detect memory errors and undefined behavior |
|
||||
|
||||
## Agent
|
||||
|
||||
A single orchestration agent composes these skills into end-to-end workflows:
|
||||
|
||||
| Agent | Role |
|
||||
|-------|------|
|
||||
| z3 | SMT solving, code quality analysis, and stress testing |
|
||||
|
||||
## Shared Infrastructure
|
||||
|
||||
All scripts share a common library at `shared/z3db.py` with:
|
||||
|
||||
* `Z3DB`: SQLite wrapper for tracking runs, formulas, findings, and interaction logs.
|
||||
* `run_z3()`: Pipe SMT-LIB2 into `z3 -in` with timeout handling.
|
||||
* `find_z3()`: Locate the Z3 binary across build directories and PATH.
|
||||
* Parsers: `parse_model()`, `parse_stats()`, `parse_unsat_core()`.
|
||||
|
||||
The database schema lives in `shared/schema.sql`.
|
||||
|
||||
## Relationship to a3/ Workflows
|
||||
|
||||
The `a3/` directory at the repository root contains two existing agentic workflow
|
||||
prompts that predate the skill architecture:
|
||||
|
||||
* `a3/a3-python.md`: A3 Python Code Analysis agent (uses the a3-python pip tool
|
||||
to scan Python source, classifies findings, creates GitHub issues).
|
||||
* `a3/a3-rust.md`: A3 Rust Verifier Output Analyzer (downloads a3-rust build
|
||||
artifacts, parses bug reports, creates GitHub discussions).
|
||||
|
||||
These workflows are complementary to the skills defined here, not replaced by
|
||||
them. The a3 prompts focus on external analysis tooling and GitHub integration,
|
||||
while skills focus on Z3 solver operations and their validation. Both may be
|
||||
composed by the same orchestrating agent.
|
||||
|
||||
## Usage
|
||||
|
||||
Check database status and recent runs:
|
||||
|
||||
```
|
||||
python shared/z3db.py status
|
||||
python shared/z3db.py runs --skill solve --last 5
|
||||
python shared/z3db.py log --run-id 12
|
||||
python shared/z3db.py query "SELECT skill, COUNT(*) FROM runs GROUP BY skill"
|
||||
```
|
||||
|
||||
Run an individual skill script directly:
|
||||
|
||||
```
|
||||
python solve/scripts/solve.py --file problem.smt2
|
||||
python encode/scripts/encode.py --validate formula.smt2
|
||||
python benchmark/scripts/benchmark.py --file problem.smt2
|
||||
```
|
||||
79
.github/skills/benchmark/SKILL.md
vendored
Normal file
79
.github/skills/benchmark/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
---
|
||||
name: benchmark
|
||||
description: Measure Z3 performance on a formula or file. Collects wall-clock time, theory solver statistics, memory usage, and conflict counts. Results are logged to z3agent.db for longitudinal tracking.
|
||||
---
|
||||
|
||||
Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report performance characteristics. This is useful for identifying performance regressions, comparing tactic strategies, and profiling theory solver workload distribution.
|
||||
|
||||
# Step 1: Run Z3 with statistics
|
||||
|
||||
Action:
|
||||
Invoke benchmark.py with the formula or file. Use `--runs N` for
|
||||
repeated timing.
|
||||
|
||||
Expectation:
|
||||
The script invokes `z3 -st`, parses the statistics block, and prints
|
||||
a performance summary. A run entry is logged to z3agent.db.
|
||||
|
||||
Result:
|
||||
Timing and statistics are displayed. Proceed to Step 2 to interpret.
|
||||
|
||||
```bash
|
||||
python3 scripts/benchmark.py --file problem.smt2
|
||||
python3 scripts/benchmark.py --file problem.smt2 --runs 5
|
||||
python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug
|
||||
```
|
||||
|
||||
# Step 2: Interpret the output
|
||||
|
||||
Action:
|
||||
Review wall-clock time, memory usage, conflict counts, and per-theory
|
||||
breakdowns.
|
||||
|
||||
Expectation:
|
||||
A complete performance profile including min/median/max timing when
|
||||
multiple runs are requested.
|
||||
|
||||
Result:
|
||||
If performance is acceptable, no action needed.
|
||||
If slow, try **simplify** to reduce the formula or adjust tactic strategies.
|
||||
|
||||
The output includes:
|
||||
|
||||
- wall-clock time (ms)
|
||||
- result (sat/unsat/unknown/timeout)
|
||||
- memory usage (MB)
|
||||
- conflicts, decisions, propagations
|
||||
- per-theory breakdown (arithmetic, bv, array, etc.)
|
||||
|
||||
With `--runs N`, the script runs Z3 N times and reports min/median/max timing.
|
||||
|
||||
# Step 3: Compare over time
|
||||
|
||||
Action:
|
||||
Query past benchmark runs from z3agent.db to detect regressions or
|
||||
improvements.
|
||||
|
||||
Expectation:
|
||||
Historical run data is available for comparison, ordered by recency.
|
||||
|
||||
Result:
|
||||
If performance regressed, investigate recent formula or tactic changes.
|
||||
If improved, record the successful configuration.
|
||||
|
||||
```bash
|
||||
python3 ../../shared/z3db.py runs --skill benchmark --last 20
|
||||
python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5"
|
||||
```
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula |
|
||||
| file | path | no | | path to .smt2 file |
|
||||
| runs | int | no | 1 | number of repeated runs for timing |
|
||||
| timeout | int | no | 60 | seconds per run |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
80
.github/skills/benchmark/scripts/benchmark.py
vendored
Normal file
80
.github/skills/benchmark/scripts/benchmark.py
vendored
Normal file
|
|
@ -0,0 +1,80 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
benchmark.py: measure Z3 performance with statistics.
|
||||
|
||||
Usage:
|
||||
python benchmark.py --file problem.smt2
|
||||
python benchmark.py --file problem.smt2 --runs 5
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import statistics
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_stats, setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="benchmark")
|
||||
parser.add_argument("--formula")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--runs", type=int, default=1)
|
||||
parser.add_argument("--timeout", type=int, default=60)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
formula = args.formula
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
timings = []
|
||||
|
||||
for i in range(args.runs):
|
||||
run_id = db.start_run("benchmark", formula)
|
||||
result = run_z3(
|
||||
formula,
|
||||
z3_bin=args.z3,
|
||||
timeout=args.timeout,
|
||||
args=["-st"],
|
||||
debug=args.debug,
|
||||
)
|
||||
|
||||
stats = parse_stats(result["stdout"])
|
||||
db.log_formula(run_id, formula, result["result"], stats=stats)
|
||||
db.finish_run(
|
||||
run_id, result["result"], result["duration_ms"], result["exit_code"]
|
||||
)
|
||||
timings.append(result["duration_ms"])
|
||||
|
||||
if args.runs == 1:
|
||||
print(f"result: {result['result']}")
|
||||
print(f"time: {result['duration_ms']}ms")
|
||||
if stats:
|
||||
print("statistics:")
|
||||
for k, v in sorted(stats.items()):
|
||||
print(f" :{k} {v}")
|
||||
|
||||
if args.runs > 1:
|
||||
print(f"runs: {args.runs}")
|
||||
print(f"min: {min(timings)}ms")
|
||||
print(f"median: {statistics.median(timings):.0f}ms")
|
||||
print(f"max: {max(timings)}ms")
|
||||
print(f"result: {result['result']}")
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
73
.github/skills/encode/SKILL.md
vendored
Normal file
73
.github/skills/encode/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
---
|
||||
name: encode
|
||||
description: Translate constraint problems into SMT-LIB2 or Z3 Python API code. Handles common problem classes including scheduling, graph coloring, arithmetic puzzles, and verification conditions.
|
||||
---
|
||||
|
||||
Given a problem description (natural language, pseudocode, or a partial formulation), produce a complete, syntactically valid SMT-LIB2 encoding or Z3 Python script. The encoding should declare all variables, assert all constraints, and include the appropriate check-sat / get-model commands.
|
||||
|
||||
# Step 1: Identify the problem class
|
||||
|
||||
Action:
|
||||
Determine the SMT theory and variable sorts required by the problem
|
||||
description.
|
||||
|
||||
Expectation:
|
||||
A clear mapping from the problem to one of the supported theories
|
||||
(LIA, LRA, QF_BV, etc.).
|
||||
|
||||
Result:
|
||||
If the theory is identified, proceed to Step 2. If the problem spans
|
||||
multiple theories, select the appropriate combined logic.
|
||||
|
||||
| Problem class | Theory | Typical sorts |
|
||||
|---------------|--------|---------------|
|
||||
| Integer arithmetic | LIA / NIA | Int |
|
||||
| Real arithmetic | LRA / NRA | Real |
|
||||
| Bitvector operations | QF_BV | (_ BitVec N) |
|
||||
| Arrays and maps | QF_AX | (Array Int Int) |
|
||||
| Strings and regex | QF_S | String, RegLan |
|
||||
| Uninterpreted functions | QF_UF | custom sorts |
|
||||
| Mixed theories | AUFLIA, etc. | combination |
|
||||
|
||||
# Step 2: Generate the encoding
|
||||
|
||||
Action:
|
||||
Invoke encode.py with the problem description and desired output format.
|
||||
|
||||
Expectation:
|
||||
The script produces a complete SMT-LIB2 file or Z3 Python script with
|
||||
all declarations, constraints, and check-sat commands.
|
||||
|
||||
Result:
|
||||
For `smtlib2` format: pass the output to **solve**.
|
||||
For `python` format: execute the script directly.
|
||||
Proceed to Step 3 for validation.
|
||||
|
||||
```bash
|
||||
python3 scripts/encode.py --problem "Find integers x, y such that x^2 + y^2 = 25 and x > 0" --format smtlib2
|
||||
python3 scripts/encode.py --problem "Schedule 4 tasks on 2 machines minimizing makespan" --format python
|
||||
```
|
||||
|
||||
# Step 3: Validate the encoding
|
||||
|
||||
Action:
|
||||
The script runs a syntax check by piping the output through `z3 -in`
|
||||
in parse-only mode.
|
||||
|
||||
Expectation:
|
||||
No parse errors. If errors occur, the offending line is reported.
|
||||
|
||||
Result:
|
||||
On success: the encoding is ready for **solve**, **prove**, or **optimize**.
|
||||
On parse error: fix the reported line and re-run.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| problem | string | yes | | problem description |
|
||||
| format | string | no | smtlib2 | output format: smtlib2 or python |
|
||||
| output | path | no | stdout | write to file instead of stdout |
|
||||
| validate | flag | no | on | run syntax check on the output |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
149
.github/skills/encode/scripts/encode.py
vendored
Normal file
149
.github/skills/encode/scripts/encode.py
vendored
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
encode.py: validate and format SMT-LIB2 encodings.
|
||||
|
||||
Usage:
|
||||
python encode.py --validate formula.smt2
|
||||
python encode.py --validate formula.smt2 --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, setup_logging
|
||||
|
||||
VALIDATION_TIMEOUT = 5
|
||||
|
||||
|
||||
def read_input(path_or_stdin: str) -> str:
|
||||
"""Read formula from a file path or stdin (when path is '-')."""
|
||||
if path_or_stdin == "-":
|
||||
return sys.stdin.read()
|
||||
p = Path(path_or_stdin)
|
||||
if not p.exists():
|
||||
print(f"file not found: {p}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return p.read_text()
|
||||
|
||||
|
||||
def find_errors(output: str) -> list:
|
||||
"""Extract (error ...) messages from Z3 output."""
|
||||
return re.findall(r'\(error\s+"([^"]+)"\)', output)
|
||||
|
||||
|
||||
def validate(formula: str, z3_bin: str = None, debug: bool = False) -> dict:
|
||||
"""
|
||||
Validate an SMT-LIB2 formula by piping it through z3 -in.
|
||||
Returns a dict with 'valid' (bool), 'errors' (list), and 'raw' output.
|
||||
"""
|
||||
result = run_z3(
|
||||
formula,
|
||||
z3_bin=z3_bin,
|
||||
timeout=VALIDATION_TIMEOUT,
|
||||
debug=debug,
|
||||
)
|
||||
errors = find_errors(result["stdout"]) + find_errors(result["stderr"])
|
||||
|
||||
if result["result"] == "timeout":
|
||||
# Timeout during validation is not a syntax error: the formula
|
||||
# parsed successfully but solving exceeded the limit. That counts
|
||||
# as syntactically valid.
|
||||
return {"valid": True, "errors": [], "raw": result}
|
||||
|
||||
if errors or result["exit_code"] != 0:
|
||||
return {"valid": False, "errors": errors, "raw": result}
|
||||
|
||||
return {"valid": True, "errors": [], "raw": result}
|
||||
|
||||
|
||||
def report_errors(errors: list, formula: str):
|
||||
"""Print each syntax error with surrounding context."""
|
||||
lines = formula.splitlines()
|
||||
print(f"validation failed: {len(errors)} error(s)", file=sys.stderr)
|
||||
for err in errors:
|
||||
print(f" : {err}", file=sys.stderr)
|
||||
if len(lines) <= 20:
|
||||
print("formula:", file=sys.stderr)
|
||||
for i, line in enumerate(lines, 1):
|
||||
print(f" {i:3d} {line}", file=sys.stderr)
|
||||
|
||||
|
||||
def write_output(formula: str, output_path: str, fmt: str):
|
||||
"""Write the validated formula to a file or stdout."""
|
||||
if fmt == "python":
|
||||
print(
|
||||
"python format output is generated by the agent, " "not by this script",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
if output_path:
|
||||
Path(output_path).write_text(formula)
|
||||
print(f"written to {output_path}")
|
||||
else:
|
||||
print(formula)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="encode")
|
||||
parser.add_argument(
|
||||
"--validate",
|
||||
metavar="FILE",
|
||||
help="path to .smt2 file to validate, or '-' for stdin",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--format",
|
||||
choices=["smtlib2", "python"],
|
||||
default="smtlib2",
|
||||
help="output format (default: smtlib2)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
metavar="FILE",
|
||||
default=None,
|
||||
help="write result to file instead of stdout",
|
||||
)
|
||||
parser.add_argument("--z3", default=None, help="path to z3 binary")
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if not args.validate:
|
||||
parser.error("provide --validate FILE")
|
||||
return
|
||||
|
||||
formula = read_input(args.validate)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("encode", formula)
|
||||
|
||||
result = validate(formula, z3_bin=args.z3, debug=args.debug)
|
||||
|
||||
if result["valid"]:
|
||||
db.log_formula(run_id, formula, "valid")
|
||||
db.finish_run(run_id, "valid", result["raw"]["duration_ms"], 0)
|
||||
write_output(formula, args.output, args.format)
|
||||
db.close()
|
||||
sys.exit(0)
|
||||
else:
|
||||
db.log_formula(run_id, formula, "error")
|
||||
for err in result["errors"]:
|
||||
db.log_finding(run_id, "syntax", err, severity="error")
|
||||
db.finish_run(
|
||||
run_id,
|
||||
"error",
|
||||
result["raw"]["duration_ms"],
|
||||
result["raw"]["exit_code"],
|
||||
)
|
||||
report_errors(result["errors"], formula)
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
83
.github/skills/explain/SKILL.md
vendored
Normal file
83
.github/skills/explain/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
---
|
||||
name: explain
|
||||
description: Parse and interpret Z3 output for human consumption. Handles models, unsat cores, proofs, statistics, and error messages. Translates solver internals into plain-language explanations.
|
||||
---
|
||||
|
||||
Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark** skills), produce a structured explanation. This skill is for cases where the solver output is large, nested, or otherwise difficult to read directly.
|
||||
|
||||
# Step 1: Identify the output type
|
||||
|
||||
Action:
|
||||
Determine the category of Z3 output to explain: model, core,
|
||||
statistics, error, or proof.
|
||||
|
||||
Expectation:
|
||||
The output type maps to one of the recognized formats in the table below.
|
||||
|
||||
Result:
|
||||
If the type is ambiguous, use `--type auto` and let the script detect it.
|
||||
Proceed to Step 2.
|
||||
|
||||
| Output contains | Explanation type |
|
||||
|----------------|-----------------|
|
||||
| `(define-fun ...)` blocks | model explanation |
|
||||
| unsat core labels | conflict explanation |
|
||||
| `:key value` statistics | performance breakdown |
|
||||
| `(error ...)` | error diagnosis |
|
||||
| proof terms | proof sketch |
|
||||
|
||||
# Step 2: Run the explainer
|
||||
|
||||
Action:
|
||||
Invoke explain.py with the output file or stdin.
|
||||
|
||||
Expectation:
|
||||
The script auto-detects the output type and produces a structured
|
||||
plain-language summary.
|
||||
|
||||
Result:
|
||||
A formatted explanation is printed. If detection fails, re-run with
|
||||
an explicit `--type` flag.
|
||||
|
||||
```bash
|
||||
python3 scripts/explain.py --file output.txt
|
||||
python3 scripts/explain.py --stdin < output.txt
|
||||
python3 scripts/explain.py --file output.txt --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret the explanation
|
||||
|
||||
Action:
|
||||
Review the structured explanation for accuracy and completeness.
|
||||
|
||||
Expectation:
|
||||
Models list each variable with its value and sort. Cores list
|
||||
conflicting assertions. Statistics show time and memory breakdowns.
|
||||
|
||||
Result:
|
||||
Use the explanation to answer the user query or to guide the next
|
||||
skill invocation.
|
||||
|
||||
For models:
|
||||
- Each variable is listed with its value and sort
|
||||
- Array and function interpretations are expanded
|
||||
- Bitvector values are shown in decimal and hex
|
||||
|
||||
For unsat cores:
|
||||
- The conflicting named assertions are listed
|
||||
- A minimal conflict set is highlighted
|
||||
|
||||
For statistics:
|
||||
- Time breakdown by phase (preprocessing, solving, model construction)
|
||||
- Theory solver load distribution
|
||||
- Memory high-water mark
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| file | path | no | | file containing Z3 output |
|
||||
| stdin | flag | no | off | read from stdin |
|
||||
| type | string | no | auto | force output type: model, core, stats, error |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
129
.github/skills/explain/scripts/explain.py
vendored
Normal file
129
.github/skills/explain/scripts/explain.py
vendored
Normal file
|
|
@ -0,0 +1,129 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
explain.py: interpret Z3 output in a readable form.
|
||||
|
||||
Usage:
|
||||
python explain.py --file output.txt
|
||||
echo "sat\n(model ...)" | python explain.py --stdin
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, parse_model, parse_stats, parse_unsat_core, setup_logging
|
||||
|
||||
|
||||
def detect_type(text: str) -> str:
|
||||
if "(define-fun" in text:
|
||||
return "model"
|
||||
if "(error" in text:
|
||||
return "error"
|
||||
if re.search(r":\S+\s+[\d.]+", text):
|
||||
return "stats"
|
||||
first = text.strip().split("\n")[0].strip()
|
||||
if first == "unsat":
|
||||
return "core"
|
||||
return "unknown"
|
||||
|
||||
|
||||
def explain_model(text: str):
|
||||
model = parse_model(text)
|
||||
if not model:
|
||||
print("no model found in output")
|
||||
return
|
||||
print("satisfying assignment:")
|
||||
for name, val in model.items():
|
||||
# show hex for large integers (likely bitvectors)
|
||||
try:
|
||||
n = int(val)
|
||||
if abs(n) > 255:
|
||||
print(f" {name} = {val} (0x{n:x})")
|
||||
else:
|
||||
print(f" {name} = {val}")
|
||||
except ValueError:
|
||||
print(f" {name} = {val}")
|
||||
|
||||
|
||||
def explain_core(text: str):
|
||||
core = parse_unsat_core(text)
|
||||
if core:
|
||||
print(f"conflicting assertions ({len(core)}):")
|
||||
for label in core:
|
||||
print(f" {label}")
|
||||
else:
|
||||
print("unsat (no named assertions for core extraction)")
|
||||
|
||||
|
||||
def explain_stats(text: str):
|
||||
stats = parse_stats(text)
|
||||
if not stats:
|
||||
print("no statistics found")
|
||||
return
|
||||
print("performance breakdown:")
|
||||
for k in sorted(stats):
|
||||
print(f" :{k} {stats[k]}")
|
||||
|
||||
if "time" in stats:
|
||||
print(f"\ntotal time: {stats['time']}s")
|
||||
if "memory" in stats:
|
||||
print(f"peak memory: {stats['memory']} MB")
|
||||
|
||||
|
||||
def explain_error(text: str):
|
||||
errors = re.findall(r'\(error\s+"([^"]+)"\)', text)
|
||||
if errors:
|
||||
print(f"Z3 reported {len(errors)} error(s):")
|
||||
for e in errors:
|
||||
print(f" {e}")
|
||||
else:
|
||||
print("error in output but could not parse message")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="explain")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--stdin", action="store_true")
|
||||
parser.add_argument(
|
||||
"--type", choices=["model", "core", "stats", "error", "auto"], default="auto"
|
||||
)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
text = Path(args.file).read_text()
|
||||
elif args.stdin:
|
||||
text = sys.stdin.read()
|
||||
else:
|
||||
parser.error("provide --file or --stdin")
|
||||
return
|
||||
|
||||
output_type = args.type if args.type != "auto" else detect_type(text)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("explain", text[:200])
|
||||
|
||||
if output_type == "model":
|
||||
explain_model(text)
|
||||
elif output_type == "core":
|
||||
explain_core(text)
|
||||
elif output_type == "stats":
|
||||
explain_stats(text)
|
||||
elif output_type == "error":
|
||||
explain_error(text)
|
||||
else:
|
||||
print("could not determine output type")
|
||||
print("raw output:")
|
||||
print(text[:500])
|
||||
|
||||
db.finish_run(run_id, "success", 0, 0)
|
||||
db.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
92
.github/skills/memory-safety/SKILL.md
vendored
Normal file
92
.github/skills/memory-safety/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
---
|
||||
name: memory-safety
|
||||
description: Run AddressSanitizer and UndefinedBehaviorSanitizer on the Z3 test suite to detect memory errors, undefined behavior, and leaks. Logs each finding to z3agent.db.
|
||||
---
|
||||
|
||||
Build Z3 with compiler-based sanitizer instrumentation, execute the test suite, and parse the runtime output for memory safety violations. Supported sanitizers are AddressSanitizer (heap and stack buffer overflows, use-after-free, double-free, memory leaks) and UndefinedBehaviorSanitizer (signed integer overflow, null pointer dereference, misaligned access, shift errors). Findings are deduplicated and stored in z3agent.db for triage and longitudinal tracking.
|
||||
|
||||
# Step 1: Configure and build
|
||||
|
||||
Action:
|
||||
Invoke the script with the desired sanitizer flag. The script calls
|
||||
cmake with the appropriate `-fsanitize` flags and builds the `test-z3`
|
||||
target. Each sanitizer uses a separate build directory to avoid flag
|
||||
conflicts.
|
||||
|
||||
Expectation:
|
||||
cmake configures successfully and make compiles the instrumented binary.
|
||||
If a prior build exists with matching flags, only incremental compilation
|
||||
runs.
|
||||
|
||||
Result:
|
||||
On success: an instrumented `test-z3` binary is ready in the build
|
||||
directory. Proceed to Step 2.
|
||||
On failure: verify compiler support for the requested sanitizer flags
|
||||
and review cmake output.
|
||||
|
||||
```bash
|
||||
python3 scripts/memory_safety.py --sanitizer asan
|
||||
python3 scripts/memory_safety.py --sanitizer ubsan
|
||||
python3 scripts/memory_safety.py --sanitizer both
|
||||
```
|
||||
|
||||
To reuse an existing build:
|
||||
```bash
|
||||
python3 scripts/memory_safety.py --sanitizer asan --skip-build --build-dir build/sanitizer-asan
|
||||
```
|
||||
|
||||
# Step 2: Run and collect
|
||||
|
||||
Action:
|
||||
Execute the instrumented test binary with halt_on_error=0 so all
|
||||
violations are reported rather than aborting on the first.
|
||||
|
||||
Expectation:
|
||||
The script parses AddressSanitizer, UndefinedBehaviorSanitizer, and
|
||||
LeakSanitizer patterns from combined output, extracts source locations,
|
||||
and deduplicates by category/file/line.
|
||||
|
||||
Result:
|
||||
On `clean`: no violations detected.
|
||||
On `findings`: one or more violations found, each printed with severity,
|
||||
category, message, and source location.
|
||||
On `timeout`: test suite did not finish; increase timeout or investigate.
|
||||
On `error`: build or execution failed before sanitizer output.
|
||||
|
||||
```bash
|
||||
python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret results
|
||||
|
||||
Action:
|
||||
Review printed findings and query z3agent.db for historical comparison.
|
||||
|
||||
Expectation:
|
||||
Each finding includes severity, category, message, and source location.
|
||||
The database query returns prior runs for trend analysis.
|
||||
|
||||
Result:
|
||||
On `clean`: no action required; proceed.
|
||||
On `findings`: triage by severity and category. Compare against prior
|
||||
runs to distinguish new regressions from known issues.
|
||||
On `timeout`: increase the deadline or investigate a possible infinite
|
||||
loop.
|
||||
On `error`: inspect build logs before re-running.
|
||||
|
||||
Query past runs:
|
||||
```bash
|
||||
python3 ../../shared/z3db.py runs --skill memory-safety --last 10
|
||||
python3 ../../shared/z3db.py query "SELECT category, severity, file, line, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='memory-safety') ORDER BY run_id DESC LIMIT 20"
|
||||
```
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| sanitizer | choice | no | asan | which sanitizer to enable: asan, ubsan, or both |
|
||||
| build-dir | path | no | build/sanitizer-{name} | path to the build directory |
|
||||
| timeout | int | no | 600 | seconds before killing the test run |
|
||||
| skip-build | flag | no | off | reuse an existing instrumented build |
|
||||
| debug | flag | no | off | verbose cmake, make, and test output |
|
||||
| db | path | no | .z3-agent/z3agent.db | path to the logging database |
|
||||
287
.github/skills/memory-safety/scripts/memory_safety.py
vendored
Normal file
287
.github/skills/memory-safety/scripts/memory_safety.py
vendored
Normal file
|
|
@ -0,0 +1,287 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
memory_safety.py: run sanitizer checks on Z3 test suite.
|
||||
|
||||
Usage:
|
||||
python memory_safety.py --sanitizer asan
|
||||
python memory_safety.py --sanitizer ubsan --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, setup_logging
|
||||
|
||||
logger = logging.getLogger("z3agent")
|
||||
|
||||
SANITIZER_FLAGS = {
|
||||
"asan": "-fsanitize=address -fno-omit-frame-pointer",
|
||||
"ubsan": "-fsanitize=undefined -fno-omit-frame-pointer",
|
||||
}
|
||||
|
||||
ASAN_ERROR = re.compile(r"ERROR:\s*AddressSanitizer:\s*(\S+)")
|
||||
UBSAN_ERROR = re.compile(r":\d+:\d+:\s*runtime error:\s*(.+)")
|
||||
LEAK_ERROR = re.compile(r"ERROR:\s*LeakSanitizer:")
|
||||
LOCATION = re.compile(r"(\S+\.(?:cpp|c|h|hpp)):(\d+)")
|
||||
|
||||
|
||||
def check_dependencies():
|
||||
"""Fail early if required build tools are not on PATH."""
|
||||
missing = []
|
||||
if not shutil.which("cmake"):
|
||||
missing.append(("cmake", "sudo apt install cmake"))
|
||||
if not shutil.which("make"):
|
||||
missing.append(("make", "sudo apt install build-essential"))
|
||||
|
||||
cc = shutil.which("clang") or shutil.which("gcc")
|
||||
if not cc:
|
||||
missing.append(("clang or gcc", "sudo apt install clang"))
|
||||
|
||||
if missing:
|
||||
print("required tools not found:", file=sys.stderr)
|
||||
for tool, install in missing:
|
||||
print(f" {tool}: {install}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def find_repo_root() -> Path:
|
||||
d = Path.cwd()
|
||||
for _ in range(10):
|
||||
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
|
||||
return d
|
||||
parent = d.parent
|
||||
if parent == d:
|
||||
break
|
||||
d = parent
|
||||
logger.error("could not locate Z3 repository root")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def build_is_configured(build_dir: Path, sanitizer: str) -> bool:
|
||||
"""Check whether the build directory already has a matching cmake config."""
|
||||
cache = build_dir / "CMakeCache.txt"
|
||||
if not cache.is_file():
|
||||
return False
|
||||
expected = SANITIZER_FLAGS[sanitizer].split()[0]
|
||||
return expected in cache.read_text()
|
||||
|
||||
|
||||
def configure(build_dir: Path, sanitizer: str, repo_root: Path) -> bool:
|
||||
"""Run cmake with the requested sanitizer flags."""
|
||||
flags = SANITIZER_FLAGS[sanitizer]
|
||||
build_dir.mkdir(parents=True, exist_ok=True)
|
||||
cmd = [
|
||||
"cmake", str(repo_root),
|
||||
f"-DCMAKE_C_FLAGS={flags}",
|
||||
f"-DCMAKE_CXX_FLAGS={flags}",
|
||||
f"-DCMAKE_EXE_LINKER_FLAGS={flags}",
|
||||
"-DCMAKE_BUILD_TYPE=Debug",
|
||||
"-DZ3_BUILD_TEST=ON",
|
||||
]
|
||||
logger.info("configuring %s build in %s", sanitizer, build_dir)
|
||||
logger.debug("cmake command: %s", " ".join(cmd))
|
||||
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
|
||||
if proc.returncode != 0:
|
||||
logger.error("cmake failed:\n%s", proc.stderr)
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def compile_tests(build_dir: Path) -> bool:
|
||||
"""Compile the test-z3 target."""
|
||||
nproc = os.cpu_count() or 4
|
||||
cmd = ["make", f"-j{nproc}", "test-z3"]
|
||||
logger.info("compiling test-z3 (%d parallel jobs)", nproc)
|
||||
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
|
||||
if proc.returncode != 0:
|
||||
logger.error("compilation failed:\n%s", proc.stderr[-2000:])
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def run_tests(build_dir: Path, timeout: int) -> dict:
|
||||
"""Execute test-z3 under sanitizer runtime and capture output."""
|
||||
test_bin = build_dir / "test-z3"
|
||||
if not test_bin.is_file():
|
||||
logger.error("test-z3 not found at %s", test_bin)
|
||||
return {"stdout": "", "stderr": "binary not found", "exit_code": -1,
|
||||
"duration_ms": 0}
|
||||
|
||||
env = os.environ.copy()
|
||||
env["ASAN_OPTIONS"] = "detect_leaks=1:halt_on_error=0:print_stacktrace=1"
|
||||
env["UBSAN_OPTIONS"] = "print_stacktrace=1:halt_on_error=0"
|
||||
|
||||
cmd = [str(test_bin), "/a"]
|
||||
logger.info("running: %s", " ".join(cmd))
|
||||
start = time.monotonic()
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd, capture_output=True, text=True, timeout=timeout,
|
||||
cwd=build_dir, env=env,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
ms = int((time.monotonic() - start) * 1000)
|
||||
logger.warning("test-z3 timed out after %dms", ms)
|
||||
return {"stdout": "", "stderr": "timeout", "exit_code": -1,
|
||||
"duration_ms": ms}
|
||||
|
||||
ms = int((time.monotonic() - start) * 1000)
|
||||
logger.debug("exit_code=%d duration=%dms", proc.returncode, ms)
|
||||
return {
|
||||
"stdout": proc.stdout,
|
||||
"stderr": proc.stderr,
|
||||
"exit_code": proc.returncode,
|
||||
"duration_ms": ms,
|
||||
}
|
||||
|
||||
|
||||
def parse_findings(output: str) -> list:
|
||||
"""Extract sanitizer error reports from combined stdout and stderr."""
|
||||
findings = []
|
||||
lines = output.split("\n")
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
entry = None
|
||||
|
||||
m = ASAN_ERROR.search(line)
|
||||
if m:
|
||||
entry = {"category": "asan", "message": m.group(1),
|
||||
"severity": "high"}
|
||||
|
||||
if not entry:
|
||||
m = LEAK_ERROR.search(line)
|
||||
if m:
|
||||
entry = {"category": "leak",
|
||||
"message": "detected memory leaks",
|
||||
"severity": "high"}
|
||||
|
||||
if not entry:
|
||||
m = UBSAN_ERROR.search(line)
|
||||
if m:
|
||||
entry = {"category": "ubsan", "message": m.group(1),
|
||||
"severity": "medium"}
|
||||
|
||||
if not entry:
|
||||
continue
|
||||
|
||||
file_path, line_no = None, None
|
||||
window = lines[max(0, i - 2):i + 5]
|
||||
for ctx in window:
|
||||
loc = LOCATION.search(ctx)
|
||||
if loc and "/usr/" not in loc.group(1):
|
||||
file_path = loc.group(1)
|
||||
line_no = int(loc.group(2))
|
||||
break
|
||||
|
||||
entry["file"] = file_path
|
||||
entry["line"] = line_no
|
||||
entry["raw"] = line.strip()
|
||||
findings.append(entry)
|
||||
|
||||
return findings
|
||||
|
||||
|
||||
def deduplicate(findings: list) -> list:
|
||||
"""Remove duplicate reports at the same category, file, and line."""
|
||||
seen = set()
|
||||
result = []
|
||||
for f in findings:
|
||||
key = (f["category"], f["file"], f["line"], f["message"])
|
||||
if key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
result.append(f)
|
||||
return result
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="memory-safety")
|
||||
parser.add_argument("--sanitizer", choices=["asan", "ubsan", "both"],
|
||||
default="asan",
|
||||
help="sanitizer to enable (default: asan)")
|
||||
parser.add_argument("--build-dir", default=None,
|
||||
help="path to build directory")
|
||||
parser.add_argument("--timeout", type=int, default=600,
|
||||
help="seconds before killing test run")
|
||||
parser.add_argument("--skip-build", action="store_true",
|
||||
help="reuse existing instrumented build")
|
||||
parser.add_argument("--db", default=None,
|
||||
help="path to z3agent.db")
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
check_dependencies()
|
||||
repo_root = find_repo_root()
|
||||
|
||||
sanitizers = ["asan", "ubsan"] if args.sanitizer == "both" else [args.sanitizer]
|
||||
all_findings = []
|
||||
|
||||
db = Z3DB(args.db)
|
||||
|
||||
for san in sanitizers:
|
||||
if args.build_dir:
|
||||
build_dir = Path(args.build_dir)
|
||||
else:
|
||||
build_dir = repo_root / "build" / f"sanitizer-{san}"
|
||||
|
||||
run_id = db.start_run("memory-safety", f"sanitizer={san}")
|
||||
db.log(f"sanitizer: {san}, build: {build_dir}", run_id=run_id)
|
||||
|
||||
if not args.skip_build:
|
||||
needs_configure = not build_is_configured(build_dir, san)
|
||||
if needs_configure and not configure(build_dir, san, repo_root):
|
||||
db.finish_run(run_id, "error", 0, exit_code=1)
|
||||
print(f"FAIL: cmake configuration failed for {san}")
|
||||
continue
|
||||
if not compile_tests(build_dir):
|
||||
db.finish_run(run_id, "error", 0, exit_code=1)
|
||||
print(f"FAIL: compilation failed for {san}")
|
||||
continue
|
||||
|
||||
result = run_tests(build_dir, args.timeout)
|
||||
combined = result["stdout"] + "\n" + result["stderr"]
|
||||
findings = deduplicate(parse_findings(combined))
|
||||
|
||||
for f in findings:
|
||||
db.log_finding(
|
||||
run_id,
|
||||
category=f["category"],
|
||||
message=f["message"],
|
||||
severity=f["severity"],
|
||||
file=f["file"],
|
||||
line=f["line"],
|
||||
details={"raw": f["raw"]},
|
||||
)
|
||||
|
||||
status = "clean" if not findings else "findings"
|
||||
if result["exit_code"] == -1:
|
||||
status = "timeout" if "timeout" in result["stderr"] else "error"
|
||||
|
||||
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
|
||||
all_findings.extend(findings)
|
||||
print(f"{san}: {len(findings)} finding(s), {result['duration_ms']}ms")
|
||||
|
||||
if all_findings:
|
||||
print(f"\nTotal: {len(all_findings)} finding(s)")
|
||||
for f in all_findings:
|
||||
loc = f"{f['file']}:{f['line']}" if f["file"] else "unknown location"
|
||||
print(f" [{f['severity']}] {f['category']}: {f['message']} at {loc}")
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("\nNo sanitizer findings.")
|
||||
db.close()
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
75
.github/skills/optimize/SKILL.md
vendored
Normal file
75
.github/skills/optimize/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
---
|
||||
name: optimize
|
||||
description: Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.
|
||||
---
|
||||
|
||||
Given a set of constraints and an objective function, find the optimal value. Z3 supports both hard constraints (must hold) and soft constraints (weighted preferences), as well as lexicographic multi-objective optimization.
|
||||
|
||||
# Step 1: Formulate the problem
|
||||
|
||||
Action:
|
||||
Write constraints and an objective using `(minimize ...)` or
|
||||
`(maximize ...)` directives, followed by `(check-sat)` and `(get-model)`.
|
||||
|
||||
Expectation:
|
||||
A valid SMT-LIB2 formula with at least one optimization directive and
|
||||
all variables declared.
|
||||
|
||||
Result:
|
||||
If the formula is well-formed, proceed to Step 2. For multi-objective
|
||||
problems, list directives in priority order for lexicographic optimization.
|
||||
|
||||
Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`:
|
||||
```smtlib
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (>= x 1))
|
||||
(assert (>= y 2))
|
||||
(assert (<= (+ x y) 10))
|
||||
(minimize (+ x y))
|
||||
(check-sat)
|
||||
(get-model)
|
||||
```
|
||||
|
||||
# Step 2: Run the optimizer
|
||||
|
||||
Action:
|
||||
Invoke optimize.py with the formula or file path.
|
||||
|
||||
Expectation:
|
||||
The script prints `sat` with the optimal assignment, `unsat`, `unknown`,
|
||||
or `timeout`. A run entry is logged to z3agent.db.
|
||||
|
||||
Result:
|
||||
On `sat`: proceed to Step 3 to read the optimal values.
|
||||
On `unsat` or `timeout`: check constraints for contradictions or simplify.
|
||||
|
||||
```bash
|
||||
python3 scripts/optimize.py --file scheduling.smt2
|
||||
python3 scripts/optimize.py --formula "<inline smt-lib2>" --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
Action:
|
||||
Parse the objective value and satisfying assignment from the output.
|
||||
|
||||
Expectation:
|
||||
`sat` with a model containing the optimal value, `unsat` indicating
|
||||
infeasibility, or `unknown`/`timeout`.
|
||||
|
||||
Result:
|
||||
On `sat`: report the optimal value and assignment.
|
||||
On `unsat`: the constraints are contradictory, no feasible solution exists.
|
||||
On `unknown`/`timeout`: relax constraints or try **simplify**.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula with minimize/maximize |
|
||||
| file | path | no | | path to .smt2 file |
|
||||
| timeout | int | no | 60 | seconds |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
58
.github/skills/optimize/scripts/optimize.py
vendored
Normal file
58
.github/skills/optimize/scripts/optimize.py
vendored
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
optimize.py: solve constrained optimization problems via Z3.
|
||||
|
||||
Usage:
|
||||
python optimize.py --file scheduling.smt2
|
||||
python optimize.py --formula "(declare-const x Int)..." --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_model, setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="optimize")
|
||||
parser.add_argument("--formula")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--timeout", type=int, default=60)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
formula = args.formula
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("optimize", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
|
||||
|
||||
db.log_formula(run_id, formula, result["result"], str(model) if model else None)
|
||||
db.finish_run(run_id, result["result"], result["duration_ms"], result["exit_code"])
|
||||
|
||||
print(result["result"])
|
||||
if model:
|
||||
for name, val in model.items():
|
||||
print(f" {name} = {val}")
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
83
.github/skills/prove/SKILL.md
vendored
Normal file
83
.github/skills/prove/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
---
|
||||
name: prove
|
||||
description: Prove validity of logical statements by negation and satisfiability checking. If the negation is unsatisfiable, the original statement is valid. Otherwise a counterexample is returned.
|
||||
---
|
||||
|
||||
Given a conjecture (an SMT-LIB2 assertion or a natural language claim), determine whether it holds universally. The method is standard: negate the conjecture and check satisfiability. If the negation is unsatisfiable, the original is valid. If satisfiable, the model is a counterexample.
|
||||
|
||||
# Step 1: Prepare the negated formula
|
||||
|
||||
Action:
|
||||
Wrap the conjecture in `(assert (not ...))` and append
|
||||
`(check-sat)(get-model)`.
|
||||
|
||||
Expectation:
|
||||
A complete SMT-LIB2 formula that negates the original conjecture with
|
||||
all variables declared.
|
||||
|
||||
Result:
|
||||
If the negation is well-formed, proceed to Step 2.
|
||||
If the conjecture is natural language, run **encode** first.
|
||||
|
||||
Example: to prove that `(> x 3)` implies `(> x 1)`:
|
||||
```smtlib
|
||||
(declare-const x Int)
|
||||
(assert (not (=> (> x 3) (> x 1))))
|
||||
(check-sat)
|
||||
(get-model)
|
||||
```
|
||||
|
||||
# Step 2: Run the prover
|
||||
|
||||
Action:
|
||||
Invoke prove.py with the conjecture and variable declarations.
|
||||
|
||||
Expectation:
|
||||
The script prints `valid`, `invalid` (with counterexample), `unknown`,
|
||||
or `timeout`. A run entry is logged to z3agent.db.
|
||||
|
||||
Result:
|
||||
On `valid`: proceed to **explain** if the user needs a summary.
|
||||
On `invalid`: report the counterexample directly.
|
||||
On `unknown`/`timeout`: try **simplify** first, or increase the timeout.
|
||||
|
||||
```bash
|
||||
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
|
||||
```
|
||||
|
||||
For file input where the file contains the full negated formula:
|
||||
```bash
|
||||
python3 scripts/prove.py --file negated.smt2
|
||||
```
|
||||
|
||||
With debug tracing:
|
||||
```bash
|
||||
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
Action:
|
||||
Read the prover output to determine validity of the conjecture.
|
||||
|
||||
Expectation:
|
||||
One of `valid`, `invalid` (with counterexample), `unknown`, or `timeout`.
|
||||
|
||||
Result:
|
||||
On `valid`: the conjecture holds universally.
|
||||
On `invalid`: the model shows a concrete counterexample.
|
||||
On `unknown`/`timeout`: the conjecture may require auxiliary lemmas or induction.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| conjecture | string | no | | the assertion to prove (without negation) |
|
||||
| vars | string | no | | variable declarations as "name:sort" pairs, comma-separated |
|
||||
| file | path | no | | .smt2 file with the negated formula |
|
||||
| timeout | int | no | 30 | seconds |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
|
||||
Either `conjecture` (with `vars`) or `file` must be provided.
|
||||
82
.github/skills/prove/scripts/prove.py
vendored
Normal file
82
.github/skills/prove/scripts/prove.py
vendored
Normal file
|
|
@ -0,0 +1,82 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
prove.py: prove validity by negation + satisfiability check.
|
||||
|
||||
Usage:
|
||||
python prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
|
||||
python prove.py --file negated.smt2
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_model, setup_logging
|
||||
|
||||
|
||||
def build_formula(conjecture: str, vars_str: str) -> str:
|
||||
lines = []
|
||||
if vars_str:
|
||||
for v in vars_str.split(","):
|
||||
v = v.strip()
|
||||
name, sort = v.split(":")
|
||||
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
|
||||
lines.append(f"(assert (not {conjecture}))")
|
||||
lines.append("(check-sat)")
|
||||
lines.append("(get-model)")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="prove")
|
||||
parser.add_argument("--conjecture", help="assertion to prove")
|
||||
parser.add_argument("--vars", help="variable declarations, e.g. 'x:Int,y:Bool'")
|
||||
parser.add_argument("--file", help="path to .smt2 file with negated formula")
|
||||
parser.add_argument("--timeout", type=int, default=30)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.conjecture:
|
||||
formula = build_formula(args.conjecture, args.vars or "")
|
||||
else:
|
||||
parser.error("provide --conjecture or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("prove", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
if result["result"] == "unsat":
|
||||
verdict = "valid"
|
||||
elif result["result"] == "sat":
|
||||
verdict = "invalid"
|
||||
else:
|
||||
verdict = result["result"]
|
||||
|
||||
model = parse_model(result["stdout"]) if verdict == "invalid" else None
|
||||
|
||||
db.log_formula(run_id, formula, verdict, str(model) if model else None)
|
||||
db.finish_run(run_id, verdict, result["duration_ms"], result["exit_code"])
|
||||
|
||||
print(verdict)
|
||||
if model:
|
||||
print("counterexample:")
|
||||
for name, val in model.items():
|
||||
print(f" {name} = {val}")
|
||||
|
||||
db.close()
|
||||
# Exit 0 when we successfully determined validity or invalidity;
|
||||
# exit 1 only for errors/timeouts.
|
||||
sys.exit(0 if verdict in ("valid", "invalid") else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
57
.github/skills/shared/schema.sql
vendored
Normal file
57
.github/skills/shared/schema.sql
vendored
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
-- z3agent schema v1
|
||||
|
||||
PRAGMA journal_mode=WAL;
|
||||
PRAGMA foreign_keys=ON;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS runs (
|
||||
run_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
skill TEXT NOT NULL,
|
||||
input_hash TEXT,
|
||||
status TEXT NOT NULL DEFAULT 'running',
|
||||
duration_ms INTEGER,
|
||||
exit_code INTEGER,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_runs_skill ON runs(skill);
|
||||
CREATE INDEX IF NOT EXISTS idx_runs_status ON runs(status);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS formulas (
|
||||
formula_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
|
||||
smtlib2 TEXT NOT NULL,
|
||||
result TEXT,
|
||||
model TEXT,
|
||||
stats TEXT,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_formulas_run ON formulas(run_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_formulas_result ON formulas(result);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS findings (
|
||||
finding_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
|
||||
category TEXT NOT NULL,
|
||||
severity TEXT,
|
||||
file TEXT,
|
||||
line INTEGER,
|
||||
message TEXT NOT NULL,
|
||||
details TEXT,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_findings_run ON findings(run_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_findings_category ON findings(category);
|
||||
CREATE INDEX IF NOT EXISTS idx_findings_severity ON findings(severity);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS interaction_log (
|
||||
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER REFERENCES runs(run_id) ON DELETE SET NULL,
|
||||
level TEXT NOT NULL DEFAULT 'info',
|
||||
message TEXT NOT NULL,
|
||||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_log_run ON interaction_log(run_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_log_level ON interaction_log(level);
|
||||
364
.github/skills/shared/z3db.py
vendored
Normal file
364
.github/skills/shared/z3db.py
vendored
Normal file
|
|
@ -0,0 +1,364 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
z3db: shared library and CLI for Z3 skill scripts.
|
||||
|
||||
Library usage:
|
||||
from z3db import Z3DB, find_z3, run_z3
|
||||
|
||||
CLI usage:
|
||||
python z3db.py init
|
||||
python z3db.py status
|
||||
python z3db.py log [--run-id N]
|
||||
python z3db.py runs [--skill solve] [--last N]
|
||||
python z3db.py query "SELECT ..."
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import sqlite3
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
SCHEMA_PATH = Path(__file__).parent / "schema.sql"
|
||||
DEFAULT_DB_DIR = ".z3-agent"
|
||||
DEFAULT_DB_NAME = "z3agent.db"
|
||||
|
||||
logger = logging.getLogger("z3agent")
|
||||
|
||||
|
||||
def setup_logging(debug: bool = False):
|
||||
level = logging.DEBUG if debug else logging.INFO
|
||||
fmt = (
|
||||
"[%(levelname)s] %(message)s"
|
||||
if not debug
|
||||
else "[%(levelname)s %(asctime)s] %(message)s"
|
||||
)
|
||||
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
|
||||
|
||||
|
||||
class Z3DB:
|
||||
"""SQLite handle for z3agent.db, tracks runs, formulas, findings, logs."""
|
||||
|
||||
def __init__(self, db_path: Optional[str] = None):
|
||||
if db_path is None:
|
||||
db_dir = Path(DEFAULT_DB_DIR)
|
||||
db_dir.mkdir(exist_ok=True)
|
||||
db_path = str(db_dir / DEFAULT_DB_NAME)
|
||||
self.db_path = db_path
|
||||
self.conn = sqlite3.connect(db_path)
|
||||
self.conn.execute("PRAGMA foreign_keys=ON")
|
||||
self.conn.row_factory = sqlite3.Row
|
||||
self._init_schema()
|
||||
|
||||
def _init_schema(self):
|
||||
self.conn.executescript(SCHEMA_PATH.read_text())
|
||||
|
||||
def close(self):
|
||||
self.conn.close()
|
||||
|
||||
def start_run(self, skill: str, input_text: str = "") -> int:
|
||||
input_hash = hashlib.sha256(input_text.encode()).hexdigest()[:16]
|
||||
cur = self.conn.execute(
|
||||
"INSERT INTO runs (skill, input_hash) VALUES (?, ?)",
|
||||
(skill, input_hash),
|
||||
)
|
||||
self.conn.commit()
|
||||
run_id = cur.lastrowid
|
||||
logger.debug("started run %d (skill=%s, hash=%s)", run_id, skill, input_hash)
|
||||
return run_id
|
||||
|
||||
def finish_run(
|
||||
self, run_id: int, status: str, duration_ms: int, exit_code: int = 0
|
||||
):
|
||||
self.conn.execute(
|
||||
"UPDATE runs SET status=?, duration_ms=?, exit_code=? WHERE run_id=?",
|
||||
(status, duration_ms, exit_code, run_id),
|
||||
)
|
||||
self.conn.commit()
|
||||
logger.debug("finished run %d: %s (%dms)", run_id, status, duration_ms)
|
||||
|
||||
def log_formula(
|
||||
self,
|
||||
run_id: int,
|
||||
smtlib2: str,
|
||||
result: str = None,
|
||||
model: str = None,
|
||||
stats: dict = None,
|
||||
) -> int:
|
||||
cur = self.conn.execute(
|
||||
"INSERT INTO formulas (run_id, smtlib2, result, model, stats) "
|
||||
"VALUES (?, ?, ?, ?, ?)",
|
||||
(run_id, smtlib2, result, model, json.dumps(stats) if stats else None),
|
||||
)
|
||||
self.conn.commit()
|
||||
return cur.lastrowid
|
||||
|
||||
def log_finding(
|
||||
self,
|
||||
run_id: int,
|
||||
category: str,
|
||||
message: str,
|
||||
severity: str = None,
|
||||
file: str = None,
|
||||
line: int = None,
|
||||
details: dict = None,
|
||||
) -> int:
|
||||
cur = self.conn.execute(
|
||||
"INSERT INTO findings (run_id, category, severity, file, line, "
|
||||
"message, details) VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
(
|
||||
run_id,
|
||||
category,
|
||||
severity,
|
||||
file,
|
||||
line,
|
||||
message,
|
||||
json.dumps(details) if details else None,
|
||||
),
|
||||
)
|
||||
self.conn.commit()
|
||||
return cur.lastrowid
|
||||
|
||||
def log(self, message: str, level: str = "info", run_id: int = None):
|
||||
"""Write to stderr and to the interaction_log table."""
|
||||
getattr(logger, level, logger.info)(message)
|
||||
self.conn.execute(
|
||||
"INSERT INTO interaction_log (run_id, level, message) " "VALUES (?, ?, ?)",
|
||||
(run_id, level, message),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def get_runs(self, skill: str = None, last: int = 10):
|
||||
sql = "SELECT * FROM runs"
|
||||
params = []
|
||||
if skill:
|
||||
sql += " WHERE skill = ?"
|
||||
params.append(skill)
|
||||
sql += " ORDER BY run_id DESC LIMIT ?"
|
||||
params.append(last)
|
||||
return self.conn.execute(sql, params).fetchall()
|
||||
|
||||
def get_status(self) -> dict:
|
||||
rows = self.conn.execute(
|
||||
"SELECT status, COUNT(*) as cnt FROM runs GROUP BY status"
|
||||
).fetchall()
|
||||
total = sum(r["cnt"] for r in rows)
|
||||
by_status = {r["status"]: r["cnt"] for r in rows}
|
||||
last = self.conn.execute(
|
||||
"SELECT timestamp FROM runs ORDER BY run_id DESC LIMIT 1"
|
||||
).fetchone()
|
||||
return {
|
||||
"total": total,
|
||||
**by_status,
|
||||
"last_run": last["timestamp"] if last else None,
|
||||
}
|
||||
|
||||
def get_logs(self, run_id: int = None, last: int = 50):
|
||||
if run_id:
|
||||
return self.conn.execute(
|
||||
"SELECT * FROM interaction_log WHERE run_id=? "
|
||||
"ORDER BY log_id DESC LIMIT ?",
|
||||
(run_id, last),
|
||||
).fetchall()
|
||||
return self.conn.execute(
|
||||
"SELECT * FROM interaction_log ORDER BY log_id DESC LIMIT ?", (last,)
|
||||
).fetchall()
|
||||
|
||||
def query(self, sql: str):
|
||||
return self.conn.execute(sql).fetchall()
|
||||
|
||||
|
||||
def find_z3(hint: str = None) -> str:
|
||||
"""Locate the z3 binary: explicit path > build dirs > PATH."""
|
||||
candidates = []
|
||||
if hint:
|
||||
candidates.append(hint)
|
||||
|
||||
repo_root = _find_repo_root()
|
||||
if repo_root:
|
||||
for build_dir in ["build", "build/release", "build/debug"]:
|
||||
candidates.append(str(repo_root / build_dir / "z3"))
|
||||
|
||||
path_z3 = shutil.which("z3")
|
||||
if path_z3:
|
||||
candidates.append(path_z3)
|
||||
|
||||
for c in candidates:
|
||||
p = Path(c)
|
||||
if p.is_file() and os.access(p, os.X_OK):
|
||||
logger.debug("found z3: %s", p)
|
||||
return str(p)
|
||||
|
||||
logger.error("z3 binary not found. Searched: %s", candidates)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def _find_repo_root() -> Optional[Path]:
|
||||
d = Path.cwd()
|
||||
for _ in range(10):
|
||||
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
|
||||
return d
|
||||
parent = d.parent
|
||||
if parent == d:
|
||||
break
|
||||
d = parent
|
||||
return None
|
||||
|
||||
|
||||
def run_z3(
|
||||
formula: str,
|
||||
z3_bin: str = None,
|
||||
timeout: int = 30,
|
||||
args: list = None,
|
||||
debug: bool = False,
|
||||
) -> dict:
|
||||
"""Pipe an SMT-LIB2 formula into z3 -in, return parsed output."""
|
||||
z3_path = find_z3(z3_bin)
|
||||
cmd = [z3_path, "-in"] + (args or [])
|
||||
|
||||
logger.debug("cmd: %s", " ".join(cmd))
|
||||
logger.debug("stdin:\n%s", formula)
|
||||
|
||||
start = time.monotonic()
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
input=formula,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
duration_ms = int((time.monotonic() - start) * 1000)
|
||||
logger.warning("z3 timed out after %dms", duration_ms)
|
||||
return {
|
||||
"stdout": "",
|
||||
"stderr": "timeout",
|
||||
"exit_code": -1,
|
||||
"duration_ms": duration_ms,
|
||||
"result": "timeout",
|
||||
}
|
||||
|
||||
duration_ms = int((time.monotonic() - start) * 1000)
|
||||
|
||||
logger.debug("exit_code=%d duration=%dms", proc.returncode, duration_ms)
|
||||
logger.debug("stdout:\n%s", proc.stdout)
|
||||
if proc.stderr:
|
||||
logger.debug("stderr:\n%s", proc.stderr)
|
||||
|
||||
first_line = proc.stdout.strip().split("\n")[0].strip() if proc.stdout else ""
|
||||
result = first_line if first_line in ("sat", "unsat", "unknown") else "error"
|
||||
|
||||
return {
|
||||
"stdout": proc.stdout,
|
||||
"stderr": proc.stderr,
|
||||
"exit_code": proc.returncode,
|
||||
"duration_ms": duration_ms,
|
||||
"result": result,
|
||||
}
|
||||
|
||||
|
||||
def parse_model(stdout: str) -> Optional[dict]:
|
||||
"""Pull define-fun entries from a (get-model) response."""
|
||||
model = {}
|
||||
for m in re.finditer(r"\(define-fun\s+(\S+)\s+\(\)\s+\S+\s+(.+?)\)", stdout):
|
||||
model[m.group(1)] = m.group(2).strip()
|
||||
return model if model else None
|
||||
|
||||
|
||||
def parse_stats(stdout: str) -> Optional[dict]:
|
||||
"""Parse :key value pairs from z3 -st output."""
|
||||
stats = {}
|
||||
for m in re.finditer(r":(\S+)\s+([\d.]+)", stdout):
|
||||
key, val = m.group(1), m.group(2)
|
||||
stats[key] = float(val) if "." in val else int(val)
|
||||
return stats if stats else None
|
||||
|
||||
|
||||
def parse_unsat_core(stdout: str) -> Optional[list]:
|
||||
for line in stdout.strip().split("\n"):
|
||||
line = line.strip()
|
||||
if line.startswith("(") and not line.startswith("(error"):
|
||||
labels = line.strip("()").split()
|
||||
if labels:
|
||||
return labels
|
||||
return None
|
||||
|
||||
|
||||
def cli():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Z3 Agent database CLI",
|
||||
prog="z3db",
|
||||
)
|
||||
parser.add_argument("--db", default=None, help="path to z3agent.db")
|
||||
parser.add_argument("--debug", action="store_true", help="verbose output")
|
||||
|
||||
sub = parser.add_subparsers(dest="command")
|
||||
|
||||
sub.add_parser("init", help="initialize the database")
|
||||
|
||||
sub.add_parser("status", help="show run summary")
|
||||
|
||||
log_p = sub.add_parser("log", help="show interaction log")
|
||||
log_p.add_argument("--run-id", type=int, help="filter by run ID")
|
||||
log_p.add_argument("--last", type=int, default=50)
|
||||
|
||||
runs_p = sub.add_parser("runs", help="list runs")
|
||||
runs_p.add_argument("--skill", help="filter by skill name")
|
||||
runs_p.add_argument("--last", type=int, default=10)
|
||||
|
||||
query_p = sub.add_parser("query", help="run raw SQL")
|
||||
query_p.add_argument("sql", help="SQL query string")
|
||||
|
||||
args = parser.parse_args()
|
||||
setup_logging(args.debug)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
|
||||
if args.command == "init":
|
||||
print(f"Database initialized at {db.db_path}")
|
||||
|
||||
elif args.command == "status":
|
||||
s = db.get_status()
|
||||
print(
|
||||
f"Runs: {s['total']}"
|
||||
f" | success: {s.get('success', 0)}"
|
||||
f" | error: {s.get('error', 0)}"
|
||||
f" | timeout: {s.get('timeout', 0)}"
|
||||
f" | Last: {s['last_run'] or 'never'}"
|
||||
)
|
||||
|
||||
elif args.command == "log":
|
||||
for row in db.get_logs(args.run_id, args.last):
|
||||
print(
|
||||
f"[{row['level']}] {row['timestamp']} "
|
||||
f"(run {row['run_id']}): {row['message']}"
|
||||
)
|
||||
|
||||
elif args.command == "runs":
|
||||
for row in db.get_runs(args.skill, args.last):
|
||||
print(
|
||||
f"#{row['run_id']} {row['skill']} {row['status']} "
|
||||
f"{row['duration_ms']}ms @ {row['timestamp']}"
|
||||
)
|
||||
|
||||
elif args.command == "query":
|
||||
for row in db.query(args.sql):
|
||||
print(dict(row))
|
||||
|
||||
else:
|
||||
parser.print_help()
|
||||
|
||||
db.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
cli()
|
||||
76
.github/skills/simplify/SKILL.md
vendored
Normal file
76
.github/skills/simplify/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
---
|
||||
name: simplify
|
||||
description: Reduce formula complexity using Z3 tactic chains. Supports configurable tactic pipelines for boolean, arithmetic, and bitvector simplification.
|
||||
---
|
||||
|
||||
Given a formula, apply a sequence of Z3 tactics to produce an equivalent but simpler form. This is useful for understanding what Z3 sees after preprocessing, debugging tactic selection, and reducing formula size before solving.
|
||||
|
||||
# Step 1: Choose tactics
|
||||
|
||||
Action:
|
||||
Select a tactic chain from the available Z3 tactics based on the
|
||||
formula's theory.
|
||||
|
||||
Expectation:
|
||||
A comma-separated list of tactic names suitable for the formula domain.
|
||||
|
||||
Result:
|
||||
If unsure, use the default chain: `simplify,propagate-values,ctx-simplify`.
|
||||
For bitvector formulas, add `bit-blast`. Proceed to Step 2.
|
||||
|
||||
| Tactic | What it does |
|
||||
|--------|-------------|
|
||||
| simplify | constant folding, algebraic identities |
|
||||
| propagate-values | substitute known equalities |
|
||||
| ctx-simplify | context-dependent simplification |
|
||||
| elim-uncnstr | remove unconstrained variables |
|
||||
| solve-eqs | Gaussian elimination |
|
||||
| bit-blast | reduce bitvectors to booleans |
|
||||
| tseitin-cnf | convert to CNF |
|
||||
| aig | and-inverter graph reduction |
|
||||
|
||||
# Step 2: Run simplification
|
||||
|
||||
Action:
|
||||
Invoke simplify.py with the formula and optional tactic chain.
|
||||
|
||||
Expectation:
|
||||
The script applies each tactic in sequence and prints the simplified
|
||||
formula. A run entry is logged to z3agent.db.
|
||||
|
||||
Result:
|
||||
If the output is simpler, pass it to **solve** or **prove**.
|
||||
If unchanged, try a different tactic chain.
|
||||
|
||||
```bash
|
||||
python3 scripts/simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
|
||||
python3 scripts/simplify.py --file formula.smt2 --tactics "simplify,propagate-values,ctx-simplify"
|
||||
python3 scripts/simplify.py --file formula.smt2 --debug
|
||||
```
|
||||
|
||||
Without `--tactics`, the script applies the default chain: `simplify`, `propagate-values`, `ctx-simplify`.
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
Action:
|
||||
Read the simplified formula output in SMT-LIB2 syntax.
|
||||
|
||||
Expectation:
|
||||
One or more `(assert ...)` blocks representing equivalent subgoals.
|
||||
|
||||
Result:
|
||||
A smaller formula indicates successful reduction. Pass the result to
|
||||
**solve**, **prove**, or **optimize** as needed.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula to simplify |
|
||||
| vars | string | no | | variable declarations as "name:sort" pairs |
|
||||
| file | path | no | | path to .smt2 file |
|
||||
| tactics | string | no | simplify,propagate-values,ctx-simplify | comma-separated tactic names |
|
||||
| timeout | int | no | 30 | seconds |
|
||||
| z3 | path | no | auto | path to z3 binary |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
82
.github/skills/simplify/scripts/simplify.py
vendored
Normal file
82
.github/skills/simplify/scripts/simplify.py
vendored
Normal file
|
|
@ -0,0 +1,82 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
simplify.py: apply Z3 tactics to simplify an SMT-LIB2 formula.
|
||||
|
||||
Usage:
|
||||
python simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
|
||||
python simplify.py --file formula.smt2 --tactics "simplify,solve-eqs"
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, setup_logging
|
||||
|
||||
DEFAULT_TACTICS = "simplify,propagate-values,ctx-simplify"
|
||||
|
||||
|
||||
def build_tactic_formula(base_formula: str, tactics: str) -> str:
|
||||
tactic_list = [t.strip() for t in tactics.split(",")]
|
||||
if len(tactic_list) == 1:
|
||||
tactic_expr = f"(then {tactic_list[0]} skip)"
|
||||
else:
|
||||
tactic_expr = "(then " + " ".join(tactic_list) + ")"
|
||||
return base_formula + f"\n(apply {tactic_expr})\n"
|
||||
|
||||
|
||||
def build_formula_from_parts(formula_str: str, vars_str: str) -> str:
|
||||
lines = []
|
||||
if vars_str:
|
||||
for v in vars_str.split(","):
|
||||
v = v.strip()
|
||||
name, sort = v.split(":")
|
||||
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
|
||||
lines.append(formula_str)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="simplify")
|
||||
parser.add_argument("--formula")
|
||||
parser.add_argument("--vars")
|
||||
parser.add_argument("--file")
|
||||
parser.add_argument("--tactics", default=DEFAULT_TACTICS)
|
||||
parser.add_argument("--timeout", type=int, default=30)
|
||||
parser.add_argument("--z3", default=None)
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
base = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
base = build_formula_from_parts(args.formula, args.vars or "")
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
formula = build_tactic_formula(base, args.tactics)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("simplify", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
status = "success" if result["exit_code"] == 0 else "error"
|
||||
db.log_formula(run_id, formula, status)
|
||||
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
|
||||
|
||||
print(result["stdout"])
|
||||
if result["stderr"] and result["exit_code"] != 0:
|
||||
print(result["stderr"], file=sys.stderr)
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
75
.github/skills/solve/SKILL.md
vendored
Normal file
75
.github/skills/solve/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
---
|
||||
name: solve
|
||||
description: Check satisfiability of SMT-LIB2 formulas using Z3. Returns sat/unsat with models or unsat cores. Logs every invocation to z3agent.db for auditability.
|
||||
---
|
||||
|
||||
Given an SMT-LIB2 formula (or a set of constraints described in natural language), determine whether the formula is satisfiable. If sat, extract a satisfying assignment. If unsat and tracking labels are present, extract the unsat core.
|
||||
|
||||
# Step 1: Prepare the formula
|
||||
|
||||
Action:
|
||||
Convert the input to valid SMT-LIB2. If the input is natural language,
|
||||
use the **encode** skill first.
|
||||
|
||||
Expectation:
|
||||
A syntactically valid SMT-LIB2 formula ending with `(check-sat)` and
|
||||
either `(get-model)` or `(get-unsat-core)` as appropriate.
|
||||
|
||||
Result:
|
||||
If valid SMT-LIB2 is ready, proceed to Step 2.
|
||||
If encoding is needed, run **encode** first and return here.
|
||||
|
||||
# Step 2: Run Z3
|
||||
|
||||
Action:
|
||||
Invoke solve.py with the formula string or file path.
|
||||
|
||||
Expectation:
|
||||
The script pipes the formula to `z3 -in`, logs the run to
|
||||
`.z3-agent/z3agent.db`, and prints the result.
|
||||
|
||||
Result:
|
||||
Output is one of `sat`, `unsat`, `unknown`, or `timeout`.
|
||||
Proceed to Step 3 to interpret.
|
||||
|
||||
```bash
|
||||
python3 scripts/solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
|
||||
```
|
||||
|
||||
For file input:
|
||||
```bash
|
||||
python3 scripts/solve.py --file problem.smt2
|
||||
```
|
||||
|
||||
With debug tracing:
|
||||
```bash
|
||||
python3 scripts/solve.py --file problem.smt2 --debug
|
||||
```
|
||||
|
||||
# Step 3: Interpret the output
|
||||
|
||||
Action:
|
||||
Parse the Z3 output to determine satisfiability and extract any model
|
||||
or unsat core.
|
||||
|
||||
Expectation:
|
||||
`sat` with a model, `unsat` optionally with a core, `unknown`, or
|
||||
`timeout`.
|
||||
|
||||
Result:
|
||||
On `sat`: report the model to the user.
|
||||
On `unsat`: report the core if available.
|
||||
On `unknown`/`timeout`: try **simplify** or increase the timeout.
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| formula | string | no | | SMT-LIB2 formula as a string |
|
||||
| file | path | no | | path to an .smt2 file |
|
||||
| timeout | int | no | 30 | seconds before killing z3 |
|
||||
| z3 | path | no | auto | explicit path to z3 binary |
|
||||
| debug | flag | no | off | print z3 command, stdin, stdout, stderr, timing |
|
||||
| db | path | no | .z3-agent/z3agent.db | path to the logging database |
|
||||
|
||||
Either `formula` or `file` must be provided.
|
||||
64
.github/skills/solve/scripts/solve.py
vendored
Normal file
64
.github/skills/solve/scripts/solve.py
vendored
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
solve.py: check satisfiability of an SMT-LIB2 formula via Z3.
|
||||
|
||||
Usage:
|
||||
python solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
|
||||
python solve.py --file problem.smt2
|
||||
python solve.py --file problem.smt2 --debug --timeout 60
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, run_z3, parse_model, parse_unsat_core, setup_logging
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(prog="solve")
|
||||
parser.add_argument("--formula", help="SMT-LIB2 formula string")
|
||||
parser.add_argument("--file", help="path to .smt2 file")
|
||||
parser.add_argument("--timeout", type=int, default=30)
|
||||
parser.add_argument("--z3", default=None, help="path to z3 binary")
|
||||
parser.add_argument("--db", default=None)
|
||||
parser.add_argument("--debug", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
if args.file:
|
||||
formula = Path(args.file).read_text()
|
||||
elif args.formula:
|
||||
formula = args.formula
|
||||
else:
|
||||
parser.error("provide --formula or --file")
|
||||
return
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("solve", formula)
|
||||
|
||||
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
|
||||
|
||||
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
|
||||
core = parse_unsat_core(result["stdout"]) if result["result"] == "unsat" else None
|
||||
|
||||
db.log_formula(run_id, formula, result["result"], str(model) if model else None)
|
||||
db.finish_run(run_id, result["result"], result["duration_ms"], result["exit_code"])
|
||||
|
||||
print(result["result"])
|
||||
if model:
|
||||
for name, val in model.items():
|
||||
print(f" {name} = {val}")
|
||||
if core:
|
||||
print("unsat core:", " ".join(core))
|
||||
if result["stderr"] and result["result"] == "error":
|
||||
print(result["stderr"], file=sys.stderr)
|
||||
|
||||
db.close()
|
||||
sys.exit(0 if result["exit_code"] == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
81
.github/skills/static-analysis/SKILL.md
vendored
Normal file
81
.github/skills/static-analysis/SKILL.md
vendored
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
---
|
||||
name: static-analysis
|
||||
description: Run Clang Static Analyzer (scan-build) on Z3 source and log structured findings to z3agent.db.
|
||||
---
|
||||
|
||||
Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plist diagnostics, and record each finding with file, line, category, and description. This skill wraps scan-build into a reproducible, logged workflow suitable for regular analysis sweeps and regression tracking.
|
||||
|
||||
# Step 1: Run the analysis
|
||||
|
||||
Action:
|
||||
Invoke the script pointing at the CMake build directory. The script
|
||||
runs `scan-build cmake ..` followed by `scan-build make` and writes
|
||||
checker output to the output directory.
|
||||
|
||||
Expectation:
|
||||
scan-build completes within the timeout, producing plist diagnostic
|
||||
files in the output directory (defaults to a `scan-results` subdirectory
|
||||
of the build directory).
|
||||
|
||||
Result:
|
||||
On success: diagnostics are parsed and findings are printed. Proceed to
|
||||
Step 2.
|
||||
On failure: verify that clang and scan-build are installed and that the
|
||||
build directory contains a valid CMake configuration.
|
||||
|
||||
```bash
|
||||
python3 scripts/static_analysis.py --build-dir build
|
||||
python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug
|
||||
python3 scripts/static_analysis.py --build-dir build --timeout 1800
|
||||
```
|
||||
|
||||
# Step 2: Interpret the output
|
||||
|
||||
Action:
|
||||
Review the printed findings and the summary table grouped by category.
|
||||
|
||||
Expectation:
|
||||
Each finding shows its source location, category, and description.
|
||||
The summary table ranks categories by frequency for quick triage.
|
||||
|
||||
Result:
|
||||
On zero findings: the codebase passes all enabled static checks.
|
||||
On findings: prioritize by category frequency and severity. Address
|
||||
null dereferences and use-after-free classes first.
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
[Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read
|
||||
[Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer
|
||||
```
|
||||
|
||||
# Step 3: Review historical findings
|
||||
|
||||
Action:
|
||||
Query z3agent.db to compare current results against prior analysis
|
||||
runs.
|
||||
|
||||
Expectation:
|
||||
Queries return category counts and run history, enabling regression
|
||||
detection across commits.
|
||||
|
||||
Result:
|
||||
On stable or decreasing counts: no regressions introduced.
|
||||
On increased counts: cross-reference new findings with recent commits
|
||||
to identify the responsible change.
|
||||
|
||||
```bash
|
||||
python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"
|
||||
python3 ../../shared/z3db.py runs --skill static-analysis --last 10
|
||||
```
|
||||
|
||||
# Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| build-dir | path | yes | | path to the CMake build directory |
|
||||
| output-dir | path | no | BUILD/scan-results | directory for scan-build output |
|
||||
| timeout | int | no | 1200 | seconds allowed for the full build |
|
||||
| db | path | no | .z3-agent/z3agent.db | logging database |
|
||||
| debug | flag | no | off | verbose tracing |
|
||||
260
.github/skills/static-analysis/scripts/static_analysis.py
vendored
Normal file
260
.github/skills/static-analysis/scripts/static_analysis.py
vendored
Normal file
|
|
@ -0,0 +1,260 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
static_analysis.py: run Clang Static Analyzer on Z3 source.
|
||||
|
||||
Usage:
|
||||
python static_analysis.py --build-dir build
|
||||
python static_analysis.py --build-dir build --output-dir /tmp/sa-results
|
||||
python static_analysis.py --build-dir build --debug
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import plistlib
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
from collections import Counter
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
|
||||
from z3db import Z3DB, setup_logging
|
||||
|
||||
logger = logging.getLogger("z3agent")
|
||||
|
||||
SCAN_BUILD_NAMES = ["scan-build", "scan-build-14", "scan-build-15", "scan-build-16"]
|
||||
|
||||
|
||||
def find_scan_build() -> str:
|
||||
"""Locate the scan-build binary on PATH."""
|
||||
for name in SCAN_BUILD_NAMES:
|
||||
path = shutil.which(name)
|
||||
if path:
|
||||
logger.debug("found scan-build: %s", path)
|
||||
return path
|
||||
print(
|
||||
"scan-build not found on PATH.\n"
|
||||
"Install one of the following:\n"
|
||||
" Ubuntu/Debian: sudo apt install clang-tools\n"
|
||||
" macOS: brew install llvm\n"
|
||||
" Fedora: sudo dnf install clang-tools-extra\n"
|
||||
f"searched for: {', '.join(SCAN_BUILD_NAMES)}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def run_configure(scan_build: str, build_dir: Path, output_dir: Path,
|
||||
timeout: int) -> bool:
|
||||
"""Run scan-build cmake to configure the project."""
|
||||
repo_root = build_dir.parent
|
||||
cmd = [
|
||||
scan_build,
|
||||
"-o", str(output_dir),
|
||||
"cmake",
|
||||
str(repo_root),
|
||||
]
|
||||
logger.info("configuring: %s", " ".join(cmd))
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd, cwd=str(build_dir),
|
||||
capture_output=True, text=True, timeout=timeout,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.error("cmake configuration timed out after %ds", timeout)
|
||||
return False
|
||||
|
||||
if proc.returncode != 0:
|
||||
logger.error("cmake configuration failed (exit %d)", proc.returncode)
|
||||
logger.error("stderr: %s", proc.stderr[:2000])
|
||||
return False
|
||||
|
||||
logger.info("configuration complete")
|
||||
return True
|
||||
|
||||
|
||||
def run_build(scan_build: str, build_dir: Path, output_dir: Path,
|
||||
timeout: int) -> bool:
|
||||
"""Run scan-build make to build and analyze."""
|
||||
nproc = os.cpu_count() or 4
|
||||
cmd = [
|
||||
scan_build,
|
||||
"-o", str(output_dir),
|
||||
"--status-bugs",
|
||||
"make",
|
||||
f"-j{nproc}",
|
||||
]
|
||||
logger.info("building with analysis: %s", " ".join(cmd))
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd, cwd=str(build_dir),
|
||||
capture_output=True, text=True, timeout=timeout,
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.error("build timed out after %ds", timeout)
|
||||
return False
|
||||
|
||||
# scan-build returns nonzero when bugs are found (due to --status-bugs),
|
||||
# so a nonzero exit code is not necessarily a build failure.
|
||||
if proc.returncode != 0:
|
||||
logger.info(
|
||||
"scan-build exited with code %d (nonzero may indicate findings)",
|
||||
proc.returncode,
|
||||
)
|
||||
else:
|
||||
logger.info("build complete, no bugs reported by scan-build")
|
||||
|
||||
if proc.stderr:
|
||||
logger.debug("build stderr (last 2000 chars): %s", proc.stderr[-2000:])
|
||||
return True
|
||||
|
||||
|
||||
def collect_plist_files(output_dir: Path) -> list:
|
||||
"""Recursively find all .plist diagnostic files under the output directory."""
|
||||
plists = sorted(output_dir.rglob("*.plist"))
|
||||
logger.debug("found %d plist files in %s", len(plists), output_dir)
|
||||
return plists
|
||||
|
||||
|
||||
def parse_plist_findings(plist_path: Path) -> list:
|
||||
"""Extract findings from a single Clang plist diagnostic file.
|
||||
|
||||
Returns a list of dicts with keys: file, line, col, category, type, description.
|
||||
"""
|
||||
findings = []
|
||||
try:
|
||||
with open(plist_path, "rb") as f:
|
||||
data = plistlib.load(f)
|
||||
except Exception as exc:
|
||||
logger.warning("could not parse %s: %s", plist_path, exc)
|
||||
return findings
|
||||
|
||||
source_files = data.get("files", [])
|
||||
for diag in data.get("diagnostics", []):
|
||||
location = diag.get("location", {})
|
||||
file_idx = location.get("file", 0)
|
||||
source_file = source_files[file_idx] if file_idx < len(source_files) else "<unknown>"
|
||||
findings.append({
|
||||
"file": source_file,
|
||||
"line": location.get("line", 0),
|
||||
"col": location.get("col", 0),
|
||||
"category": diag.get("category", "uncategorized"),
|
||||
"type": diag.get("type", ""),
|
||||
"description": diag.get("description", ""),
|
||||
})
|
||||
return findings
|
||||
|
||||
|
||||
def collect_all_findings(output_dir: Path) -> list:
|
||||
"""Parse every plist file under output_dir and return merged findings."""
|
||||
all_findings = []
|
||||
for plist_path in collect_plist_files(output_dir):
|
||||
all_findings.extend(parse_plist_findings(plist_path))
|
||||
return all_findings
|
||||
|
||||
|
||||
def log_findings(db, run_id: int, findings: list):
|
||||
"""Persist each finding to z3agent.db."""
|
||||
for f in findings:
|
||||
db.log_finding(
|
||||
run_id,
|
||||
category=f["category"],
|
||||
message=f["description"],
|
||||
severity=f.get("type"),
|
||||
file=f["file"],
|
||||
line=f["line"],
|
||||
details={"col": f["col"], "type": f["type"]},
|
||||
)
|
||||
|
||||
|
||||
def print_findings(findings: list):
|
||||
"""Print individual findings and a category summary."""
|
||||
if not findings:
|
||||
print("No findings reported.")
|
||||
return
|
||||
|
||||
for f in findings:
|
||||
label = f["category"]
|
||||
if f["type"]:
|
||||
label = f["type"]
|
||||
print(f"[{label}] {f['file']}:{f['line']}: {f['description']}")
|
||||
|
||||
print()
|
||||
counts = Counter(f["category"] for f in findings)
|
||||
print(f"Total findings: {len(findings)}")
|
||||
print("By category:")
|
||||
for cat, cnt in counts.most_common():
|
||||
print(f" {cat}: {cnt}")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="static_analysis",
|
||||
description="Run Clang Static Analyzer on Z3 and log findings.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--build-dir", required=True,
|
||||
help="path to the CMake build directory",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output-dir", default=None,
|
||||
help="directory for scan-build results (default: BUILD/scan-results)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout", type=int, default=1200,
|
||||
help="seconds allowed for the full analysis build",
|
||||
)
|
||||
parser.add_argument("--db", default=None, help="path to z3agent.db")
|
||||
parser.add_argument("--debug", action="store_true", help="verbose tracing")
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(args.debug)
|
||||
|
||||
scan_build = find_scan_build()
|
||||
|
||||
build_dir = Path(args.build_dir).resolve()
|
||||
build_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
output_dir = Path(args.output_dir) if args.output_dir else build_dir / "scan-results"
|
||||
output_dir = output_dir.resolve()
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
db = Z3DB(args.db)
|
||||
run_id = db.start_run("static-analysis", f"build_dir={build_dir}")
|
||||
start = time.monotonic()
|
||||
|
||||
if not run_configure(scan_build, build_dir, output_dir, timeout=args.timeout):
|
||||
elapsed = int((time.monotonic() - start) * 1000)
|
||||
db.finish_run(run_id, "error", elapsed, exit_code=1)
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
|
||||
if not run_build(scan_build, build_dir, output_dir, timeout=args.timeout):
|
||||
elapsed = int((time.monotonic() - start) * 1000)
|
||||
db.finish_run(run_id, "error", elapsed, exit_code=1)
|
||||
db.close()
|
||||
sys.exit(1)
|
||||
|
||||
elapsed = int((time.monotonic() - start) * 1000)
|
||||
|
||||
findings = collect_all_findings(output_dir)
|
||||
log_findings(db, run_id, findings)
|
||||
|
||||
status = "clean" if len(findings) == 0 else "findings"
|
||||
db.finish_run(run_id, status, elapsed, exit_code=0)
|
||||
|
||||
db.log(
|
||||
f"static analysis complete: {len(findings)} finding(s) in {elapsed}ms",
|
||||
run_id=run_id,
|
||||
)
|
||||
|
||||
print_findings(findings)
|
||||
|
||||
db.close()
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -117,3 +117,4 @@ genaisrc/genblogpost.genai.mts
|
|||
bazel-*
|
||||
# Local issue tracking
|
||||
.beads
|
||||
.z3-agent/
|
||||
|
|
|
|||
463
Z3-AGENT.md
Normal file
463
Z3-AGENT.md
Normal file
|
|
@ -0,0 +1,463 @@
|
|||
# Z3 Agent
|
||||
|
||||
A Copilot agent for the Z3 theorem prover. It wraps 9 skills that cover
|
||||
SMT solving and code quality analysis.
|
||||
|
||||
## What it does
|
||||
|
||||
The agent handles two kinds of requests:
|
||||
|
||||
1. **SMT solving**: formulate constraints, check satisfiability, prove
|
||||
properties, optimize objectives, simplify expressions.
|
||||
2. **Code quality**: run sanitizers (ASan, UBSan) and Clang Static Analyzer
|
||||
against the Z3 codebase to catch memory bugs and logic errors.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
You need a built Z3 binary. The scripts look for it in this order:
|
||||
|
||||
1. Explicit `--z3 path/to/z3`
|
||||
2. `build/z3`, `build/release/z3`, `build/debug/z3` (relative to repo root)
|
||||
3. `z3` on your PATH
|
||||
|
||||
For code quality skills you also need:
|
||||
|
||||
- **memory-safety**: cmake, make, and a compiler with sanitizer support
|
||||
(gcc or clang). The script checks at startup and tells you what is missing.
|
||||
- **static-analysis**: scan-build (part of clang-tools). Same early check
|
||||
with install instructions if absent.
|
||||
|
||||
## Using the agent in Copilot Chat
|
||||
|
||||
Mention `@z3` and describe what you want in plain language.
|
||||
The agent figures out which skill to use, builds the formula if needed,
|
||||
runs Z3, and gives you the result.
|
||||
|
||||
### solve: check satisfiability
|
||||
|
||||
```
|
||||
@z3 is (x > 0 and y > 0 and x + y < 5) satisfiable over the integers?
|
||||
```
|
||||
|
||||
Expected response: **sat** with a model like `x = 1, y = 1`.
|
||||
|
||||
```
|
||||
@z3 can x + y = 10 and x - y = 4 both hold at the same time?
|
||||
```
|
||||
|
||||
Expected response: **sat** with `x = 7, y = 3`.
|
||||
|
||||
```
|
||||
@z3 is there an integer x where x > 0, x < 0?
|
||||
```
|
||||
|
||||
Expected response: **unsat** (no such integer exists).
|
||||
|
||||
### prove: check if something is always true
|
||||
|
||||
```
|
||||
@z3 prove that x * x >= 0 for all integers x
|
||||
```
|
||||
|
||||
Expected response: **valid** (the negation is unsatisfiable, so the property holds).
|
||||
|
||||
```
|
||||
@z3 is it true that (a + b)^2 >= 0 for all real a and b?
|
||||
```
|
||||
|
||||
Expected response: **valid**.
|
||||
|
||||
```
|
||||
@z3 prove that if x > y and y > z then x > z, for integers
|
||||
```
|
||||
|
||||
Expected response: **valid** (transitivity of >).
|
||||
|
||||
### optimize: find the best value
|
||||
|
||||
```
|
||||
@z3 maximize 3x + 2y where x >= 1, y >= 1, and x + y <= 20
|
||||
```
|
||||
|
||||
Expected response: **sat** with `x = 19, y = 1` (objective = 59).
|
||||
|
||||
```
|
||||
@z3 minimize x + y where x >= 5, y >= 3, and x + y >= 10
|
||||
```
|
||||
|
||||
Expected response: **sat** with `x = 5, y = 5` or similar (objective = 10).
|
||||
|
||||
### simplify: reduce an expression
|
||||
|
||||
```
|
||||
@z3 simplify x + 0 + 1*x
|
||||
```
|
||||
|
||||
Expected response: `2*x`.
|
||||
|
||||
```
|
||||
@z3 simplify (a and true) or (a and false)
|
||||
```
|
||||
|
||||
Expected response: `a`.
|
||||
|
||||
### encode: translate a problem to SMT-LIB2
|
||||
|
||||
```
|
||||
@z3 encode this as SMT-LIB2: find integers x and y where x + y = 10 and x > y
|
||||
```
|
||||
|
||||
Expected response: the SMT-LIB2 formula:
|
||||
```
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (= (+ x y) 10))
|
||||
(assert (> x y))
|
||||
(check-sat)
|
||||
(get-model)
|
||||
```
|
||||
|
||||
### explain: interpret Z3 output
|
||||
|
||||
```
|
||||
@z3 what does this Z3 output mean?
|
||||
sat
|
||||
(
|
||||
(define-fun x () Int 7)
|
||||
(define-fun y () Int 3)
|
||||
)
|
||||
```
|
||||
|
||||
Expected response: a readable summary like "satisfying assignment: x = 7, y = 3".
|
||||
|
||||
```
|
||||
@z3 Z3 returned unknown, what does that mean?
|
||||
```
|
||||
|
||||
Expected response: an explanation of common causes (timeout, incomplete theory, quantifiers).
|
||||
|
||||
### benchmark: measure performance
|
||||
|
||||
```
|
||||
@z3 how fast can Z3 solve (x > 0 and y > 0 and x + y < 100)? run it 5 times
|
||||
```
|
||||
|
||||
Expected response: timing stats like min/median/max in milliseconds and the result.
|
||||
|
||||
### memory-safety: find memory bugs in Z3
|
||||
|
||||
```
|
||||
@z3 run AddressSanitizer on the Z3 test suite
|
||||
```
|
||||
|
||||
Expected response: builds Z3 with ASan, runs the tests, reports any findings
|
||||
with category, file, and line number. If clean, reports no findings.
|
||||
|
||||
```
|
||||
@z3 check for undefined behavior in Z3
|
||||
```
|
||||
|
||||
Expected response: runs UBSan, same format.
|
||||
|
||||
```
|
||||
@z3 run both sanitizers
|
||||
```
|
||||
|
||||
Expected response: runs ASan and UBSan, aggregates findings from both.
|
||||
|
||||
### static-analysis: find bugs without running the code
|
||||
|
||||
```
|
||||
@z3 run static analysis on the Z3 source
|
||||
```
|
||||
|
||||
Expected response: runs Clang Static Analyzer, reports findings grouped by
|
||||
category (null dereference, dead store, memory leak, etc.) with file and line.
|
||||
|
||||
### multi-skill: the agent chains skills when needed
|
||||
|
||||
```
|
||||
@z3 prove that for all integers, if x^2 is even then x is even
|
||||
```
|
||||
|
||||
The agent uses **encode** to formalize and negate the statement, then
|
||||
**prove** to check it, then **explain** to present the result.
|
||||
|
||||
```
|
||||
@z3 full verification pass before the release
|
||||
```
|
||||
|
||||
The agent runs **memory-safety** (ASan + UBSan) and **static-analysis**
|
||||
in parallel, then aggregates and deduplicates findings sorted by severity.
|
||||
|
||||
## Using the scripts directly
|
||||
|
||||
Every skill lives under `.github/skills/<name>/scripts/`. All scripts
|
||||
accept `--debug` for full tracing and `--db path` to specify where the
|
||||
SQLite log goes (defaults to `z3agent.db` in the current directory).
|
||||
|
||||
### solve
|
||||
|
||||
Check whether a set of constraints has a solution.
|
||||
|
||||
```
|
||||
python3 .github/skills/solve/scripts/solve.py \
|
||||
--z3 build/release/z3 \
|
||||
--formula '
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (> x 0))
|
||||
(assert (> y 0))
|
||||
(assert (< (+ x y) 5))
|
||||
(check-sat)
|
||||
(get-model)'
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
sat
|
||||
x = 1
|
||||
y = 1
|
||||
```
|
||||
|
||||
### prove
|
||||
|
||||
Check whether a property holds for all values. The script negates your
|
||||
conjecture and asks Z3 if the negation is satisfiable. If it is not,
|
||||
the property is valid.
|
||||
|
||||
```
|
||||
python3 .github/skills/prove/scripts/prove.py \
|
||||
--z3 build/release/z3 \
|
||||
--conjecture '(>= (* x x) 0)' \
|
||||
--vars 'x:Int'
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
valid
|
||||
```
|
||||
|
||||
If Z3 finds a counterexample, it prints `invalid` followed by the
|
||||
counterexample values.
|
||||
|
||||
### optimize
|
||||
|
||||
Find the best value of an objective subject to constraints.
|
||||
|
||||
```
|
||||
python3 .github/skills/optimize/scripts/optimize.py \
|
||||
--z3 build/release/z3 \
|
||||
--formula '
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (>= x 1))
|
||||
(assert (>= y 1))
|
||||
(assert (<= (+ x y) 20))
|
||||
(maximize (+ (* 3 x) (* 2 y)))
|
||||
(check-sat)
|
||||
(get-model)'
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
sat
|
||||
x = 19
|
||||
y = 1
|
||||
```
|
||||
|
||||
Here Z3 maximizes `3x + 2y` under the constraint `x + y <= 20`, so it
|
||||
pushes x as high as possible (19) and keeps y at its minimum (1),
|
||||
giving `3*19 + 2*1 = 59`.
|
||||
|
||||
### simplify
|
||||
|
||||
Reduce expressions using Z3 tactic chains.
|
||||
|
||||
```
|
||||
python3 .github/skills/simplify/scripts/simplify.py \
|
||||
--z3 build/release/z3 \
|
||||
--formula '(declare-const x Int)(simplify (+ x 0 (* 1 x)))'
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
(* 2 x)
|
||||
(goals
|
||||
(goal
|
||||
:precision precise :depth 1)
|
||||
)
|
||||
```
|
||||
|
||||
Z3 simplified `x + 0 + 1*x` down to `2*x`.
|
||||
|
||||
### benchmark
|
||||
|
||||
Measure solving time over multiple runs.
|
||||
|
||||
```
|
||||
python3 .github/skills/benchmark/scripts/benchmark.py \
|
||||
--z3 build/release/z3 \
|
||||
--runs 5 \
|
||||
--formula '
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (> x 0))
|
||||
(assert (> y 0))
|
||||
(assert (< (+ x y) 100))
|
||||
(check-sat)'
|
||||
```
|
||||
|
||||
Output (times will vary on your machine):
|
||||
|
||||
```
|
||||
runs: 5
|
||||
min: 27ms
|
||||
median: 28ms
|
||||
max: 30ms
|
||||
result: sat
|
||||
```
|
||||
|
||||
### explain
|
||||
|
||||
Interpret Z3 output in readable form. It reads from stdin:
|
||||
|
||||
```
|
||||
echo 'sat
|
||||
(
|
||||
(define-fun x () Int
|
||||
19)
|
||||
(define-fun y () Int
|
||||
1)
|
||||
)' | python3 .github/skills/explain/scripts/explain.py --stdin --type model
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
satisfying assignment:
|
||||
x = 19
|
||||
y = 1
|
||||
```
|
||||
|
||||
### encode
|
||||
|
||||
Validate that an SMT-LIB2 file is well-formed by running it through Z3:
|
||||
|
||||
```
|
||||
python3 .github/skills/encode/scripts/encode.py \
|
||||
--z3 build/release/z3 \
|
||||
--validate problem.smt2
|
||||
```
|
||||
|
||||
If the file parses and runs without errors, it prints the formula back.
|
||||
If there are syntax or sort errors, it prints the Z3 error message.
|
||||
|
||||
### memory-safety
|
||||
|
||||
Build Z3 with AddressSanitizer or UndefinedBehaviorSanitizer, run the
|
||||
test suite, and collect any findings.
|
||||
|
||||
```
|
||||
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer asan
|
||||
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer ubsan
|
||||
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer both
|
||||
```
|
||||
|
||||
Use `--skip-build` to reuse a previous instrumented build. Use
|
||||
`--build-dir path` to control where the build goes (defaults to
|
||||
`build/sanitizer-asan` or `build/sanitizer-ubsan` under the repo root).
|
||||
|
||||
If cmake, make, or a C compiler is not found, the script prints what
|
||||
you need to install and exits.
|
||||
|
||||
### static-analysis
|
||||
|
||||
Run Clang Static Analyzer over the Z3 source tree.
|
||||
|
||||
```
|
||||
python3 .github/skills/static-analysis/scripts/static_analysis.py \
|
||||
--build-dir build/scan
|
||||
```
|
||||
|
||||
Results go to `build/scan/scan-results/` by default. Findings are
|
||||
printed grouped by category with file and line number.
|
||||
|
||||
If `scan-build` is not on your PATH, the script prints install
|
||||
instructions for Ubuntu, macOS, and Fedora.
|
||||
|
||||
## Debug tracing
|
||||
|
||||
Add `--debug` to any command to see the full trace: run IDs, z3 binary
|
||||
path, the exact command and stdin sent to Z3, stdout/stderr received,
|
||||
timing, and database logging. Example:
|
||||
|
||||
```
|
||||
python3 .github/skills/solve/scripts/solve.py \
|
||||
--z3 build/release/z3 --debug \
|
||||
--formula '
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (> x 0))
|
||||
(assert (> y 0))
|
||||
(assert (< (+ x y) 5))
|
||||
(check-sat)
|
||||
(get-model)'
|
||||
```
|
||||
|
||||
```
|
||||
[DEBUG] started run 31 (skill=solve, hash=d64beb5a61842362)
|
||||
[DEBUG] found z3: build/release/z3
|
||||
[DEBUG] cmd: build/release/z3 -in
|
||||
[DEBUG] stdin:
|
||||
(declare-const x Int)
|
||||
(declare-const y Int)
|
||||
(assert (> x 0))
|
||||
(assert (> y 0))
|
||||
(assert (< (+ x y) 5))
|
||||
(check-sat)
|
||||
(get-model)
|
||||
[DEBUG] exit_code=0 duration=28ms
|
||||
[DEBUG] stdout:
|
||||
sat
|
||||
(
|
||||
(define-fun x () Int
|
||||
1)
|
||||
(define-fun y () Int
|
||||
1)
|
||||
)
|
||||
[DEBUG] finished run 31: sat (28ms)
|
||||
sat
|
||||
x = 1
|
||||
y = 1
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
Every run is logged to a SQLite database (`z3agent.db` by default).
|
||||
You can query it directly:
|
||||
|
||||
```
|
||||
sqlite3 z3agent.db "SELECT id, skill, status, duration_ms FROM runs ORDER BY id DESC LIMIT 10;"
|
||||
```
|
||||
|
||||
Use `--db /path/to/file.db` on any script to put the database somewhere
|
||||
else.
|
||||
|
||||
## Skill list
|
||||
|
||||
| Skill | What it does |
|
||||
|-------|-------------|
|
||||
| solve | check satisfiability, extract models or unsat cores |
|
||||
| prove | prove validity by negating and checking unsatisfiability |
|
||||
| optimize | minimize or maximize objectives under constraints |
|
||||
| simplify | reduce formulas with Z3 tactic chains |
|
||||
| encode | translate problems into SMT-LIB2, validate syntax |
|
||||
| explain | interpret Z3 output (models, cores, stats, errors) |
|
||||
| benchmark | measure solving time, collect statistics |
|
||||
| memory-safety | run ASan/UBSan on Z3 test suite |
|
||||
| static-analysis | run Clang Static Analyzer on Z3 source |
|
||||
Loading…
Add table
Add a link
Reference in a new issue