3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-03-15 17:49:59 +00:00

Merge pull request #8948 from angelica-moreira/z3-skill-exploration

Add Copilot skills and agent for Z3
This commit is contained in:
Nikolaj Bjorner 2026-03-11 19:20:31 -07:00 committed by GitHub
commit 9c2d34ce43
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
24 changed files with 3045 additions and 0 deletions

180
.github/agents/z3.md vendored Normal file
View file

@ -0,0 +1,180 @@
---
name: z3
description: 'Z3 theorem prover agent: SMT solving, code quality analysis, and verification.'
---
## Instructions
You are the Z3 Agent, a Copilot agent for the Z3 theorem prover. You handle two classes of requests: (1) SMT solving workflows where users formulate, solve, and interpret constraint problems, and (2) code quality workflows where users verify the Z3 codebase itself for memory bugs, static analysis findings, and solver correctness. Route to the appropriate skills based on the request.
### Workflow
1. **Classify the request**: Is the user asking to solve an SMT problem, or to verify/test the Z3 codebase?
2. **For SMT problems**:
- Encode the problem into SMT-LIB2 if needed (via **encode**).
- Route to the appropriate solving skill (**solve**, **prove**, **optimize**, **simplify**).
- Interpret the result (via **explain**).
- Measure performance if relevant (via **benchmark**).
3. **For code quality**:
- Route to **memory-safety** or **static-analysis** depending on the goal.
- Independent skills may run in parallel.
- Aggregate and deduplicate findings across skills.
4. **Report**: Present results clearly. For SMT problems, interpret models and proofs. For code quality, sort findings by severity with file locations.
5. **Iterate**: On follow-ups, refine the formulation or narrow the scope. Do not re-run the full pipeline when only a narrow adjustment is needed.
### Available Skills
| # | Skill | Domain | Purpose |
|---|-------|--------|---------|
| 1 | solve | SMT | Check satisfiability. Extract models or unsat cores. |
| 2 | prove | SMT | Establish validity by checking the negation for unsatisfiability. |
| 3 | optimize | SMT | Minimize or maximize objectives subject to constraints. |
| 4 | simplify | SMT | Apply tactic chains to reduce formula complexity. |
| 5 | encode | SMT | Translate problem descriptions into SMT-LIB2 syntax. |
| 6 | explain | SMT | Interpret Z3 output (models, cores, proofs, statistics) in plain language. |
| 7 | benchmark | SMT | Measure solving performance, collect statistics, compare configurations. |
| 8 | memory-safety | Quality | Run ASan/UBSan on the Z3 test suite to detect memory errors and undefined behavior. |
| 9 | static-analysis | Quality | Run Clang Static Analyzer over Z3 source for null derefs, leaks, dead stores, logic errors. |
### Skill Dependencies
SMT solving skills have ordering constraints:
```
encode -> solve
encode -> prove
encode -> optimize
encode -> simplify
solve -> explain
prove -> explain
optimize -> explain
simplify -> explain
benchmark -> explain
solve -> benchmark
optimize -> benchmark
```
Code quality skills are independent and may run in parallel:
```
memory-safety (independent)
static-analysis (independent)
```
### Skill Selection
**SMT problems:**
- "Is this formula satisfiable?" : `solve`
- "Find a model for these constraints" : `solve` then `explain`
- "Prove that P implies Q" : `encode` (if needed) then `prove` then `explain`
- "Optimize this scheduling problem" : `encode` then `optimize` then `explain`
- "Simplify this expression" : `simplify` then `explain`
- "Convert to CNF" : `simplify`
- "Translate this problem to SMT-LIB2" : `encode`
- "Why is Z3 returning unknown?" : `explain`
- "Why is this query slow?" : `benchmark` then `explain`
- "What does this model mean?" : `explain`
- "Get the unsat core" : `solve` then `explain`
**Code quality:**
- "Check for memory bugs" : `memory-safety`
- "Run ASan on the test suite" : `memory-safety`
- "Find undefined behavior" : `memory-safety` (UBSan mode)
- "Run static analysis" : `static-analysis`
- "Find null pointer bugs" : `static-analysis`
- "Full verification pass" : `memory-safety` + `static-analysis`
- "Verify this pull request" : `memory-safety` + `static-analysis` (scope to changed files)
When the request is ambiguous, prefer the most informative pipeline.
### Examples
User: "Is (x > 0 and y > 0 and x + y < 1) satisfiable over the reals?"
1. **solve**: Assert the conjunction over real-valued variables. Run `(check-sat)`.
2. **explain**: Present the model or state unsatisfiability.
User: "Prove that for all integers x, if x^2 is even then x is even."
1. **encode**: Formalize and negate the statement.
2. **prove**: Check the negation for unsatisfiability.
3. **explain**: Present the validity result or counterexample.
User: "Schedule five tasks on two machines to minimize makespan."
1. **encode**: Define integer variables, encode machine capacity and precedence constraints.
2. **optimize**: Minimize the makespan variable.
3. **explain**: Present the optimal schedule and binding constraints.
User: "Why is my bitvector query so slow?"
1. **benchmark**: Run with statistics collection.
2. **explain**: Identify cost centers and suggest parameter adjustments.
User: "Check for memory bugs in the SAT solver."
1. **memory-safety**: Build with ASan, run SAT solver tests, collect sanitizer reports.
2. Report findings with stack traces categorized by bug type.
User: "Full verification pass before the release."
1. Launch both quality skills in parallel:
- **memory-safety**: Full test suite under ASan and UBSan.
- **static-analysis**: Full source tree scan.
2. Aggregate findings, deduplicate, sort by severity.
### Build Configurations
Code quality skills may require specific builds:
**memory-safety (ASan)**:
```bash
mkdir build-asan && cd build-asan
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_BUILD_TYPE=Debug
make -j$(nproc)
```
**memory-safety (UBSan)**:
```bash
mkdir build-ubsan && cd build-ubsan
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=undefined" -DCMAKE_C_FLAGS="-fsanitize=undefined" -DCMAKE_BUILD_TYPE=Debug
make -j$(nproc)
```
**static-analysis**:
```bash
mkdir build-analyze && cd build-analyze
scan-build cmake .. -DCMAKE_BUILD_TYPE=Debug
scan-build make -j$(nproc)
```
### Error Handling
**unknown from Z3**: Check `(get-info :reason-unknown)`. If "incomplete," suggest alternative encodings. If "timeout," suggest parameter tuning or the **simplify** skill.
**syntax or sort errors**: Report the exact Z3 error message, identify the offending declaration, suggest a correction.
**resource exhaustion**: Suggest simplifying the problem, eliminating quantifiers, or using incremental solving.
**build failure**: Report compiler errors. Common cause: sanitizer flags incompatible with optimization levels.
**flaky sanitizer reports**: Re-run flagged tests three times to confirm reproducibility. Mark non-reproducible findings as "intermittent."
**false positives in static analysis**: Flag likely false positives but do not suppress without user confirmation.
### Notes
- Validate SMT-LIB2 syntax before invoking Z3.
- Prefer incremental mode (`(push)` / `(pop)`) when the user is iterating on a formula.
- Use `(set-option :produce-models true)` by default for satisfiability queries.
- Collect statistics with `z3 -st` when performance is relevant.
- Present models in readable table format, not raw S-expressions.
- Sanitizer builds are slower than Release builds. Set timeouts to at least 3x normal.
- Store code quality artifacts in `.z3-agent/`.
- Never fabricate results or suppress findings.

72
.github/skills/README.md vendored Normal file
View file

@ -0,0 +1,72 @@
# Z3 Agent Skills
Reusable, composable verification primitives for the Z3 theorem prover.
Each skill is a self-contained unit: a `SKILL.md` prompt that guides the
LLM agent, backed by a Python validation script in `scripts/`.
## Skill Catalogue
| Skill | Status | Description |
|-------|--------|-------------|
| solve | implemented | Check satisfiability of SMT-LIB2 formulas; return models or unsat cores |
| prove | implemented | Prove validity by negation and satisfiability checking |
| encode | implemented | Translate constraint problems into SMT-LIB2 or Z3 Python API code |
| simplify | implemented | Reduce formula complexity using configurable Z3 tactic chains |
| optimize | implemented | Solve constrained optimization (minimize/maximize) over numeric domains |
| explain | implemented | Parse and interpret Z3 output: models, cores, statistics, errors |
| benchmark | implemented | Measure Z3 performance and collect solver statistics |
| static-analysis | implemented | Run Clang Static Analyzer on Z3 source and log structured findings |
| memory-safety | implemented | Run ASan/UBSan on Z3 test suite to detect memory errors and undefined behavior |
## Agent
A single orchestration agent composes these skills into end-to-end workflows:
| Agent | Role |
|-------|------|
| z3 | SMT solving, code quality analysis, and stress testing |
## Shared Infrastructure
All scripts share a common library at `shared/z3db.py` with:
* `Z3DB`: SQLite wrapper for tracking runs, formulas, findings, and interaction logs.
* `run_z3()`: Pipe SMT-LIB2 into `z3 -in` with timeout handling.
* `find_z3()`: Locate the Z3 binary across build directories and PATH.
* Parsers: `parse_model()`, `parse_stats()`, `parse_unsat_core()`.
The database schema lives in `shared/schema.sql`.
## Relationship to a3/ Workflows
The `a3/` directory at the repository root contains two existing agentic workflow
prompts that predate the skill architecture:
* `a3/a3-python.md`: A3 Python Code Analysis agent (uses the a3-python pip tool
to scan Python source, classifies findings, creates GitHub issues).
* `a3/a3-rust.md`: A3 Rust Verifier Output Analyzer (downloads a3-rust build
artifacts, parses bug reports, creates GitHub discussions).
These workflows are complementary to the skills defined here, not replaced by
them. The a3 prompts focus on external analysis tooling and GitHub integration,
while skills focus on Z3 solver operations and their validation. Both may be
composed by the same orchestrating agent.
## Usage
Check database status and recent runs:
```
python shared/z3db.py status
python shared/z3db.py runs --skill solve --last 5
python shared/z3db.py log --run-id 12
python shared/z3db.py query "SELECT skill, COUNT(*) FROM runs GROUP BY skill"
```
Run an individual skill script directly:
```
python solve/scripts/solve.py --file problem.smt2
python encode/scripts/encode.py --validate formula.smt2
python benchmark/scripts/benchmark.py --file problem.smt2
```

79
.github/skills/benchmark/SKILL.md vendored Normal file
View file

@ -0,0 +1,79 @@
---
name: benchmark
description: Measure Z3 performance on a formula or file. Collects wall-clock time, theory solver statistics, memory usage, and conflict counts. Results are logged to z3agent.db for longitudinal tracking.
---
Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report performance characteristics. This is useful for identifying performance regressions, comparing tactic strategies, and profiling theory solver workload distribution.
# Step 1: Run Z3 with statistics
Action:
Invoke benchmark.py with the formula or file. Use `--runs N` for
repeated timing.
Expectation:
The script invokes `z3 -st`, parses the statistics block, and prints
a performance summary. A run entry is logged to z3agent.db.
Result:
Timing and statistics are displayed. Proceed to Step 2 to interpret.
```bash
python3 scripts/benchmark.py --file problem.smt2
python3 scripts/benchmark.py --file problem.smt2 --runs 5
python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug
```
# Step 2: Interpret the output
Action:
Review wall-clock time, memory usage, conflict counts, and per-theory
breakdowns.
Expectation:
A complete performance profile including min/median/max timing when
multiple runs are requested.
Result:
If performance is acceptable, no action needed.
If slow, try **simplify** to reduce the formula or adjust tactic strategies.
The output includes:
- wall-clock time (ms)
- result (sat/unsat/unknown/timeout)
- memory usage (MB)
- conflicts, decisions, propagations
- per-theory breakdown (arithmetic, bv, array, etc.)
With `--runs N`, the script runs Z3 N times and reports min/median/max timing.
# Step 3: Compare over time
Action:
Query past benchmark runs from z3agent.db to detect regressions or
improvements.
Expectation:
Historical run data is available for comparison, ordered by recency.
Result:
If performance regressed, investigate recent formula or tactic changes.
If improved, record the successful configuration.
```bash
python3 ../../shared/z3db.py runs --skill benchmark --last 20
python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5"
```
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula |
| file | path | no | | path to .smt2 file |
| runs | int | no | 1 | number of repeated runs for timing |
| timeout | int | no | 60 | seconds per run |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,80 @@
#!/usr/bin/env python3
"""
benchmark.py: measure Z3 performance with statistics.
Usage:
python benchmark.py --file problem.smt2
python benchmark.py --file problem.smt2 --runs 5
"""
import argparse
import statistics
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_stats, setup_logging
def main():
parser = argparse.ArgumentParser(prog="benchmark")
parser.add_argument("--formula")
parser.add_argument("--file")
parser.add_argument("--runs", type=int, default=1)
parser.add_argument("--timeout", type=int, default=60)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.formula:
formula = args.formula
else:
parser.error("provide --formula or --file")
return
db = Z3DB(args.db)
timings = []
for i in range(args.runs):
run_id = db.start_run("benchmark", formula)
result = run_z3(
formula,
z3_bin=args.z3,
timeout=args.timeout,
args=["-st"],
debug=args.debug,
)
stats = parse_stats(result["stdout"])
db.log_formula(run_id, formula, result["result"], stats=stats)
db.finish_run(
run_id, result["result"], result["duration_ms"], result["exit_code"]
)
timings.append(result["duration_ms"])
if args.runs == 1:
print(f"result: {result['result']}")
print(f"time: {result['duration_ms']}ms")
if stats:
print("statistics:")
for k, v in sorted(stats.items()):
print(f" :{k} {v}")
if args.runs > 1:
print(f"runs: {args.runs}")
print(f"min: {min(timings)}ms")
print(f"median: {statistics.median(timings):.0f}ms")
print(f"max: {max(timings)}ms")
print(f"result: {result['result']}")
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

73
.github/skills/encode/SKILL.md vendored Normal file
View file

@ -0,0 +1,73 @@
---
name: encode
description: Translate constraint problems into SMT-LIB2 or Z3 Python API code. Handles common problem classes including scheduling, graph coloring, arithmetic puzzles, and verification conditions.
---
Given a problem description (natural language, pseudocode, or a partial formulation), produce a complete, syntactically valid SMT-LIB2 encoding or Z3 Python script. The encoding should declare all variables, assert all constraints, and include the appropriate check-sat / get-model commands.
# Step 1: Identify the problem class
Action:
Determine the SMT theory and variable sorts required by the problem
description.
Expectation:
A clear mapping from the problem to one of the supported theories
(LIA, LRA, QF_BV, etc.).
Result:
If the theory is identified, proceed to Step 2. If the problem spans
multiple theories, select the appropriate combined logic.
| Problem class | Theory | Typical sorts |
|---------------|--------|---------------|
| Integer arithmetic | LIA / NIA | Int |
| Real arithmetic | LRA / NRA | Real |
| Bitvector operations | QF_BV | (_ BitVec N) |
| Arrays and maps | QF_AX | (Array Int Int) |
| Strings and regex | QF_S | String, RegLan |
| Uninterpreted functions | QF_UF | custom sorts |
| Mixed theories | AUFLIA, etc. | combination |
# Step 2: Generate the encoding
Action:
Invoke encode.py with the problem description and desired output format.
Expectation:
The script produces a complete SMT-LIB2 file or Z3 Python script with
all declarations, constraints, and check-sat commands.
Result:
For `smtlib2` format: pass the output to **solve**.
For `python` format: execute the script directly.
Proceed to Step 3 for validation.
```bash
python3 scripts/encode.py --problem "Find integers x, y such that x^2 + y^2 = 25 and x > 0" --format smtlib2
python3 scripts/encode.py --problem "Schedule 4 tasks on 2 machines minimizing makespan" --format python
```
# Step 3: Validate the encoding
Action:
The script runs a syntax check by piping the output through `z3 -in`
in parse-only mode.
Expectation:
No parse errors. If errors occur, the offending line is reported.
Result:
On success: the encoding is ready for **solve**, **prove**, or **optimize**.
On parse error: fix the reported line and re-run.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| problem | string | yes | | problem description |
| format | string | no | smtlib2 | output format: smtlib2 or python |
| output | path | no | stdout | write to file instead of stdout |
| validate | flag | no | on | run syntax check on the output |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

149
.github/skills/encode/scripts/encode.py vendored Normal file
View file

@ -0,0 +1,149 @@
#!/usr/bin/env python3
"""
encode.py: validate and format SMT-LIB2 encodings.
Usage:
python encode.py --validate formula.smt2
python encode.py --validate formula.smt2 --debug
"""
import argparse
import re
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, setup_logging
VALIDATION_TIMEOUT = 5
def read_input(path_or_stdin: str) -> str:
"""Read formula from a file path or stdin (when path is '-')."""
if path_or_stdin == "-":
return sys.stdin.read()
p = Path(path_or_stdin)
if not p.exists():
print(f"file not found: {p}", file=sys.stderr)
sys.exit(1)
return p.read_text()
def find_errors(output: str) -> list:
"""Extract (error ...) messages from Z3 output."""
return re.findall(r'\(error\s+"([^"]+)"\)', output)
def validate(formula: str, z3_bin: str = None, debug: bool = False) -> dict:
"""
Validate an SMT-LIB2 formula by piping it through z3 -in.
Returns a dict with 'valid' (bool), 'errors' (list), and 'raw' output.
"""
result = run_z3(
formula,
z3_bin=z3_bin,
timeout=VALIDATION_TIMEOUT,
debug=debug,
)
errors = find_errors(result["stdout"]) + find_errors(result["stderr"])
if result["result"] == "timeout":
# Timeout during validation is not a syntax error: the formula
# parsed successfully but solving exceeded the limit. That counts
# as syntactically valid.
return {"valid": True, "errors": [], "raw": result}
if errors or result["exit_code"] != 0:
return {"valid": False, "errors": errors, "raw": result}
return {"valid": True, "errors": [], "raw": result}
def report_errors(errors: list, formula: str):
"""Print each syntax error with surrounding context."""
lines = formula.splitlines()
print(f"validation failed: {len(errors)} error(s)", file=sys.stderr)
for err in errors:
print(f" : {err}", file=sys.stderr)
if len(lines) <= 20:
print("formula:", file=sys.stderr)
for i, line in enumerate(lines, 1):
print(f" {i:3d} {line}", file=sys.stderr)
def write_output(formula: str, output_path: str, fmt: str):
"""Write the validated formula to a file or stdout."""
if fmt == "python":
print(
"python format output is generated by the agent, " "not by this script",
file=sys.stderr,
)
sys.exit(1)
if output_path:
Path(output_path).write_text(formula)
print(f"written to {output_path}")
else:
print(formula)
def main():
parser = argparse.ArgumentParser(prog="encode")
parser.add_argument(
"--validate",
metavar="FILE",
help="path to .smt2 file to validate, or '-' for stdin",
)
parser.add_argument(
"--format",
choices=["smtlib2", "python"],
default="smtlib2",
help="output format (default: smtlib2)",
)
parser.add_argument(
"--output",
metavar="FILE",
default=None,
help="write result to file instead of stdout",
)
parser.add_argument("--z3", default=None, help="path to z3 binary")
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if not args.validate:
parser.error("provide --validate FILE")
return
formula = read_input(args.validate)
db = Z3DB(args.db)
run_id = db.start_run("encode", formula)
result = validate(formula, z3_bin=args.z3, debug=args.debug)
if result["valid"]:
db.log_formula(run_id, formula, "valid")
db.finish_run(run_id, "valid", result["raw"]["duration_ms"], 0)
write_output(formula, args.output, args.format)
db.close()
sys.exit(0)
else:
db.log_formula(run_id, formula, "error")
for err in result["errors"]:
db.log_finding(run_id, "syntax", err, severity="error")
db.finish_run(
run_id,
"error",
result["raw"]["duration_ms"],
result["raw"]["exit_code"],
)
report_errors(result["errors"], formula)
db.close()
sys.exit(1)
if __name__ == "__main__":
main()

83
.github/skills/explain/SKILL.md vendored Normal file
View file

@ -0,0 +1,83 @@
---
name: explain
description: Parse and interpret Z3 output for human consumption. Handles models, unsat cores, proofs, statistics, and error messages. Translates solver internals into plain-language explanations.
---
Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark** skills), produce a structured explanation. This skill is for cases where the solver output is large, nested, or otherwise difficult to read directly.
# Step 1: Identify the output type
Action:
Determine the category of Z3 output to explain: model, core,
statistics, error, or proof.
Expectation:
The output type maps to one of the recognized formats in the table below.
Result:
If the type is ambiguous, use `--type auto` and let the script detect it.
Proceed to Step 2.
| Output contains | Explanation type |
|----------------|-----------------|
| `(define-fun ...)` blocks | model explanation |
| unsat core labels | conflict explanation |
| `:key value` statistics | performance breakdown |
| `(error ...)` | error diagnosis |
| proof terms | proof sketch |
# Step 2: Run the explainer
Action:
Invoke explain.py with the output file or stdin.
Expectation:
The script auto-detects the output type and produces a structured
plain-language summary.
Result:
A formatted explanation is printed. If detection fails, re-run with
an explicit `--type` flag.
```bash
python3 scripts/explain.py --file output.txt
python3 scripts/explain.py --stdin < output.txt
python3 scripts/explain.py --file output.txt --debug
```
# Step 3: Interpret the explanation
Action:
Review the structured explanation for accuracy and completeness.
Expectation:
Models list each variable with its value and sort. Cores list
conflicting assertions. Statistics show time and memory breakdowns.
Result:
Use the explanation to answer the user query or to guide the next
skill invocation.
For models:
- Each variable is listed with its value and sort
- Array and function interpretations are expanded
- Bitvector values are shown in decimal and hex
For unsat cores:
- The conflicting named assertions are listed
- A minimal conflict set is highlighted
For statistics:
- Time breakdown by phase (preprocessing, solving, model construction)
- Theory solver load distribution
- Memory high-water mark
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| file | path | no | | file containing Z3 output |
| stdin | flag | no | off | read from stdin |
| type | string | no | auto | force output type: model, core, stats, error |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,129 @@
#!/usr/bin/env python3
"""
explain.py: interpret Z3 output in a readable form.
Usage:
python explain.py --file output.txt
echo "sat\n(model ...)" | python explain.py --stdin
"""
import argparse
import re
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, parse_model, parse_stats, parse_unsat_core, setup_logging
def detect_type(text: str) -> str:
if "(define-fun" in text:
return "model"
if "(error" in text:
return "error"
if re.search(r":\S+\s+[\d.]+", text):
return "stats"
first = text.strip().split("\n")[0].strip()
if first == "unsat":
return "core"
return "unknown"
def explain_model(text: str):
model = parse_model(text)
if not model:
print("no model found in output")
return
print("satisfying assignment:")
for name, val in model.items():
# show hex for large integers (likely bitvectors)
try:
n = int(val)
if abs(n) > 255:
print(f" {name} = {val} (0x{n:x})")
else:
print(f" {name} = {val}")
except ValueError:
print(f" {name} = {val}")
def explain_core(text: str):
core = parse_unsat_core(text)
if core:
print(f"conflicting assertions ({len(core)}):")
for label in core:
print(f" {label}")
else:
print("unsat (no named assertions for core extraction)")
def explain_stats(text: str):
stats = parse_stats(text)
if not stats:
print("no statistics found")
return
print("performance breakdown:")
for k in sorted(stats):
print(f" :{k} {stats[k]}")
if "time" in stats:
print(f"\ntotal time: {stats['time']}s")
if "memory" in stats:
print(f"peak memory: {stats['memory']} MB")
def explain_error(text: str):
errors = re.findall(r'\(error\s+"([^"]+)"\)', text)
if errors:
print(f"Z3 reported {len(errors)} error(s):")
for e in errors:
print(f" {e}")
else:
print("error in output but could not parse message")
def main():
parser = argparse.ArgumentParser(prog="explain")
parser.add_argument("--file")
parser.add_argument("--stdin", action="store_true")
parser.add_argument(
"--type", choices=["model", "core", "stats", "error", "auto"], default="auto"
)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
text = Path(args.file).read_text()
elif args.stdin:
text = sys.stdin.read()
else:
parser.error("provide --file or --stdin")
return
output_type = args.type if args.type != "auto" else detect_type(text)
db = Z3DB(args.db)
run_id = db.start_run("explain", text[:200])
if output_type == "model":
explain_model(text)
elif output_type == "core":
explain_core(text)
elif output_type == "stats":
explain_stats(text)
elif output_type == "error":
explain_error(text)
else:
print("could not determine output type")
print("raw output:")
print(text[:500])
db.finish_run(run_id, "success", 0, 0)
db.close()
if __name__ == "__main__":
main()

92
.github/skills/memory-safety/SKILL.md vendored Normal file
View file

@ -0,0 +1,92 @@
---
name: memory-safety
description: Run AddressSanitizer and UndefinedBehaviorSanitizer on the Z3 test suite to detect memory errors, undefined behavior, and leaks. Logs each finding to z3agent.db.
---
Build Z3 with compiler-based sanitizer instrumentation, execute the test suite, and parse the runtime output for memory safety violations. Supported sanitizers are AddressSanitizer (heap and stack buffer overflows, use-after-free, double-free, memory leaks) and UndefinedBehaviorSanitizer (signed integer overflow, null pointer dereference, misaligned access, shift errors). Findings are deduplicated and stored in z3agent.db for triage and longitudinal tracking.
# Step 1: Configure and build
Action:
Invoke the script with the desired sanitizer flag. The script calls
cmake with the appropriate `-fsanitize` flags and builds the `test-z3`
target. Each sanitizer uses a separate build directory to avoid flag
conflicts.
Expectation:
cmake configures successfully and make compiles the instrumented binary.
If a prior build exists with matching flags, only incremental compilation
runs.
Result:
On success: an instrumented `test-z3` binary is ready in the build
directory. Proceed to Step 2.
On failure: verify compiler support for the requested sanitizer flags
and review cmake output.
```bash
python3 scripts/memory_safety.py --sanitizer asan
python3 scripts/memory_safety.py --sanitizer ubsan
python3 scripts/memory_safety.py --sanitizer both
```
To reuse an existing build:
```bash
python3 scripts/memory_safety.py --sanitizer asan --skip-build --build-dir build/sanitizer-asan
```
# Step 2: Run and collect
Action:
Execute the instrumented test binary with halt_on_error=0 so all
violations are reported rather than aborting on the first.
Expectation:
The script parses AddressSanitizer, UndefinedBehaviorSanitizer, and
LeakSanitizer patterns from combined output, extracts source locations,
and deduplicates by category/file/line.
Result:
On `clean`: no violations detected.
On `findings`: one or more violations found, each printed with severity,
category, message, and source location.
On `timeout`: test suite did not finish; increase timeout or investigate.
On `error`: build or execution failed before sanitizer output.
```bash
python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug
```
# Step 3: Interpret results
Action:
Review printed findings and query z3agent.db for historical comparison.
Expectation:
Each finding includes severity, category, message, and source location.
The database query returns prior runs for trend analysis.
Result:
On `clean`: no action required; proceed.
On `findings`: triage by severity and category. Compare against prior
runs to distinguish new regressions from known issues.
On `timeout`: increase the deadline or investigate a possible infinite
loop.
On `error`: inspect build logs before re-running.
Query past runs:
```bash
python3 ../../shared/z3db.py runs --skill memory-safety --last 10
python3 ../../shared/z3db.py query "SELECT category, severity, file, line, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='memory-safety') ORDER BY run_id DESC LIMIT 20"
```
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| sanitizer | choice | no | asan | which sanitizer to enable: asan, ubsan, or both |
| build-dir | path | no | build/sanitizer-{name} | path to the build directory |
| timeout | int | no | 600 | seconds before killing the test run |
| skip-build | flag | no | off | reuse an existing instrumented build |
| debug | flag | no | off | verbose cmake, make, and test output |
| db | path | no | .z3-agent/z3agent.db | path to the logging database |

View file

@ -0,0 +1,287 @@
#!/usr/bin/env python3
"""
memory_safety.py: run sanitizer checks on Z3 test suite.
Usage:
python memory_safety.py --sanitizer asan
python memory_safety.py --sanitizer ubsan --debug
"""
import argparse
import logging
import os
import re
import shutil
import subprocess
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, setup_logging
logger = logging.getLogger("z3agent")
SANITIZER_FLAGS = {
"asan": "-fsanitize=address -fno-omit-frame-pointer",
"ubsan": "-fsanitize=undefined -fno-omit-frame-pointer",
}
ASAN_ERROR = re.compile(r"ERROR:\s*AddressSanitizer:\s*(\S+)")
UBSAN_ERROR = re.compile(r":\d+:\d+:\s*runtime error:\s*(.+)")
LEAK_ERROR = re.compile(r"ERROR:\s*LeakSanitizer:")
LOCATION = re.compile(r"(\S+\.(?:cpp|c|h|hpp)):(\d+)")
def check_dependencies():
"""Fail early if required build tools are not on PATH."""
missing = []
if not shutil.which("cmake"):
missing.append(("cmake", "sudo apt install cmake"))
if not shutil.which("make"):
missing.append(("make", "sudo apt install build-essential"))
cc = shutil.which("clang") or shutil.which("gcc")
if not cc:
missing.append(("clang or gcc", "sudo apt install clang"))
if missing:
print("required tools not found:", file=sys.stderr)
for tool, install in missing:
print(f" {tool}: {install}", file=sys.stderr)
sys.exit(1)
def find_repo_root() -> Path:
d = Path.cwd()
for _ in range(10):
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
return d
parent = d.parent
if parent == d:
break
d = parent
logger.error("could not locate Z3 repository root")
sys.exit(1)
def build_is_configured(build_dir: Path, sanitizer: str) -> bool:
"""Check whether the build directory already has a matching cmake config."""
cache = build_dir / "CMakeCache.txt"
if not cache.is_file():
return False
expected = SANITIZER_FLAGS[sanitizer].split()[0]
return expected in cache.read_text()
def configure(build_dir: Path, sanitizer: str, repo_root: Path) -> bool:
"""Run cmake with the requested sanitizer flags."""
flags = SANITIZER_FLAGS[sanitizer]
build_dir.mkdir(parents=True, exist_ok=True)
cmd = [
"cmake", str(repo_root),
f"-DCMAKE_C_FLAGS={flags}",
f"-DCMAKE_CXX_FLAGS={flags}",
f"-DCMAKE_EXE_LINKER_FLAGS={flags}",
"-DCMAKE_BUILD_TYPE=Debug",
"-DZ3_BUILD_TEST=ON",
]
logger.info("configuring %s build in %s", sanitizer, build_dir)
logger.debug("cmake command: %s", " ".join(cmd))
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
if proc.returncode != 0:
logger.error("cmake failed:\n%s", proc.stderr)
return False
return True
def compile_tests(build_dir: Path) -> bool:
"""Compile the test-z3 target."""
nproc = os.cpu_count() or 4
cmd = ["make", f"-j{nproc}", "test-z3"]
logger.info("compiling test-z3 (%d parallel jobs)", nproc)
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
if proc.returncode != 0:
logger.error("compilation failed:\n%s", proc.stderr[-2000:])
return False
return True
def run_tests(build_dir: Path, timeout: int) -> dict:
"""Execute test-z3 under sanitizer runtime and capture output."""
test_bin = build_dir / "test-z3"
if not test_bin.is_file():
logger.error("test-z3 not found at %s", test_bin)
return {"stdout": "", "stderr": "binary not found", "exit_code": -1,
"duration_ms": 0}
env = os.environ.copy()
env["ASAN_OPTIONS"] = "detect_leaks=1:halt_on_error=0:print_stacktrace=1"
env["UBSAN_OPTIONS"] = "print_stacktrace=1:halt_on_error=0"
cmd = [str(test_bin), "/a"]
logger.info("running: %s", " ".join(cmd))
start = time.monotonic()
try:
proc = subprocess.run(
cmd, capture_output=True, text=True, timeout=timeout,
cwd=build_dir, env=env,
)
except subprocess.TimeoutExpired:
ms = int((time.monotonic() - start) * 1000)
logger.warning("test-z3 timed out after %dms", ms)
return {"stdout": "", "stderr": "timeout", "exit_code": -1,
"duration_ms": ms}
ms = int((time.monotonic() - start) * 1000)
logger.debug("exit_code=%d duration=%dms", proc.returncode, ms)
return {
"stdout": proc.stdout,
"stderr": proc.stderr,
"exit_code": proc.returncode,
"duration_ms": ms,
}
def parse_findings(output: str) -> list:
"""Extract sanitizer error reports from combined stdout and stderr."""
findings = []
lines = output.split("\n")
for i, line in enumerate(lines):
entry = None
m = ASAN_ERROR.search(line)
if m:
entry = {"category": "asan", "message": m.group(1),
"severity": "high"}
if not entry:
m = LEAK_ERROR.search(line)
if m:
entry = {"category": "leak",
"message": "detected memory leaks",
"severity": "high"}
if not entry:
m = UBSAN_ERROR.search(line)
if m:
entry = {"category": "ubsan", "message": m.group(1),
"severity": "medium"}
if not entry:
continue
file_path, line_no = None, None
window = lines[max(0, i - 2):i + 5]
for ctx in window:
loc = LOCATION.search(ctx)
if loc and "/usr/" not in loc.group(1):
file_path = loc.group(1)
line_no = int(loc.group(2))
break
entry["file"] = file_path
entry["line"] = line_no
entry["raw"] = line.strip()
findings.append(entry)
return findings
def deduplicate(findings: list) -> list:
"""Remove duplicate reports at the same category, file, and line."""
seen = set()
result = []
for f in findings:
key = (f["category"], f["file"], f["line"], f["message"])
if key in seen:
continue
seen.add(key)
result.append(f)
return result
def main():
parser = argparse.ArgumentParser(prog="memory-safety")
parser.add_argument("--sanitizer", choices=["asan", "ubsan", "both"],
default="asan",
help="sanitizer to enable (default: asan)")
parser.add_argument("--build-dir", default=None,
help="path to build directory")
parser.add_argument("--timeout", type=int, default=600,
help="seconds before killing test run")
parser.add_argument("--skip-build", action="store_true",
help="reuse existing instrumented build")
parser.add_argument("--db", default=None,
help="path to z3agent.db")
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
check_dependencies()
repo_root = find_repo_root()
sanitizers = ["asan", "ubsan"] if args.sanitizer == "both" else [args.sanitizer]
all_findings = []
db = Z3DB(args.db)
for san in sanitizers:
if args.build_dir:
build_dir = Path(args.build_dir)
else:
build_dir = repo_root / "build" / f"sanitizer-{san}"
run_id = db.start_run("memory-safety", f"sanitizer={san}")
db.log(f"sanitizer: {san}, build: {build_dir}", run_id=run_id)
if not args.skip_build:
needs_configure = not build_is_configured(build_dir, san)
if needs_configure and not configure(build_dir, san, repo_root):
db.finish_run(run_id, "error", 0, exit_code=1)
print(f"FAIL: cmake configuration failed for {san}")
continue
if not compile_tests(build_dir):
db.finish_run(run_id, "error", 0, exit_code=1)
print(f"FAIL: compilation failed for {san}")
continue
result = run_tests(build_dir, args.timeout)
combined = result["stdout"] + "\n" + result["stderr"]
findings = deduplicate(parse_findings(combined))
for f in findings:
db.log_finding(
run_id,
category=f["category"],
message=f["message"],
severity=f["severity"],
file=f["file"],
line=f["line"],
details={"raw": f["raw"]},
)
status = "clean" if not findings else "findings"
if result["exit_code"] == -1:
status = "timeout" if "timeout" in result["stderr"] else "error"
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
all_findings.extend(findings)
print(f"{san}: {len(findings)} finding(s), {result['duration_ms']}ms")
if all_findings:
print(f"\nTotal: {len(all_findings)} finding(s)")
for f in all_findings:
loc = f"{f['file']}:{f['line']}" if f["file"] else "unknown location"
print(f" [{f['severity']}] {f['category']}: {f['message']} at {loc}")
db.close()
sys.exit(1)
else:
print("\nNo sanitizer findings.")
db.close()
sys.exit(0)
if __name__ == "__main__":
main()

75
.github/skills/optimize/SKILL.md vendored Normal file
View file

@ -0,0 +1,75 @@
---
name: optimize
description: Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.
---
Given a set of constraints and an objective function, find the optimal value. Z3 supports both hard constraints (must hold) and soft constraints (weighted preferences), as well as lexicographic multi-objective optimization.
# Step 1: Formulate the problem
Action:
Write constraints and an objective using `(minimize ...)` or
`(maximize ...)` directives, followed by `(check-sat)` and `(get-model)`.
Expectation:
A valid SMT-LIB2 formula with at least one optimization directive and
all variables declared.
Result:
If the formula is well-formed, proceed to Step 2. For multi-objective
problems, list directives in priority order for lexicographic optimization.
Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`:
```smtlib
(declare-const x Int)
(declare-const y Int)
(assert (>= x 1))
(assert (>= y 2))
(assert (<= (+ x y) 10))
(minimize (+ x y))
(check-sat)
(get-model)
```
# Step 2: Run the optimizer
Action:
Invoke optimize.py with the formula or file path.
Expectation:
The script prints `sat` with the optimal assignment, `unsat`, `unknown`,
or `timeout`. A run entry is logged to z3agent.db.
Result:
On `sat`: proceed to Step 3 to read the optimal values.
On `unsat` or `timeout`: check constraints for contradictions or simplify.
```bash
python3 scripts/optimize.py --file scheduling.smt2
python3 scripts/optimize.py --formula "<inline smt-lib2>" --debug
```
# Step 3: Interpret the output
Action:
Parse the objective value and satisfying assignment from the output.
Expectation:
`sat` with a model containing the optimal value, `unsat` indicating
infeasibility, or `unknown`/`timeout`.
Result:
On `sat`: report the optimal value and assignment.
On `unsat`: the constraints are contradictory, no feasible solution exists.
On `unknown`/`timeout`: relax constraints or try **simplify**.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula with minimize/maximize |
| file | path | no | | path to .smt2 file |
| timeout | int | no | 60 | seconds |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,58 @@
#!/usr/bin/env python3
"""
optimize.py: solve constrained optimization problems via Z3.
Usage:
python optimize.py --file scheduling.smt2
python optimize.py --formula "(declare-const x Int)..." --debug
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_model, setup_logging
def main():
parser = argparse.ArgumentParser(prog="optimize")
parser.add_argument("--formula")
parser.add_argument("--file")
parser.add_argument("--timeout", type=int, default=60)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.formula:
formula = args.formula
else:
parser.error("provide --formula or --file")
return
db = Z3DB(args.db)
run_id = db.start_run("optimize", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
db.log_formula(run_id, formula, result["result"], str(model) if model else None)
db.finish_run(run_id, result["result"], result["duration_ms"], result["exit_code"])
print(result["result"])
if model:
for name, val in model.items():
print(f" {name} = {val}")
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

83
.github/skills/prove/SKILL.md vendored Normal file
View file

@ -0,0 +1,83 @@
---
name: prove
description: Prove validity of logical statements by negation and satisfiability checking. If the negation is unsatisfiable, the original statement is valid. Otherwise a counterexample is returned.
---
Given a conjecture (an SMT-LIB2 assertion or a natural language claim), determine whether it holds universally. The method is standard: negate the conjecture and check satisfiability. If the negation is unsatisfiable, the original is valid. If satisfiable, the model is a counterexample.
# Step 1: Prepare the negated formula
Action:
Wrap the conjecture in `(assert (not ...))` and append
`(check-sat)(get-model)`.
Expectation:
A complete SMT-LIB2 formula that negates the original conjecture with
all variables declared.
Result:
If the negation is well-formed, proceed to Step 2.
If the conjecture is natural language, run **encode** first.
Example: to prove that `(> x 3)` implies `(> x 1)`:
```smtlib
(declare-const x Int)
(assert (not (=> (> x 3) (> x 1))))
(check-sat)
(get-model)
```
# Step 2: Run the prover
Action:
Invoke prove.py with the conjecture and variable declarations.
Expectation:
The script prints `valid`, `invalid` (with counterexample), `unknown`,
or `timeout`. A run entry is logged to z3agent.db.
Result:
On `valid`: proceed to **explain** if the user needs a summary.
On `invalid`: report the counterexample directly.
On `unknown`/`timeout`: try **simplify** first, or increase the timeout.
```bash
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
```
For file input where the file contains the full negated formula:
```bash
python3 scripts/prove.py --file negated.smt2
```
With debug tracing:
```bash
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" --debug
```
# Step 3: Interpret the output
Action:
Read the prover output to determine validity of the conjecture.
Expectation:
One of `valid`, `invalid` (with counterexample), `unknown`, or `timeout`.
Result:
On `valid`: the conjecture holds universally.
On `invalid`: the model shows a concrete counterexample.
On `unknown`/`timeout`: the conjecture may require auxiliary lemmas or induction.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| conjecture | string | no | | the assertion to prove (without negation) |
| vars | string | no | | variable declarations as "name:sort" pairs, comma-separated |
| file | path | no | | .smt2 file with the negated formula |
| timeout | int | no | 30 | seconds |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |
Either `conjecture` (with `vars`) or `file` must be provided.

82
.github/skills/prove/scripts/prove.py vendored Normal file
View file

@ -0,0 +1,82 @@
#!/usr/bin/env python3
"""
prove.py: prove validity by negation + satisfiability check.
Usage:
python prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
python prove.py --file negated.smt2
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_model, setup_logging
def build_formula(conjecture: str, vars_str: str) -> str:
lines = []
if vars_str:
for v in vars_str.split(","):
v = v.strip()
name, sort = v.split(":")
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
lines.append(f"(assert (not {conjecture}))")
lines.append("(check-sat)")
lines.append("(get-model)")
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(prog="prove")
parser.add_argument("--conjecture", help="assertion to prove")
parser.add_argument("--vars", help="variable declarations, e.g. 'x:Int,y:Bool'")
parser.add_argument("--file", help="path to .smt2 file with negated formula")
parser.add_argument("--timeout", type=int, default=30)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.conjecture:
formula = build_formula(args.conjecture, args.vars or "")
else:
parser.error("provide --conjecture or --file")
return
db = Z3DB(args.db)
run_id = db.start_run("prove", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
if result["result"] == "unsat":
verdict = "valid"
elif result["result"] == "sat":
verdict = "invalid"
else:
verdict = result["result"]
model = parse_model(result["stdout"]) if verdict == "invalid" else None
db.log_formula(run_id, formula, verdict, str(model) if model else None)
db.finish_run(run_id, verdict, result["duration_ms"], result["exit_code"])
print(verdict)
if model:
print("counterexample:")
for name, val in model.items():
print(f" {name} = {val}")
db.close()
# Exit 0 when we successfully determined validity or invalidity;
# exit 1 only for errors/timeouts.
sys.exit(0 if verdict in ("valid", "invalid") else 1)
if __name__ == "__main__":
main()

57
.github/skills/shared/schema.sql vendored Normal file
View file

@ -0,0 +1,57 @@
-- z3agent schema v1
PRAGMA journal_mode=WAL;
PRAGMA foreign_keys=ON;
CREATE TABLE IF NOT EXISTS runs (
run_id INTEGER PRIMARY KEY AUTOINCREMENT,
skill TEXT NOT NULL,
input_hash TEXT,
status TEXT NOT NULL DEFAULT 'running',
duration_ms INTEGER,
exit_code INTEGER,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_runs_skill ON runs(skill);
CREATE INDEX IF NOT EXISTS idx_runs_status ON runs(status);
CREATE TABLE IF NOT EXISTS formulas (
formula_id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
smtlib2 TEXT NOT NULL,
result TEXT,
model TEXT,
stats TEXT,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_formulas_run ON formulas(run_id);
CREATE INDEX IF NOT EXISTS idx_formulas_result ON formulas(result);
CREATE TABLE IF NOT EXISTS findings (
finding_id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
category TEXT NOT NULL,
severity TEXT,
file TEXT,
line INTEGER,
message TEXT NOT NULL,
details TEXT,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_findings_run ON findings(run_id);
CREATE INDEX IF NOT EXISTS idx_findings_category ON findings(category);
CREATE INDEX IF NOT EXISTS idx_findings_severity ON findings(severity);
CREATE TABLE IF NOT EXISTS interaction_log (
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(run_id) ON DELETE SET NULL,
level TEXT NOT NULL DEFAULT 'info',
message TEXT NOT NULL,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_log_run ON interaction_log(run_id);
CREATE INDEX IF NOT EXISTS idx_log_level ON interaction_log(level);

364
.github/skills/shared/z3db.py vendored Normal file
View file

@ -0,0 +1,364 @@
#!/usr/bin/env python3
"""
z3db: shared library and CLI for Z3 skill scripts.
Library usage:
from z3db import Z3DB, find_z3, run_z3
CLI usage:
python z3db.py init
python z3db.py status
python z3db.py log [--run-id N]
python z3db.py runs [--skill solve] [--last N]
python z3db.py query "SELECT ..."
"""
import argparse
import hashlib
import json
import logging
import os
import re
import shutil
import sqlite3
import subprocess
import sys
import time
from pathlib import Path
from typing import Optional
SCHEMA_PATH = Path(__file__).parent / "schema.sql"
DEFAULT_DB_DIR = ".z3-agent"
DEFAULT_DB_NAME = "z3agent.db"
logger = logging.getLogger("z3agent")
def setup_logging(debug: bool = False):
level = logging.DEBUG if debug else logging.INFO
fmt = (
"[%(levelname)s] %(message)s"
if not debug
else "[%(levelname)s %(asctime)s] %(message)s"
)
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
class Z3DB:
"""SQLite handle for z3agent.db, tracks runs, formulas, findings, logs."""
def __init__(self, db_path: Optional[str] = None):
if db_path is None:
db_dir = Path(DEFAULT_DB_DIR)
db_dir.mkdir(exist_ok=True)
db_path = str(db_dir / DEFAULT_DB_NAME)
self.db_path = db_path
self.conn = sqlite3.connect(db_path)
self.conn.execute("PRAGMA foreign_keys=ON")
self.conn.row_factory = sqlite3.Row
self._init_schema()
def _init_schema(self):
self.conn.executescript(SCHEMA_PATH.read_text())
def close(self):
self.conn.close()
def start_run(self, skill: str, input_text: str = "") -> int:
input_hash = hashlib.sha256(input_text.encode()).hexdigest()[:16]
cur = self.conn.execute(
"INSERT INTO runs (skill, input_hash) VALUES (?, ?)",
(skill, input_hash),
)
self.conn.commit()
run_id = cur.lastrowid
logger.debug("started run %d (skill=%s, hash=%s)", run_id, skill, input_hash)
return run_id
def finish_run(
self, run_id: int, status: str, duration_ms: int, exit_code: int = 0
):
self.conn.execute(
"UPDATE runs SET status=?, duration_ms=?, exit_code=? WHERE run_id=?",
(status, duration_ms, exit_code, run_id),
)
self.conn.commit()
logger.debug("finished run %d: %s (%dms)", run_id, status, duration_ms)
def log_formula(
self,
run_id: int,
smtlib2: str,
result: str = None,
model: str = None,
stats: dict = None,
) -> int:
cur = self.conn.execute(
"INSERT INTO formulas (run_id, smtlib2, result, model, stats) "
"VALUES (?, ?, ?, ?, ?)",
(run_id, smtlib2, result, model, json.dumps(stats) if stats else None),
)
self.conn.commit()
return cur.lastrowid
def log_finding(
self,
run_id: int,
category: str,
message: str,
severity: str = None,
file: str = None,
line: int = None,
details: dict = None,
) -> int:
cur = self.conn.execute(
"INSERT INTO findings (run_id, category, severity, file, line, "
"message, details) VALUES (?, ?, ?, ?, ?, ?, ?)",
(
run_id,
category,
severity,
file,
line,
message,
json.dumps(details) if details else None,
),
)
self.conn.commit()
return cur.lastrowid
def log(self, message: str, level: str = "info", run_id: int = None):
"""Write to stderr and to the interaction_log table."""
getattr(logger, level, logger.info)(message)
self.conn.execute(
"INSERT INTO interaction_log (run_id, level, message) " "VALUES (?, ?, ?)",
(run_id, level, message),
)
self.conn.commit()
def get_runs(self, skill: str = None, last: int = 10):
sql = "SELECT * FROM runs"
params = []
if skill:
sql += " WHERE skill = ?"
params.append(skill)
sql += " ORDER BY run_id DESC LIMIT ?"
params.append(last)
return self.conn.execute(sql, params).fetchall()
def get_status(self) -> dict:
rows = self.conn.execute(
"SELECT status, COUNT(*) as cnt FROM runs GROUP BY status"
).fetchall()
total = sum(r["cnt"] for r in rows)
by_status = {r["status"]: r["cnt"] for r in rows}
last = self.conn.execute(
"SELECT timestamp FROM runs ORDER BY run_id DESC LIMIT 1"
).fetchone()
return {
"total": total,
**by_status,
"last_run": last["timestamp"] if last else None,
}
def get_logs(self, run_id: int = None, last: int = 50):
if run_id:
return self.conn.execute(
"SELECT * FROM interaction_log WHERE run_id=? "
"ORDER BY log_id DESC LIMIT ?",
(run_id, last),
).fetchall()
return self.conn.execute(
"SELECT * FROM interaction_log ORDER BY log_id DESC LIMIT ?", (last,)
).fetchall()
def query(self, sql: str):
return self.conn.execute(sql).fetchall()
def find_z3(hint: str = None) -> str:
"""Locate the z3 binary: explicit path > build dirs > PATH."""
candidates = []
if hint:
candidates.append(hint)
repo_root = _find_repo_root()
if repo_root:
for build_dir in ["build", "build/release", "build/debug"]:
candidates.append(str(repo_root / build_dir / "z3"))
path_z3 = shutil.which("z3")
if path_z3:
candidates.append(path_z3)
for c in candidates:
p = Path(c)
if p.is_file() and os.access(p, os.X_OK):
logger.debug("found z3: %s", p)
return str(p)
logger.error("z3 binary not found. Searched: %s", candidates)
sys.exit(1)
def _find_repo_root() -> Optional[Path]:
d = Path.cwd()
for _ in range(10):
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
return d
parent = d.parent
if parent == d:
break
d = parent
return None
def run_z3(
formula: str,
z3_bin: str = None,
timeout: int = 30,
args: list = None,
debug: bool = False,
) -> dict:
"""Pipe an SMT-LIB2 formula into z3 -in, return parsed output."""
z3_path = find_z3(z3_bin)
cmd = [z3_path, "-in"] + (args or [])
logger.debug("cmd: %s", " ".join(cmd))
logger.debug("stdin:\n%s", formula)
start = time.monotonic()
try:
proc = subprocess.run(
cmd,
input=formula,
capture_output=True,
text=True,
timeout=timeout,
)
except subprocess.TimeoutExpired:
duration_ms = int((time.monotonic() - start) * 1000)
logger.warning("z3 timed out after %dms", duration_ms)
return {
"stdout": "",
"stderr": "timeout",
"exit_code": -1,
"duration_ms": duration_ms,
"result": "timeout",
}
duration_ms = int((time.monotonic() - start) * 1000)
logger.debug("exit_code=%d duration=%dms", proc.returncode, duration_ms)
logger.debug("stdout:\n%s", proc.stdout)
if proc.stderr:
logger.debug("stderr:\n%s", proc.stderr)
first_line = proc.stdout.strip().split("\n")[0].strip() if proc.stdout else ""
result = first_line if first_line in ("sat", "unsat", "unknown") else "error"
return {
"stdout": proc.stdout,
"stderr": proc.stderr,
"exit_code": proc.returncode,
"duration_ms": duration_ms,
"result": result,
}
def parse_model(stdout: str) -> Optional[dict]:
"""Pull define-fun entries from a (get-model) response."""
model = {}
for m in re.finditer(r"\(define-fun\s+(\S+)\s+\(\)\s+\S+\s+(.+?)\)", stdout):
model[m.group(1)] = m.group(2).strip()
return model if model else None
def parse_stats(stdout: str) -> Optional[dict]:
"""Parse :key value pairs from z3 -st output."""
stats = {}
for m in re.finditer(r":(\S+)\s+([\d.]+)", stdout):
key, val = m.group(1), m.group(2)
stats[key] = float(val) if "." in val else int(val)
return stats if stats else None
def parse_unsat_core(stdout: str) -> Optional[list]:
for line in stdout.strip().split("\n"):
line = line.strip()
if line.startswith("(") and not line.startswith("(error"):
labels = line.strip("()").split()
if labels:
return labels
return None
def cli():
parser = argparse.ArgumentParser(
description="Z3 Agent database CLI",
prog="z3db",
)
parser.add_argument("--db", default=None, help="path to z3agent.db")
parser.add_argument("--debug", action="store_true", help="verbose output")
sub = parser.add_subparsers(dest="command")
sub.add_parser("init", help="initialize the database")
sub.add_parser("status", help="show run summary")
log_p = sub.add_parser("log", help="show interaction log")
log_p.add_argument("--run-id", type=int, help="filter by run ID")
log_p.add_argument("--last", type=int, default=50)
runs_p = sub.add_parser("runs", help="list runs")
runs_p.add_argument("--skill", help="filter by skill name")
runs_p.add_argument("--last", type=int, default=10)
query_p = sub.add_parser("query", help="run raw SQL")
query_p.add_argument("sql", help="SQL query string")
args = parser.parse_args()
setup_logging(args.debug)
db = Z3DB(args.db)
if args.command == "init":
print(f"Database initialized at {db.db_path}")
elif args.command == "status":
s = db.get_status()
print(
f"Runs: {s['total']}"
f" | success: {s.get('success', 0)}"
f" | error: {s.get('error', 0)}"
f" | timeout: {s.get('timeout', 0)}"
f" | Last: {s['last_run'] or 'never'}"
)
elif args.command == "log":
for row in db.get_logs(args.run_id, args.last):
print(
f"[{row['level']}] {row['timestamp']} "
f"(run {row['run_id']}): {row['message']}"
)
elif args.command == "runs":
for row in db.get_runs(args.skill, args.last):
print(
f"#{row['run_id']} {row['skill']} {row['status']} "
f"{row['duration_ms']}ms @ {row['timestamp']}"
)
elif args.command == "query":
for row in db.query(args.sql):
print(dict(row))
else:
parser.print_help()
db.close()
if __name__ == "__main__":
cli()

76
.github/skills/simplify/SKILL.md vendored Normal file
View file

@ -0,0 +1,76 @@
---
name: simplify
description: Reduce formula complexity using Z3 tactic chains. Supports configurable tactic pipelines for boolean, arithmetic, and bitvector simplification.
---
Given a formula, apply a sequence of Z3 tactics to produce an equivalent but simpler form. This is useful for understanding what Z3 sees after preprocessing, debugging tactic selection, and reducing formula size before solving.
# Step 1: Choose tactics
Action:
Select a tactic chain from the available Z3 tactics based on the
formula's theory.
Expectation:
A comma-separated list of tactic names suitable for the formula domain.
Result:
If unsure, use the default chain: `simplify,propagate-values,ctx-simplify`.
For bitvector formulas, add `bit-blast`. Proceed to Step 2.
| Tactic | What it does |
|--------|-------------|
| simplify | constant folding, algebraic identities |
| propagate-values | substitute known equalities |
| ctx-simplify | context-dependent simplification |
| elim-uncnstr | remove unconstrained variables |
| solve-eqs | Gaussian elimination |
| bit-blast | reduce bitvectors to booleans |
| tseitin-cnf | convert to CNF |
| aig | and-inverter graph reduction |
# Step 2: Run simplification
Action:
Invoke simplify.py with the formula and optional tactic chain.
Expectation:
The script applies each tactic in sequence and prints the simplified
formula. A run entry is logged to z3agent.db.
Result:
If the output is simpler, pass it to **solve** or **prove**.
If unchanged, try a different tactic chain.
```bash
python3 scripts/simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
python3 scripts/simplify.py --file formula.smt2 --tactics "simplify,propagate-values,ctx-simplify"
python3 scripts/simplify.py --file formula.smt2 --debug
```
Without `--tactics`, the script applies the default chain: `simplify`, `propagate-values`, `ctx-simplify`.
# Step 3: Interpret the output
Action:
Read the simplified formula output in SMT-LIB2 syntax.
Expectation:
One or more `(assert ...)` blocks representing equivalent subgoals.
Result:
A smaller formula indicates successful reduction. Pass the result to
**solve**, **prove**, or **optimize** as needed.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula to simplify |
| vars | string | no | | variable declarations as "name:sort" pairs |
| file | path | no | | path to .smt2 file |
| tactics | string | no | simplify,propagate-values,ctx-simplify | comma-separated tactic names |
| timeout | int | no | 30 | seconds |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,82 @@
#!/usr/bin/env python3
"""
simplify.py: apply Z3 tactics to simplify an SMT-LIB2 formula.
Usage:
python simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
python simplify.py --file formula.smt2 --tactics "simplify,solve-eqs"
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, setup_logging
DEFAULT_TACTICS = "simplify,propagate-values,ctx-simplify"
def build_tactic_formula(base_formula: str, tactics: str) -> str:
tactic_list = [t.strip() for t in tactics.split(",")]
if len(tactic_list) == 1:
tactic_expr = f"(then {tactic_list[0]} skip)"
else:
tactic_expr = "(then " + " ".join(tactic_list) + ")"
return base_formula + f"\n(apply {tactic_expr})\n"
def build_formula_from_parts(formula_str: str, vars_str: str) -> str:
lines = []
if vars_str:
for v in vars_str.split(","):
v = v.strip()
name, sort = v.split(":")
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
lines.append(formula_str)
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(prog="simplify")
parser.add_argument("--formula")
parser.add_argument("--vars")
parser.add_argument("--file")
parser.add_argument("--tactics", default=DEFAULT_TACTICS)
parser.add_argument("--timeout", type=int, default=30)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
base = Path(args.file).read_text()
elif args.formula:
base = build_formula_from_parts(args.formula, args.vars or "")
else:
parser.error("provide --formula or --file")
return
formula = build_tactic_formula(base, args.tactics)
db = Z3DB(args.db)
run_id = db.start_run("simplify", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
status = "success" if result["exit_code"] == 0 else "error"
db.log_formula(run_id, formula, status)
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
print(result["stdout"])
if result["stderr"] and result["exit_code"] != 0:
print(result["stderr"], file=sys.stderr)
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

75
.github/skills/solve/SKILL.md vendored Normal file
View file

@ -0,0 +1,75 @@
---
name: solve
description: Check satisfiability of SMT-LIB2 formulas using Z3. Returns sat/unsat with models or unsat cores. Logs every invocation to z3agent.db for auditability.
---
Given an SMT-LIB2 formula (or a set of constraints described in natural language), determine whether the formula is satisfiable. If sat, extract a satisfying assignment. If unsat and tracking labels are present, extract the unsat core.
# Step 1: Prepare the formula
Action:
Convert the input to valid SMT-LIB2. If the input is natural language,
use the **encode** skill first.
Expectation:
A syntactically valid SMT-LIB2 formula ending with `(check-sat)` and
either `(get-model)` or `(get-unsat-core)` as appropriate.
Result:
If valid SMT-LIB2 is ready, proceed to Step 2.
If encoding is needed, run **encode** first and return here.
# Step 2: Run Z3
Action:
Invoke solve.py with the formula string or file path.
Expectation:
The script pipes the formula to `z3 -in`, logs the run to
`.z3-agent/z3agent.db`, and prints the result.
Result:
Output is one of `sat`, `unsat`, `unknown`, or `timeout`.
Proceed to Step 3 to interpret.
```bash
python3 scripts/solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
```
For file input:
```bash
python3 scripts/solve.py --file problem.smt2
```
With debug tracing:
```bash
python3 scripts/solve.py --file problem.smt2 --debug
```
# Step 3: Interpret the output
Action:
Parse the Z3 output to determine satisfiability and extract any model
or unsat core.
Expectation:
`sat` with a model, `unsat` optionally with a core, `unknown`, or
`timeout`.
Result:
On `sat`: report the model to the user.
On `unsat`: report the core if available.
On `unknown`/`timeout`: try **simplify** or increase the timeout.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula as a string |
| file | path | no | | path to an .smt2 file |
| timeout | int | no | 30 | seconds before killing z3 |
| z3 | path | no | auto | explicit path to z3 binary |
| debug | flag | no | off | print z3 command, stdin, stdout, stderr, timing |
| db | path | no | .z3-agent/z3agent.db | path to the logging database |
Either `formula` or `file` must be provided.

64
.github/skills/solve/scripts/solve.py vendored Normal file
View file

@ -0,0 +1,64 @@
#!/usr/bin/env python3
"""
solve.py: check satisfiability of an SMT-LIB2 formula via Z3.
Usage:
python solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
python solve.py --file problem.smt2
python solve.py --file problem.smt2 --debug --timeout 60
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_model, parse_unsat_core, setup_logging
def main():
parser = argparse.ArgumentParser(prog="solve")
parser.add_argument("--formula", help="SMT-LIB2 formula string")
parser.add_argument("--file", help="path to .smt2 file")
parser.add_argument("--timeout", type=int, default=30)
parser.add_argument("--z3", default=None, help="path to z3 binary")
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.formula:
formula = args.formula
else:
parser.error("provide --formula or --file")
return
db = Z3DB(args.db)
run_id = db.start_run("solve", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
core = parse_unsat_core(result["stdout"]) if result["result"] == "unsat" else None
db.log_formula(run_id, formula, result["result"], str(model) if model else None)
db.finish_run(run_id, result["result"], result["duration_ms"], result["exit_code"])
print(result["result"])
if model:
for name, val in model.items():
print(f" {name} = {val}")
if core:
print("unsat core:", " ".join(core))
if result["stderr"] and result["result"] == "error":
print(result["stderr"], file=sys.stderr)
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

81
.github/skills/static-analysis/SKILL.md vendored Normal file
View file

@ -0,0 +1,81 @@
---
name: static-analysis
description: Run Clang Static Analyzer (scan-build) on Z3 source and log structured findings to z3agent.db.
---
Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plist diagnostics, and record each finding with file, line, category, and description. This skill wraps scan-build into a reproducible, logged workflow suitable for regular analysis sweeps and regression tracking.
# Step 1: Run the analysis
Action:
Invoke the script pointing at the CMake build directory. The script
runs `scan-build cmake ..` followed by `scan-build make` and writes
checker output to the output directory.
Expectation:
scan-build completes within the timeout, producing plist diagnostic
files in the output directory (defaults to a `scan-results` subdirectory
of the build directory).
Result:
On success: diagnostics are parsed and findings are printed. Proceed to
Step 2.
On failure: verify that clang and scan-build are installed and that the
build directory contains a valid CMake configuration.
```bash
python3 scripts/static_analysis.py --build-dir build
python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug
python3 scripts/static_analysis.py --build-dir build --timeout 1800
```
# Step 2: Interpret the output
Action:
Review the printed findings and the summary table grouped by category.
Expectation:
Each finding shows its source location, category, and description.
The summary table ranks categories by frequency for quick triage.
Result:
On zero findings: the codebase passes all enabled static checks.
On findings: prioritize by category frequency and severity. Address
null dereferences and use-after-free classes first.
Example output:
```
[Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read
[Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer
```
# Step 3: Review historical findings
Action:
Query z3agent.db to compare current results against prior analysis
runs.
Expectation:
Queries return category counts and run history, enabling regression
detection across commits.
Result:
On stable or decreasing counts: no regressions introduced.
On increased counts: cross-reference new findings with recent commits
to identify the responsible change.
```bash
python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"
python3 ../../shared/z3db.py runs --skill static-analysis --last 10
```
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| build-dir | path | yes | | path to the CMake build directory |
| output-dir | path | no | BUILD/scan-results | directory for scan-build output |
| timeout | int | no | 1200 | seconds allowed for the full build |
| db | path | no | .z3-agent/z3agent.db | logging database |
| debug | flag | no | off | verbose tracing |

View file

@ -0,0 +1,260 @@
#!/usr/bin/env python3
"""
static_analysis.py: run Clang Static Analyzer on Z3 source.
Usage:
python static_analysis.py --build-dir build
python static_analysis.py --build-dir build --output-dir /tmp/sa-results
python static_analysis.py --build-dir build --debug
"""
import argparse
import logging
import os
import plistlib
import shutil
import subprocess
import sys
import time
from collections import Counter
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, setup_logging
logger = logging.getLogger("z3agent")
SCAN_BUILD_NAMES = ["scan-build", "scan-build-14", "scan-build-15", "scan-build-16"]
def find_scan_build() -> str:
"""Locate the scan-build binary on PATH."""
for name in SCAN_BUILD_NAMES:
path = shutil.which(name)
if path:
logger.debug("found scan-build: %s", path)
return path
print(
"scan-build not found on PATH.\n"
"Install one of the following:\n"
" Ubuntu/Debian: sudo apt install clang-tools\n"
" macOS: brew install llvm\n"
" Fedora: sudo dnf install clang-tools-extra\n"
f"searched for: {', '.join(SCAN_BUILD_NAMES)}",
file=sys.stderr,
)
sys.exit(1)
def run_configure(scan_build: str, build_dir: Path, output_dir: Path,
timeout: int) -> bool:
"""Run scan-build cmake to configure the project."""
repo_root = build_dir.parent
cmd = [
scan_build,
"-o", str(output_dir),
"cmake",
str(repo_root),
]
logger.info("configuring: %s", " ".join(cmd))
try:
proc = subprocess.run(
cmd, cwd=str(build_dir),
capture_output=True, text=True, timeout=timeout,
)
except subprocess.TimeoutExpired:
logger.error("cmake configuration timed out after %ds", timeout)
return False
if proc.returncode != 0:
logger.error("cmake configuration failed (exit %d)", proc.returncode)
logger.error("stderr: %s", proc.stderr[:2000])
return False
logger.info("configuration complete")
return True
def run_build(scan_build: str, build_dir: Path, output_dir: Path,
timeout: int) -> bool:
"""Run scan-build make to build and analyze."""
nproc = os.cpu_count() or 4
cmd = [
scan_build,
"-o", str(output_dir),
"--status-bugs",
"make",
f"-j{nproc}",
]
logger.info("building with analysis: %s", " ".join(cmd))
try:
proc = subprocess.run(
cmd, cwd=str(build_dir),
capture_output=True, text=True, timeout=timeout,
)
except subprocess.TimeoutExpired:
logger.error("build timed out after %ds", timeout)
return False
# scan-build returns nonzero when bugs are found (due to --status-bugs),
# so a nonzero exit code is not necessarily a build failure.
if proc.returncode != 0:
logger.info(
"scan-build exited with code %d (nonzero may indicate findings)",
proc.returncode,
)
else:
logger.info("build complete, no bugs reported by scan-build")
if proc.stderr:
logger.debug("build stderr (last 2000 chars): %s", proc.stderr[-2000:])
return True
def collect_plist_files(output_dir: Path) -> list:
"""Recursively find all .plist diagnostic files under the output directory."""
plists = sorted(output_dir.rglob("*.plist"))
logger.debug("found %d plist files in %s", len(plists), output_dir)
return plists
def parse_plist_findings(plist_path: Path) -> list:
"""Extract findings from a single Clang plist diagnostic file.
Returns a list of dicts with keys: file, line, col, category, type, description.
"""
findings = []
try:
with open(plist_path, "rb") as f:
data = plistlib.load(f)
except Exception as exc:
logger.warning("could not parse %s: %s", plist_path, exc)
return findings
source_files = data.get("files", [])
for diag in data.get("diagnostics", []):
location = diag.get("location", {})
file_idx = location.get("file", 0)
source_file = source_files[file_idx] if file_idx < len(source_files) else "<unknown>"
findings.append({
"file": source_file,
"line": location.get("line", 0),
"col": location.get("col", 0),
"category": diag.get("category", "uncategorized"),
"type": diag.get("type", ""),
"description": diag.get("description", ""),
})
return findings
def collect_all_findings(output_dir: Path) -> list:
"""Parse every plist file under output_dir and return merged findings."""
all_findings = []
for plist_path in collect_plist_files(output_dir):
all_findings.extend(parse_plist_findings(plist_path))
return all_findings
def log_findings(db, run_id: int, findings: list):
"""Persist each finding to z3agent.db."""
for f in findings:
db.log_finding(
run_id,
category=f["category"],
message=f["description"],
severity=f.get("type"),
file=f["file"],
line=f["line"],
details={"col": f["col"], "type": f["type"]},
)
def print_findings(findings: list):
"""Print individual findings and a category summary."""
if not findings:
print("No findings reported.")
return
for f in findings:
label = f["category"]
if f["type"]:
label = f["type"]
print(f"[{label}] {f['file']}:{f['line']}: {f['description']}")
print()
counts = Counter(f["category"] for f in findings)
print(f"Total findings: {len(findings)}")
print("By category:")
for cat, cnt in counts.most_common():
print(f" {cat}: {cnt}")
def main():
parser = argparse.ArgumentParser(
prog="static_analysis",
description="Run Clang Static Analyzer on Z3 and log findings.",
)
parser.add_argument(
"--build-dir", required=True,
help="path to the CMake build directory",
)
parser.add_argument(
"--output-dir", default=None,
help="directory for scan-build results (default: BUILD/scan-results)",
)
parser.add_argument(
"--timeout", type=int, default=1200,
help="seconds allowed for the full analysis build",
)
parser.add_argument("--db", default=None, help="path to z3agent.db")
parser.add_argument("--debug", action="store_true", help="verbose tracing")
args = parser.parse_args()
setup_logging(args.debug)
scan_build = find_scan_build()
build_dir = Path(args.build_dir).resolve()
build_dir.mkdir(parents=True, exist_ok=True)
output_dir = Path(args.output_dir) if args.output_dir else build_dir / "scan-results"
output_dir = output_dir.resolve()
output_dir.mkdir(parents=True, exist_ok=True)
db = Z3DB(args.db)
run_id = db.start_run("static-analysis", f"build_dir={build_dir}")
start = time.monotonic()
if not run_configure(scan_build, build_dir, output_dir, timeout=args.timeout):
elapsed = int((time.monotonic() - start) * 1000)
db.finish_run(run_id, "error", elapsed, exit_code=1)
db.close()
sys.exit(1)
if not run_build(scan_build, build_dir, output_dir, timeout=args.timeout):
elapsed = int((time.monotonic() - start) * 1000)
db.finish_run(run_id, "error", elapsed, exit_code=1)
db.close()
sys.exit(1)
elapsed = int((time.monotonic() - start) * 1000)
findings = collect_all_findings(output_dir)
log_findings(db, run_id, findings)
status = "clean" if len(findings) == 0 else "findings"
db.finish_run(run_id, status, elapsed, exit_code=0)
db.log(
f"static analysis complete: {len(findings)} finding(s) in {elapsed}ms",
run_id=run_id,
)
print_findings(findings)
db.close()
sys.exit(0)
if __name__ == "__main__":
main()

1
.gitignore vendored
View file

@ -117,3 +117,4 @@ genaisrc/genblogpost.genai.mts
bazel-*
# Local issue tracking
.beads
.z3-agent/

463
Z3-AGENT.md Normal file
View file

@ -0,0 +1,463 @@
# Z3 Agent
A Copilot agent for the Z3 theorem prover. It wraps 9 skills that cover
SMT solving and code quality analysis.
## What it does
The agent handles two kinds of requests:
1. **SMT solving**: formulate constraints, check satisfiability, prove
properties, optimize objectives, simplify expressions.
2. **Code quality**: run sanitizers (ASan, UBSan) and Clang Static Analyzer
against the Z3 codebase to catch memory bugs and logic errors.
## Prerequisites
You need a built Z3 binary. The scripts look for it in this order:
1. Explicit `--z3 path/to/z3`
2. `build/z3`, `build/release/z3`, `build/debug/z3` (relative to repo root)
3. `z3` on your PATH
For code quality skills you also need:
- **memory-safety**: cmake, make, and a compiler with sanitizer support
(gcc or clang). The script checks at startup and tells you what is missing.
- **static-analysis**: scan-build (part of clang-tools). Same early check
with install instructions if absent.
## Using the agent in Copilot Chat
Mention `@z3` and describe what you want in plain language.
The agent figures out which skill to use, builds the formula if needed,
runs Z3, and gives you the result.
### solve: check satisfiability
```
@z3 is (x > 0 and y > 0 and x + y < 5) satisfiable over the integers?
```
Expected response: **sat** with a model like `x = 1, y = 1`.
```
@z3 can x + y = 10 and x - y = 4 both hold at the same time?
```
Expected response: **sat** with `x = 7, y = 3`.
```
@z3 is there an integer x where x > 0, x < 0?
```
Expected response: **unsat** (no such integer exists).
### prove: check if something is always true
```
@z3 prove that x * x >= 0 for all integers x
```
Expected response: **valid** (the negation is unsatisfiable, so the property holds).
```
@z3 is it true that (a + b)^2 >= 0 for all real a and b?
```
Expected response: **valid**.
```
@z3 prove that if x > y and y > z then x > z, for integers
```
Expected response: **valid** (transitivity of >).
### optimize: find the best value
```
@z3 maximize 3x + 2y where x >= 1, y >= 1, and x + y <= 20
```
Expected response: **sat** with `x = 19, y = 1` (objective = 59).
```
@z3 minimize x + y where x >= 5, y >= 3, and x + y >= 10
```
Expected response: **sat** with `x = 5, y = 5` or similar (objective = 10).
### simplify: reduce an expression
```
@z3 simplify x + 0 + 1*x
```
Expected response: `2*x`.
```
@z3 simplify (a and true) or (a and false)
```
Expected response: `a`.
### encode: translate a problem to SMT-LIB2
```
@z3 encode this as SMT-LIB2: find integers x and y where x + y = 10 and x > y
```
Expected response: the SMT-LIB2 formula:
```
(declare-const x Int)
(declare-const y Int)
(assert (= (+ x y) 10))
(assert (> x y))
(check-sat)
(get-model)
```
### explain: interpret Z3 output
```
@z3 what does this Z3 output mean?
sat
(
(define-fun x () Int 7)
(define-fun y () Int 3)
)
```
Expected response: a readable summary like "satisfying assignment: x = 7, y = 3".
```
@z3 Z3 returned unknown, what does that mean?
```
Expected response: an explanation of common causes (timeout, incomplete theory, quantifiers).
### benchmark: measure performance
```
@z3 how fast can Z3 solve (x > 0 and y > 0 and x + y < 100)? run it 5 times
```
Expected response: timing stats like min/median/max in milliseconds and the result.
### memory-safety: find memory bugs in Z3
```
@z3 run AddressSanitizer on the Z3 test suite
```
Expected response: builds Z3 with ASan, runs the tests, reports any findings
with category, file, and line number. If clean, reports no findings.
```
@z3 check for undefined behavior in Z3
```
Expected response: runs UBSan, same format.
```
@z3 run both sanitizers
```
Expected response: runs ASan and UBSan, aggregates findings from both.
### static-analysis: find bugs without running the code
```
@z3 run static analysis on the Z3 source
```
Expected response: runs Clang Static Analyzer, reports findings grouped by
category (null dereference, dead store, memory leak, etc.) with file and line.
### multi-skill: the agent chains skills when needed
```
@z3 prove that for all integers, if x^2 is even then x is even
```
The agent uses **encode** to formalize and negate the statement, then
**prove** to check it, then **explain** to present the result.
```
@z3 full verification pass before the release
```
The agent runs **memory-safety** (ASan + UBSan) and **static-analysis**
in parallel, then aggregates and deduplicates findings sorted by severity.
## Using the scripts directly
Every skill lives under `.github/skills/<name>/scripts/`. All scripts
accept `--debug` for full tracing and `--db path` to specify where the
SQLite log goes (defaults to `z3agent.db` in the current directory).
### solve
Check whether a set of constraints has a solution.
```
python3 .github/skills/solve/scripts/solve.py \
--z3 build/release/z3 \
--formula '
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 5))
(check-sat)
(get-model)'
```
Output:
```
sat
x = 1
y = 1
```
### prove
Check whether a property holds for all values. The script negates your
conjecture and asks Z3 if the negation is satisfiable. If it is not,
the property is valid.
```
python3 .github/skills/prove/scripts/prove.py \
--z3 build/release/z3 \
--conjecture '(>= (* x x) 0)' \
--vars 'x:Int'
```
Output:
```
valid
```
If Z3 finds a counterexample, it prints `invalid` followed by the
counterexample values.
### optimize
Find the best value of an objective subject to constraints.
```
python3 .github/skills/optimize/scripts/optimize.py \
--z3 build/release/z3 \
--formula '
(declare-const x Int)
(declare-const y Int)
(assert (>= x 1))
(assert (>= y 1))
(assert (<= (+ x y) 20))
(maximize (+ (* 3 x) (* 2 y)))
(check-sat)
(get-model)'
```
Output:
```
sat
x = 19
y = 1
```
Here Z3 maximizes `3x + 2y` under the constraint `x + y <= 20`, so it
pushes x as high as possible (19) and keeps y at its minimum (1),
giving `3*19 + 2*1 = 59`.
### simplify
Reduce expressions using Z3 tactic chains.
```
python3 .github/skills/simplify/scripts/simplify.py \
--z3 build/release/z3 \
--formula '(declare-const x Int)(simplify (+ x 0 (* 1 x)))'
```
Output:
```
(* 2 x)
(goals
(goal
:precision precise :depth 1)
)
```
Z3 simplified `x + 0 + 1*x` down to `2*x`.
### benchmark
Measure solving time over multiple runs.
```
python3 .github/skills/benchmark/scripts/benchmark.py \
--z3 build/release/z3 \
--runs 5 \
--formula '
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 100))
(check-sat)'
```
Output (times will vary on your machine):
```
runs: 5
min: 27ms
median: 28ms
max: 30ms
result: sat
```
### explain
Interpret Z3 output in readable form. It reads from stdin:
```
echo 'sat
(
(define-fun x () Int
19)
(define-fun y () Int
1)
)' | python3 .github/skills/explain/scripts/explain.py --stdin --type model
```
Output:
```
satisfying assignment:
x = 19
y = 1
```
### encode
Validate that an SMT-LIB2 file is well-formed by running it through Z3:
```
python3 .github/skills/encode/scripts/encode.py \
--z3 build/release/z3 \
--validate problem.smt2
```
If the file parses and runs without errors, it prints the formula back.
If there are syntax or sort errors, it prints the Z3 error message.
### memory-safety
Build Z3 with AddressSanitizer or UndefinedBehaviorSanitizer, run the
test suite, and collect any findings.
```
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer asan
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer ubsan
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer both
```
Use `--skip-build` to reuse a previous instrumented build. Use
`--build-dir path` to control where the build goes (defaults to
`build/sanitizer-asan` or `build/sanitizer-ubsan` under the repo root).
If cmake, make, or a C compiler is not found, the script prints what
you need to install and exits.
### static-analysis
Run Clang Static Analyzer over the Z3 source tree.
```
python3 .github/skills/static-analysis/scripts/static_analysis.py \
--build-dir build/scan
```
Results go to `build/scan/scan-results/` by default. Findings are
printed grouped by category with file and line number.
If `scan-build` is not on your PATH, the script prints install
instructions for Ubuntu, macOS, and Fedora.
## Debug tracing
Add `--debug` to any command to see the full trace: run IDs, z3 binary
path, the exact command and stdin sent to Z3, stdout/stderr received,
timing, and database logging. Example:
```
python3 .github/skills/solve/scripts/solve.py \
--z3 build/release/z3 --debug \
--formula '
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 5))
(check-sat)
(get-model)'
```
```
[DEBUG] started run 31 (skill=solve, hash=d64beb5a61842362)
[DEBUG] found z3: build/release/z3
[DEBUG] cmd: build/release/z3 -in
[DEBUG] stdin:
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 5))
(check-sat)
(get-model)
[DEBUG] exit_code=0 duration=28ms
[DEBUG] stdout:
sat
(
(define-fun x () Int
1)
(define-fun y () Int
1)
)
[DEBUG] finished run 31: sat (28ms)
sat
x = 1
y = 1
```
## Logging
Every run is logged to a SQLite database (`z3agent.db` by default).
You can query it directly:
```
sqlite3 z3agent.db "SELECT id, skill, status, duration_ms FROM runs ORDER BY id DESC LIMIT 10;"
```
Use `--db /path/to/file.db` on any script to put the database somewhere
else.
## Skill list
| Skill | What it does |
|-------|-------------|
| solve | check satisfiability, extract models or unsat cores |
| prove | prove validity by negating and checking unsatisfiability |
| optimize | minimize or maximize objectives under constraints |
| simplify | reduce formulas with Z3 tactic chains |
| encode | translate problems into SMT-LIB2, validate syntax |
| explain | interpret Z3 output (models, cores, stats, errors) |
| benchmark | measure solving time, collect statistics |
| memory-safety | run ASan/UBSan on Z3 test suite |
| static-analysis | run Clang Static Analyzer on Z3 source |