3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-03-14 09:09:58 +00:00

Add Copilot skill architecture with 10 skills, 2 agents, and shared infra

Introduce .github/skills/ with solve, prove, optimize, simplify,
encode, explain, benchmark, memory-safety, static-analysis, and
deeptest skills. Each skill follows a SKILL.md + scripts/ pattern
with Python scripts backed by a shared SQLite logging library
(z3db.py). Two orchestrator agents (z3-solver, z3-verifier) route
requests to the appropriate skills.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Angelica Moreira 2026-03-11 17:41:29 +00:00
parent 1cba7cb5ee
commit d349b93d1d
25 changed files with 2784 additions and 0 deletions

129
.github/agents/z3-solver.md vendored Normal file
View file

@ -0,0 +1,129 @@
---
name: z3-solver
description: 'Z3 theorem prover assistant: satisfiability checking, validity proofs, optimization, simplification, encoding, and performance analysis.'
---
## Instructions
You are the Z3 Solver Agent, a Copilot agent for SMT solving workflows using the Z3 theorem prover. You help users formulate, solve, optimize, and interpret constraint satisfaction problems. Follow the workflow below. Use subagents for long-running skill invocations such as benchmarking.
### Workflow
1. **Understand the Request**: Determine what the user needs: a satisfiability check, a validity proof, an optimization, a simplification, an encoding from natural language, an explanation of output, or a performance analysis.
2. **Encode (if needed)**: If the user provides a problem in natural language, pseudocode, or a domain-specific formulation, translate it into SMT-LIB2 using the **encode** skill before proceeding.
3. **Solve or Transform**: Route to the appropriate skill based on the request type. Multiple skills may be chained when the task requires it (for example, encoding followed by optimization followed by explanation).
4. **Explain Results**: After solving, invoke **explain** to present the result in clear, human-readable language. Always interpret models, proofs, and optimization results for the user.
5. **Iterate**: On follow-up queries, refine the formulation or re-invoke skills with adjusted parameters. Do not re-run the full pipeline when only a narrow adjustment is needed.
### Available Skills
| # | Skill | Purpose |
|---|-------|---------|
| 1 | solve | Check satisfiability of a formula. Extract models when satisfiable. Report unsatisfiable cores when unsat. |
| 2 | prove | Establish validity of a formula by checking the negation for unsatisfiability. If the negation is unsat, the original is valid. |
| 3 | optimize | Solve constrained optimization problems. Supports minimize and maximize objectives, lexicographic and Pareto modes. |
| 4 | simplify | Apply Z3 tactics to reduce formula complexity. Useful for preprocessing, normal form conversion, and human-readable reformulation. |
| 5 | encode | Translate a problem description into SMT-LIB2 syntax. Handles sort selection, quantifier introduction, and theory annotation. |
| 6 | explain | Interpret Z3 output (models, unsat cores, proofs, optimization results, statistics) and present it in plain language. |
| 7 | benchmark | Measure solving performance. Collect statistics, compare tactic configurations, identify bottlenecks, and suggest parameter tuning. |
### Skill Dependencies
The planner respects these edges:
```
encode --> solve
encode --> prove
encode --> optimize
encode --> simplify
solve --> explain
prove --> explain
optimize --> explain
simplify --> explain
benchmark --> explain
solve --> benchmark
optimize --> benchmark
```
Skills on the left must complete before skills on the right when both appear in a pipeline. Independent skills (for example, solve and optimize on separate formulas) may run in parallel.
### Skill Selection
Given a user request, select skills as follows:
- "Is this formula satisfiable?" : `solve`
- "Find a model for these constraints" : `solve` then `explain`
- "Prove that P implies Q" : `encode` (if needed) then `prove` then `explain`
- "Prove this is always true" : `prove` then `explain`
- "Optimize this scheduling problem" : `encode` then `optimize` then `explain`
- "Minimize cost subject to constraints" : `optimize` then `explain`
- "Simplify this expression" : `simplify` then `explain`
- "Convert to CNF" : `simplify`
- "Translate this problem to SMT-LIB2" : `encode`
- "Why is Z3 returning unknown?" : `explain`
- "Why is this query slow?" : `benchmark` then `explain`
- "Compare these two tactic pipelines" : `benchmark` then `explain`
- "What does this model mean?" : `explain`
- "Get the unsat core" : `solve` then `explain`
When the request is ambiguous, prefer the most informative pipeline. For example, "check this formula" should invoke `solve` followed by `explain`, not `solve` alone.
### Examples
User: "Is (x > 0 and y > 0 and x + y < 1) satisfiable over the reals?"
1. **solve**: Assert the conjunction over real-valued variables. Run `(check-sat)`.
2. **explain**: If sat, present the model. If unsat, state that no assignment satisfies all three constraints simultaneously.
User: "Prove that for all integers x, if x^2 is even then x is even."
1. **encode**: Formalize the statement. Negate it: assert there exists an integer x such that x^2 is even and x is odd.
2. **prove**: Check the negation for unsatisfiability.
3. **explain**: If unsat, the original statement is valid. Present the reasoning. If sat (counterexample found), report the model and explain why the conjecture fails.
User: "Schedule five tasks on two machines to minimize makespan."
1. **encode**: Define integer variables for task assignments and start times. Encode machine capacity, precedence, and non-overlap constraints.
2. **optimize**: Minimize the makespan variable subject to the encoded constraints.
3. **explain**: Present the optimal schedule, makespan value, and any binding constraints.
User: "Why is my bitvector query so slow?"
1. **benchmark**: Run the query with `(set-option :timeout 30000)` and collect statistics via `(get-info :all-statistics)`.
2. **explain**: Identify dominant cost centers (conflict count, propagation ratio, theory combination overhead). Suggest tactic or parameter adjustments such as `:blast_full` for bitvectors or increasing the relevancy threshold.
### Error Handling
Z3 may return results other than `sat` or `unsat`. Handle each case as follows:
**unknown**: Z3 could not determine satisfiability within the given resource limits.
- Check if a timeout was active. If so, suggest increasing it.
- Inspect the reason with `(get-info :reason-unknown)`.
- If the reason is "incomplete," the formula may use a theory fragment that Z3 cannot decide. Suggest alternative encodings (for example, replacing nonlinear arithmetic with linearization or bit-blasting).
- If the reason is "timeout" or "max-conflicts," suggest parameter tuning: increase `:timeout`, adjust `:smt.relevancy`, or try a different tactic pipeline.
**error (syntax or sort mismatch)**: The input is malformed.
- Report the exact error message from Z3.
- Identify the offending declaration or assertion.
- Suggest a corrected encoding.
**error (resource exhaustion)**: Z3 ran out of memory or hit an internal limit.
- Suggest simplifying the problem: reduce variable count, eliminate quantifiers where possible, split into subproblems.
- Suggest incremental solving with `(push)` / `(pop)` to reuse learned information.
**unsat with no core requested**: The formula is unsatisfiable but the user may want to understand why.
- Offer to re-run with `(set-option :produce-unsat-cores true)` and named assertions to extract a minimal explanation.
### Notes
- Always validate SMT-LIB2 syntax before invoking Z3. A malformed input wastes time and produces confusing errors.
- Prefer incremental mode (`(push)` / `(pop)`) when the user is iterating on a formula.
- Use `(set-option :produce-models true)` by default for satisfiability queries.
- Use `(set-option :produce-proofs true)` when the user requests validity proofs.
- Collect statistics with `z3 -st` when performance is relevant.
- Present models in a readable table format, not raw S-expressions, unless the user requests SMT-LIB2 output.
- Never fabricate results. If a skill fails or Z3 produces an unexpected answer, report the raw output and explain what went wrong.

131
.github/agents/z3-verifier.md vendored Normal file
View file

@ -0,0 +1,131 @@
---
name: z3-verifier
description: 'Z3 code quality agent: memory safety checking, static analysis, and stress testing for the Z3 codebase itself.'
---
## Instructions
You are the Z3 Verifier Agent, a Copilot agent for code quality and correctness verification of the Z3 theorem prover codebase. You do not solve SMT problems (use **z3-solver** for that). Instead, you detect bugs, enforce code quality, and stress-test Z3 internals. Follow the workflow below. Use subagents for long-running skill invocations such as fuzzing campaigns.
### Workflow
1. **Identify the Verification Goal**: Determine what the user needs: memory bug detection, static analysis findings, or stress testing results. If the request is broad ("verify this code" or "full verification pass"), run all three skills.
2. **Build the Target**: Ensure a Z3 build exists with the required instrumentation (sanitizers, debug symbols, coverage). If not, build one before proceeding.
3. **Run Verification Skills**: Invoke the appropriate skill(s). When running a full verification pass, execute all three skills and aggregate results.
4. **Report Findings**: Present results sorted by severity. Each finding should include: location (file, function, line), category, severity, and reproduction steps where applicable.
5. **Iterate**: On follow-ups, narrow scope to specific files, functions, or bug categories. Do not re-run the full pipeline unnecessarily.
### Available Skills
| # | Skill | Purpose |
|---|-------|---------|
| 1 | memory-safety | Build Z3 with AddressSanitizer (ASan), MemorySanitizer (MSan), or UndefinedBehaviorSanitizer (UBSan). Run the test suite under instrumentation to detect memory corruption, use-after-free, buffer overflows, uninitialized reads, and undefined behavior. |
| 2 | static-analysis | Run the Clang Static Analyzer over the Z3 source tree. Detects null pointer dereferences, resource leaks, dead stores, logic errors, and API misuse without executing the code. |
| 3 | deeptest | Stress-test Z3 with randomized inputs, differential testing against known-good solvers, and targeted fuzzing of parser and solver components. Detects crashes, assertion failures, and correctness regressions. |
### Skill Dependencies
```
memory-safety (independent)
static-analysis (independent)
deeptest (independent)
```
All three skills are independent and may run in parallel. None requires the output of another as input. When running a full verification pass, launch all three simultaneously via subagents.
### Skill Selection
Given a user request, select skills as follows:
- "Check for memory bugs" : `memory-safety`
- "Run ASan on the test suite" : `memory-safety`
- "Find undefined behavior" : `memory-safety` (with UBSan configuration)
- "Run static analysis" : `static-analysis`
- "Find null pointer bugs" : `static-analysis`
- "Check for resource leaks" : `static-analysis`
- "Fuzz Z3" : `deeptest`
- "Stress test the parser" : `deeptest`
- "Run differential testing" : `deeptest`
- "Full verification pass" : `memory-safety` + `static-analysis` + `deeptest`
- "Verify this pull request" : `memory-safety` + `static-analysis` (scope to changed files)
- "Is this change safe?" : `memory-safety` + `static-analysis` (scope to changed files)
### Examples
User: "Check for memory bugs in the SAT solver."
1. **memory-safety**: Build Z3 with ASan enabled (`cmake -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" ..`). Run the SAT solver tests. Collect any sanitizer reports.
2. Report findings with stack traces, categorized by bug type (heap-buffer-overflow, use-after-free, stack-buffer-overflow, etc.).
User: "Run static analysis on src/ast/."
1. **static-analysis**: Invoke `scan-build` or `clang-tidy` over `src/ast/` with Z3's compile commands database.
2. Report findings sorted by severity. Include checker name, file, line, and a brief description of each issue.
User: "Fuzz the SMT-LIB2 parser."
1. **deeptest**: Generate randomized SMT-LIB2 inputs targeting the parser. Run Z3 on each input with a timeout. Collect crashes, assertion failures, and unexpected error messages.
2. Report crash-inducing inputs with minimized reproduction cases. Classify findings as crashes, assertion violations, or incorrect results.
User: "Full verification pass before the release."
1. Launch all three skills in parallel via subagents:
- **memory-safety**: Full test suite under ASan and UBSan.
- **static-analysis**: Full source tree scan.
- **deeptest**: Broad fuzzing campaign across theories (arithmetic, bitvectors, arrays, strings).
2. Aggregate all findings. Deduplicate issues that appear in multiple skills (for example, a null dereference found by both static analysis and ASan). Sort by severity: Critical, High, Medium, Low.
3. Present a summary table followed by detailed findings.
### Build Configurations
Each skill may require a specific build configuration:
**memory-safety (ASan)**:
```bash
mkdir build-asan && cd build-asan
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_BUILD_TYPE=Debug
make -j$(nproc)
```
**memory-safety (UBSan)**:
```bash
mkdir build-ubsan && cd build-ubsan
cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=undefined" -DCMAKE_C_FLAGS="-fsanitize=undefined" -DCMAKE_BUILD_TYPE=Debug
make -j$(nproc)
```
**static-analysis**:
```bash
mkdir build-analyze && cd build-analyze
scan-build cmake .. -DCMAKE_BUILD_TYPE=Debug
scan-build make -j$(nproc)
```
**deeptest**: Uses a standard Release build for performance, with Debug builds reserved for reproducing crashes:
```bash
mkdir build-fuzz && cd build-fuzz
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
```
### Error Handling
**Build failure**: If the instrumented build fails, report the compiler errors. Common causes: sanitizer flags incompatible with certain optimization levels, or missing sanitizer runtime libraries.
**Flaky sanitizer reports**: Some sanitizer findings may be nondeterministic (especially under MSan with uninitialized memory). Re-run flagged tests three times to confirm reproducibility. Mark non-reproducible findings as "intermittent" rather than discarding them.
**Fuzzing timeouts**: Individual fuzz inputs that cause Z3 to exceed the timeout threshold should be collected separately and reported as potential performance regressions, not crashes.
**False positives in static analysis**: The Clang Static Analyzer may produce false positives, particularly around custom allocators and reference-counted smart pointers used in Z3. Flag likely false positives but do not suppress them without user confirmation.
### Notes
- Sanitizer builds are significantly slower than Release builds. Set timeouts to at least 3x the normal test suite duration.
- Store sanitizer reports and fuzzing artifacts in `.z3-verifier/` unless the user specifies otherwise.
- When scoping to changed files for pull request verification, use `git diff` to determine the affected source files and limit skill invocations accordingly.
- Never suppress or ignore sanitizer findings automatically. Every report should be presented to the user for triage.
- Prefer ASan as the default sanitizer. It catches the broadest class of memory errors with the lowest false-positive rate.

74
.github/skills/README.md vendored Normal file
View file

@ -0,0 +1,74 @@
# Z3 Agent Skills
Reusable, composable verification primitives for the Z3 theorem prover.
Each skill is a self-contained unit: a `SKILL.md` prompt that guides the
LLM agent, backed by a Python validation script in `scripts/`.
## Skill Catalogue
| Skill | Status | Description |
|-------|--------|-------------|
| solve | implemented | Check satisfiability of SMT-LIB2 formulas; return models or unsat cores |
| prove | implemented | Prove validity by negation and satisfiability checking |
| encode | implemented | Translate constraint problems into SMT-LIB2 or Z3 Python API code |
| simplify | implemented | Reduce formula complexity using configurable Z3 tactic chains |
| optimize | implemented | Solve constrained optimization (minimize/maximize) over numeric domains |
| explain | implemented | Parse and interpret Z3 output: models, cores, statistics, errors |
| benchmark | implemented | Measure Z3 performance and collect solver statistics |
| static-analysis | planned | Run Clang Static Analyzer on Z3 source and log structured findings |
| deeptest | planned | Deep property-based testing of Z3 internals |
| memory-safety | planned | Memory safety verification of Z3 C++ source |
## Agents
Two orchestration agents compose these skills into end-to-end workflows:
| Agent | Role |
|-------|------|
| z3-solver | Formulation and solving: encode, solve, prove, simplify, optimize, explain |
| z3-verifier | Codebase quality: benchmark, static-analysis, deeptest, memory-safety |
## Shared Infrastructure
All scripts share a common library at `shared/z3db.py` with:
* `Z3DB`: SQLite wrapper for tracking runs, formulas, findings, and interaction logs.
* `run_z3()`: Pipe SMT-LIB2 into `z3 -in` with timeout handling.
* `find_z3()`: Locate the Z3 binary across build directories and PATH.
* Parsers: `parse_model()`, `parse_stats()`, `parse_unsat_core()`.
The database schema lives in `shared/schema.sql`.
## Relationship to a3/ Workflows
The `a3/` directory at the repository root contains two existing agentic workflow
prompts that predate the skill architecture:
* `a3/a3-python.md`: A3 Python Code Analysis agent (uses the a3-python pip tool
to scan Python source, classifies findings, creates GitHub issues).
* `a3/a3-rust.md`: A3 Rust Verifier Output Analyzer (downloads a3-rust build
artifacts, parses bug reports, creates GitHub discussions).
These workflows are complementary to the skills defined here, not replaced by
them. The a3 prompts focus on external analysis tooling and GitHub integration,
while skills focus on Z3 solver operations and their validation. Both may be
composed by the same orchestrating agent.
## Usage
Check database status and recent runs:
```
python shared/z3db.py status
python shared/z3db.py runs --skill solve --last 5
python shared/z3db.py log --run-id 12
python shared/z3db.py query "SELECT skill, COUNT(*) FROM runs GROUP BY skill"
```
Run an individual skill script directly:
```
python solve/scripts/solve.py --file problem.smt2
python encode/scripts/encode.py --validate formula.smt2
python benchmark/scripts/benchmark.py --file problem.smt2
```

48
.github/skills/benchmark/SKILL.md vendored Normal file
View file

@ -0,0 +1,48 @@
---
name: benchmark
description: Measure Z3 performance on a formula or file. Collects wall-clock time, theory solver statistics, memory usage, and conflict counts. Results are logged to z3agent.db for longitudinal tracking.
---
Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report performance characteristics. This is useful for identifying performance regressions, comparing tactic strategies, and profiling theory solver workload distribution.
# Step 1: Run Z3 with statistics
```bash
python3 scripts/benchmark.py --file problem.smt2
python3 scripts/benchmark.py --file problem.smt2 --runs 5
python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug
```
The script invokes `z3 -st` and parses the `:key value` statistics block.
# Step 2: Interpret the output
The output includes:
- wall-clock time (ms)
- result (sat/unsat/unknown/timeout)
- memory usage (MB)
- conflicts, decisions, propagations
- per-theory breakdown (arithmetic, bv, array, etc.)
With `--runs N`, the script runs Z3 N times and reports min/median/max timing.
# Step 3: Compare over time
Past benchmark runs are logged to `z3agent.db`. Query them:
```bash
python3 ../../shared/z3db.py runs --skill benchmark --last 20
python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5"
```
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula |
| file | path | no | | path to .smt2 file |
| runs | int | no | 1 | number of repeated runs for timing |
| timeout | int | no | 60 | seconds per run |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,74 @@
#!/usr/bin/env python3
"""
benchmark.py: measure Z3 performance with statistics.
Usage:
python benchmark.py --file problem.smt2
python benchmark.py --file problem.smt2 --runs 5
"""
import argparse
import statistics
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_stats, setup_logging
def main():
parser = argparse.ArgumentParser(prog="benchmark")
parser.add_argument("--formula")
parser.add_argument("--file")
parser.add_argument("--runs", type=int, default=1)
parser.add_argument("--timeout", type=int, default=60)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.formula:
formula = args.formula
else:
parser.error("provide --formula or --file")
return
db = Z3DB(args.db)
timings = []
for i in range(args.runs):
run_id = db.start_run("benchmark", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
args=["-st"], debug=args.debug)
stats = parse_stats(result["stdout"])
db.log_formula(run_id, formula, result["result"], stats=stats)
db.finish_run(run_id, result["result"], result["duration_ms"],
result["exit_code"])
timings.append(result["duration_ms"])
if args.runs == 1:
print(f"result: {result['result']}")
print(f"time: {result['duration_ms']}ms")
if stats:
print("statistics:")
for k, v in sorted(stats.items()):
print(f" :{k} {v}")
if args.runs > 1:
print(f"runs: {args.runs}")
print(f"min: {min(timings)}ms")
print(f"median: {statistics.median(timings):.0f}ms")
print(f"max: {max(timings)}ms")
print(f"result: {result['result']}")
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

70
.github/skills/deeptest/SKILL.md vendored Normal file
View file

@ -0,0 +1,70 @@
---
name: deeptest
description: Generate stress tests and differential tests for Z3 theories. Creates random or structured SMT-LIB2 formulas, runs them through Z3, and checks for crashes, assertion failures, or result inconsistencies. Inspired by fuzzing and metamorphic testing approaches applied to SMT solvers.
---
Given a strategy and count, generate SMT-LIB2 formulas targeting Z3 internals and report anomalies. Strategies range from pure random generation to structured metamorphic and cross-theory combinations. Every formula and finding is logged to z3agent.db.
# Step 1: Choose a strategy and run
```bash
python3 scripts/deeptest.py --strategy random --count 100 --seed 42
python3 scripts/deeptest.py --strategy metamorphic --seed-file base.smt2 --count 50
python3 scripts/deeptest.py --strategy cross-theory --theories "LIA,BV" --count 80
python3 scripts/deeptest.py --strategy incremental --count 60 --debug
```
Available strategies:
- `random`: generate formulas with random declarations (Int, Bool, BitVec), random arithmetic and boolean assertions, and check-sat.
- `metamorphic`: start from a base formula (generated or loaded from file), apply equisatisfiable transformations (tautology insertion, double negation, assertion duplication), and verify the result stays consistent.
- `cross-theory`: combine multiple theories (LIA, Bool, BV) in a single formula with bridging constraints to stress theory combination.
- `incremental`: generate push/pop sequences with per-frame assertions to stress incremental solving.
# Step 2: Interpret the output
The script prints a summary after completion:
```
strategy: random
seed: 42
formulas: 100
anomalies: 2
elapsed: 4500ms
```
A nonzero anomaly count means the run detected crashes (nonzero exit code), assertion failures in stderr, solver errors, or result disagreements between a base formula and its metamorphic variants.
# Step 3: Inspect findings
Findings are logged to `z3agent.db` with category, severity, and details:
```bash
python3 ../../shared/z3db.py query "SELECT category, severity, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='deeptest') ORDER BY finding_id DESC LIMIT 20"
```
Each finding includes the formula index, exit code, and a stderr excerpt for triage.
# Step 4: Reproduce
Use the `--seed` parameter to reproduce a run exactly:
```bash
python3 scripts/deeptest.py --strategy random --count 100 --seed 42
```
The seed is printed in every run summary and logged in the run record.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| strategy | string | no | random | test generation strategy: random, metamorphic, cross-theory, incremental |
| count | int | no | 50 | number of formulas to generate |
| seed | int | no | clock | random seed for reproducibility |
| seed-file | path | no | | base .smt2 file for metamorphic strategy |
| theories | string | no | LIA,BV | comma-separated theories for cross-theory strategy |
| timeout | int | no | 10 | per-formula Z3 timeout in seconds |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,393 @@
#!/usr/bin/env python3
"""
deeptest.py: generate and run stress tests for Z3.
Usage:
python deeptest.py --strategy random --count 100
python deeptest.py --strategy metamorphic --seed-file base.smt2
python deeptest.py --strategy cross-theory --theories "LIA,BV" --debug
"""
import argparse
import logging
import random
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, setup_logging
log = logging.getLogger("deeptest")
# Sort and operator tables
THEORY_SORTS = {
"LIA": "Int",
"Bool": "Bool",
"BV": "(_ BitVec 32)",
}
INT_ARITH = ["+", "-", "*"]
INT_CMP = [">", "<", ">=", "<=", "="]
BV_ARITH = ["bvadd", "bvsub", "bvand", "bvor", "bvxor"]
BV_CMP = ["bvslt", "bvsgt", "bvsle", "bvsge", "="]
# Assertion generators (one per sort)
def _int_assertion(rng, vs):
if len(vs) < 2:
return f"(assert ({rng.choice(INT_CMP)} {vs[0]} {rng.randint(-10, 10)}))"
a, b = rng.sample(vs, 2)
return f"(assert ({rng.choice(INT_CMP)} ({rng.choice(INT_ARITH)} {a} {b}) {rng.randint(-10, 10)}))"
def _bool_assertion(rng, vs):
if len(vs) == 1:
return f"(assert {vs[0]})" if rng.random() < 0.5 else f"(assert (not {vs[0]}))"
a, b = rng.sample(vs, 2)
return f"(assert ({rng.choice(['and', 'or', '=>'])} {a} {b}))"
def _bv_assertion(rng, vs):
lit = f"(_ bv{rng.randint(0, 255)} 32)"
if len(vs) < 2:
return f"(assert ({rng.choice(BV_CMP)} {vs[0]} {lit}))"
a, b = rng.sample(vs, 2)
return f"(assert ({rng.choice(BV_CMP)} ({rng.choice(BV_ARITH)} {a} {b}) {lit}))"
SORT_ASSERTION = {
"Int": _int_assertion,
"Bool": _bool_assertion,
"(_ BitVec 32)": _bv_assertion,
}
def _random_assertion(rng, vars_by_sort):
"""Pick a populated sort and emit one random assertion."""
available = [s for s in vars_by_sort if vars_by_sort[s]]
if not available:
return None
sort = rng.choice(available)
return SORT_ASSERTION[sort](rng, vars_by_sort[sort])
# Formula generators
def gen_random_formula(rng, num_vars=5, num_assertions=5):
"""Random declarations, random assertions, single check-sat."""
lines = []
vars_by_sort = {}
sorts = list(THEORY_SORTS.values())
for i in range(num_vars):
sort = rng.choice(sorts)
name = f"v{i}"
lines.append(f"(declare-const {name} {sort})")
vars_by_sort.setdefault(sort, []).append(name)
for _ in range(num_assertions):
a = _random_assertion(rng, vars_by_sort)
if a:
lines.append(a)
lines.append("(check-sat)")
return "\n".join(lines)
def gen_metamorphic_variant(rng, base_formula):
"""Apply an equisatisfiable transformation to a formula.
Transformations:
tautology : insert (assert true) before check-sat
double_neg : wrap one assertion body in double negation
duplicate : repeat an existing assertion
"""
lines = base_formula.strip().split("\n")
transform = rng.choice(["tautology", "double_neg", "duplicate"])
assertion_idxs = [i for i, l in enumerate(lines)
if l.strip().startswith("(assert")]
if transform == "tautology":
pos = next((i for i, l in enumerate(lines) if "check-sat" in l),
len(lines))
lines.insert(pos, "(assert true)")
elif transform == "double_neg" and assertion_idxs:
idx = rng.choice(assertion_idxs)
body = lines[idx].strip()
inner = body[len("(assert "):-1]
lines[idx] = f"(assert (not (not {inner})))"
elif transform == "duplicate" and assertion_idxs:
idx = rng.choice(assertion_idxs)
lines.insert(idx + 1, lines[idx])
return "\n".join(lines)
def gen_cross_theory_formula(rng, theories, num_vars=4, num_assertions=6):
"""Combine variables from multiple theories with bridging constraints."""
lines = []
vars_by_sort = {}
sorts = [THEORY_SORTS[t] for t in theories if t in THEORY_SORTS]
if not sorts:
sorts = list(THEORY_SORTS.values())
for i in range(num_vars):
sort = sorts[i % len(sorts)]
name = f"v{i}"
lines.append(f"(declare-const {name} {sort})")
vars_by_sort.setdefault(sort, []).append(name)
for _ in range(num_assertions):
a = _random_assertion(rng, vars_by_sort)
if a:
lines.append(a)
# Bridge Int and Bool when both present
int_vs = vars_by_sort.get("Int", [])
bool_vs = vars_by_sort.get("Bool", [])
if int_vs and bool_vs:
iv = rng.choice(int_vs)
bv = rng.choice(bool_vs)
lines.append(f"(assert (= {bv} (> {iv} 0)))")
lines.append("(check-sat)")
return "\n".join(lines)
def gen_incremental_formula(rng, num_frames=3, num_vars=4,
asserts_per_frame=3):
"""Push/pop sequence: all variables declared globally, assertions scoped."""
lines = []
vars_by_sort = {}
sorts = list(THEORY_SORTS.values())
for i in range(num_vars):
sort = rng.choice(sorts)
name = f"v{i}"
lines.append(f"(declare-const {name} {sort})")
vars_by_sort.setdefault(sort, []).append(name)
for _ in range(num_frames):
lines.append("(push 1)")
for _ in range(asserts_per_frame):
a = _random_assertion(rng, vars_by_sort)
if a:
lines.append(a)
lines.append("(check-sat)")
lines.append("(pop 1)")
lines.append("(check-sat)")
return "\n".join(lines)
# Anomaly detection
def classify_result(result):
"""Return an anomaly category string or None if the result looks normal."""
if result["exit_code"] != 0 and result["result"] != "timeout":
return "crash"
if "assertion" in result["stderr"].lower():
return "assertion_failure"
if result["result"] == "error":
return "error"
return None
# Strategy runners
def run_random(args, rng, db, run_id):
anomalies = 0
for i in range(args.count):
formula = gen_random_formula(rng, rng.randint(2, 8),
rng.randint(1, 10))
log.debug("formula %d:\n%s", i, formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
debug=args.debug)
db.log_formula(run_id, formula, result["result"])
cat = classify_result(result)
if cat:
anomalies += 1
db.log_finding(
run_id, cat,
f"random formula #{i}: {cat} (exit={result['exit_code']})",
severity="high" if cat == "crash" else "medium",
details={"formula_index": i,
"exit_code": result["exit_code"],
"stderr": result["stderr"][:500]})
log.warning("anomaly in formula %d: %s", i, cat)
return anomalies
def run_metamorphic(args, rng, db, run_id):
if args.seed_file:
base = Path(args.seed_file).read_text()
else:
base = gen_random_formula(rng, num_vars=4, num_assertions=3)
base_out = run_z3(base, z3_bin=args.z3, timeout=args.timeout,
debug=args.debug)
base_status = base_out["result"]
db.log_formula(run_id, base, base_status)
log.info("base formula result: %s", base_status)
if base_status not in ("sat", "unsat"):
db.log_finding(run_id, "skip",
f"base formula not definite: {base_status}",
severity="info")
return 0
anomalies = 0
for i in range(args.count):
variant = gen_metamorphic_variant(rng, base)
log.debug("variant %d:\n%s", i, variant)
result = run_z3(variant, z3_bin=args.z3, timeout=args.timeout,
debug=args.debug)
db.log_formula(run_id, variant, result["result"])
cat = classify_result(result)
if cat:
anomalies += 1
db.log_finding(
run_id, cat,
f"metamorphic variant #{i}: {cat}",
severity="high",
details={"variant_index": i,
"stderr": result["stderr"][:500]})
log.warning("anomaly in variant %d: %s", i, cat)
continue
if result["result"] in ("sat", "unsat") \
and result["result"] != base_status:
anomalies += 1
db.log_finding(
run_id, "disagreement",
f"variant #{i}: expected {base_status}, "
f"got {result['result']}",
severity="critical",
details={"variant_index": i,
"expected": base_status,
"actual": result["result"]})
log.warning("disagreement in variant %d: expected %s, got %s",
i, base_status, result["result"])
return anomalies
def run_cross_theory(args, rng, db, run_id):
theories = [t.strip() for t in args.theories.split(",")]
anomalies = 0
for i in range(args.count):
formula = gen_cross_theory_formula(rng, theories,
rng.randint(3, 8),
rng.randint(2, 10))
log.debug("cross-theory formula %d:\n%s", i, formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
debug=args.debug)
db.log_formula(run_id, formula, result["result"])
cat = classify_result(result)
if cat:
anomalies += 1
db.log_finding(
run_id, cat,
f"cross-theory #{i} ({','.join(theories)}): {cat}",
severity="high" if cat == "crash" else "medium",
details={"formula_index": i, "theories": theories,
"exit_code": result["exit_code"],
"stderr": result["stderr"][:500]})
log.warning("anomaly in cross-theory formula %d: %s", i, cat)
return anomalies
def run_incremental(args, rng, db, run_id):
anomalies = 0
for i in range(args.count):
num_frames = rng.randint(2, 6)
formula = gen_incremental_formula(rng, num_frames)
log.debug("incremental formula %d:\n%s", i, formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
debug=args.debug)
db.log_formula(run_id, formula, result["result"])
cat = classify_result(result)
if cat:
anomalies += 1
db.log_finding(
run_id, cat,
f"incremental #{i} ({num_frames} frames): {cat}",
severity="high" if cat == "crash" else "medium",
details={"formula_index": i, "num_frames": num_frames,
"exit_code": result["exit_code"],
"stderr": result["stderr"][:500]})
log.warning("anomaly in incremental formula %d: %s", i, cat)
return anomalies
STRATEGIES = {
"random": run_random,
"metamorphic": run_metamorphic,
"cross-theory": run_cross_theory,
"incremental": run_incremental,
}
# Entry point
def main():
parser = argparse.ArgumentParser(
prog="deeptest",
description="Generate and run stress tests for Z3.",
)
parser.add_argument("--strategy", choices=list(STRATEGIES),
default="random",
help="test generation strategy")
parser.add_argument("--count", type=int, default=50,
help="number of formulas to generate")
parser.add_argument("--seed", type=int, default=None,
help="random seed for reproducibility")
parser.add_argument("--seed-file", default=None,
help="base .smt2 file for metamorphic strategy")
parser.add_argument("--theories", default="LIA,BV",
help="comma-separated theories for cross-theory")
parser.add_argument("--timeout", type=int, default=10,
help="per-formula Z3 timeout in seconds")
parser.add_argument("--z3", default=None, help="path to z3 binary")
parser.add_argument("--db", default=None, help="path to z3agent.db")
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
seed = args.seed if args.seed is not None else int(time.time())
rng = random.Random(seed)
log.info("seed: %d", seed)
db = Z3DB(args.db)
run_id = db.start_run(
"deeptest",
f"strategy={args.strategy} count={args.count} seed={seed}")
t0 = time.monotonic()
anomalies = STRATEGIES[args.strategy](args, rng, db, run_id)
elapsed_ms = int((time.monotonic() - t0) * 1000)
status = "success" if anomalies == 0 else "findings"
db.finish_run(run_id, status, elapsed_ms)
print(f"strategy: {args.strategy}")
print(f"seed: {seed}")
print(f"formulas: {args.count}")
print(f"anomalies: {anomalies}")
print(f"elapsed: {elapsed_ms}ms")
db.close()
sys.exit(1 if anomalies > 0 else 0)
if __name__ == "__main__":
main()

45
.github/skills/encode/SKILL.md vendored Normal file
View file

@ -0,0 +1,45 @@
---
name: encode
description: Translate constraint problems into SMT-LIB2 or Z3 Python API code. Handles common problem classes including scheduling, graph coloring, arithmetic puzzles, and verification conditions.
---
Given a problem description (natural language, pseudocode, or a partial formulation), produce a complete, syntactically valid SMT-LIB2 encoding or Z3 Python script. The encoding should declare all variables, assert all constraints, and include the appropriate check-sat / get-model commands.
# Step 1: Identify the problem class
Common encodings:
| Problem class | Theory | Typical sorts |
|---------------|--------|---------------|
| Integer arithmetic | LIA / NIA | Int |
| Real arithmetic | LRA / NRA | Real |
| Bitvector operations | QF_BV | (_ BitVec N) |
| Arrays and maps | QF_AX | (Array Int Int) |
| Strings and regex | QF_S | String, RegLan |
| Uninterpreted functions | QF_UF | custom sorts |
| Mixed theories | AUFLIA, etc. | combination |
# Step 2: Generate the encoding
```bash
python3 scripts/encode.py --problem "Find integers x, y such that x^2 + y^2 = 25 and x > 0" --format smtlib2
python3 scripts/encode.py --problem "Schedule 4 tasks on 2 machines minimizing makespan" --format python
```
For `--format smtlib2`, the output is a complete .smt2 file ready for the **solve** skill.
For `--format python`, the output is a standalone Z3 Python script.
# Step 3: Validate the encoding
The script checks that the generated formula is syntactically valid by running a quick `z3 -in` parse check (no solving, just syntax). Parse errors are reported with the offending line.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| problem | string | yes | | problem description |
| format | string | no | smtlib2 | output format: smtlib2 or python |
| output | path | no | stdout | write to file instead of stdout |
| validate | flag | no | on | run syntax check on the output |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

144
.github/skills/encode/scripts/encode.py vendored Normal file
View file

@ -0,0 +1,144 @@
#!/usr/bin/env python3
"""
encode.py: validate and format SMT-LIB2 encodings.
Usage:
python encode.py --validate formula.smt2
python encode.py --validate formula.smt2 --debug
"""
import argparse
import re
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, setup_logging
VALIDATION_TIMEOUT = 5
def read_input(path_or_stdin: str) -> str:
"""Read formula from a file path or stdin (when path is '-')."""
if path_or_stdin == "-":
return sys.stdin.read()
p = Path(path_or_stdin)
if not p.exists():
print(f"file not found: {p}", file=sys.stderr)
sys.exit(1)
return p.read_text()
def find_errors(output: str) -> list:
"""Extract (error ...) messages from Z3 output."""
return re.findall(r'\(error\s+"([^"]+)"\)', output)
def validate(formula: str, z3_bin: str = None, debug: bool = False) -> dict:
"""
Validate an SMT-LIB2 formula by piping it through z3 -in.
Returns a dict with 'valid' (bool), 'errors' (list), and 'raw' output.
"""
result = run_z3(
formula, z3_bin=z3_bin, timeout=VALIDATION_TIMEOUT, debug=debug,
)
errors = find_errors(result["stdout"]) + find_errors(result["stderr"])
if result["result"] == "timeout":
# Timeout during validation is not a syntax error: the formula
# parsed successfully but solving exceeded the limit. That counts
# as syntactically valid.
return {"valid": True, "errors": [], "raw": result}
if errors or result["exit_code"] != 0:
return {"valid": False, "errors": errors, "raw": result}
return {"valid": True, "errors": [], "raw": result}
def report_errors(errors: list, formula: str):
"""Print each syntax error with surrounding context."""
lines = formula.splitlines()
print(f"validation failed: {len(errors)} error(s)", file=sys.stderr)
for err in errors:
print(f" : {err}", file=sys.stderr)
if len(lines) <= 20:
print("formula:", file=sys.stderr)
for i, line in enumerate(lines, 1):
print(f" {i:3d} {line}", file=sys.stderr)
def write_output(formula: str, output_path: str, fmt: str):
"""Write the validated formula to a file or stdout."""
if fmt == "python":
print("python format output is generated by the agent, "
"not by this script", file=sys.stderr)
sys.exit(1)
if output_path:
Path(output_path).write_text(formula)
print(f"written to {output_path}")
else:
print(formula)
def main():
parser = argparse.ArgumentParser(prog="encode")
parser.add_argument(
"--validate",
metavar="FILE",
help="path to .smt2 file to validate, or '-' for stdin",
)
parser.add_argument(
"--format",
choices=["smtlib2", "python"],
default="smtlib2",
help="output format (default: smtlib2)",
)
parser.add_argument(
"--output",
metavar="FILE",
default=None,
help="write result to file instead of stdout",
)
parser.add_argument("--z3", default=None, help="path to z3 binary")
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if not args.validate:
parser.error("provide --validate FILE")
return
formula = read_input(args.validate)
db = Z3DB(args.db)
run_id = db.start_run("encode", formula)
result = validate(formula, z3_bin=args.z3, debug=args.debug)
if result["valid"]:
db.log_formula(run_id, formula, "valid")
db.finish_run(run_id, "valid", result["raw"]["duration_ms"], 0)
write_output(formula, args.output, args.format)
db.close()
sys.exit(0)
else:
db.log_formula(run_id, formula, "error")
for err in result["errors"]:
db.log_finding(run_id, "syntax", err, severity="error")
db.finish_run(
run_id, "error",
result["raw"]["duration_ms"],
result["raw"]["exit_code"],
)
report_errors(result["errors"], formula)
db.close()
sys.exit(1)
if __name__ == "__main__":
main()

52
.github/skills/explain/SKILL.md vendored Normal file
View file

@ -0,0 +1,52 @@
---
name: explain
description: Parse and interpret Z3 output for human consumption. Handles models, unsat cores, proofs, statistics, and error messages. Translates solver internals into plain-language explanations.
---
Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark** skills), produce a structured explanation. This skill is for cases where the solver output is large, nested, or otherwise difficult to read directly.
# Step 1: Identify the output type
| Output contains | Explanation type |
|----------------|-----------------|
| `(define-fun ...)` blocks | model explanation |
| unsat core labels | conflict explanation |
| `:key value` statistics | performance breakdown |
| `(error ...)` | error diagnosis |
| proof terms | proof sketch |
# Step 2: Run the explainer
```bash
python3 scripts/explain.py --file output.txt
python3 scripts/explain.py --stdin < output.txt
python3 scripts/explain.py --file output.txt --debug
```
The script auto-detects the output type and produces a structured summary.
# Step 3: Interpret the explanation
For models:
- Each variable is listed with its value and sort
- Array and function interpretations are expanded
- Bitvector values are shown in decimal and hex
For unsat cores:
- The conflicting named assertions are listed
- A minimal conflict set is highlighted
For statistics:
- Time breakdown by phase (preprocessing, solving, model construction)
- Theory solver load distribution
- Memory high-water mark
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| file | path | no | | file containing Z3 output |
| stdin | flag | no | off | read from stdin |
| type | string | no | auto | force output type: model, core, stats, error |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,128 @@
#!/usr/bin/env python3
"""
explain.py: interpret Z3 output in a readable form.
Usage:
python explain.py --file output.txt
echo "sat\n(model ...)" | python explain.py --stdin
"""
import argparse
import re
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, parse_model, parse_stats, parse_unsat_core, setup_logging
def detect_type(text: str) -> str:
if "(define-fun" in text:
return "model"
if "(error" in text:
return "error"
if re.search(r':\S+\s+[\d.]+', text):
return "stats"
first = text.strip().split("\n")[0].strip()
if first == "unsat":
return "core"
return "unknown"
def explain_model(text: str):
model = parse_model(text)
if not model:
print("no model found in output")
return
print("satisfying assignment:")
for name, val in model.items():
# show hex for large integers (likely bitvectors)
try:
n = int(val)
if abs(n) > 255:
print(f" {name} = {val} (0x{n:x})")
else:
print(f" {name} = {val}")
except ValueError:
print(f" {name} = {val}")
def explain_core(text: str):
core = parse_unsat_core(text)
if core:
print(f"conflicting assertions ({len(core)}):")
for label in core:
print(f" {label}")
else:
print("unsat (no named assertions for core extraction)")
def explain_stats(text: str):
stats = parse_stats(text)
if not stats:
print("no statistics found")
return
print("performance breakdown:")
for k in sorted(stats):
print(f" :{k} {stats[k]}")
if "time" in stats:
print(f"\ntotal time: {stats['time']}s")
if "memory" in stats:
print(f"peak memory: {stats['memory']} MB")
def explain_error(text: str):
errors = re.findall(r'\(error\s+"([^"]+)"\)', text)
if errors:
print(f"Z3 reported {len(errors)} error(s):")
for e in errors:
print(f" {e}")
else:
print("error in output but could not parse message")
def main():
parser = argparse.ArgumentParser(prog="explain")
parser.add_argument("--file")
parser.add_argument("--stdin", action="store_true")
parser.add_argument("--type", choices=["model", "core", "stats", "error", "auto"],
default="auto")
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
text = Path(args.file).read_text()
elif args.stdin:
text = sys.stdin.read()
else:
parser.error("provide --file or --stdin")
return
output_type = args.type if args.type != "auto" else detect_type(text)
db = Z3DB(args.db)
run_id = db.start_run("explain", text[:200])
if output_type == "model":
explain_model(text)
elif output_type == "core":
explain_core(text)
elif output_type == "stats":
explain_stats(text)
elif output_type == "error":
explain_error(text)
else:
print("could not determine output type")
print("raw output:")
print(text[:500])
db.finish_run(run_id, "success", 0, 0)
db.close()
if __name__ == "__main__":
main()

53
.github/skills/memory-safety/SKILL.md vendored Normal file
View file

@ -0,0 +1,53 @@
---
name: memory-safety
description: Run AddressSanitizer and UndefinedBehaviorSanitizer on the Z3 test suite to detect memory errors, undefined behavior, and leaks. Logs each finding to z3agent.db.
---
Build Z3 with compiler-based sanitizer instrumentation, execute the test suite, and parse the runtime output for memory safety violations. Supported sanitizers are AddressSanitizer (heap and stack buffer overflows, use-after-free, double-free, memory leaks) and UndefinedBehaviorSanitizer (signed integer overflow, null pointer dereference, misaligned access, shift errors). Findings are deduplicated and stored in z3agent.db for triage and longitudinal tracking.
# Step 1: Configure and build
The script invokes cmake with the appropriate `-fsanitize` flags and builds the `test-z3` target. Each sanitizer uses a separate build directory to avoid flag conflicts. If a prior instrumented build exists with matching flags, only incremental compilation runs.
```bash
python3 scripts/memory_safety.py --sanitizer asan
python3 scripts/memory_safety.py --sanitizer ubsan
python3 scripts/memory_safety.py --sanitizer both
```
To reuse an existing build:
```bash
python3 scripts/memory_safety.py --sanitizer asan --skip-build --build-dir build/sanitizer-asan
```
# Step 2: Run and collect
The test binary runs with `halt_on_error=0` so the sanitizer reports all violations rather than aborting on the first. The script parses `ERROR: AddressSanitizer`, `runtime error:`, and `ERROR: LeakSanitizer` patterns from the combined output, extracts source locations where available, and deduplicates by category, file, and line.
```bash
python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug
```
# Step 3: Interpret results
- `clean`: no sanitizer violations detected.
- `findings`: one or more violations found. Each is printed with severity, category, message, and source location.
- `timeout`: the test suite did not complete within the deadline. Increase the timeout or investigate a possible infinite loop.
- `error`: build or execution failed before sanitizer output could be collected.
Query past runs:
```bash
python3 ../../shared/z3db.py runs --skill memory-safety --last 10
python3 ../../shared/z3db.py query "SELECT category, severity, file, line, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='memory-safety') ORDER BY run_id DESC LIMIT 20"
```
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| sanitizer | choice | no | asan | which sanitizer to enable: asan, ubsan, or both |
| build-dir | path | no | build/sanitizer-{name} | path to the build directory |
| timeout | int | no | 600 | seconds before killing the test run |
| skip-build | flag | no | off | reuse an existing instrumented build |
| debug | flag | no | off | verbose cmake, make, and test output |
| db | path | no | .z3-agent/z3agent.db | path to the logging database |

View file

@ -0,0 +1,266 @@
#!/usr/bin/env python3
"""
memory_safety.py: run sanitizer checks on Z3 test suite.
Usage:
python memory_safety.py --sanitizer asan
python memory_safety.py --sanitizer ubsan --debug
"""
import argparse
import logging
import os
import re
import subprocess
import sys
import time
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, setup_logging
logger = logging.getLogger("z3agent")
SANITIZER_FLAGS = {
"asan": "-fsanitize=address -fno-omit-frame-pointer",
"ubsan": "-fsanitize=undefined -fno-omit-frame-pointer",
}
ASAN_ERROR = re.compile(r"ERROR:\s*AddressSanitizer:\s*(\S+)")
UBSAN_ERROR = re.compile(r":\d+:\d+:\s*runtime error:\s*(.+)")
LEAK_ERROR = re.compile(r"ERROR:\s*LeakSanitizer:")
LOCATION = re.compile(r"(\S+\.(?:cpp|c|h|hpp)):(\d+)")
def find_repo_root() -> Path:
d = Path.cwd()
for _ in range(10):
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
return d
parent = d.parent
if parent == d:
break
d = parent
logger.error("could not locate Z3 repository root")
sys.exit(1)
def build_is_configured(build_dir: Path, sanitizer: str) -> bool:
"""Check whether the build directory already has a matching cmake config."""
cache = build_dir / "CMakeCache.txt"
if not cache.is_file():
return False
expected = SANITIZER_FLAGS[sanitizer].split()[0]
return expected in cache.read_text()
def configure(build_dir: Path, sanitizer: str, repo_root: Path) -> bool:
"""Run cmake with the requested sanitizer flags."""
flags = SANITIZER_FLAGS[sanitizer]
build_dir.mkdir(parents=True, exist_ok=True)
cmd = [
"cmake", str(repo_root),
f"-DCMAKE_C_FLAGS={flags}",
f"-DCMAKE_CXX_FLAGS={flags}",
f"-DCMAKE_EXE_LINKER_FLAGS={flags}",
"-DCMAKE_BUILD_TYPE=Debug",
"-DZ3_BUILD_TEST=ON",
]
logger.info("configuring %s build in %s", sanitizer, build_dir)
logger.debug("cmake command: %s", " ".join(cmd))
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
if proc.returncode != 0:
logger.error("cmake failed:\n%s", proc.stderr)
return False
return True
def compile_tests(build_dir: Path) -> bool:
"""Compile the test-z3 target."""
nproc = os.cpu_count() or 4
cmd = ["make", f"-j{nproc}", "test-z3"]
logger.info("compiling test-z3 (%d parallel jobs)", nproc)
proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
if proc.returncode != 0:
logger.error("compilation failed:\n%s", proc.stderr[-2000:])
return False
return True
def run_tests(build_dir: Path, timeout: int) -> dict:
"""Execute test-z3 under sanitizer runtime and capture output."""
test_bin = build_dir / "test-z3"
if not test_bin.is_file():
logger.error("test-z3 not found at %s", test_bin)
return {"stdout": "", "stderr": "binary not found", "exit_code": -1,
"duration_ms": 0}
env = os.environ.copy()
env["ASAN_OPTIONS"] = "detect_leaks=1:halt_on_error=0:print_stacktrace=1"
env["UBSAN_OPTIONS"] = "print_stacktrace=1:halt_on_error=0"
cmd = [str(test_bin), "/a"]
logger.info("running: %s", " ".join(cmd))
start = time.monotonic()
try:
proc = subprocess.run(
cmd, capture_output=True, text=True, timeout=timeout,
cwd=build_dir, env=env,
)
except subprocess.TimeoutExpired:
ms = int((time.monotonic() - start) * 1000)
logger.warning("test-z3 timed out after %dms", ms)
return {"stdout": "", "stderr": "timeout", "exit_code": -1,
"duration_ms": ms}
ms = int((time.monotonic() - start) * 1000)
logger.debug("exit_code=%d duration=%dms", proc.returncode, ms)
return {
"stdout": proc.stdout,
"stderr": proc.stderr,
"exit_code": proc.returncode,
"duration_ms": ms,
}
def parse_findings(output: str) -> list:
"""Extract sanitizer error reports from combined stdout and stderr."""
findings = []
lines = output.split("\n")
for i, line in enumerate(lines):
entry = None
m = ASAN_ERROR.search(line)
if m:
entry = {"category": "asan", "message": m.group(1),
"severity": "high"}
if not entry:
m = LEAK_ERROR.search(line)
if m:
entry = {"category": "leak",
"message": "detected memory leaks",
"severity": "high"}
if not entry:
m = UBSAN_ERROR.search(line)
if m:
entry = {"category": "ubsan", "message": m.group(1),
"severity": "medium"}
if not entry:
continue
file_path, line_no = None, None
window = lines[max(0, i - 2):i + 5]
for ctx in window:
loc = LOCATION.search(ctx)
if loc and "/usr/" not in loc.group(1):
file_path = loc.group(1)
line_no = int(loc.group(2))
break
entry["file"] = file_path
entry["line"] = line_no
entry["raw"] = line.strip()
findings.append(entry)
return findings
def deduplicate(findings: list) -> list:
"""Remove duplicate reports at the same category, file, and line."""
seen = set()
result = []
for f in findings:
key = (f["category"], f["file"], f["line"], f["message"])
if key in seen:
continue
seen.add(key)
result.append(f)
return result
def main():
parser = argparse.ArgumentParser(prog="memory-safety")
parser.add_argument("--sanitizer", choices=["asan", "ubsan", "both"],
default="asan",
help="sanitizer to enable (default: asan)")
parser.add_argument("--build-dir", default=None,
help="path to build directory")
parser.add_argument("--timeout", type=int, default=600,
help="seconds before killing test run")
parser.add_argument("--skip-build", action="store_true",
help="reuse existing instrumented build")
parser.add_argument("--db", default=None,
help="path to z3agent.db")
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
repo_root = find_repo_root()
sanitizers = ["asan", "ubsan"] if args.sanitizer == "both" else [args.sanitizer]
all_findings = []
db = Z3DB(args.db)
for san in sanitizers:
if args.build_dir:
build_dir = Path(args.build_dir)
else:
build_dir = repo_root / "build" / f"sanitizer-{san}"
run_id = db.start_run("memory-safety", f"sanitizer={san}")
db.log(f"sanitizer: {san}, build: {build_dir}", run_id=run_id)
if not args.skip_build:
needs_configure = not build_is_configured(build_dir, san)
if needs_configure and not configure(build_dir, san, repo_root):
db.finish_run(run_id, "error", 0, exit_code=1)
print(f"FAIL: cmake configuration failed for {san}")
continue
if not compile_tests(build_dir):
db.finish_run(run_id, "error", 0, exit_code=1)
print(f"FAIL: compilation failed for {san}")
continue
result = run_tests(build_dir, args.timeout)
combined = result["stdout"] + "\n" + result["stderr"]
findings = deduplicate(parse_findings(combined))
for f in findings:
db.log_finding(
run_id,
category=f["category"],
message=f["message"],
severity=f["severity"],
file=f["file"],
line=f["line"],
details={"raw": f["raw"]},
)
status = "clean" if not findings else "findings"
if result["exit_code"] == -1:
status = "timeout" if "timeout" in result["stderr"] else "error"
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
all_findings.extend(findings)
print(f"{san}: {len(findings)} finding(s), {result['duration_ms']}ms")
if all_findings:
print(f"\nTotal: {len(all_findings)} finding(s)")
for f in all_findings:
loc = f"{f['file']}:{f['line']}" if f["file"] else "unknown location"
print(f" [{f['severity']}] {f['category']}: {f['message']} at {loc}")
db.close()
sys.exit(1)
else:
print("\nNo sanitizer findings.")
db.close()
sys.exit(0)
if __name__ == "__main__":
main()

48
.github/skills/optimize/SKILL.md vendored Normal file
View file

@ -0,0 +1,48 @@
---
name: optimize
description: Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.
---
Given a set of constraints and an objective function, find the optimal value. Z3 supports both hard constraints (must hold) and soft constraints (weighted preferences), as well as lexicographic multi-objective optimization.
# Step 1: Formulate the problem
The formula uses the `(minimize ...)` or `(maximize ...)` directives followed by `(check-sat)` and `(get-model)`.
Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`:
```smtlib
(declare-const x Int)
(declare-const y Int)
(assert (>= x 1))
(assert (>= y 2))
(assert (<= (+ x y) 10))
(minimize (+ x y))
(check-sat)
(get-model)
```
# Step 2: Run the optimizer
```bash
python3 scripts/optimize.py --file scheduling.smt2
python3 scripts/optimize.py --formula "<inline smt-lib2>" --debug
```
# Step 3: Interpret the output
- `sat` with a model: the optimal assignment satisfying all constraints.
- `unsat`: the constraints are contradictory; no feasible solution exists.
- `unknown` or `timeout`: Z3 could not determine optimality.
The script prints the objective value and the satisfying assignment.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula with minimize/maximize |
| file | path | no | | path to .smt2 file |
| timeout | int | no | 60 | seconds |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,60 @@
#!/usr/bin/env python3
"""
optimize.py: solve constrained optimization problems via Z3.
Usage:
python optimize.py --file scheduling.smt2
python optimize.py --formula "(declare-const x Int)..." --debug
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_model, setup_logging
def main():
parser = argparse.ArgumentParser(prog="optimize")
parser.add_argument("--formula")
parser.add_argument("--file")
parser.add_argument("--timeout", type=int, default=60)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.formula:
formula = args.formula
else:
parser.error("provide --formula or --file")
return
db = Z3DB(args.db)
run_id = db.start_run("optimize", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
db.log_formula(run_id, formula, result["result"],
str(model) if model else None)
db.finish_run(run_id, result["result"], result["duration_ms"],
result["exit_code"])
print(result["result"])
if model:
for name, val in model.items():
print(f" {name} = {val}")
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

54
.github/skills/prove/SKILL.md vendored Normal file
View file

@ -0,0 +1,54 @@
---
name: prove
description: Prove validity of logical statements by negation and satisfiability checking. If the negation is unsatisfiable, the original statement is valid. Otherwise a counterexample is returned.
---
Given a conjecture (an SMT-LIB2 assertion or a natural language claim), determine whether it holds universally. The method is standard: negate the conjecture and check satisfiability. If the negation is unsatisfiable, the original is valid. If satisfiable, the model is a counterexample.
# Step 1: Prepare the negated formula
Wrap the conjecture in `(assert (not ...))` and append `(check-sat)(get-model)`.
Example: to prove that `(> x 3)` implies `(> x 1)`:
```smtlib
(declare-const x Int)
(assert (not (=> (> x 3) (> x 1))))
(check-sat)
(get-model)
```
# Step 2: Run the prover
```bash
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
```
For file input where the file contains the full negated formula:
```bash
python3 scripts/prove.py --file negated.smt2
```
With debug tracing:
```bash
python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" --debug
```
# Step 3: Interpret the output
- `valid`: the negation was unsat, so the conjecture holds for all inputs.
- `invalid` followed by a counterexample: the negation was sat; the model shows a concrete assignment where the conjecture fails.
- `unknown` or `timeout`: Z3 could not decide. The conjecture may require auxiliary lemmas or induction.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| conjecture | string | no | | the assertion to prove (without negation) |
| vars | string | no | | variable declarations as "name:sort" pairs, comma-separated |
| file | path | no | | .smt2 file with the negated formula |
| timeout | int | no | 30 | seconds |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |
Either `conjecture` (with `vars`) or `file` must be provided.

82
.github/skills/prove/scripts/prove.py vendored Normal file
View file

@ -0,0 +1,82 @@
#!/usr/bin/env python3
"""
prove.py: prove validity by negation + satisfiability check.
Usage:
python prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
python prove.py --file negated.smt2
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_model, setup_logging
def build_formula(conjecture: str, vars_str: str) -> str:
lines = []
if vars_str:
for v in vars_str.split(","):
v = v.strip()
name, sort = v.split(":")
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
lines.append(f"(assert (not {conjecture}))")
lines.append("(check-sat)")
lines.append("(get-model)")
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(prog="prove")
parser.add_argument("--conjecture", help="assertion to prove")
parser.add_argument("--vars", help="variable declarations, e.g. 'x:Int,y:Bool'")
parser.add_argument("--file", help="path to .smt2 file with negated formula")
parser.add_argument("--timeout", type=int, default=30)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.conjecture:
formula = build_formula(args.conjecture, args.vars or "")
else:
parser.error("provide --conjecture or --file")
return
db = Z3DB(args.db)
run_id = db.start_run("prove", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
if result["result"] == "unsat":
verdict = "valid"
elif result["result"] == "sat":
verdict = "invalid"
else:
verdict = result["result"]
model = parse_model(result["stdout"]) if verdict == "invalid" else None
db.log_formula(run_id, formula, verdict, str(model) if model else None)
db.finish_run(run_id, verdict, result["duration_ms"], result["exit_code"])
print(verdict)
if model:
print("counterexample:")
for name, val in model.items():
print(f" {name} = {val}")
db.close()
# Exit 0 when we successfully determined validity or invalidity;
# exit 1 only for errors/timeouts.
sys.exit(0 if verdict in ("valid", "invalid") else 1)
if __name__ == "__main__":
main()

57
.github/skills/shared/schema.sql vendored Normal file
View file

@ -0,0 +1,57 @@
-- z3agent schema v1
PRAGMA journal_mode=WAL;
PRAGMA foreign_keys=ON;
CREATE TABLE IF NOT EXISTS runs (
run_id INTEGER PRIMARY KEY AUTOINCREMENT,
skill TEXT NOT NULL,
input_hash TEXT,
status TEXT NOT NULL DEFAULT 'running',
duration_ms INTEGER,
exit_code INTEGER,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_runs_skill ON runs(skill);
CREATE INDEX IF NOT EXISTS idx_runs_status ON runs(status);
CREATE TABLE IF NOT EXISTS formulas (
formula_id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
smtlib2 TEXT NOT NULL,
result TEXT,
model TEXT,
stats TEXT,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_formulas_run ON formulas(run_id);
CREATE INDEX IF NOT EXISTS idx_formulas_result ON formulas(result);
CREATE TABLE IF NOT EXISTS findings (
finding_id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
category TEXT NOT NULL,
severity TEXT,
file TEXT,
line INTEGER,
message TEXT NOT NULL,
details TEXT,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_findings_run ON findings(run_id);
CREATE INDEX IF NOT EXISTS idx_findings_category ON findings(category);
CREATE INDEX IF NOT EXISTS idx_findings_severity ON findings(severity);
CREATE TABLE IF NOT EXISTS interaction_log (
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id INTEGER REFERENCES runs(run_id) ON DELETE SET NULL,
level TEXT NOT NULL DEFAULT 'info',
message TEXT NOT NULL,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_log_run ON interaction_log(run_id);
CREATE INDEX IF NOT EXISTS idx_log_level ON interaction_log(level);

328
.github/skills/shared/z3db.py vendored Normal file
View file

@ -0,0 +1,328 @@
#!/usr/bin/env python3
"""
z3db: shared library and CLI for Z3 skill scripts.
Library usage:
from z3db import Z3DB, find_z3, run_z3
CLI usage:
python z3db.py init
python z3db.py status
python z3db.py log [--run-id N]
python z3db.py runs [--skill solve] [--last N]
python z3db.py query "SELECT ..."
"""
import argparse
import hashlib
import json
import logging
import os
import re
import shutil
import sqlite3
import subprocess
import sys
import time
from pathlib import Path
from typing import Optional
SCHEMA_PATH = Path(__file__).parent / "schema.sql"
DEFAULT_DB_DIR = ".z3-agent"
DEFAULT_DB_NAME = "z3agent.db"
logger = logging.getLogger("z3agent")
def setup_logging(debug: bool = False):
level = logging.DEBUG if debug else logging.INFO
fmt = "[%(levelname)s] %(message)s" if not debug else \
"[%(levelname)s %(asctime)s] %(message)s"
logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
class Z3DB:
"""SQLite handle for z3agent.db, tracks runs, formulas, findings, logs."""
def __init__(self, db_path: Optional[str] = None):
if db_path is None:
db_dir = Path(DEFAULT_DB_DIR)
db_dir.mkdir(exist_ok=True)
db_path = str(db_dir / DEFAULT_DB_NAME)
self.db_path = db_path
self.conn = sqlite3.connect(db_path)
self.conn.execute("PRAGMA foreign_keys=ON")
self.conn.row_factory = sqlite3.Row
self._init_schema()
def _init_schema(self):
self.conn.executescript(SCHEMA_PATH.read_text())
def close(self):
self.conn.close()
def start_run(self, skill: str, input_text: str = "") -> int:
input_hash = hashlib.sha256(input_text.encode()).hexdigest()[:16]
cur = self.conn.execute(
"INSERT INTO runs (skill, input_hash) VALUES (?, ?)",
(skill, input_hash),
)
self.conn.commit()
run_id = cur.lastrowid
logger.debug("started run %d (skill=%s, hash=%s)", run_id, skill, input_hash)
return run_id
def finish_run(self, run_id: int, status: str, duration_ms: int,
exit_code: int = 0):
self.conn.execute(
"UPDATE runs SET status=?, duration_ms=?, exit_code=? WHERE run_id=?",
(status, duration_ms, exit_code, run_id),
)
self.conn.commit()
logger.debug("finished run %d: %s (%dms)", run_id, status, duration_ms)
def log_formula(self, run_id: int, smtlib2: str, result: str = None,
model: str = None, stats: dict = None) -> int:
cur = self.conn.execute(
"INSERT INTO formulas (run_id, smtlib2, result, model, stats) "
"VALUES (?, ?, ?, ?, ?)",
(run_id, smtlib2, result, model,
json.dumps(stats) if stats else None),
)
self.conn.commit()
return cur.lastrowid
def log_finding(self, run_id: int, category: str, message: str,
severity: str = None, file: str = None,
line: int = None, details: dict = None) -> int:
cur = self.conn.execute(
"INSERT INTO findings (run_id, category, severity, file, line, "
"message, details) VALUES (?, ?, ?, ?, ?, ?, ?)",
(run_id, category, severity, file, line, message,
json.dumps(details) if details else None),
)
self.conn.commit()
return cur.lastrowid
def log(self, message: str, level: str = "info", run_id: int = None):
"""Write to stderr and to the interaction_log table."""
getattr(logger, level, logger.info)(message)
self.conn.execute(
"INSERT INTO interaction_log (run_id, level, message) "
"VALUES (?, ?, ?)",
(run_id, level, message),
)
self.conn.commit()
def get_runs(self, skill: str = None, last: int = 10):
sql = "SELECT * FROM runs"
params = []
if skill:
sql += " WHERE skill = ?"
params.append(skill)
sql += " ORDER BY run_id DESC LIMIT ?"
params.append(last)
return self.conn.execute(sql, params).fetchall()
def get_status(self) -> dict:
rows = self.conn.execute(
"SELECT status, COUNT(*) as cnt FROM runs GROUP BY status"
).fetchall()
total = sum(r["cnt"] for r in rows)
by_status = {r["status"]: r["cnt"] for r in rows}
last = self.conn.execute(
"SELECT timestamp FROM runs ORDER BY run_id DESC LIMIT 1"
).fetchone()
return {
"total": total,
**by_status,
"last_run": last["timestamp"] if last else None,
}
def get_logs(self, run_id: int = None, last: int = 50):
if run_id:
return self.conn.execute(
"SELECT * FROM interaction_log WHERE run_id=? "
"ORDER BY log_id DESC LIMIT ?", (run_id, last)
).fetchall()
return self.conn.execute(
"SELECT * FROM interaction_log ORDER BY log_id DESC LIMIT ?",
(last,)
).fetchall()
def query(self, sql: str):
return self.conn.execute(sql).fetchall()
def find_z3(hint: str = None) -> str:
"""Locate the z3 binary: explicit path > build dirs > PATH."""
candidates = []
if hint:
candidates.append(hint)
repo_root = _find_repo_root()
if repo_root:
for build_dir in ["build", "build/release", "build/debug"]:
candidates.append(str(repo_root / build_dir / "z3"))
path_z3 = shutil.which("z3")
if path_z3:
candidates.append(path_z3)
for c in candidates:
p = Path(c)
if p.is_file() and os.access(p, os.X_OK):
logger.debug("found z3: %s", p)
return str(p)
logger.error("z3 binary not found. Searched: %s", candidates)
sys.exit(1)
def _find_repo_root() -> Optional[Path]:
d = Path.cwd()
for _ in range(10):
if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
return d
parent = d.parent
if parent == d:
break
d = parent
return None
def run_z3(formula: str, z3_bin: str = None, timeout: int = 30,
args: list = None, debug: bool = False) -> dict:
"""Pipe an SMT-LIB2 formula into z3 -in, return parsed output."""
z3_path = find_z3(z3_bin)
cmd = [z3_path, "-in"] + (args or [])
logger.debug("cmd: %s", " ".join(cmd))
logger.debug("stdin:\n%s", formula)
start = time.monotonic()
try:
proc = subprocess.run(
cmd, input=formula, capture_output=True, text=True,
timeout=timeout,
)
except subprocess.TimeoutExpired:
duration_ms = int((time.monotonic() - start) * 1000)
logger.warning("z3 timed out after %dms", duration_ms)
return {
"stdout": "", "stderr": "timeout", "exit_code": -1,
"duration_ms": duration_ms, "result": "timeout",
}
duration_ms = int((time.monotonic() - start) * 1000)
logger.debug("exit_code=%d duration=%dms", proc.returncode, duration_ms)
logger.debug("stdout:\n%s", proc.stdout)
if proc.stderr:
logger.debug("stderr:\n%s", proc.stderr)
first_line = proc.stdout.strip().split("\n")[0].strip() if proc.stdout else ""
result = first_line if first_line in ("sat", "unsat", "unknown") else "error"
return {
"stdout": proc.stdout,
"stderr": proc.stderr,
"exit_code": proc.returncode,
"duration_ms": duration_ms,
"result": result,
}
def parse_model(stdout: str) -> Optional[dict]:
"""Pull define-fun entries from a (get-model) response."""
model = {}
for m in re.finditer(
r'\(define-fun\s+(\S+)\s+\(\)\s+\S+\s+(.+?)\)', stdout
):
model[m.group(1)] = m.group(2).strip()
return model if model else None
def parse_stats(stdout: str) -> Optional[dict]:
"""Parse :key value pairs from z3 -st output."""
stats = {}
for m in re.finditer(r':(\S+)\s+([\d.]+)', stdout):
key, val = m.group(1), m.group(2)
stats[key] = float(val) if '.' in val else int(val)
return stats if stats else None
def parse_unsat_core(stdout: str) -> Optional[list]:
for line in stdout.strip().split("\n"):
line = line.strip()
if line.startswith("(") and not line.startswith("(error"):
labels = line.strip("()").split()
if labels:
return labels
return None
def cli():
parser = argparse.ArgumentParser(
description="Z3 Agent database CLI",
prog="z3db",
)
parser.add_argument("--db", default=None, help="path to z3agent.db")
parser.add_argument("--debug", action="store_true", help="verbose output")
sub = parser.add_subparsers(dest="command")
sub.add_parser("init", help="initialize the database")
status_p = sub.add_parser("status", help="show run summary")
log_p = sub.add_parser("log", help="show interaction log")
log_p.add_argument("--run-id", type=int, help="filter by run ID")
log_p.add_argument("--last", type=int, default=50)
runs_p = sub.add_parser("runs", help="list runs")
runs_p.add_argument("--skill", help="filter by skill name")
runs_p.add_argument("--last", type=int, default=10)
query_p = sub.add_parser("query", help="run raw SQL")
query_p.add_argument("sql", help="SQL query string")
args = parser.parse_args()
setup_logging(args.debug)
db = Z3DB(args.db)
if args.command == "init":
print(f"Database initialized at {db.db_path}")
elif args.command == "status":
s = db.get_status()
print(f"Runs: {s['total']}"
f" | success: {s.get('success', 0)}"
f" | error: {s.get('error', 0)}"
f" | timeout: {s.get('timeout', 0)}"
f" | Last: {s['last_run'] or 'never'}")
elif args.command == "log":
for row in db.get_logs(args.run_id, args.last):
print(f"[{row['level']}] {row['timestamp']} "
f"(run {row['run_id']}): {row['message']}")
elif args.command == "runs":
for row in db.get_runs(args.skill, args.last):
print(f"#{row['run_id']} {row['skill']} {row['status']} "
f"{row['duration_ms']}ms @ {row['timestamp']}")
elif args.command == "query":
for row in db.query(args.sql):
print(dict(row))
else:
parser.print_help()
db.close()
if __name__ == "__main__":
cli()

48
.github/skills/simplify/SKILL.md vendored Normal file
View file

@ -0,0 +1,48 @@
---
name: simplify
description: Reduce formula complexity using Z3 tactic chains. Supports configurable tactic pipelines for boolean, arithmetic, and bitvector simplification.
---
Given a formula, apply a sequence of Z3 tactics to produce an equivalent but simpler form. This is useful for understanding what Z3 sees after preprocessing, debugging tactic selection, and reducing formula size before solving.
# Step 1: Choose tactics
Z3 provides dozens of tactics. Common ones:
| Tactic | What it does |
|--------|-------------|
| simplify | constant folding, algebraic identities |
| propagate-values | substitute known equalities |
| ctx-simplify | context-dependent simplification |
| elim-uncnstr | remove unconstrained variables |
| solve-eqs | Gaussian elimination |
| bit-blast | reduce bitvectors to booleans |
| tseitin-cnf | convert to CNF |
| aig | and-inverter graph reduction |
# Step 2: Run simplification
```bash
python3 scripts/simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
python3 scripts/simplify.py --file formula.smt2 --tactics "simplify,propagate-values,ctx-simplify"
python3 scripts/simplify.py --file formula.smt2 --debug
```
Without `--tactics`, the script applies the default chain: `simplify`, `propagate-values`, `ctx-simplify`.
# Step 3: Interpret the output
The script prints the simplified formula in SMT-LIB2 syntax. Subgoals are printed as separate `(assert ...)` blocks.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula to simplify |
| vars | string | no | | variable declarations as "name:sort" pairs |
| file | path | no | | path to .smt2 file |
| tactics | string | no | simplify,propagate-values,ctx-simplify | comma-separated tactic names |
| timeout | int | no | 30 | seconds |
| z3 | path | no | auto | path to z3 binary |
| debug | flag | no | off | verbose tracing |
| db | path | no | .z3-agent/z3agent.db | logging database |

View file

@ -0,0 +1,83 @@
#!/usr/bin/env python3
"""
simplify.py: apply Z3 tactics to simplify an SMT-LIB2 formula.
Usage:
python simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
python simplify.py --file formula.smt2 --tactics "simplify,solve-eqs"
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, setup_logging
DEFAULT_TACTICS = "simplify,propagate-values,ctx-simplify"
def build_tactic_formula(base_formula: str, tactics: str) -> str:
tactic_list = [t.strip() for t in tactics.split(",")]
if len(tactic_list) == 1:
tactic_expr = f"(then {tactic_list[0]} skip)"
else:
tactic_expr = "(then " + " ".join(tactic_list) + ")"
return base_formula + f"\n(apply {tactic_expr})\n"
def build_formula_from_parts(formula_str: str, vars_str: str) -> str:
lines = []
if vars_str:
for v in vars_str.split(","):
v = v.strip()
name, sort = v.split(":")
lines.append(f"(declare-const {name.strip()} {sort.strip()})")
lines.append(formula_str)
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(prog="simplify")
parser.add_argument("--formula")
parser.add_argument("--vars")
parser.add_argument("--file")
parser.add_argument("--tactics", default=DEFAULT_TACTICS)
parser.add_argument("--timeout", type=int, default=30)
parser.add_argument("--z3", default=None)
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
base = Path(args.file).read_text()
elif args.formula:
base = build_formula_from_parts(args.formula, args.vars or "")
else:
parser.error("provide --formula or --file")
return
formula = build_tactic_formula(base, args.tactics)
db = Z3DB(args.db)
run_id = db.start_run("simplify", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
status = "success" if result["exit_code"] == 0 else "error"
db.log_formula(run_id, formula, status)
db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
print(result["stdout"])
if result["stderr"] and result["exit_code"] != 0:
print(result["stderr"], file=sys.stderr)
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

50
.github/skills/solve/SKILL.md vendored Normal file
View file

@ -0,0 +1,50 @@
---
name: solve
description: Check satisfiability of SMT-LIB2 formulas using Z3. Returns sat/unsat with models or unsat cores. Logs every invocation to z3agent.db for auditability.
---
Given an SMT-LIB2 formula (or a set of constraints described in natural language), determine whether the formula is satisfiable. If sat, extract a satisfying assignment. If unsat and tracking labels are present, extract the unsat core.
# Step 1: Prepare the formula
If the input is already valid SMT-LIB2, use it directly. If it is a natural language description, use the **encode** skill first to produce SMT-LIB2.
The formula must include `(check-sat)` at the end. Append `(get-model)` for satisfiable queries or `(get-unsat-core)` when named assertions are used.
# Step 2: Run Z3
```bash
python3 scripts/solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
```
For file input:
```bash
python3 scripts/solve.py --file problem.smt2
```
With debug tracing:
```bash
python3 scripts/solve.py --file problem.smt2 --debug
```
The script pipes the formula to `z3 -in` via subprocess (no shell expansion), logs the run to `.z3-agent/z3agent.db`, and prints the result.
# Step 3: Interpret the output
- `sat` followed by a model: the formula is satisfiable; the model assigns concrete values to each declared constant.
- `unsat`: no assignment exists. If `(get-unsat-core)` was used, the conflicting named assertions are listed.
- `unknown`: Z3 could not decide within the timeout. Consider increasing the timeout or simplifying the formula.
- `timeout`: the process was killed after the deadline. Try the **simplify** skill to reduce complexity.
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| formula | string | no | | SMT-LIB2 formula as a string |
| file | path | no | | path to an .smt2 file |
| timeout | int | no | 30 | seconds before killing z3 |
| z3 | path | no | auto | explicit path to z3 binary |
| debug | flag | no | off | print z3 command, stdin, stdout, stderr, timing |
| db | path | no | .z3-agent/z3agent.db | path to the logging database |
Either `formula` or `file` must be provided.

66
.github/skills/solve/scripts/solve.py vendored Normal file
View file

@ -0,0 +1,66 @@
#!/usr/bin/env python3
"""
solve.py: check satisfiability of an SMT-LIB2 formula via Z3.
Usage:
python solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
python solve.py --file problem.smt2
python solve.py --file problem.smt2 --debug --timeout 60
"""
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, run_z3, parse_model, parse_unsat_core, setup_logging
def main():
parser = argparse.ArgumentParser(prog="solve")
parser.add_argument("--formula", help="SMT-LIB2 formula string")
parser.add_argument("--file", help="path to .smt2 file")
parser.add_argument("--timeout", type=int, default=30)
parser.add_argument("--z3", default=None, help="path to z3 binary")
parser.add_argument("--db", default=None)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
setup_logging(args.debug)
if args.file:
formula = Path(args.file).read_text()
elif args.formula:
formula = args.formula
else:
parser.error("provide --formula or --file")
return
db = Z3DB(args.db)
run_id = db.start_run("solve", formula)
result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
model = parse_model(result["stdout"]) if result["result"] == "sat" else None
core = parse_unsat_core(result["stdout"]) if result["result"] == "unsat" else None
db.log_formula(run_id, formula, result["result"],
str(model) if model else None)
db.finish_run(run_id, result["result"], result["duration_ms"],
result["exit_code"])
print(result["result"])
if model:
for name, val in model.items():
print(f" {name} = {val}")
if core:
print("unsat core:", " ".join(core))
if result["stderr"] and result["result"] == "error":
print(result["stderr"], file=sys.stderr)
db.close()
sys.exit(0 if result["exit_code"] == 0 else 1)
if __name__ == "__main__":
main()

46
.github/skills/static-analysis/SKILL.md vendored Normal file
View file

@ -0,0 +1,46 @@
---
name: static-analysis
description: Run Clang Static Analyzer (scan-build) on Z3 source and log structured findings to z3agent.db.
---
Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plist diagnostics, and record each finding with file, line, category, and description. This skill wraps scan-build into a reproducible, logged workflow suitable for regular analysis sweeps and regression tracking.
# Step 1: Run the analysis
```bash
python3 scripts/static_analysis.py --build-dir build
python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug
python3 scripts/static_analysis.py --build-dir build --timeout 1800
```
The script invokes `scan-build cmake ..` followed by `scan-build make` inside the specified build directory. Clang checker output is written to `--output-dir` (defaults to a `scan-results` subdirectory of the build directory).
# Step 2: Interpret the output
Each finding is printed with its source location, category, and description:
```
[Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read
[Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer
```
A summary table groups findings by category so that high-frequency classes are visible at a glance.
# Step 3: Review historical findings
All findings are logged to `z3agent.db`. Query them to track trends:
```bash
python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"
python3 ../../shared/z3db.py runs --skill static-analysis --last 10
```
# Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| build-dir | path | yes | | path to the CMake build directory |
| output-dir | path | no | BUILD/scan-results | directory for scan-build output |
| timeout | int | no | 1200 | seconds allowed for the full build |
| db | path | no | .z3-agent/z3agent.db | logging database |
| debug | flag | no | off | verbose tracing |

View file

@ -0,0 +1,255 @@
#!/usr/bin/env python3
"""
static_analysis.py: run Clang Static Analyzer on Z3 source.
Usage:
python static_analysis.py --build-dir build
python static_analysis.py --build-dir build --output-dir /tmp/sa-results
python static_analysis.py --build-dir build --debug
"""
import argparse
import logging
import os
import plistlib
import shutil
import subprocess
import sys
import time
from collections import Counter
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
from z3db import Z3DB, setup_logging
logger = logging.getLogger("z3agent")
SCAN_BUILD_NAMES = ["scan-build", "scan-build-14", "scan-build-15", "scan-build-16"]
def find_scan_build() -> str:
"""Locate the scan-build binary on PATH."""
for name in SCAN_BUILD_NAMES:
path = shutil.which(name)
if path:
logger.debug("found scan-build: %s", path)
return path
logger.error(
"scan-build not found. Install clang-tools or set PATH. "
"Searched: %s", ", ".join(SCAN_BUILD_NAMES)
)
sys.exit(1)
def run_configure(scan_build: str, build_dir: Path, output_dir: Path,
timeout: int) -> bool:
"""Run scan-build cmake to configure the project."""
repo_root = build_dir.parent
cmd = [
scan_build,
"-o", str(output_dir),
"cmake",
str(repo_root),
]
logger.info("configuring: %s", " ".join(cmd))
try:
proc = subprocess.run(
cmd, cwd=str(build_dir),
capture_output=True, text=True, timeout=timeout,
)
except subprocess.TimeoutExpired:
logger.error("cmake configuration timed out after %ds", timeout)
return False
if proc.returncode != 0:
logger.error("cmake configuration failed (exit %d)", proc.returncode)
logger.error("stderr: %s", proc.stderr[:2000])
return False
logger.info("configuration complete")
return True
def run_build(scan_build: str, build_dir: Path, output_dir: Path,
timeout: int) -> bool:
"""Run scan-build make to build and analyze."""
nproc = os.cpu_count() or 4
cmd = [
scan_build,
"-o", str(output_dir),
"--status-bugs",
"make",
f"-j{nproc}",
]
logger.info("building with analysis: %s", " ".join(cmd))
try:
proc = subprocess.run(
cmd, cwd=str(build_dir),
capture_output=True, text=True, timeout=timeout,
)
except subprocess.TimeoutExpired:
logger.error("build timed out after %ds", timeout)
return False
# scan-build returns nonzero when bugs are found (due to --status-bugs),
# so a nonzero exit code is not necessarily a build failure.
if proc.returncode != 0:
logger.info(
"scan-build exited with code %d (nonzero may indicate findings)",
proc.returncode,
)
else:
logger.info("build complete, no bugs reported by scan-build")
if proc.stderr:
logger.debug("build stderr (last 2000 chars): %s", proc.stderr[-2000:])
return True
def collect_plist_files(output_dir: Path) -> list:
"""Recursively find all .plist diagnostic files under the output directory."""
plists = sorted(output_dir.rglob("*.plist"))
logger.debug("found %d plist files in %s", len(plists), output_dir)
return plists
def parse_plist_findings(plist_path: Path) -> list:
"""Extract findings from a single Clang plist diagnostic file.
Returns a list of dicts with keys: file, line, col, category, type, description.
"""
findings = []
try:
with open(plist_path, "rb") as f:
data = plistlib.load(f)
except Exception as exc:
logger.warning("could not parse %s: %s", plist_path, exc)
return findings
source_files = data.get("files", [])
for diag in data.get("diagnostics", []):
location = diag.get("location", {})
file_idx = location.get("file", 0)
source_file = source_files[file_idx] if file_idx < len(source_files) else "<unknown>"
findings.append({
"file": source_file,
"line": location.get("line", 0),
"col": location.get("col", 0),
"category": diag.get("category", "uncategorized"),
"type": diag.get("type", ""),
"description": diag.get("description", ""),
})
return findings
def collect_all_findings(output_dir: Path) -> list:
"""Parse every plist file under output_dir and return merged findings."""
all_findings = []
for plist_path in collect_plist_files(output_dir):
all_findings.extend(parse_plist_findings(plist_path))
return all_findings
def log_findings(db, run_id: int, findings: list):
"""Persist each finding to z3agent.db."""
for f in findings:
db.log_finding(
run_id,
category=f["category"],
message=f["description"],
severity=f.get("type"),
file=f["file"],
line=f["line"],
details={"col": f["col"], "type": f["type"]},
)
def print_findings(findings: list):
"""Print individual findings and a category summary."""
if not findings:
print("No findings reported.")
return
for f in findings:
label = f["category"]
if f["type"]:
label = f["type"]
print(f"[{label}] {f['file']}:{f['line']}: {f['description']}")
print()
counts = Counter(f["category"] for f in findings)
print(f"Total findings: {len(findings)}")
print("By category:")
for cat, cnt in counts.most_common():
print(f" {cat}: {cnt}")
def main():
parser = argparse.ArgumentParser(
prog="static_analysis",
description="Run Clang Static Analyzer on Z3 and log findings.",
)
parser.add_argument(
"--build-dir", required=True,
help="path to the CMake build directory",
)
parser.add_argument(
"--output-dir", default=None,
help="directory for scan-build results (default: BUILD/scan-results)",
)
parser.add_argument(
"--timeout", type=int, default=1200,
help="seconds allowed for the full analysis build",
)
parser.add_argument("--db", default=None, help="path to z3agent.db")
parser.add_argument("--debug", action="store_true", help="verbose tracing")
args = parser.parse_args()
setup_logging(args.debug)
scan_build = find_scan_build()
build_dir = Path(args.build_dir).resolve()
build_dir.mkdir(parents=True, exist_ok=True)
output_dir = Path(args.output_dir) if args.output_dir else build_dir / "scan-results"
output_dir = output_dir.resolve()
output_dir.mkdir(parents=True, exist_ok=True)
db = Z3DB(args.db)
run_id = db.start_run("static-analysis", f"build_dir={build_dir}")
start = time.monotonic()
if not run_configure(scan_build, build_dir, output_dir, timeout=args.timeout):
elapsed = int((time.monotonic() - start) * 1000)
db.finish_run(run_id, "error", elapsed, exit_code=1)
db.close()
sys.exit(1)
if not run_build(scan_build, build_dir, output_dir, timeout=args.timeout):
elapsed = int((time.monotonic() - start) * 1000)
db.finish_run(run_id, "error", elapsed, exit_code=1)
db.close()
sys.exit(1)
elapsed = int((time.monotonic() - start) * 1000)
findings = collect_all_findings(output_dir)
log_findings(db, run_id, findings)
status = "clean" if len(findings) == 0 else "findings"
db.finish_run(run_id, status, elapsed, exit_code=0)
db.log(
f"static analysis complete: {len(findings)} finding(s) in {elapsed}ms",
run_id=run_id,
)
print_findings(findings)
db.close()
sys.exit(0)
if __name__ == "__main__":
main()