diff --git a/.github/agents/z3-solver.md b/.github/agents/z3-solver.md new file mode 100644 index 000000000..d1a97be80 --- /dev/null +++ b/.github/agents/z3-solver.md @@ -0,0 +1,129 @@ +--- +name: z3-solver +description: 'Z3 theorem prover assistant: satisfiability checking, validity proofs, optimization, simplification, encoding, and performance analysis.' +--- + +## Instructions + +You are the Z3 Solver Agent, a Copilot agent for SMT solving workflows using the Z3 theorem prover. You help users formulate, solve, optimize, and interpret constraint satisfaction problems. Follow the workflow below. Use subagents for long-running skill invocations such as benchmarking. + +### Workflow + +1. **Understand the Request**: Determine what the user needs: a satisfiability check, a validity proof, an optimization, a simplification, an encoding from natural language, an explanation of output, or a performance analysis. + +2. **Encode (if needed)**: If the user provides a problem in natural language, pseudocode, or a domain-specific formulation, translate it into SMT-LIB2 using the **encode** skill before proceeding. + +3. **Solve or Transform**: Route to the appropriate skill based on the request type. Multiple skills may be chained when the task requires it (for example, encoding followed by optimization followed by explanation). + +4. **Explain Results**: After solving, invoke **explain** to present the result in clear, human-readable language. Always interpret models, proofs, and optimization results for the user. + +5. **Iterate**: On follow-up queries, refine the formulation or re-invoke skills with adjusted parameters. Do not re-run the full pipeline when only a narrow adjustment is needed. + +### Available Skills + +| # | Skill | Purpose | +|---|-------|---------| +| 1 | solve | Check satisfiability of a formula. Extract models when satisfiable. Report unsatisfiable cores when unsat. | +| 2 | prove | Establish validity of a formula by checking the negation for unsatisfiability. If the negation is unsat, the original is valid. | +| 3 | optimize | Solve constrained optimization problems. Supports minimize and maximize objectives, lexicographic and Pareto modes. | +| 4 | simplify | Apply Z3 tactics to reduce formula complexity. Useful for preprocessing, normal form conversion, and human-readable reformulation. | +| 5 | encode | Translate a problem description into SMT-LIB2 syntax. Handles sort selection, quantifier introduction, and theory annotation. | +| 6 | explain | Interpret Z3 output (models, unsat cores, proofs, optimization results, statistics) and present it in plain language. | +| 7 | benchmark | Measure solving performance. Collect statistics, compare tactic configurations, identify bottlenecks, and suggest parameter tuning. | + +### Skill Dependencies + +The planner respects these edges: + +``` +encode --> solve +encode --> prove +encode --> optimize +encode --> simplify +solve --> explain +prove --> explain +optimize --> explain +simplify --> explain +benchmark --> explain +solve --> benchmark +optimize --> benchmark +``` + +Skills on the left must complete before skills on the right when both appear in a pipeline. Independent skills (for example, solve and optimize on separate formulas) may run in parallel. + +### Skill Selection + +Given a user request, select skills as follows: + +- "Is this formula satisfiable?" : `solve` +- "Find a model for these constraints" : `solve` then `explain` +- "Prove that P implies Q" : `encode` (if needed) then `prove` then `explain` +- "Prove this is always true" : `prove` then `explain` +- "Optimize this scheduling problem" : `encode` then `optimize` then `explain` +- "Minimize cost subject to constraints" : `optimize` then `explain` +- "Simplify this expression" : `simplify` then `explain` +- "Convert to CNF" : `simplify` +- "Translate this problem to SMT-LIB2" : `encode` +- "Why is Z3 returning unknown?" : `explain` +- "Why is this query slow?" : `benchmark` then `explain` +- "Compare these two tactic pipelines" : `benchmark` then `explain` +- "What does this model mean?" : `explain` +- "Get the unsat core" : `solve` then `explain` + +When the request is ambiguous, prefer the most informative pipeline. For example, "check this formula" should invoke `solve` followed by `explain`, not `solve` alone. + +### Examples + +User: "Is (x > 0 and y > 0 and x + y < 1) satisfiable over the reals?" + +1. **solve**: Assert the conjunction over real-valued variables. Run `(check-sat)`. +2. **explain**: If sat, present the model. If unsat, state that no assignment satisfies all three constraints simultaneously. + +User: "Prove that for all integers x, if x^2 is even then x is even." + +1. **encode**: Formalize the statement. Negate it: assert there exists an integer x such that x^2 is even and x is odd. +2. **prove**: Check the negation for unsatisfiability. +3. **explain**: If unsat, the original statement is valid. Present the reasoning. If sat (counterexample found), report the model and explain why the conjecture fails. + +User: "Schedule five tasks on two machines to minimize makespan." + +1. **encode**: Define integer variables for task assignments and start times. Encode machine capacity, precedence, and non-overlap constraints. +2. **optimize**: Minimize the makespan variable subject to the encoded constraints. +3. **explain**: Present the optimal schedule, makespan value, and any binding constraints. + +User: "Why is my bitvector query so slow?" + +1. **benchmark**: Run the query with `(set-option :timeout 30000)` and collect statistics via `(get-info :all-statistics)`. +2. **explain**: Identify dominant cost centers (conflict count, propagation ratio, theory combination overhead). Suggest tactic or parameter adjustments such as `:blast_full` for bitvectors or increasing the relevancy threshold. + +### Error Handling + +Z3 may return results other than `sat` or `unsat`. Handle each case as follows: + +**unknown**: Z3 could not determine satisfiability within the given resource limits. +- Check if a timeout was active. If so, suggest increasing it. +- Inspect the reason with `(get-info :reason-unknown)`. +- If the reason is "incomplete," the formula may use a theory fragment that Z3 cannot decide. Suggest alternative encodings (for example, replacing nonlinear arithmetic with linearization or bit-blasting). +- If the reason is "timeout" or "max-conflicts," suggest parameter tuning: increase `:timeout`, adjust `:smt.relevancy`, or try a different tactic pipeline. + +**error (syntax or sort mismatch)**: The input is malformed. +- Report the exact error message from Z3. +- Identify the offending declaration or assertion. +- Suggest a corrected encoding. + +**error (resource exhaustion)**: Z3 ran out of memory or hit an internal limit. +- Suggest simplifying the problem: reduce variable count, eliminate quantifiers where possible, split into subproblems. +- Suggest incremental solving with `(push)` / `(pop)` to reuse learned information. + +**unsat with no core requested**: The formula is unsatisfiable but the user may want to understand why. +- Offer to re-run with `(set-option :produce-unsat-cores true)` and named assertions to extract a minimal explanation. + +### Notes + +- Always validate SMT-LIB2 syntax before invoking Z3. A malformed input wastes time and produces confusing errors. +- Prefer incremental mode (`(push)` / `(pop)`) when the user is iterating on a formula. +- Use `(set-option :produce-models true)` by default for satisfiability queries. +- Use `(set-option :produce-proofs true)` when the user requests validity proofs. +- Collect statistics with `z3 -st` when performance is relevant. +- Present models in a readable table format, not raw S-expressions, unless the user requests SMT-LIB2 output. +- Never fabricate results. If a skill fails or Z3 produces an unexpected answer, report the raw output and explain what went wrong. diff --git a/.github/agents/z3-verifier.md b/.github/agents/z3-verifier.md new file mode 100644 index 000000000..246ce1b5a --- /dev/null +++ b/.github/agents/z3-verifier.md @@ -0,0 +1,131 @@ +--- +name: z3-verifier +description: 'Z3 code quality agent: memory safety checking, static analysis, and stress testing for the Z3 codebase itself.' +--- + +## Instructions + +You are the Z3 Verifier Agent, a Copilot agent for code quality and correctness verification of the Z3 theorem prover codebase. You do not solve SMT problems (use **z3-solver** for that). Instead, you detect bugs, enforce code quality, and stress-test Z3 internals. Follow the workflow below. Use subagents for long-running skill invocations such as fuzzing campaigns. + +### Workflow + +1. **Identify the Verification Goal**: Determine what the user needs: memory bug detection, static analysis findings, or stress testing results. If the request is broad ("verify this code" or "full verification pass"), run all three skills. + +2. **Build the Target**: Ensure a Z3 build exists with the required instrumentation (sanitizers, debug symbols, coverage). If not, build one before proceeding. + +3. **Run Verification Skills**: Invoke the appropriate skill(s). When running a full verification pass, execute all three skills and aggregate results. + +4. **Report Findings**: Present results sorted by severity. Each finding should include: location (file, function, line), category, severity, and reproduction steps where applicable. + +5. **Iterate**: On follow-ups, narrow scope to specific files, functions, or bug categories. Do not re-run the full pipeline unnecessarily. + +### Available Skills + +| # | Skill | Purpose | +|---|-------|---------| +| 1 | memory-safety | Build Z3 with AddressSanitizer (ASan), MemorySanitizer (MSan), or UndefinedBehaviorSanitizer (UBSan). Run the test suite under instrumentation to detect memory corruption, use-after-free, buffer overflows, uninitialized reads, and undefined behavior. | +| 2 | static-analysis | Run the Clang Static Analyzer over the Z3 source tree. Detects null pointer dereferences, resource leaks, dead stores, logic errors, and API misuse without executing the code. | +| 3 | deeptest | Stress-test Z3 with randomized inputs, differential testing against known-good solvers, and targeted fuzzing of parser and solver components. Detects crashes, assertion failures, and correctness regressions. | + +### Skill Dependencies + +``` +memory-safety (independent) +static-analysis (independent) +deeptest (independent) +``` + +All three skills are independent and may run in parallel. None requires the output of another as input. When running a full verification pass, launch all three simultaneously via subagents. + +### Skill Selection + +Given a user request, select skills as follows: + +- "Check for memory bugs" : `memory-safety` +- "Run ASan on the test suite" : `memory-safety` +- "Find undefined behavior" : `memory-safety` (with UBSan configuration) +- "Run static analysis" : `static-analysis` +- "Find null pointer bugs" : `static-analysis` +- "Check for resource leaks" : `static-analysis` +- "Fuzz Z3" : `deeptest` +- "Stress test the parser" : `deeptest` +- "Run differential testing" : `deeptest` +- "Full verification pass" : `memory-safety` + `static-analysis` + `deeptest` +- "Verify this pull request" : `memory-safety` + `static-analysis` (scope to changed files) +- "Is this change safe?" : `memory-safety` + `static-analysis` (scope to changed files) + +### Examples + +User: "Check for memory bugs in the SAT solver." + +1. **memory-safety**: Build Z3 with ASan enabled (`cmake -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" ..`). Run the SAT solver tests. Collect any sanitizer reports. +2. Report findings with stack traces, categorized by bug type (heap-buffer-overflow, use-after-free, stack-buffer-overflow, etc.). + +User: "Run static analysis on src/ast/." + +1. **static-analysis**: Invoke `scan-build` or `clang-tidy` over `src/ast/` with Z3's compile commands database. +2. Report findings sorted by severity. Include checker name, file, line, and a brief description of each issue. + +User: "Fuzz the SMT-LIB2 parser." + +1. **deeptest**: Generate randomized SMT-LIB2 inputs targeting the parser. Run Z3 on each input with a timeout. Collect crashes, assertion failures, and unexpected error messages. +2. Report crash-inducing inputs with minimized reproduction cases. Classify findings as crashes, assertion violations, or incorrect results. + +User: "Full verification pass before the release." + +1. Launch all three skills in parallel via subagents: + - **memory-safety**: Full test suite under ASan and UBSan. + - **static-analysis**: Full source tree scan. + - **deeptest**: Broad fuzzing campaign across theories (arithmetic, bitvectors, arrays, strings). +2. Aggregate all findings. Deduplicate issues that appear in multiple skills (for example, a null dereference found by both static analysis and ASan). Sort by severity: Critical, High, Medium, Low. +3. Present a summary table followed by detailed findings. + +### Build Configurations + +Each skill may require a specific build configuration: + +**memory-safety (ASan)**: +```bash +mkdir build-asan && cd build-asan +cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_BUILD_TYPE=Debug +make -j$(nproc) +``` + +**memory-safety (UBSan)**: +```bash +mkdir build-ubsan && cd build-ubsan +cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=undefined" -DCMAKE_C_FLAGS="-fsanitize=undefined" -DCMAKE_BUILD_TYPE=Debug +make -j$(nproc) +``` + +**static-analysis**: +```bash +mkdir build-analyze && cd build-analyze +scan-build cmake .. -DCMAKE_BUILD_TYPE=Debug +scan-build make -j$(nproc) +``` + +**deeptest**: Uses a standard Release build for performance, with Debug builds reserved for reproducing crashes: +```bash +mkdir build-fuzz && cd build-fuzz +cmake .. -DCMAKE_BUILD_TYPE=Release +make -j$(nproc) +``` + +### Error Handling + +**Build failure**: If the instrumented build fails, report the compiler errors. Common causes: sanitizer flags incompatible with certain optimization levels, or missing sanitizer runtime libraries. + +**Flaky sanitizer reports**: Some sanitizer findings may be nondeterministic (especially under MSan with uninitialized memory). Re-run flagged tests three times to confirm reproducibility. Mark non-reproducible findings as "intermittent" rather than discarding them. + +**Fuzzing timeouts**: Individual fuzz inputs that cause Z3 to exceed the timeout threshold should be collected separately and reported as potential performance regressions, not crashes. + +**False positives in static analysis**: The Clang Static Analyzer may produce false positives, particularly around custom allocators and reference-counted smart pointers used in Z3. Flag likely false positives but do not suppress them without user confirmation. + +### Notes + +- Sanitizer builds are significantly slower than Release builds. Set timeouts to at least 3x the normal test suite duration. +- Store sanitizer reports and fuzzing artifacts in `.z3-verifier/` unless the user specifies otherwise. +- When scoping to changed files for pull request verification, use `git diff` to determine the affected source files and limit skill invocations accordingly. +- Never suppress or ignore sanitizer findings automatically. Every report should be presented to the user for triage. +- Prefer ASan as the default sanitizer. It catches the broadest class of memory errors with the lowest false-positive rate. diff --git a/.github/skills/README.md b/.github/skills/README.md new file mode 100644 index 000000000..53fc9f80d --- /dev/null +++ b/.github/skills/README.md @@ -0,0 +1,74 @@ +# Z3 Agent Skills + +Reusable, composable verification primitives for the Z3 theorem prover. +Each skill is a self-contained unit: a `SKILL.md` prompt that guides the +LLM agent, backed by a Python validation script in `scripts/`. + +## Skill Catalogue + +| Skill | Status | Description | +|-------|--------|-------------| +| solve | implemented | Check satisfiability of SMT-LIB2 formulas; return models or unsat cores | +| prove | implemented | Prove validity by negation and satisfiability checking | +| encode | implemented | Translate constraint problems into SMT-LIB2 or Z3 Python API code | +| simplify | implemented | Reduce formula complexity using configurable Z3 tactic chains | +| optimize | implemented | Solve constrained optimization (minimize/maximize) over numeric domains | +| explain | implemented | Parse and interpret Z3 output: models, cores, statistics, errors | +| benchmark | implemented | Measure Z3 performance and collect solver statistics | +| static-analysis | planned | Run Clang Static Analyzer on Z3 source and log structured findings | +| deeptest | planned | Deep property-based testing of Z3 internals | +| memory-safety | planned | Memory safety verification of Z3 C++ source | + +## Agents + +Two orchestration agents compose these skills into end-to-end workflows: + +| Agent | Role | +|-------|------| +| z3-solver | Formulation and solving: encode, solve, prove, simplify, optimize, explain | +| z3-verifier | Codebase quality: benchmark, static-analysis, deeptest, memory-safety | + +## Shared Infrastructure + +All scripts share a common library at `shared/z3db.py` with: + +* `Z3DB`: SQLite wrapper for tracking runs, formulas, findings, and interaction logs. +* `run_z3()`: Pipe SMT-LIB2 into `z3 -in` with timeout handling. +* `find_z3()`: Locate the Z3 binary across build directories and PATH. +* Parsers: `parse_model()`, `parse_stats()`, `parse_unsat_core()`. + +The database schema lives in `shared/schema.sql`. + +## Relationship to a3/ Workflows + +The `a3/` directory at the repository root contains two existing agentic workflow +prompts that predate the skill architecture: + +* `a3/a3-python.md`: A3 Python Code Analysis agent (uses the a3-python pip tool + to scan Python source, classifies findings, creates GitHub issues). +* `a3/a3-rust.md`: A3 Rust Verifier Output Analyzer (downloads a3-rust build + artifacts, parses bug reports, creates GitHub discussions). + +These workflows are complementary to the skills defined here, not replaced by +them. The a3 prompts focus on external analysis tooling and GitHub integration, +while skills focus on Z3 solver operations and their validation. Both may be +composed by the same orchestrating agent. + +## Usage + +Check database status and recent runs: + +``` +python shared/z3db.py status +python shared/z3db.py runs --skill solve --last 5 +python shared/z3db.py log --run-id 12 +python shared/z3db.py query "SELECT skill, COUNT(*) FROM runs GROUP BY skill" +``` + +Run an individual skill script directly: + +``` +python solve/scripts/solve.py --file problem.smt2 +python encode/scripts/encode.py --validate formula.smt2 +python benchmark/scripts/benchmark.py --file problem.smt2 +``` diff --git a/.github/skills/benchmark/SKILL.md b/.github/skills/benchmark/SKILL.md new file mode 100644 index 000000000..cffacde54 --- /dev/null +++ b/.github/skills/benchmark/SKILL.md @@ -0,0 +1,48 @@ +--- +name: benchmark +description: Measure Z3 performance on a formula or file. Collects wall-clock time, theory solver statistics, memory usage, and conflict counts. Results are logged to z3agent.db for longitudinal tracking. +--- + +Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report performance characteristics. This is useful for identifying performance regressions, comparing tactic strategies, and profiling theory solver workload distribution. + +# Step 1: Run Z3 with statistics + +```bash +python3 scripts/benchmark.py --file problem.smt2 +python3 scripts/benchmark.py --file problem.smt2 --runs 5 +python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug +``` + +The script invokes `z3 -st` and parses the `:key value` statistics block. + +# Step 2: Interpret the output + +The output includes: + +- wall-clock time (ms) +- result (sat/unsat/unknown/timeout) +- memory usage (MB) +- conflicts, decisions, propagations +- per-theory breakdown (arithmetic, bv, array, etc.) + +With `--runs N`, the script runs Z3 N times and reports min/median/max timing. + +# Step 3: Compare over time + +Past benchmark runs are logged to `z3agent.db`. Query them: +```bash +python3 ../../shared/z3db.py runs --skill benchmark --last 20 +python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5" +``` + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| formula | string | no | | SMT-LIB2 formula | +| file | path | no | | path to .smt2 file | +| runs | int | no | 1 | number of repeated runs for timing | +| timeout | int | no | 60 | seconds per run | +| z3 | path | no | auto | path to z3 binary | +| debug | flag | no | off | verbose tracing | +| db | path | no | .z3-agent/z3agent.db | logging database | diff --git a/.github/skills/benchmark/scripts/benchmark.py b/.github/skills/benchmark/scripts/benchmark.py new file mode 100644 index 000000000..1e23abe1f --- /dev/null +++ b/.github/skills/benchmark/scripts/benchmark.py @@ -0,0 +1,74 @@ +#!/usr/bin/env python3 +""" +benchmark.py: measure Z3 performance with statistics. + +Usage: + python benchmark.py --file problem.smt2 + python benchmark.py --file problem.smt2 --runs 5 +""" + +import argparse +import statistics +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, run_z3, parse_stats, setup_logging + + +def main(): + parser = argparse.ArgumentParser(prog="benchmark") + parser.add_argument("--formula") + parser.add_argument("--file") + parser.add_argument("--runs", type=int, default=1) + parser.add_argument("--timeout", type=int, default=60) + parser.add_argument("--z3", default=None) + parser.add_argument("--db", default=None) + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + if args.file: + formula = Path(args.file).read_text() + elif args.formula: + formula = args.formula + else: + parser.error("provide --formula or --file") + return + + db = Z3DB(args.db) + timings = [] + + for i in range(args.runs): + run_id = db.start_run("benchmark", formula) + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, + args=["-st"], debug=args.debug) + + stats = parse_stats(result["stdout"]) + db.log_formula(run_id, formula, result["result"], stats=stats) + db.finish_run(run_id, result["result"], result["duration_ms"], + result["exit_code"]) + timings.append(result["duration_ms"]) + + if args.runs == 1: + print(f"result: {result['result']}") + print(f"time: {result['duration_ms']}ms") + if stats: + print("statistics:") + for k, v in sorted(stats.items()): + print(f" :{k} {v}") + + if args.runs > 1: + print(f"runs: {args.runs}") + print(f"min: {min(timings)}ms") + print(f"median: {statistics.median(timings):.0f}ms") + print(f"max: {max(timings)}ms") + print(f"result: {result['result']}") + + db.close() + sys.exit(0 if result["exit_code"] == 0 else 1) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/deeptest/SKILL.md b/.github/skills/deeptest/SKILL.md new file mode 100644 index 000000000..ead3f5b84 --- /dev/null +++ b/.github/skills/deeptest/SKILL.md @@ -0,0 +1,70 @@ +--- +name: deeptest +description: Generate stress tests and differential tests for Z3 theories. Creates random or structured SMT-LIB2 formulas, runs them through Z3, and checks for crashes, assertion failures, or result inconsistencies. Inspired by fuzzing and metamorphic testing approaches applied to SMT solvers. +--- + +Given a strategy and count, generate SMT-LIB2 formulas targeting Z3 internals and report anomalies. Strategies range from pure random generation to structured metamorphic and cross-theory combinations. Every formula and finding is logged to z3agent.db. + +# Step 1: Choose a strategy and run + +```bash +python3 scripts/deeptest.py --strategy random --count 100 --seed 42 +python3 scripts/deeptest.py --strategy metamorphic --seed-file base.smt2 --count 50 +python3 scripts/deeptest.py --strategy cross-theory --theories "LIA,BV" --count 80 +python3 scripts/deeptest.py --strategy incremental --count 60 --debug +``` + +Available strategies: + +- `random`: generate formulas with random declarations (Int, Bool, BitVec), random arithmetic and boolean assertions, and check-sat. +- `metamorphic`: start from a base formula (generated or loaded from file), apply equisatisfiable transformations (tautology insertion, double negation, assertion duplication), and verify the result stays consistent. +- `cross-theory`: combine multiple theories (LIA, Bool, BV) in a single formula with bridging constraints to stress theory combination. +- `incremental`: generate push/pop sequences with per-frame assertions to stress incremental solving. + +# Step 2: Interpret the output + +The script prints a summary after completion: + +``` +strategy: random +seed: 42 +formulas: 100 +anomalies: 2 +elapsed: 4500ms +``` + +A nonzero anomaly count means the run detected crashes (nonzero exit code), assertion failures in stderr, solver errors, or result disagreements between a base formula and its metamorphic variants. + +# Step 3: Inspect findings + +Findings are logged to `z3agent.db` with category, severity, and details: + +```bash +python3 ../../shared/z3db.py query "SELECT category, severity, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='deeptest') ORDER BY finding_id DESC LIMIT 20" +``` + +Each finding includes the formula index, exit code, and a stderr excerpt for triage. + +# Step 4: Reproduce + +Use the `--seed` parameter to reproduce a run exactly: + +```bash +python3 scripts/deeptest.py --strategy random --count 100 --seed 42 +``` + +The seed is printed in every run summary and logged in the run record. + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| strategy | string | no | random | test generation strategy: random, metamorphic, cross-theory, incremental | +| count | int | no | 50 | number of formulas to generate | +| seed | int | no | clock | random seed for reproducibility | +| seed-file | path | no | | base .smt2 file for metamorphic strategy | +| theories | string | no | LIA,BV | comma-separated theories for cross-theory strategy | +| timeout | int | no | 10 | per-formula Z3 timeout in seconds | +| z3 | path | no | auto | path to z3 binary | +| debug | flag | no | off | verbose tracing | +| db | path | no | .z3-agent/z3agent.db | logging database | diff --git a/.github/skills/deeptest/scripts/deeptest.py b/.github/skills/deeptest/scripts/deeptest.py new file mode 100644 index 000000000..5d513a6bd --- /dev/null +++ b/.github/skills/deeptest/scripts/deeptest.py @@ -0,0 +1,393 @@ +#!/usr/bin/env python3 +""" +deeptest.py: generate and run stress tests for Z3. + +Usage: + python deeptest.py --strategy random --count 100 + python deeptest.py --strategy metamorphic --seed-file base.smt2 + python deeptest.py --strategy cross-theory --theories "LIA,BV" --debug +""" + +import argparse +import logging +import random +import sys +import time +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, run_z3, setup_logging + +log = logging.getLogger("deeptest") + +# Sort and operator tables + +THEORY_SORTS = { + "LIA": "Int", + "Bool": "Bool", + "BV": "(_ BitVec 32)", +} + +INT_ARITH = ["+", "-", "*"] +INT_CMP = [">", "<", ">=", "<=", "="] +BV_ARITH = ["bvadd", "bvsub", "bvand", "bvor", "bvxor"] +BV_CMP = ["bvslt", "bvsgt", "bvsle", "bvsge", "="] + +# Assertion generators (one per sort) + + +def _int_assertion(rng, vs): + if len(vs) < 2: + return f"(assert ({rng.choice(INT_CMP)} {vs[0]} {rng.randint(-10, 10)}))" + a, b = rng.sample(vs, 2) + return f"(assert ({rng.choice(INT_CMP)} ({rng.choice(INT_ARITH)} {a} {b}) {rng.randint(-10, 10)}))" + + +def _bool_assertion(rng, vs): + if len(vs) == 1: + return f"(assert {vs[0]})" if rng.random() < 0.5 else f"(assert (not {vs[0]}))" + a, b = rng.sample(vs, 2) + return f"(assert ({rng.choice(['and', 'or', '=>'])} {a} {b}))" + + +def _bv_assertion(rng, vs): + lit = f"(_ bv{rng.randint(0, 255)} 32)" + if len(vs) < 2: + return f"(assert ({rng.choice(BV_CMP)} {vs[0]} {lit}))" + a, b = rng.sample(vs, 2) + return f"(assert ({rng.choice(BV_CMP)} ({rng.choice(BV_ARITH)} {a} {b}) {lit}))" + + +SORT_ASSERTION = { + "Int": _int_assertion, + "Bool": _bool_assertion, + "(_ BitVec 32)": _bv_assertion, +} + + +def _random_assertion(rng, vars_by_sort): + """Pick a populated sort and emit one random assertion.""" + available = [s for s in vars_by_sort if vars_by_sort[s]] + if not available: + return None + sort = rng.choice(available) + return SORT_ASSERTION[sort](rng, vars_by_sort[sort]) + +# Formula generators + + +def gen_random_formula(rng, num_vars=5, num_assertions=5): + """Random declarations, random assertions, single check-sat.""" + lines = [] + vars_by_sort = {} + sorts = list(THEORY_SORTS.values()) + + for i in range(num_vars): + sort = rng.choice(sorts) + name = f"v{i}" + lines.append(f"(declare-const {name} {sort})") + vars_by_sort.setdefault(sort, []).append(name) + + for _ in range(num_assertions): + a = _random_assertion(rng, vars_by_sort) + if a: + lines.append(a) + + lines.append("(check-sat)") + return "\n".join(lines) + + +def gen_metamorphic_variant(rng, base_formula): + """Apply an equisatisfiable transformation to a formula. + + Transformations: + tautology : insert (assert true) before check-sat + double_neg : wrap one assertion body in double negation + duplicate : repeat an existing assertion + """ + lines = base_formula.strip().split("\n") + transform = rng.choice(["tautology", "double_neg", "duplicate"]) + assertion_idxs = [i for i, l in enumerate(lines) + if l.strip().startswith("(assert")] + + if transform == "tautology": + pos = next((i for i, l in enumerate(lines) if "check-sat" in l), + len(lines)) + lines.insert(pos, "(assert true)") + + elif transform == "double_neg" and assertion_idxs: + idx = rng.choice(assertion_idxs) + body = lines[idx].strip() + inner = body[len("(assert "):-1] + lines[idx] = f"(assert (not (not {inner})))" + + elif transform == "duplicate" and assertion_idxs: + idx = rng.choice(assertion_idxs) + lines.insert(idx + 1, lines[idx]) + + return "\n".join(lines) + + +def gen_cross_theory_formula(rng, theories, num_vars=4, num_assertions=6): + """Combine variables from multiple theories with bridging constraints.""" + lines = [] + vars_by_sort = {} + sorts = [THEORY_SORTS[t] for t in theories if t in THEORY_SORTS] + if not sorts: + sorts = list(THEORY_SORTS.values()) + + for i in range(num_vars): + sort = sorts[i % len(sorts)] + name = f"v{i}" + lines.append(f"(declare-const {name} {sort})") + vars_by_sort.setdefault(sort, []).append(name) + + for _ in range(num_assertions): + a = _random_assertion(rng, vars_by_sort) + if a: + lines.append(a) + + # Bridge Int and Bool when both present + int_vs = vars_by_sort.get("Int", []) + bool_vs = vars_by_sort.get("Bool", []) + if int_vs and bool_vs: + iv = rng.choice(int_vs) + bv = rng.choice(bool_vs) + lines.append(f"(assert (= {bv} (> {iv} 0)))") + + lines.append("(check-sat)") + return "\n".join(lines) + + +def gen_incremental_formula(rng, num_frames=3, num_vars=4, + asserts_per_frame=3): + """Push/pop sequence: all variables declared globally, assertions scoped.""" + lines = [] + vars_by_sort = {} + sorts = list(THEORY_SORTS.values()) + + for i in range(num_vars): + sort = rng.choice(sorts) + name = f"v{i}" + lines.append(f"(declare-const {name} {sort})") + vars_by_sort.setdefault(sort, []).append(name) + + for _ in range(num_frames): + lines.append("(push 1)") + for _ in range(asserts_per_frame): + a = _random_assertion(rng, vars_by_sort) + if a: + lines.append(a) + lines.append("(check-sat)") + lines.append("(pop 1)") + + lines.append("(check-sat)") + return "\n".join(lines) + +# Anomaly detection + + +def classify_result(result): + """Return an anomaly category string or None if the result looks normal.""" + if result["exit_code"] != 0 and result["result"] != "timeout": + return "crash" + if "assertion" in result["stderr"].lower(): + return "assertion_failure" + if result["result"] == "error": + return "error" + return None + +# Strategy runners + + +def run_random(args, rng, db, run_id): + anomalies = 0 + for i in range(args.count): + formula = gen_random_formula(rng, rng.randint(2, 8), + rng.randint(1, 10)) + log.debug("formula %d:\n%s", i, formula) + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, + debug=args.debug) + db.log_formula(run_id, formula, result["result"]) + + cat = classify_result(result) + if cat: + anomalies += 1 + db.log_finding( + run_id, cat, + f"random formula #{i}: {cat} (exit={result['exit_code']})", + severity="high" if cat == "crash" else "medium", + details={"formula_index": i, + "exit_code": result["exit_code"], + "stderr": result["stderr"][:500]}) + log.warning("anomaly in formula %d: %s", i, cat) + return anomalies + + +def run_metamorphic(args, rng, db, run_id): + if args.seed_file: + base = Path(args.seed_file).read_text() + else: + base = gen_random_formula(rng, num_vars=4, num_assertions=3) + + base_out = run_z3(base, z3_bin=args.z3, timeout=args.timeout, + debug=args.debug) + base_status = base_out["result"] + db.log_formula(run_id, base, base_status) + log.info("base formula result: %s", base_status) + + if base_status not in ("sat", "unsat"): + db.log_finding(run_id, "skip", + f"base formula not definite: {base_status}", + severity="info") + return 0 + + anomalies = 0 + for i in range(args.count): + variant = gen_metamorphic_variant(rng, base) + log.debug("variant %d:\n%s", i, variant) + result = run_z3(variant, z3_bin=args.z3, timeout=args.timeout, + debug=args.debug) + db.log_formula(run_id, variant, result["result"]) + + cat = classify_result(result) + if cat: + anomalies += 1 + db.log_finding( + run_id, cat, + f"metamorphic variant #{i}: {cat}", + severity="high", + details={"variant_index": i, + "stderr": result["stderr"][:500]}) + log.warning("anomaly in variant %d: %s", i, cat) + continue + + if result["result"] in ("sat", "unsat") \ + and result["result"] != base_status: + anomalies += 1 + db.log_finding( + run_id, "disagreement", + f"variant #{i}: expected {base_status}, " + f"got {result['result']}", + severity="critical", + details={"variant_index": i, + "expected": base_status, + "actual": result["result"]}) + log.warning("disagreement in variant %d: expected %s, got %s", + i, base_status, result["result"]) + return anomalies + + +def run_cross_theory(args, rng, db, run_id): + theories = [t.strip() for t in args.theories.split(",")] + anomalies = 0 + for i in range(args.count): + formula = gen_cross_theory_formula(rng, theories, + rng.randint(3, 8), + rng.randint(2, 10)) + log.debug("cross-theory formula %d:\n%s", i, formula) + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, + debug=args.debug) + db.log_formula(run_id, formula, result["result"]) + + cat = classify_result(result) + if cat: + anomalies += 1 + db.log_finding( + run_id, cat, + f"cross-theory #{i} ({','.join(theories)}): {cat}", + severity="high" if cat == "crash" else "medium", + details={"formula_index": i, "theories": theories, + "exit_code": result["exit_code"], + "stderr": result["stderr"][:500]}) + log.warning("anomaly in cross-theory formula %d: %s", i, cat) + return anomalies + + +def run_incremental(args, rng, db, run_id): + anomalies = 0 + for i in range(args.count): + num_frames = rng.randint(2, 6) + formula = gen_incremental_formula(rng, num_frames) + log.debug("incremental formula %d:\n%s", i, formula) + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, + debug=args.debug) + db.log_formula(run_id, formula, result["result"]) + + cat = classify_result(result) + if cat: + anomalies += 1 + db.log_finding( + run_id, cat, + f"incremental #{i} ({num_frames} frames): {cat}", + severity="high" if cat == "crash" else "medium", + details={"formula_index": i, "num_frames": num_frames, + "exit_code": result["exit_code"], + "stderr": result["stderr"][:500]}) + log.warning("anomaly in incremental formula %d: %s", i, cat) + return anomalies + + +STRATEGIES = { + "random": run_random, + "metamorphic": run_metamorphic, + "cross-theory": run_cross_theory, + "incremental": run_incremental, +} + +# Entry point + + +def main(): + parser = argparse.ArgumentParser( + prog="deeptest", + description="Generate and run stress tests for Z3.", + ) + parser.add_argument("--strategy", choices=list(STRATEGIES), + default="random", + help="test generation strategy") + parser.add_argument("--count", type=int, default=50, + help="number of formulas to generate") + parser.add_argument("--seed", type=int, default=None, + help="random seed for reproducibility") + parser.add_argument("--seed-file", default=None, + help="base .smt2 file for metamorphic strategy") + parser.add_argument("--theories", default="LIA,BV", + help="comma-separated theories for cross-theory") + parser.add_argument("--timeout", type=int, default=10, + help="per-formula Z3 timeout in seconds") + parser.add_argument("--z3", default=None, help="path to z3 binary") + parser.add_argument("--db", default=None, help="path to z3agent.db") + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + seed = args.seed if args.seed is not None else int(time.time()) + rng = random.Random(seed) + log.info("seed: %d", seed) + + db = Z3DB(args.db) + run_id = db.start_run( + "deeptest", + f"strategy={args.strategy} count={args.count} seed={seed}") + + t0 = time.monotonic() + anomalies = STRATEGIES[args.strategy](args, rng, db, run_id) + elapsed_ms = int((time.monotonic() - t0) * 1000) + + status = "success" if anomalies == 0 else "findings" + db.finish_run(run_id, status, elapsed_ms) + + print(f"strategy: {args.strategy}") + print(f"seed: {seed}") + print(f"formulas: {args.count}") + print(f"anomalies: {anomalies}") + print(f"elapsed: {elapsed_ms}ms") + + db.close() + sys.exit(1 if anomalies > 0 else 0) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/encode/SKILL.md b/.github/skills/encode/SKILL.md new file mode 100644 index 000000000..eef343bef --- /dev/null +++ b/.github/skills/encode/SKILL.md @@ -0,0 +1,45 @@ +--- +name: encode +description: Translate constraint problems into SMT-LIB2 or Z3 Python API code. Handles common problem classes including scheduling, graph coloring, arithmetic puzzles, and verification conditions. +--- + +Given a problem description (natural language, pseudocode, or a partial formulation), produce a complete, syntactically valid SMT-LIB2 encoding or Z3 Python script. The encoding should declare all variables, assert all constraints, and include the appropriate check-sat / get-model commands. + +# Step 1: Identify the problem class + +Common encodings: + +| Problem class | Theory | Typical sorts | +|---------------|--------|---------------| +| Integer arithmetic | LIA / NIA | Int | +| Real arithmetic | LRA / NRA | Real | +| Bitvector operations | QF_BV | (_ BitVec N) | +| Arrays and maps | QF_AX | (Array Int Int) | +| Strings and regex | QF_S | String, RegLan | +| Uninterpreted functions | QF_UF | custom sorts | +| Mixed theories | AUFLIA, etc. | combination | + +# Step 2: Generate the encoding + +```bash +python3 scripts/encode.py --problem "Find integers x, y such that x^2 + y^2 = 25 and x > 0" --format smtlib2 +python3 scripts/encode.py --problem "Schedule 4 tasks on 2 machines minimizing makespan" --format python +``` + +For `--format smtlib2`, the output is a complete .smt2 file ready for the **solve** skill. +For `--format python`, the output is a standalone Z3 Python script. + +# Step 3: Validate the encoding + +The script checks that the generated formula is syntactically valid by running a quick `z3 -in` parse check (no solving, just syntax). Parse errors are reported with the offending line. + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| problem | string | yes | | problem description | +| format | string | no | smtlib2 | output format: smtlib2 or python | +| output | path | no | stdout | write to file instead of stdout | +| validate | flag | no | on | run syntax check on the output | +| debug | flag | no | off | verbose tracing | +| db | path | no | .z3-agent/z3agent.db | logging database | diff --git a/.github/skills/encode/scripts/encode.py b/.github/skills/encode/scripts/encode.py new file mode 100644 index 000000000..67f3ea87d --- /dev/null +++ b/.github/skills/encode/scripts/encode.py @@ -0,0 +1,144 @@ +#!/usr/bin/env python3 +""" +encode.py: validate and format SMT-LIB2 encodings. + +Usage: + python encode.py --validate formula.smt2 + python encode.py --validate formula.smt2 --debug +""" + +import argparse +import re +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, run_z3, setup_logging + + +VALIDATION_TIMEOUT = 5 + + +def read_input(path_or_stdin: str) -> str: + """Read formula from a file path or stdin (when path is '-').""" + if path_or_stdin == "-": + return sys.stdin.read() + p = Path(path_or_stdin) + if not p.exists(): + print(f"file not found: {p}", file=sys.stderr) + sys.exit(1) + return p.read_text() + + +def find_errors(output: str) -> list: + """Extract (error ...) messages from Z3 output.""" + return re.findall(r'\(error\s+"([^"]+)"\)', output) + + +def validate(formula: str, z3_bin: str = None, debug: bool = False) -> dict: + """ + Validate an SMT-LIB2 formula by piping it through z3 -in. + Returns a dict with 'valid' (bool), 'errors' (list), and 'raw' output. + """ + result = run_z3( + formula, z3_bin=z3_bin, timeout=VALIDATION_TIMEOUT, debug=debug, + ) + errors = find_errors(result["stdout"]) + find_errors(result["stderr"]) + + if result["result"] == "timeout": + # Timeout during validation is not a syntax error: the formula + # parsed successfully but solving exceeded the limit. That counts + # as syntactically valid. + return {"valid": True, "errors": [], "raw": result} + + if errors or result["exit_code"] != 0: + return {"valid": False, "errors": errors, "raw": result} + + return {"valid": True, "errors": [], "raw": result} + + +def report_errors(errors: list, formula: str): + """Print each syntax error with surrounding context.""" + lines = formula.splitlines() + print(f"validation failed: {len(errors)} error(s)", file=sys.stderr) + for err in errors: + print(f" : {err}", file=sys.stderr) + if len(lines) <= 20: + print("formula:", file=sys.stderr) + for i, line in enumerate(lines, 1): + print(f" {i:3d} {line}", file=sys.stderr) + + +def write_output(formula: str, output_path: str, fmt: str): + """Write the validated formula to a file or stdout.""" + if fmt == "python": + print("python format output is generated by the agent, " + "not by this script", file=sys.stderr) + sys.exit(1) + + if output_path: + Path(output_path).write_text(formula) + print(f"written to {output_path}") + else: + print(formula) + + +def main(): + parser = argparse.ArgumentParser(prog="encode") + parser.add_argument( + "--validate", + metavar="FILE", + help="path to .smt2 file to validate, or '-' for stdin", + ) + parser.add_argument( + "--format", + choices=["smtlib2", "python"], + default="smtlib2", + help="output format (default: smtlib2)", + ) + parser.add_argument( + "--output", + metavar="FILE", + default=None, + help="write result to file instead of stdout", + ) + parser.add_argument("--z3", default=None, help="path to z3 binary") + parser.add_argument("--db", default=None) + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + if not args.validate: + parser.error("provide --validate FILE") + return + + formula = read_input(args.validate) + + db = Z3DB(args.db) + run_id = db.start_run("encode", formula) + + result = validate(formula, z3_bin=args.z3, debug=args.debug) + + if result["valid"]: + db.log_formula(run_id, formula, "valid") + db.finish_run(run_id, "valid", result["raw"]["duration_ms"], 0) + write_output(formula, args.output, args.format) + db.close() + sys.exit(0) + else: + db.log_formula(run_id, formula, "error") + for err in result["errors"]: + db.log_finding(run_id, "syntax", err, severity="error") + db.finish_run( + run_id, "error", + result["raw"]["duration_ms"], + result["raw"]["exit_code"], + ) + report_errors(result["errors"], formula) + db.close() + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/explain/SKILL.md b/.github/skills/explain/SKILL.md new file mode 100644 index 000000000..515b51378 --- /dev/null +++ b/.github/skills/explain/SKILL.md @@ -0,0 +1,52 @@ +--- +name: explain +description: Parse and interpret Z3 output for human consumption. Handles models, unsat cores, proofs, statistics, and error messages. Translates solver internals into plain-language explanations. +--- + +Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark** skills), produce a structured explanation. This skill is for cases where the solver output is large, nested, or otherwise difficult to read directly. + +# Step 1: Identify the output type + +| Output contains | Explanation type | +|----------------|-----------------| +| `(define-fun ...)` blocks | model explanation | +| unsat core labels | conflict explanation | +| `:key value` statistics | performance breakdown | +| `(error ...)` | error diagnosis | +| proof terms | proof sketch | + +# Step 2: Run the explainer + +```bash +python3 scripts/explain.py --file output.txt +python3 scripts/explain.py --stdin < output.txt +python3 scripts/explain.py --file output.txt --debug +``` + +The script auto-detects the output type and produces a structured summary. + +# Step 3: Interpret the explanation + +For models: +- Each variable is listed with its value and sort +- Array and function interpretations are expanded +- Bitvector values are shown in decimal and hex + +For unsat cores: +- The conflicting named assertions are listed +- A minimal conflict set is highlighted + +For statistics: +- Time breakdown by phase (preprocessing, solving, model construction) +- Theory solver load distribution +- Memory high-water mark + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| file | path | no | | file containing Z3 output | +| stdin | flag | no | off | read from stdin | +| type | string | no | auto | force output type: model, core, stats, error | +| debug | flag | no | off | verbose tracing | +| db | path | no | .z3-agent/z3agent.db | logging database | diff --git a/.github/skills/explain/scripts/explain.py b/.github/skills/explain/scripts/explain.py new file mode 100644 index 000000000..d2704085a --- /dev/null +++ b/.github/skills/explain/scripts/explain.py @@ -0,0 +1,128 @@ +#!/usr/bin/env python3 +""" +explain.py: interpret Z3 output in a readable form. + +Usage: + python explain.py --file output.txt + echo "sat\n(model ...)" | python explain.py --stdin +""" + +import argparse +import re +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, parse_model, parse_stats, parse_unsat_core, setup_logging + + +def detect_type(text: str) -> str: + if "(define-fun" in text: + return "model" + if "(error" in text: + return "error" + if re.search(r':\S+\s+[\d.]+', text): + return "stats" + first = text.strip().split("\n")[0].strip() + if first == "unsat": + return "core" + return "unknown" + + +def explain_model(text: str): + model = parse_model(text) + if not model: + print("no model found in output") + return + print("satisfying assignment:") + for name, val in model.items(): + # show hex for large integers (likely bitvectors) + try: + n = int(val) + if abs(n) > 255: + print(f" {name} = {val} (0x{n:x})") + else: + print(f" {name} = {val}") + except ValueError: + print(f" {name} = {val}") + + +def explain_core(text: str): + core = parse_unsat_core(text) + if core: + print(f"conflicting assertions ({len(core)}):") + for label in core: + print(f" {label}") + else: + print("unsat (no named assertions for core extraction)") + + +def explain_stats(text: str): + stats = parse_stats(text) + if not stats: + print("no statistics found") + return + print("performance breakdown:") + for k in sorted(stats): + print(f" :{k} {stats[k]}") + + if "time" in stats: + print(f"\ntotal time: {stats['time']}s") + if "memory" in stats: + print(f"peak memory: {stats['memory']} MB") + + +def explain_error(text: str): + errors = re.findall(r'\(error\s+"([^"]+)"\)', text) + if errors: + print(f"Z3 reported {len(errors)} error(s):") + for e in errors: + print(f" {e}") + else: + print("error in output but could not parse message") + + +def main(): + parser = argparse.ArgumentParser(prog="explain") + parser.add_argument("--file") + parser.add_argument("--stdin", action="store_true") + parser.add_argument("--type", choices=["model", "core", "stats", "error", "auto"], + default="auto") + parser.add_argument("--db", default=None) + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + if args.file: + text = Path(args.file).read_text() + elif args.stdin: + text = sys.stdin.read() + else: + parser.error("provide --file or --stdin") + return + + output_type = args.type if args.type != "auto" else detect_type(text) + + db = Z3DB(args.db) + run_id = db.start_run("explain", text[:200]) + + if output_type == "model": + explain_model(text) + elif output_type == "core": + explain_core(text) + elif output_type == "stats": + explain_stats(text) + elif output_type == "error": + explain_error(text) + else: + print("could not determine output type") + print("raw output:") + print(text[:500]) + + db.finish_run(run_id, "success", 0, 0) + db.close() + + +if __name__ == "__main__": + main() diff --git a/.github/skills/memory-safety/SKILL.md b/.github/skills/memory-safety/SKILL.md new file mode 100644 index 000000000..75a7861c2 --- /dev/null +++ b/.github/skills/memory-safety/SKILL.md @@ -0,0 +1,53 @@ +--- +name: memory-safety +description: Run AddressSanitizer and UndefinedBehaviorSanitizer on the Z3 test suite to detect memory errors, undefined behavior, and leaks. Logs each finding to z3agent.db. +--- + +Build Z3 with compiler-based sanitizer instrumentation, execute the test suite, and parse the runtime output for memory safety violations. Supported sanitizers are AddressSanitizer (heap and stack buffer overflows, use-after-free, double-free, memory leaks) and UndefinedBehaviorSanitizer (signed integer overflow, null pointer dereference, misaligned access, shift errors). Findings are deduplicated and stored in z3agent.db for triage and longitudinal tracking. + +# Step 1: Configure and build + +The script invokes cmake with the appropriate `-fsanitize` flags and builds the `test-z3` target. Each sanitizer uses a separate build directory to avoid flag conflicts. If a prior instrumented build exists with matching flags, only incremental compilation runs. + +```bash +python3 scripts/memory_safety.py --sanitizer asan +python3 scripts/memory_safety.py --sanitizer ubsan +python3 scripts/memory_safety.py --sanitizer both +``` + +To reuse an existing build: +```bash +python3 scripts/memory_safety.py --sanitizer asan --skip-build --build-dir build/sanitizer-asan +``` + +# Step 2: Run and collect + +The test binary runs with `halt_on_error=0` so the sanitizer reports all violations rather than aborting on the first. The script parses `ERROR: AddressSanitizer`, `runtime error:`, and `ERROR: LeakSanitizer` patterns from the combined output, extracts source locations where available, and deduplicates by category, file, and line. + +```bash +python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug +``` + +# Step 3: Interpret results + +- `clean`: no sanitizer violations detected. +- `findings`: one or more violations found. Each is printed with severity, category, message, and source location. +- `timeout`: the test suite did not complete within the deadline. Increase the timeout or investigate a possible infinite loop. +- `error`: build or execution failed before sanitizer output could be collected. + +Query past runs: +```bash +python3 ../../shared/z3db.py runs --skill memory-safety --last 10 +python3 ../../shared/z3db.py query "SELECT category, severity, file, line, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='memory-safety') ORDER BY run_id DESC LIMIT 20" +``` + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| sanitizer | choice | no | asan | which sanitizer to enable: asan, ubsan, or both | +| build-dir | path | no | build/sanitizer-{name} | path to the build directory | +| timeout | int | no | 600 | seconds before killing the test run | +| skip-build | flag | no | off | reuse an existing instrumented build | +| debug | flag | no | off | verbose cmake, make, and test output | +| db | path | no | .z3-agent/z3agent.db | path to the logging database | diff --git a/.github/skills/memory-safety/scripts/memory_safety.py b/.github/skills/memory-safety/scripts/memory_safety.py new file mode 100644 index 000000000..cab818a63 --- /dev/null +++ b/.github/skills/memory-safety/scripts/memory_safety.py @@ -0,0 +1,266 @@ +#!/usr/bin/env python3 +""" +memory_safety.py: run sanitizer checks on Z3 test suite. + +Usage: + python memory_safety.py --sanitizer asan + python memory_safety.py --sanitizer ubsan --debug +""" + +import argparse +import logging +import os +import re +import subprocess +import sys +import time +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, setup_logging + +logger = logging.getLogger("z3agent") + +SANITIZER_FLAGS = { + "asan": "-fsanitize=address -fno-omit-frame-pointer", + "ubsan": "-fsanitize=undefined -fno-omit-frame-pointer", +} + +ASAN_ERROR = re.compile(r"ERROR:\s*AddressSanitizer:\s*(\S+)") +UBSAN_ERROR = re.compile(r":\d+:\d+:\s*runtime error:\s*(.+)") +LEAK_ERROR = re.compile(r"ERROR:\s*LeakSanitizer:") +LOCATION = re.compile(r"(\S+\.(?:cpp|c|h|hpp)):(\d+)") + + +def find_repo_root() -> Path: + d = Path.cwd() + for _ in range(10): + if (d / "CMakeLists.txt").exists() and (d / "src").is_dir(): + return d + parent = d.parent + if parent == d: + break + d = parent + logger.error("could not locate Z3 repository root") + sys.exit(1) + + +def build_is_configured(build_dir: Path, sanitizer: str) -> bool: + """Check whether the build directory already has a matching cmake config.""" + cache = build_dir / "CMakeCache.txt" + if not cache.is_file(): + return False + expected = SANITIZER_FLAGS[sanitizer].split()[0] + return expected in cache.read_text() + + +def configure(build_dir: Path, sanitizer: str, repo_root: Path) -> bool: + """Run cmake with the requested sanitizer flags.""" + flags = SANITIZER_FLAGS[sanitizer] + build_dir.mkdir(parents=True, exist_ok=True) + cmd = [ + "cmake", str(repo_root), + f"-DCMAKE_C_FLAGS={flags}", + f"-DCMAKE_CXX_FLAGS={flags}", + f"-DCMAKE_EXE_LINKER_FLAGS={flags}", + "-DCMAKE_BUILD_TYPE=Debug", + "-DZ3_BUILD_TEST=ON", + ] + logger.info("configuring %s build in %s", sanitizer, build_dir) + logger.debug("cmake command: %s", " ".join(cmd)) + proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True) + if proc.returncode != 0: + logger.error("cmake failed:\n%s", proc.stderr) + return False + return True + + +def compile_tests(build_dir: Path) -> bool: + """Compile the test-z3 target.""" + nproc = os.cpu_count() or 4 + cmd = ["make", f"-j{nproc}", "test-z3"] + logger.info("compiling test-z3 (%d parallel jobs)", nproc) + proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True) + if proc.returncode != 0: + logger.error("compilation failed:\n%s", proc.stderr[-2000:]) + return False + return True + + +def run_tests(build_dir: Path, timeout: int) -> dict: + """Execute test-z3 under sanitizer runtime and capture output.""" + test_bin = build_dir / "test-z3" + if not test_bin.is_file(): + logger.error("test-z3 not found at %s", test_bin) + return {"stdout": "", "stderr": "binary not found", "exit_code": -1, + "duration_ms": 0} + + env = os.environ.copy() + env["ASAN_OPTIONS"] = "detect_leaks=1:halt_on_error=0:print_stacktrace=1" + env["UBSAN_OPTIONS"] = "print_stacktrace=1:halt_on_error=0" + + cmd = [str(test_bin), "/a"] + logger.info("running: %s", " ".join(cmd)) + start = time.monotonic() + try: + proc = subprocess.run( + cmd, capture_output=True, text=True, timeout=timeout, + cwd=build_dir, env=env, + ) + except subprocess.TimeoutExpired: + ms = int((time.monotonic() - start) * 1000) + logger.warning("test-z3 timed out after %dms", ms) + return {"stdout": "", "stderr": "timeout", "exit_code": -1, + "duration_ms": ms} + + ms = int((time.monotonic() - start) * 1000) + logger.debug("exit_code=%d duration=%dms", proc.returncode, ms) + return { + "stdout": proc.stdout, + "stderr": proc.stderr, + "exit_code": proc.returncode, + "duration_ms": ms, + } + + +def parse_findings(output: str) -> list: + """Extract sanitizer error reports from combined stdout and stderr.""" + findings = [] + lines = output.split("\n") + + for i, line in enumerate(lines): + entry = None + + m = ASAN_ERROR.search(line) + if m: + entry = {"category": "asan", "message": m.group(1), + "severity": "high"} + + if not entry: + m = LEAK_ERROR.search(line) + if m: + entry = {"category": "leak", + "message": "detected memory leaks", + "severity": "high"} + + if not entry: + m = UBSAN_ERROR.search(line) + if m: + entry = {"category": "ubsan", "message": m.group(1), + "severity": "medium"} + + if not entry: + continue + + file_path, line_no = None, None + window = lines[max(0, i - 2):i + 5] + for ctx in window: + loc = LOCATION.search(ctx) + if loc and "/usr/" not in loc.group(1): + file_path = loc.group(1) + line_no = int(loc.group(2)) + break + + entry["file"] = file_path + entry["line"] = line_no + entry["raw"] = line.strip() + findings.append(entry) + + return findings + + +def deduplicate(findings: list) -> list: + """Remove duplicate reports at the same category, file, and line.""" + seen = set() + result = [] + for f in findings: + key = (f["category"], f["file"], f["line"], f["message"]) + if key in seen: + continue + seen.add(key) + result.append(f) + return result + + +def main(): + parser = argparse.ArgumentParser(prog="memory-safety") + parser.add_argument("--sanitizer", choices=["asan", "ubsan", "both"], + default="asan", + help="sanitizer to enable (default: asan)") + parser.add_argument("--build-dir", default=None, + help="path to build directory") + parser.add_argument("--timeout", type=int, default=600, + help="seconds before killing test run") + parser.add_argument("--skip-build", action="store_true", + help="reuse existing instrumented build") + parser.add_argument("--db", default=None, + help="path to z3agent.db") + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + repo_root = find_repo_root() + + sanitizers = ["asan", "ubsan"] if args.sanitizer == "both" else [args.sanitizer] + all_findings = [] + + db = Z3DB(args.db) + + for san in sanitizers: + if args.build_dir: + build_dir = Path(args.build_dir) + else: + build_dir = repo_root / "build" / f"sanitizer-{san}" + + run_id = db.start_run("memory-safety", f"sanitizer={san}") + db.log(f"sanitizer: {san}, build: {build_dir}", run_id=run_id) + + if not args.skip_build: + needs_configure = not build_is_configured(build_dir, san) + if needs_configure and not configure(build_dir, san, repo_root): + db.finish_run(run_id, "error", 0, exit_code=1) + print(f"FAIL: cmake configuration failed for {san}") + continue + if not compile_tests(build_dir): + db.finish_run(run_id, "error", 0, exit_code=1) + print(f"FAIL: compilation failed for {san}") + continue + + result = run_tests(build_dir, args.timeout) + combined = result["stdout"] + "\n" + result["stderr"] + findings = deduplicate(parse_findings(combined)) + + for f in findings: + db.log_finding( + run_id, + category=f["category"], + message=f["message"], + severity=f["severity"], + file=f["file"], + line=f["line"], + details={"raw": f["raw"]}, + ) + + status = "clean" if not findings else "findings" + if result["exit_code"] == -1: + status = "timeout" if "timeout" in result["stderr"] else "error" + + db.finish_run(run_id, status, result["duration_ms"], result["exit_code"]) + all_findings.extend(findings) + print(f"{san}: {len(findings)} finding(s), {result['duration_ms']}ms") + + if all_findings: + print(f"\nTotal: {len(all_findings)} finding(s)") + for f in all_findings: + loc = f"{f['file']}:{f['line']}" if f["file"] else "unknown location" + print(f" [{f['severity']}] {f['category']}: {f['message']} at {loc}") + db.close() + sys.exit(1) + else: + print("\nNo sanitizer findings.") + db.close() + sys.exit(0) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/optimize/SKILL.md b/.github/skills/optimize/SKILL.md new file mode 100644 index 000000000..fc93a7e2c --- /dev/null +++ b/.github/skills/optimize/SKILL.md @@ -0,0 +1,48 @@ +--- +name: optimize +description: Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains. +--- + +Given a set of constraints and an objective function, find the optimal value. Z3 supports both hard constraints (must hold) and soft constraints (weighted preferences), as well as lexicographic multi-objective optimization. + +# Step 1: Formulate the problem + +The formula uses the `(minimize ...)` or `(maximize ...)` directives followed by `(check-sat)` and `(get-model)`. + +Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`: +```smtlib +(declare-const x Int) +(declare-const y Int) +(assert (>= x 1)) +(assert (>= y 2)) +(assert (<= (+ x y) 10)) +(minimize (+ x y)) +(check-sat) +(get-model) +``` + +# Step 2: Run the optimizer + +```bash +python3 scripts/optimize.py --file scheduling.smt2 +python3 scripts/optimize.py --formula "" --debug +``` + +# Step 3: Interpret the output + +- `sat` with a model: the optimal assignment satisfying all constraints. +- `unsat`: the constraints are contradictory; no feasible solution exists. +- `unknown` or `timeout`: Z3 could not determine optimality. + +The script prints the objective value and the satisfying assignment. + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| formula | string | no | | SMT-LIB2 formula with minimize/maximize | +| file | path | no | | path to .smt2 file | +| timeout | int | no | 60 | seconds | +| z3 | path | no | auto | path to z3 binary | +| debug | flag | no | off | verbose tracing | +| db | path | no | .z3-agent/z3agent.db | logging database | diff --git a/.github/skills/optimize/scripts/optimize.py b/.github/skills/optimize/scripts/optimize.py new file mode 100644 index 000000000..8c7462ccb --- /dev/null +++ b/.github/skills/optimize/scripts/optimize.py @@ -0,0 +1,60 @@ +#!/usr/bin/env python3 +""" +optimize.py: solve constrained optimization problems via Z3. + +Usage: + python optimize.py --file scheduling.smt2 + python optimize.py --formula "(declare-const x Int)..." --debug +""" + +import argparse +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, run_z3, parse_model, setup_logging + + +def main(): + parser = argparse.ArgumentParser(prog="optimize") + parser.add_argument("--formula") + parser.add_argument("--file") + parser.add_argument("--timeout", type=int, default=60) + parser.add_argument("--z3", default=None) + parser.add_argument("--db", default=None) + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + if args.file: + formula = Path(args.file).read_text() + elif args.formula: + formula = args.formula + else: + parser.error("provide --formula or --file") + return + + db = Z3DB(args.db) + run_id = db.start_run("optimize", formula) + + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug) + + model = parse_model(result["stdout"]) if result["result"] == "sat" else None + + db.log_formula(run_id, formula, result["result"], + str(model) if model else None) + db.finish_run(run_id, result["result"], result["duration_ms"], + result["exit_code"]) + + print(result["result"]) + if model: + for name, val in model.items(): + print(f" {name} = {val}") + + db.close() + sys.exit(0 if result["exit_code"] == 0 else 1) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/prove/SKILL.md b/.github/skills/prove/SKILL.md new file mode 100644 index 000000000..a67d57758 --- /dev/null +++ b/.github/skills/prove/SKILL.md @@ -0,0 +1,54 @@ +--- +name: prove +description: Prove validity of logical statements by negation and satisfiability checking. If the negation is unsatisfiable, the original statement is valid. Otherwise a counterexample is returned. +--- + +Given a conjecture (an SMT-LIB2 assertion or a natural language claim), determine whether it holds universally. The method is standard: negate the conjecture and check satisfiability. If the negation is unsatisfiable, the original is valid. If satisfiable, the model is a counterexample. + +# Step 1: Prepare the negated formula + +Wrap the conjecture in `(assert (not ...))` and append `(check-sat)(get-model)`. + +Example: to prove that `(> x 3)` implies `(> x 1)`: +```smtlib +(declare-const x Int) +(assert (not (=> (> x 3) (> x 1)))) +(check-sat) +(get-model) +``` + +# Step 2: Run the prover + +```bash +python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" +``` + +For file input where the file contains the full negated formula: +```bash +python3 scripts/prove.py --file negated.smt2 +``` + +With debug tracing: +```bash +python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" --debug +``` + +# Step 3: Interpret the output + +- `valid`: the negation was unsat, so the conjecture holds for all inputs. +- `invalid` followed by a counterexample: the negation was sat; the model shows a concrete assignment where the conjecture fails. +- `unknown` or `timeout`: Z3 could not decide. The conjecture may require auxiliary lemmas or induction. + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| conjecture | string | no | | the assertion to prove (without negation) | +| vars | string | no | | variable declarations as "name:sort" pairs, comma-separated | +| file | path | no | | .smt2 file with the negated formula | +| timeout | int | no | 30 | seconds | +| z3 | path | no | auto | path to z3 binary | +| debug | flag | no | off | verbose tracing | +| db | path | no | .z3-agent/z3agent.db | logging database | + +Either `conjecture` (with `vars`) or `file` must be provided. diff --git a/.github/skills/prove/scripts/prove.py b/.github/skills/prove/scripts/prove.py new file mode 100644 index 000000000..b4656fdd7 --- /dev/null +++ b/.github/skills/prove/scripts/prove.py @@ -0,0 +1,82 @@ +#!/usr/bin/env python3 +""" +prove.py: prove validity by negation + satisfiability check. + +Usage: + python prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" + python prove.py --file negated.smt2 +""" + +import argparse +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, run_z3, parse_model, setup_logging + + +def build_formula(conjecture: str, vars_str: str) -> str: + lines = [] + if vars_str: + for v in vars_str.split(","): + v = v.strip() + name, sort = v.split(":") + lines.append(f"(declare-const {name.strip()} {sort.strip()})") + lines.append(f"(assert (not {conjecture}))") + lines.append("(check-sat)") + lines.append("(get-model)") + return "\n".join(lines) + + +def main(): + parser = argparse.ArgumentParser(prog="prove") + parser.add_argument("--conjecture", help="assertion to prove") + parser.add_argument("--vars", help="variable declarations, e.g. 'x:Int,y:Bool'") + parser.add_argument("--file", help="path to .smt2 file with negated formula") + parser.add_argument("--timeout", type=int, default=30) + parser.add_argument("--z3", default=None) + parser.add_argument("--db", default=None) + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + if args.file: + formula = Path(args.file).read_text() + elif args.conjecture: + formula = build_formula(args.conjecture, args.vars or "") + else: + parser.error("provide --conjecture or --file") + return + + db = Z3DB(args.db) + run_id = db.start_run("prove", formula) + + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug) + + if result["result"] == "unsat": + verdict = "valid" + elif result["result"] == "sat": + verdict = "invalid" + else: + verdict = result["result"] + + model = parse_model(result["stdout"]) if verdict == "invalid" else None + + db.log_formula(run_id, formula, verdict, str(model) if model else None) + db.finish_run(run_id, verdict, result["duration_ms"], result["exit_code"]) + + print(verdict) + if model: + print("counterexample:") + for name, val in model.items(): + print(f" {name} = {val}") + + db.close() + # Exit 0 when we successfully determined validity or invalidity; + # exit 1 only for errors/timeouts. + sys.exit(0 if verdict in ("valid", "invalid") else 1) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/shared/schema.sql b/.github/skills/shared/schema.sql new file mode 100644 index 000000000..90c365e6d --- /dev/null +++ b/.github/skills/shared/schema.sql @@ -0,0 +1,57 @@ +-- z3agent schema v1 + +PRAGMA journal_mode=WAL; +PRAGMA foreign_keys=ON; + +CREATE TABLE IF NOT EXISTS runs ( + run_id INTEGER PRIMARY KEY AUTOINCREMENT, + skill TEXT NOT NULL, + input_hash TEXT, + status TEXT NOT NULL DEFAULT 'running', + duration_ms INTEGER, + exit_code INTEGER, + timestamp TEXT NOT NULL DEFAULT (datetime('now')) +); + +CREATE INDEX IF NOT EXISTS idx_runs_skill ON runs(skill); +CREATE INDEX IF NOT EXISTS idx_runs_status ON runs(status); + +CREATE TABLE IF NOT EXISTS formulas ( + formula_id INTEGER PRIMARY KEY AUTOINCREMENT, + run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE, + smtlib2 TEXT NOT NULL, + result TEXT, + model TEXT, + stats TEXT, + timestamp TEXT NOT NULL DEFAULT (datetime('now')) +); + +CREATE INDEX IF NOT EXISTS idx_formulas_run ON formulas(run_id); +CREATE INDEX IF NOT EXISTS idx_formulas_result ON formulas(result); + +CREATE TABLE IF NOT EXISTS findings ( + finding_id INTEGER PRIMARY KEY AUTOINCREMENT, + run_id INTEGER REFERENCES runs(run_id) ON DELETE CASCADE, + category TEXT NOT NULL, + severity TEXT, + file TEXT, + line INTEGER, + message TEXT NOT NULL, + details TEXT, + timestamp TEXT NOT NULL DEFAULT (datetime('now')) +); + +CREATE INDEX IF NOT EXISTS idx_findings_run ON findings(run_id); +CREATE INDEX IF NOT EXISTS idx_findings_category ON findings(category); +CREATE INDEX IF NOT EXISTS idx_findings_severity ON findings(severity); + +CREATE TABLE IF NOT EXISTS interaction_log ( + log_id INTEGER PRIMARY KEY AUTOINCREMENT, + run_id INTEGER REFERENCES runs(run_id) ON DELETE SET NULL, + level TEXT NOT NULL DEFAULT 'info', + message TEXT NOT NULL, + timestamp TEXT NOT NULL DEFAULT (datetime('now')) +); + +CREATE INDEX IF NOT EXISTS idx_log_run ON interaction_log(run_id); +CREATE INDEX IF NOT EXISTS idx_log_level ON interaction_log(level); diff --git a/.github/skills/shared/z3db.py b/.github/skills/shared/z3db.py new file mode 100644 index 000000000..ca959073d --- /dev/null +++ b/.github/skills/shared/z3db.py @@ -0,0 +1,328 @@ +#!/usr/bin/env python3 +""" +z3db: shared library and CLI for Z3 skill scripts. + +Library usage: + from z3db import Z3DB, find_z3, run_z3 + +CLI usage: + python z3db.py init + python z3db.py status + python z3db.py log [--run-id N] + python z3db.py runs [--skill solve] [--last N] + python z3db.py query "SELECT ..." +""" + +import argparse +import hashlib +import json +import logging +import os +import re +import shutil +import sqlite3 +import subprocess +import sys +import time +from pathlib import Path +from typing import Optional + + +SCHEMA_PATH = Path(__file__).parent / "schema.sql" +DEFAULT_DB_DIR = ".z3-agent" +DEFAULT_DB_NAME = "z3agent.db" + +logger = logging.getLogger("z3agent") + + +def setup_logging(debug: bool = False): + level = logging.DEBUG if debug else logging.INFO + fmt = "[%(levelname)s] %(message)s" if not debug else \ + "[%(levelname)s %(asctime)s] %(message)s" + logging.basicConfig(level=level, format=fmt, stream=sys.stderr) + + +class Z3DB: + """SQLite handle for z3agent.db, tracks runs, formulas, findings, logs.""" + + def __init__(self, db_path: Optional[str] = None): + if db_path is None: + db_dir = Path(DEFAULT_DB_DIR) + db_dir.mkdir(exist_ok=True) + db_path = str(db_dir / DEFAULT_DB_NAME) + self.db_path = db_path + self.conn = sqlite3.connect(db_path) + self.conn.execute("PRAGMA foreign_keys=ON") + self.conn.row_factory = sqlite3.Row + self._init_schema() + + def _init_schema(self): + self.conn.executescript(SCHEMA_PATH.read_text()) + + def close(self): + self.conn.close() + + def start_run(self, skill: str, input_text: str = "") -> int: + input_hash = hashlib.sha256(input_text.encode()).hexdigest()[:16] + cur = self.conn.execute( + "INSERT INTO runs (skill, input_hash) VALUES (?, ?)", + (skill, input_hash), + ) + self.conn.commit() + run_id = cur.lastrowid + logger.debug("started run %d (skill=%s, hash=%s)", run_id, skill, input_hash) + return run_id + + def finish_run(self, run_id: int, status: str, duration_ms: int, + exit_code: int = 0): + self.conn.execute( + "UPDATE runs SET status=?, duration_ms=?, exit_code=? WHERE run_id=?", + (status, duration_ms, exit_code, run_id), + ) + self.conn.commit() + logger.debug("finished run %d: %s (%dms)", run_id, status, duration_ms) + + def log_formula(self, run_id: int, smtlib2: str, result: str = None, + model: str = None, stats: dict = None) -> int: + cur = self.conn.execute( + "INSERT INTO formulas (run_id, smtlib2, result, model, stats) " + "VALUES (?, ?, ?, ?, ?)", + (run_id, smtlib2, result, model, + json.dumps(stats) if stats else None), + ) + self.conn.commit() + return cur.lastrowid + + def log_finding(self, run_id: int, category: str, message: str, + severity: str = None, file: str = None, + line: int = None, details: dict = None) -> int: + cur = self.conn.execute( + "INSERT INTO findings (run_id, category, severity, file, line, " + "message, details) VALUES (?, ?, ?, ?, ?, ?, ?)", + (run_id, category, severity, file, line, message, + json.dumps(details) if details else None), + ) + self.conn.commit() + return cur.lastrowid + + def log(self, message: str, level: str = "info", run_id: int = None): + """Write to stderr and to the interaction_log table.""" + getattr(logger, level, logger.info)(message) + self.conn.execute( + "INSERT INTO interaction_log (run_id, level, message) " + "VALUES (?, ?, ?)", + (run_id, level, message), + ) + self.conn.commit() + + def get_runs(self, skill: str = None, last: int = 10): + sql = "SELECT * FROM runs" + params = [] + if skill: + sql += " WHERE skill = ?" + params.append(skill) + sql += " ORDER BY run_id DESC LIMIT ?" + params.append(last) + return self.conn.execute(sql, params).fetchall() + + def get_status(self) -> dict: + rows = self.conn.execute( + "SELECT status, COUNT(*) as cnt FROM runs GROUP BY status" + ).fetchall() + total = sum(r["cnt"] for r in rows) + by_status = {r["status"]: r["cnt"] for r in rows} + last = self.conn.execute( + "SELECT timestamp FROM runs ORDER BY run_id DESC LIMIT 1" + ).fetchone() + return { + "total": total, + **by_status, + "last_run": last["timestamp"] if last else None, + } + + def get_logs(self, run_id: int = None, last: int = 50): + if run_id: + return self.conn.execute( + "SELECT * FROM interaction_log WHERE run_id=? " + "ORDER BY log_id DESC LIMIT ?", (run_id, last) + ).fetchall() + return self.conn.execute( + "SELECT * FROM interaction_log ORDER BY log_id DESC LIMIT ?", + (last,) + ).fetchall() + + def query(self, sql: str): + return self.conn.execute(sql).fetchall() + + +def find_z3(hint: str = None) -> str: + """Locate the z3 binary: explicit path > build dirs > PATH.""" + candidates = [] + if hint: + candidates.append(hint) + + repo_root = _find_repo_root() + if repo_root: + for build_dir in ["build", "build/release", "build/debug"]: + candidates.append(str(repo_root / build_dir / "z3")) + + path_z3 = shutil.which("z3") + if path_z3: + candidates.append(path_z3) + + for c in candidates: + p = Path(c) + if p.is_file() and os.access(p, os.X_OK): + logger.debug("found z3: %s", p) + return str(p) + + logger.error("z3 binary not found. Searched: %s", candidates) + sys.exit(1) + + +def _find_repo_root() -> Optional[Path]: + d = Path.cwd() + for _ in range(10): + if (d / "CMakeLists.txt").exists() and (d / "src").is_dir(): + return d + parent = d.parent + if parent == d: + break + d = parent + return None + + +def run_z3(formula: str, z3_bin: str = None, timeout: int = 30, + args: list = None, debug: bool = False) -> dict: + """Pipe an SMT-LIB2 formula into z3 -in, return parsed output.""" + z3_path = find_z3(z3_bin) + cmd = [z3_path, "-in"] + (args or []) + + logger.debug("cmd: %s", " ".join(cmd)) + logger.debug("stdin:\n%s", formula) + + start = time.monotonic() + try: + proc = subprocess.run( + cmd, input=formula, capture_output=True, text=True, + timeout=timeout, + ) + except subprocess.TimeoutExpired: + duration_ms = int((time.monotonic() - start) * 1000) + logger.warning("z3 timed out after %dms", duration_ms) + return { + "stdout": "", "stderr": "timeout", "exit_code": -1, + "duration_ms": duration_ms, "result": "timeout", + } + + duration_ms = int((time.monotonic() - start) * 1000) + + logger.debug("exit_code=%d duration=%dms", proc.returncode, duration_ms) + logger.debug("stdout:\n%s", proc.stdout) + if proc.stderr: + logger.debug("stderr:\n%s", proc.stderr) + + first_line = proc.stdout.strip().split("\n")[0].strip() if proc.stdout else "" + result = first_line if first_line in ("sat", "unsat", "unknown") else "error" + + return { + "stdout": proc.stdout, + "stderr": proc.stderr, + "exit_code": proc.returncode, + "duration_ms": duration_ms, + "result": result, + } + + +def parse_model(stdout: str) -> Optional[dict]: + """Pull define-fun entries from a (get-model) response.""" + model = {} + for m in re.finditer( + r'\(define-fun\s+(\S+)\s+\(\)\s+\S+\s+(.+?)\)', stdout + ): + model[m.group(1)] = m.group(2).strip() + return model if model else None + + +def parse_stats(stdout: str) -> Optional[dict]: + """Parse :key value pairs from z3 -st output.""" + stats = {} + for m in re.finditer(r':(\S+)\s+([\d.]+)', stdout): + key, val = m.group(1), m.group(2) + stats[key] = float(val) if '.' in val else int(val) + return stats if stats else None + + +def parse_unsat_core(stdout: str) -> Optional[list]: + for line in stdout.strip().split("\n"): + line = line.strip() + if line.startswith("(") and not line.startswith("(error"): + labels = line.strip("()").split() + if labels: + return labels + return None + + +def cli(): + parser = argparse.ArgumentParser( + description="Z3 Agent database CLI", + prog="z3db", + ) + parser.add_argument("--db", default=None, help="path to z3agent.db") + parser.add_argument("--debug", action="store_true", help="verbose output") + + sub = parser.add_subparsers(dest="command") + + sub.add_parser("init", help="initialize the database") + + status_p = sub.add_parser("status", help="show run summary") + + log_p = sub.add_parser("log", help="show interaction log") + log_p.add_argument("--run-id", type=int, help="filter by run ID") + log_p.add_argument("--last", type=int, default=50) + + runs_p = sub.add_parser("runs", help="list runs") + runs_p.add_argument("--skill", help="filter by skill name") + runs_p.add_argument("--last", type=int, default=10) + + query_p = sub.add_parser("query", help="run raw SQL") + query_p.add_argument("sql", help="SQL query string") + + args = parser.parse_args() + setup_logging(args.debug) + + db = Z3DB(args.db) + + if args.command == "init": + print(f"Database initialized at {db.db_path}") + + elif args.command == "status": + s = db.get_status() + print(f"Runs: {s['total']}" + f" | success: {s.get('success', 0)}" + f" | error: {s.get('error', 0)}" + f" | timeout: {s.get('timeout', 0)}" + f" | Last: {s['last_run'] or 'never'}") + + elif args.command == "log": + for row in db.get_logs(args.run_id, args.last): + print(f"[{row['level']}] {row['timestamp']} " + f"(run {row['run_id']}): {row['message']}") + + elif args.command == "runs": + for row in db.get_runs(args.skill, args.last): + print(f"#{row['run_id']} {row['skill']} {row['status']} " + f"{row['duration_ms']}ms @ {row['timestamp']}") + + elif args.command == "query": + for row in db.query(args.sql): + print(dict(row)) + + else: + parser.print_help() + + db.close() + + +if __name__ == "__main__": + cli() diff --git a/.github/skills/simplify/SKILL.md b/.github/skills/simplify/SKILL.md new file mode 100644 index 000000000..5803b7148 --- /dev/null +++ b/.github/skills/simplify/SKILL.md @@ -0,0 +1,48 @@ +--- +name: simplify +description: Reduce formula complexity using Z3 tactic chains. Supports configurable tactic pipelines for boolean, arithmetic, and bitvector simplification. +--- + +Given a formula, apply a sequence of Z3 tactics to produce an equivalent but simpler form. This is useful for understanding what Z3 sees after preprocessing, debugging tactic selection, and reducing formula size before solving. + +# Step 1: Choose tactics + +Z3 provides dozens of tactics. Common ones: + +| Tactic | What it does | +|--------|-------------| +| simplify | constant folding, algebraic identities | +| propagate-values | substitute known equalities | +| ctx-simplify | context-dependent simplification | +| elim-uncnstr | remove unconstrained variables | +| solve-eqs | Gaussian elimination | +| bit-blast | reduce bitvectors to booleans | +| tseitin-cnf | convert to CNF | +| aig | and-inverter graph reduction | + +# Step 2: Run simplification + +```bash +python3 scripts/simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int" +python3 scripts/simplify.py --file formula.smt2 --tactics "simplify,propagate-values,ctx-simplify" +python3 scripts/simplify.py --file formula.smt2 --debug +``` + +Without `--tactics`, the script applies the default chain: `simplify`, `propagate-values`, `ctx-simplify`. + +# Step 3: Interpret the output + +The script prints the simplified formula in SMT-LIB2 syntax. Subgoals are printed as separate `(assert ...)` blocks. + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| formula | string | no | | SMT-LIB2 formula to simplify | +| vars | string | no | | variable declarations as "name:sort" pairs | +| file | path | no | | path to .smt2 file | +| tactics | string | no | simplify,propagate-values,ctx-simplify | comma-separated tactic names | +| timeout | int | no | 30 | seconds | +| z3 | path | no | auto | path to z3 binary | +| debug | flag | no | off | verbose tracing | +| db | path | no | .z3-agent/z3agent.db | logging database | diff --git a/.github/skills/simplify/scripts/simplify.py b/.github/skills/simplify/scripts/simplify.py new file mode 100644 index 000000000..9abef32fb --- /dev/null +++ b/.github/skills/simplify/scripts/simplify.py @@ -0,0 +1,83 @@ +#!/usr/bin/env python3 +""" +simplify.py: apply Z3 tactics to simplify an SMT-LIB2 formula. + +Usage: + python simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int" + python simplify.py --file formula.smt2 --tactics "simplify,solve-eqs" +""" + +import argparse +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, run_z3, setup_logging + + +DEFAULT_TACTICS = "simplify,propagate-values,ctx-simplify" + + +def build_tactic_formula(base_formula: str, tactics: str) -> str: + tactic_list = [t.strip() for t in tactics.split(",")] + if len(tactic_list) == 1: + tactic_expr = f"(then {tactic_list[0]} skip)" + else: + tactic_expr = "(then " + " ".join(tactic_list) + ")" + return base_formula + f"\n(apply {tactic_expr})\n" + + +def build_formula_from_parts(formula_str: str, vars_str: str) -> str: + lines = [] + if vars_str: + for v in vars_str.split(","): + v = v.strip() + name, sort = v.split(":") + lines.append(f"(declare-const {name.strip()} {sort.strip()})") + lines.append(formula_str) + return "\n".join(lines) + + +def main(): + parser = argparse.ArgumentParser(prog="simplify") + parser.add_argument("--formula") + parser.add_argument("--vars") + parser.add_argument("--file") + parser.add_argument("--tactics", default=DEFAULT_TACTICS) + parser.add_argument("--timeout", type=int, default=30) + parser.add_argument("--z3", default=None) + parser.add_argument("--db", default=None) + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + if args.file: + base = Path(args.file).read_text() + elif args.formula: + base = build_formula_from_parts(args.formula, args.vars or "") + else: + parser.error("provide --formula or --file") + return + + formula = build_tactic_formula(base, args.tactics) + + db = Z3DB(args.db) + run_id = db.start_run("simplify", formula) + + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug) + + status = "success" if result["exit_code"] == 0 else "error" + db.log_formula(run_id, formula, status) + db.finish_run(run_id, status, result["duration_ms"], result["exit_code"]) + + print(result["stdout"]) + if result["stderr"] and result["exit_code"] != 0: + print(result["stderr"], file=sys.stderr) + + db.close() + sys.exit(0 if result["exit_code"] == 0 else 1) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/solve/SKILL.md b/.github/skills/solve/SKILL.md new file mode 100644 index 000000000..a7385635b --- /dev/null +++ b/.github/skills/solve/SKILL.md @@ -0,0 +1,50 @@ +--- +name: solve +description: Check satisfiability of SMT-LIB2 formulas using Z3. Returns sat/unsat with models or unsat cores. Logs every invocation to z3agent.db for auditability. +--- + +Given an SMT-LIB2 formula (or a set of constraints described in natural language), determine whether the formula is satisfiable. If sat, extract a satisfying assignment. If unsat and tracking labels are present, extract the unsat core. + +# Step 1: Prepare the formula + +If the input is already valid SMT-LIB2, use it directly. If it is a natural language description, use the **encode** skill first to produce SMT-LIB2. + +The formula must include `(check-sat)` at the end. Append `(get-model)` for satisfiable queries or `(get-unsat-core)` when named assertions are used. + +# Step 2: Run Z3 + +```bash +python3 scripts/solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)" +``` + +For file input: +```bash +python3 scripts/solve.py --file problem.smt2 +``` + +With debug tracing: +```bash +python3 scripts/solve.py --file problem.smt2 --debug +``` + +The script pipes the formula to `z3 -in` via subprocess (no shell expansion), logs the run to `.z3-agent/z3agent.db`, and prints the result. + +# Step 3: Interpret the output + +- `sat` followed by a model: the formula is satisfiable; the model assigns concrete values to each declared constant. +- `unsat`: no assignment exists. If `(get-unsat-core)` was used, the conflicting named assertions are listed. +- `unknown`: Z3 could not decide within the timeout. Consider increasing the timeout or simplifying the formula. +- `timeout`: the process was killed after the deadline. Try the **simplify** skill to reduce complexity. + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| formula | string | no | | SMT-LIB2 formula as a string | +| file | path | no | | path to an .smt2 file | +| timeout | int | no | 30 | seconds before killing z3 | +| z3 | path | no | auto | explicit path to z3 binary | +| debug | flag | no | off | print z3 command, stdin, stdout, stderr, timing | +| db | path | no | .z3-agent/z3agent.db | path to the logging database | + +Either `formula` or `file` must be provided. diff --git a/.github/skills/solve/scripts/solve.py b/.github/skills/solve/scripts/solve.py new file mode 100644 index 000000000..b283243f2 --- /dev/null +++ b/.github/skills/solve/scripts/solve.py @@ -0,0 +1,66 @@ +#!/usr/bin/env python3 +""" +solve.py: check satisfiability of an SMT-LIB2 formula via Z3. + +Usage: + python solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)" + python solve.py --file problem.smt2 + python solve.py --file problem.smt2 --debug --timeout 60 +""" + +import argparse +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, run_z3, parse_model, parse_unsat_core, setup_logging + + +def main(): + parser = argparse.ArgumentParser(prog="solve") + parser.add_argument("--formula", help="SMT-LIB2 formula string") + parser.add_argument("--file", help="path to .smt2 file") + parser.add_argument("--timeout", type=int, default=30) + parser.add_argument("--z3", default=None, help="path to z3 binary") + parser.add_argument("--db", default=None) + parser.add_argument("--debug", action="store_true") + args = parser.parse_args() + + setup_logging(args.debug) + + if args.file: + formula = Path(args.file).read_text() + elif args.formula: + formula = args.formula + else: + parser.error("provide --formula or --file") + return + + db = Z3DB(args.db) + run_id = db.start_run("solve", formula) + + result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug) + + model = parse_model(result["stdout"]) if result["result"] == "sat" else None + core = parse_unsat_core(result["stdout"]) if result["result"] == "unsat" else None + + db.log_formula(run_id, formula, result["result"], + str(model) if model else None) + db.finish_run(run_id, result["result"], result["duration_ms"], + result["exit_code"]) + + print(result["result"]) + if model: + for name, val in model.items(): + print(f" {name} = {val}") + if core: + print("unsat core:", " ".join(core)) + if result["stderr"] and result["result"] == "error": + print(result["stderr"], file=sys.stderr) + + db.close() + sys.exit(0 if result["exit_code"] == 0 else 1) + + +if __name__ == "__main__": + main() diff --git a/.github/skills/static-analysis/SKILL.md b/.github/skills/static-analysis/SKILL.md new file mode 100644 index 000000000..566999813 --- /dev/null +++ b/.github/skills/static-analysis/SKILL.md @@ -0,0 +1,46 @@ +--- +name: static-analysis +description: Run Clang Static Analyzer (scan-build) on Z3 source and log structured findings to z3agent.db. +--- + +Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plist diagnostics, and record each finding with file, line, category, and description. This skill wraps scan-build into a reproducible, logged workflow suitable for regular analysis sweeps and regression tracking. + +# Step 1: Run the analysis + +```bash +python3 scripts/static_analysis.py --build-dir build +python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug +python3 scripts/static_analysis.py --build-dir build --timeout 1800 +``` + +The script invokes `scan-build cmake ..` followed by `scan-build make` inside the specified build directory. Clang checker output is written to `--output-dir` (defaults to a `scan-results` subdirectory of the build directory). + +# Step 2: Interpret the output + +Each finding is printed with its source location, category, and description: + +``` +[Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read +[Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer +``` + +A summary table groups findings by category so that high-frequency classes are visible at a glance. + +# Step 3: Review historical findings + +All findings are logged to `z3agent.db`. Query them to track trends: + +```bash +python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC" +python3 ../../shared/z3db.py runs --skill static-analysis --last 10 +``` + +# Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| build-dir | path | yes | | path to the CMake build directory | +| output-dir | path | no | BUILD/scan-results | directory for scan-build output | +| timeout | int | no | 1200 | seconds allowed for the full build | +| db | path | no | .z3-agent/z3agent.db | logging database | +| debug | flag | no | off | verbose tracing | diff --git a/.github/skills/static-analysis/scripts/static_analysis.py b/.github/skills/static-analysis/scripts/static_analysis.py new file mode 100644 index 000000000..aa64d883d --- /dev/null +++ b/.github/skills/static-analysis/scripts/static_analysis.py @@ -0,0 +1,255 @@ +#!/usr/bin/env python3 +""" +static_analysis.py: run Clang Static Analyzer on Z3 source. + +Usage: + python static_analysis.py --build-dir build + python static_analysis.py --build-dir build --output-dir /tmp/sa-results + python static_analysis.py --build-dir build --debug +""" + +import argparse +import logging +import os +import plistlib +import shutil +import subprocess +import sys +import time +from collections import Counter +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared")) +from z3db import Z3DB, setup_logging + +logger = logging.getLogger("z3agent") + +SCAN_BUILD_NAMES = ["scan-build", "scan-build-14", "scan-build-15", "scan-build-16"] + + +def find_scan_build() -> str: + """Locate the scan-build binary on PATH.""" + for name in SCAN_BUILD_NAMES: + path = shutil.which(name) + if path: + logger.debug("found scan-build: %s", path) + return path + logger.error( + "scan-build not found. Install clang-tools or set PATH. " + "Searched: %s", ", ".join(SCAN_BUILD_NAMES) + ) + sys.exit(1) + + +def run_configure(scan_build: str, build_dir: Path, output_dir: Path, + timeout: int) -> bool: + """Run scan-build cmake to configure the project.""" + repo_root = build_dir.parent + cmd = [ + scan_build, + "-o", str(output_dir), + "cmake", + str(repo_root), + ] + logger.info("configuring: %s", " ".join(cmd)) + try: + proc = subprocess.run( + cmd, cwd=str(build_dir), + capture_output=True, text=True, timeout=timeout, + ) + except subprocess.TimeoutExpired: + logger.error("cmake configuration timed out after %ds", timeout) + return False + + if proc.returncode != 0: + logger.error("cmake configuration failed (exit %d)", proc.returncode) + logger.error("stderr: %s", proc.stderr[:2000]) + return False + + logger.info("configuration complete") + return True + + +def run_build(scan_build: str, build_dir: Path, output_dir: Path, + timeout: int) -> bool: + """Run scan-build make to build and analyze.""" + nproc = os.cpu_count() or 4 + cmd = [ + scan_build, + "-o", str(output_dir), + "--status-bugs", + "make", + f"-j{nproc}", + ] + logger.info("building with analysis: %s", " ".join(cmd)) + try: + proc = subprocess.run( + cmd, cwd=str(build_dir), + capture_output=True, text=True, timeout=timeout, + ) + except subprocess.TimeoutExpired: + logger.error("build timed out after %ds", timeout) + return False + + # scan-build returns nonzero when bugs are found (due to --status-bugs), + # so a nonzero exit code is not necessarily a build failure. + if proc.returncode != 0: + logger.info( + "scan-build exited with code %d (nonzero may indicate findings)", + proc.returncode, + ) + else: + logger.info("build complete, no bugs reported by scan-build") + + if proc.stderr: + logger.debug("build stderr (last 2000 chars): %s", proc.stderr[-2000:]) + return True + + +def collect_plist_files(output_dir: Path) -> list: + """Recursively find all .plist diagnostic files under the output directory.""" + plists = sorted(output_dir.rglob("*.plist")) + logger.debug("found %d plist files in %s", len(plists), output_dir) + return plists + + +def parse_plist_findings(plist_path: Path) -> list: + """Extract findings from a single Clang plist diagnostic file. + + Returns a list of dicts with keys: file, line, col, category, type, description. + """ + findings = [] + try: + with open(plist_path, "rb") as f: + data = plistlib.load(f) + except Exception as exc: + logger.warning("could not parse %s: %s", plist_path, exc) + return findings + + source_files = data.get("files", []) + for diag in data.get("diagnostics", []): + location = diag.get("location", {}) + file_idx = location.get("file", 0) + source_file = source_files[file_idx] if file_idx < len(source_files) else "" + findings.append({ + "file": source_file, + "line": location.get("line", 0), + "col": location.get("col", 0), + "category": diag.get("category", "uncategorized"), + "type": diag.get("type", ""), + "description": diag.get("description", ""), + }) + return findings + + +def collect_all_findings(output_dir: Path) -> list: + """Parse every plist file under output_dir and return merged findings.""" + all_findings = [] + for plist_path in collect_plist_files(output_dir): + all_findings.extend(parse_plist_findings(plist_path)) + return all_findings + + +def log_findings(db, run_id: int, findings: list): + """Persist each finding to z3agent.db.""" + for f in findings: + db.log_finding( + run_id, + category=f["category"], + message=f["description"], + severity=f.get("type"), + file=f["file"], + line=f["line"], + details={"col": f["col"], "type": f["type"]}, + ) + + +def print_findings(findings: list): + """Print individual findings and a category summary.""" + if not findings: + print("No findings reported.") + return + + for f in findings: + label = f["category"] + if f["type"]: + label = f["type"] + print(f"[{label}] {f['file']}:{f['line']}: {f['description']}") + + print() + counts = Counter(f["category"] for f in findings) + print(f"Total findings: {len(findings)}") + print("By category:") + for cat, cnt in counts.most_common(): + print(f" {cat}: {cnt}") + + +def main(): + parser = argparse.ArgumentParser( + prog="static_analysis", + description="Run Clang Static Analyzer on Z3 and log findings.", + ) + parser.add_argument( + "--build-dir", required=True, + help="path to the CMake build directory", + ) + parser.add_argument( + "--output-dir", default=None, + help="directory for scan-build results (default: BUILD/scan-results)", + ) + parser.add_argument( + "--timeout", type=int, default=1200, + help="seconds allowed for the full analysis build", + ) + parser.add_argument("--db", default=None, help="path to z3agent.db") + parser.add_argument("--debug", action="store_true", help="verbose tracing") + args = parser.parse_args() + + setup_logging(args.debug) + + scan_build = find_scan_build() + + build_dir = Path(args.build_dir).resolve() + build_dir.mkdir(parents=True, exist_ok=True) + + output_dir = Path(args.output_dir) if args.output_dir else build_dir / "scan-results" + output_dir = output_dir.resolve() + output_dir.mkdir(parents=True, exist_ok=True) + + db = Z3DB(args.db) + run_id = db.start_run("static-analysis", f"build_dir={build_dir}") + start = time.monotonic() + + if not run_configure(scan_build, build_dir, output_dir, timeout=args.timeout): + elapsed = int((time.monotonic() - start) * 1000) + db.finish_run(run_id, "error", elapsed, exit_code=1) + db.close() + sys.exit(1) + + if not run_build(scan_build, build_dir, output_dir, timeout=args.timeout): + elapsed = int((time.monotonic() - start) * 1000) + db.finish_run(run_id, "error", elapsed, exit_code=1) + db.close() + sys.exit(1) + + elapsed = int((time.monotonic() - start) * 1000) + + findings = collect_all_findings(output_dir) + log_findings(db, run_id, findings) + + status = "clean" if len(findings) == 0 else "findings" + db.finish_run(run_id, status, elapsed, exit_code=0) + + db.log( + f"static analysis complete: {len(findings)} finding(s) in {elapsed}ms", + run_id=run_id, + ) + + print_findings(findings) + + db.close() + sys.exit(0) + + +if __name__ == "__main__": + main()