Add Copilot skill architecture with 10 skills, 2 agents, and shared infra

Introduce .github/skills/ with solve, prove, optimize, simplify, encode, explain, benchmark, memory-safety, static-analysis, and deeptest skills. Each skill follows a SKILL.md + scripts/ pattern with Python scripts backed by a shared SQLite logging library (z3db.py). Two orchestrator agents (z3-solver, z3-verifier) route requests to the appropriate skills. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-07-11 09:36:19 +00:00 · 2026-03-11 17:41:29 +00:00 · 2026-03-11 17:41:29 +00:00 · d349b93d1d
commit d349b93d1d
parent 1cba7cb5ee
25 changed files with 2784 additions and 0 deletions
--- a/.github/agents/z3-solver.md
+++ b/.github/agents/z3-solver.md
@ -0,0 +1,129 @@
+---
+name: z3-solver
+description: 'Z3 theorem prover assistant: satisfiability checking, validity proofs, optimization, simplification, encoding, and performance analysis.'
+---
+
+## Instructions
+
+You are the Z3 Solver Agent, a Copilot agent for SMT solving workflows using the Z3 theorem prover. You help users formulate, solve, optimize, and interpret constraint satisfaction problems. Follow the workflow below. Use subagents for long-running skill invocations such as benchmarking.
+
+### Workflow
+
+1. **Understand the Request**: Determine what the user needs: a satisfiability check, a validity proof, an optimization, a simplification, an encoding from natural language, an explanation of output, or a performance analysis.
+
+2. **Encode (if needed)**: If the user provides a problem in natural language, pseudocode, or a domain-specific formulation, translate it into SMT-LIB2 using the **encode** skill before proceeding.
+
+3. **Solve or Transform**: Route to the appropriate skill based on the request type. Multiple skills may be chained when the task requires it (for example, encoding followed by optimization followed by explanation).
+
+4. **Explain Results**: After solving, invoke **explain** to present the result in clear, human-readable language. Always interpret models, proofs, and optimization results for the user.
+
+5. **Iterate**: On follow-up queries, refine the formulation or re-invoke skills with adjusted parameters. Do not re-run the full pipeline when only a narrow adjustment is needed.
+
+### Available Skills
+
+| # | Skill | Purpose |
+|---|-------|---------|
+| 1 | solve | Check satisfiability of a formula. Extract models when satisfiable. Report unsatisfiable cores when unsat. |
+| 2 | prove | Establish validity of a formula by checking the negation for unsatisfiability. If the negation is unsat, the original is valid. |
+| 3 | optimize | Solve constrained optimization problems. Supports minimize and maximize objectives, lexicographic and Pareto modes. |
+| 4 | simplify | Apply Z3 tactics to reduce formula complexity. Useful for preprocessing, normal form conversion, and human-readable reformulation. |
+| 5 | encode | Translate a problem description into SMT-LIB2 syntax. Handles sort selection, quantifier introduction, and theory annotation. |
+| 6 | explain | Interpret Z3 output (models, unsat cores, proofs, optimization results, statistics) and present it in plain language. |
+| 7 | benchmark | Measure solving performance. Collect statistics, compare tactic configurations, identify bottlenecks, and suggest parameter tuning. |
+
+### Skill Dependencies
+
+The planner respects these edges:
+
+```
+encode --> solve
+encode --> prove
+encode --> optimize
+encode --> simplify
+solve  --> explain
+prove  --> explain
+optimize --> explain
+simplify --> explain
+benchmark --> explain
+solve  --> benchmark
+optimize --> benchmark
+```
+
+Skills on the left must complete before skills on the right when both appear in a pipeline. Independent skills (for example, solve and optimize on separate formulas) may run in parallel.
+
+### Skill Selection
+
+Given a user request, select skills as follows:
+
+- "Is this formula satisfiable?" : `solve`
+- "Find a model for these constraints" : `solve` then `explain`
+- "Prove that P implies Q" : `encode` (if needed) then `prove` then `explain`
+- "Prove this is always true" : `prove` then `explain`
+- "Optimize this scheduling problem" : `encode` then `optimize` then `explain`
+- "Minimize cost subject to constraints" : `optimize` then `explain`
+- "Simplify this expression" : `simplify` then `explain`
+- "Convert to CNF" : `simplify`
+- "Translate this problem to SMT-LIB2" : `encode`
+- "Why is Z3 returning unknown?" : `explain`
+- "Why is this query slow?" : `benchmark` then `explain`
+- "Compare these two tactic pipelines" : `benchmark` then `explain`
+- "What does this model mean?" : `explain`
+- "Get the unsat core" : `solve` then `explain`
+
+When the request is ambiguous, prefer the most informative pipeline. For example, "check this formula" should invoke `solve` followed by `explain`, not `solve` alone.
+
+### Examples
+
+User: "Is (x > 0 and y > 0 and x + y < 1) satisfiable over the reals?"
+
+1. **solve**: Assert the conjunction over real-valued variables. Run `(check-sat)`.
+2. **explain**: If sat, present the model. If unsat, state that no assignment satisfies all three constraints simultaneously.
+
+User: "Prove that for all integers x, if x^2 is even then x is even."
+
+1. **encode**: Formalize the statement. Negate it: assert there exists an integer x such that x^2 is even and x is odd.
+2. **prove**: Check the negation for unsatisfiability.
+3. **explain**: If unsat, the original statement is valid. Present the reasoning. If sat (counterexample found), report the model and explain why the conjecture fails.
+
+User: "Schedule five tasks on two machines to minimize makespan."
+
+1. **encode**: Define integer variables for task assignments and start times. Encode machine capacity, precedence, and non-overlap constraints.
+2. **optimize**: Minimize the makespan variable subject to the encoded constraints.
+3. **explain**: Present the optimal schedule, makespan value, and any binding constraints.
+
+User: "Why is my bitvector query so slow?"
+
+1. **benchmark**: Run the query with `(set-option :timeout 30000)` and collect statistics via `(get-info :all-statistics)`.
+2. **explain**: Identify dominant cost centers (conflict count, propagation ratio, theory combination overhead). Suggest tactic or parameter adjustments such as `:blast_full` for bitvectors or increasing the relevancy threshold.
+
+### Error Handling
+
+Z3 may return results other than `sat` or `unsat`. Handle each case as follows:
+
+**unknown**: Z3 could not determine satisfiability within the given resource limits.
+- Check if a timeout was active. If so, suggest increasing it.
+- Inspect the reason with `(get-info :reason-unknown)`.
+- If the reason is "incomplete," the formula may use a theory fragment that Z3 cannot decide. Suggest alternative encodings (for example, replacing nonlinear arithmetic with linearization or bit-blasting).
+- If the reason is "timeout" or "max-conflicts," suggest parameter tuning: increase `:timeout`, adjust `:smt.relevancy`, or try a different tactic pipeline.
+
+**error (syntax or sort mismatch)**: The input is malformed.
+- Report the exact error message from Z3.
+- Identify the offending declaration or assertion.
+- Suggest a corrected encoding.
+
+**error (resource exhaustion)**: Z3 ran out of memory or hit an internal limit.
+- Suggest simplifying the problem: reduce variable count, eliminate quantifiers where possible, split into subproblems.
+- Suggest incremental solving with `(push)` / `(pop)` to reuse learned information.
+
+**unsat with no core requested**: The formula is unsatisfiable but the user may want to understand why.
+- Offer to re-run with `(set-option :produce-unsat-cores true)` and named assertions to extract a minimal explanation.
+
+### Notes
+
+- Always validate SMT-LIB2 syntax before invoking Z3. A malformed input wastes time and produces confusing errors.
+- Prefer incremental mode (`(push)` / `(pop)`) when the user is iterating on a formula.
+- Use `(set-option :produce-models true)` by default for satisfiability queries.
+- Use `(set-option :produce-proofs true)` when the user requests validity proofs.
+- Collect statistics with `z3 -st` when performance is relevant.
+- Present models in a readable table format, not raw S-expressions, unless the user requests SMT-LIB2 output.
+- Never fabricate results. If a skill fails or Z3 produces an unexpected answer, report the raw output and explain what went wrong.
--- a/.github/agents/z3-verifier.md
+++ b/.github/agents/z3-verifier.md
@ -0,0 +1,131 @@
+---
+name: z3-verifier
+description: 'Z3 code quality agent: memory safety checking, static analysis, and stress testing for the Z3 codebase itself.'
+---
+
+## Instructions
+
+You are the Z3 Verifier Agent, a Copilot agent for code quality and correctness verification of the Z3 theorem prover codebase. You do not solve SMT problems (use **z3-solver** for that). Instead, you detect bugs, enforce code quality, and stress-test Z3 internals. Follow the workflow below. Use subagents for long-running skill invocations such as fuzzing campaigns.
+
+### Workflow
+
+1. **Identify the Verification Goal**: Determine what the user needs: memory bug detection, static analysis findings, or stress testing results. If the request is broad ("verify this code" or "full verification pass"), run all three skills.
+
+2. **Build the Target**: Ensure a Z3 build exists with the required instrumentation (sanitizers, debug symbols, coverage). If not, build one before proceeding.
+
+3. **Run Verification Skills**: Invoke the appropriate skill(s). When running a full verification pass, execute all three skills and aggregate results.
+
+4. **Report Findings**: Present results sorted by severity. Each finding should include: location (file, function, line), category, severity, and reproduction steps where applicable.
+
+5. **Iterate**: On follow-ups, narrow scope to specific files, functions, or bug categories. Do not re-run the full pipeline unnecessarily.
+
+### Available Skills
+
+| # | Skill | Purpose |
+|---|-------|---------|
+| 1 | memory-safety | Build Z3 with AddressSanitizer (ASan), MemorySanitizer (MSan), or UndefinedBehaviorSanitizer (UBSan). Run the test suite under instrumentation to detect memory corruption, use-after-free, buffer overflows, uninitialized reads, and undefined behavior. |
+| 2 | static-analysis | Run the Clang Static Analyzer over the Z3 source tree. Detects null pointer dereferences, resource leaks, dead stores, logic errors, and API misuse without executing the code. |
+| 3 | deeptest | Stress-test Z3 with randomized inputs, differential testing against known-good solvers, and targeted fuzzing of parser and solver components. Detects crashes, assertion failures, and correctness regressions. |
+
+### Skill Dependencies
+
+```
+memory-safety (independent)
+static-analysis (independent)
+deeptest (independent)
+```
+
+All three skills are independent and may run in parallel. None requires the output of another as input. When running a full verification pass, launch all three simultaneously via subagents.
+
+### Skill Selection
+
+Given a user request, select skills as follows:
+
+- "Check for memory bugs" : `memory-safety`
+- "Run ASan on the test suite" : `memory-safety`
+- "Find undefined behavior" : `memory-safety` (with UBSan configuration)
+- "Run static analysis" : `static-analysis`
+- "Find null pointer bugs" : `static-analysis`
+- "Check for resource leaks" : `static-analysis`
+- "Fuzz Z3" : `deeptest`
+- "Stress test the parser" : `deeptest`
+- "Run differential testing" : `deeptest`
+- "Full verification pass" : `memory-safety` + `static-analysis` + `deeptest`
+- "Verify this pull request" : `memory-safety` + `static-analysis` (scope to changed files)
+- "Is this change safe?" : `memory-safety` + `static-analysis` (scope to changed files)
+
+### Examples
+
+User: "Check for memory bugs in the SAT solver."
+
+1. **memory-safety**: Build Z3 with ASan enabled (`cmake -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" ..`). Run the SAT solver tests. Collect any sanitizer reports.
+2. Report findings with stack traces, categorized by bug type (heap-buffer-overflow, use-after-free, stack-buffer-overflow, etc.).
+
+User: "Run static analysis on src/ast/."
+
+1. **static-analysis**: Invoke `scan-build` or `clang-tidy` over `src/ast/` with Z3's compile commands database.
+2. Report findings sorted by severity. Include checker name, file, line, and a brief description of each issue.
+
+User: "Fuzz the SMT-LIB2 parser."
+
+1. **deeptest**: Generate randomized SMT-LIB2 inputs targeting the parser. Run Z3 on each input with a timeout. Collect crashes, assertion failures, and unexpected error messages.
+2. Report crash-inducing inputs with minimized reproduction cases. Classify findings as crashes, assertion violations, or incorrect results.
+
+User: "Full verification pass before the release."
+
+1. Launch all three skills in parallel via subagents:
+   - **memory-safety**: Full test suite under ASan and UBSan.
+   - **static-analysis**: Full source tree scan.
+   - **deeptest**: Broad fuzzing campaign across theories (arithmetic, bitvectors, arrays, strings).
+2. Aggregate all findings. Deduplicate issues that appear in multiple skills (for example, a null dereference found by both static analysis and ASan). Sort by severity: Critical, High, Medium, Low.
+3. Present a summary table followed by detailed findings.
+
+### Build Configurations
+
+Each skill may require a specific build configuration:
+
+**memory-safety (ASan)**:
+```bash
+mkdir build-asan && cd build-asan
+cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer" -DCMAKE_BUILD_TYPE=Debug
+make -j$(nproc)
+```
+
+**memory-safety (UBSan)**:
+```bash
+mkdir build-ubsan && cd build-ubsan
+cmake .. -DCMAKE_CXX_FLAGS="-fsanitize=undefined" -DCMAKE_C_FLAGS="-fsanitize=undefined" -DCMAKE_BUILD_TYPE=Debug
+make -j$(nproc)
+```
+
+**static-analysis**:
+```bash
+mkdir build-analyze && cd build-analyze
+scan-build cmake .. -DCMAKE_BUILD_TYPE=Debug
+scan-build make -j$(nproc)
+```
+
+**deeptest**: Uses a standard Release build for performance, with Debug builds reserved for reproducing crashes:
+```bash
+mkdir build-fuzz && cd build-fuzz
+cmake .. -DCMAKE_BUILD_TYPE=Release
+make -j$(nproc)
+```
+
+### Error Handling
+
+**Build failure**: If the instrumented build fails, report the compiler errors. Common causes: sanitizer flags incompatible with certain optimization levels, or missing sanitizer runtime libraries.
+
+**Flaky sanitizer reports**: Some sanitizer findings may be nondeterministic (especially under MSan with uninitialized memory). Re-run flagged tests three times to confirm reproducibility. Mark non-reproducible findings as "intermittent" rather than discarding them.
+
+**Fuzzing timeouts**: Individual fuzz inputs that cause Z3 to exceed the timeout threshold should be collected separately and reported as potential performance regressions, not crashes.
+
+**False positives in static analysis**: The Clang Static Analyzer may produce false positives, particularly around custom allocators and reference-counted smart pointers used in Z3. Flag likely false positives but do not suppress them without user confirmation.
+
+### Notes
+
+- Sanitizer builds are significantly slower than Release builds. Set timeouts to at least 3x the normal test suite duration.
+- Store sanitizer reports and fuzzing artifacts in `.z3-verifier/` unless the user specifies otherwise.
+- When scoping to changed files for pull request verification, use `git diff` to determine the affected source files and limit skill invocations accordingly.
+- Never suppress or ignore sanitizer findings automatically. Every report should be presented to the user for triage.
+- Prefer ASan as the default sanitizer. It catches the broadest class of memory errors with the lowest false-positive rate.
--- a/.github/skills/README.md
+++ b/.github/skills/README.md
@ -0,0 +1,74 @@
+# Z3 Agent Skills
+
+Reusable, composable verification primitives for the Z3 theorem prover.
+Each skill is a self-contained unit: a `SKILL.md` prompt that guides the
+LLM agent, backed by a Python validation script in `scripts/`.
+
+## Skill Catalogue
+
+| Skill | Status | Description |
+|-------|--------|-------------|
+| solve | implemented | Check satisfiability of SMT-LIB2 formulas; return models or unsat cores |
+| prove | implemented | Prove validity by negation and satisfiability checking |
+| encode | implemented | Translate constraint problems into SMT-LIB2 or Z3 Python API code |
+| simplify | implemented | Reduce formula complexity using configurable Z3 tactic chains |
+| optimize | implemented | Solve constrained optimization (minimize/maximize) over numeric domains |
+| explain | implemented | Parse and interpret Z3 output: models, cores, statistics, errors |
+| benchmark | implemented | Measure Z3 performance and collect solver statistics |
+| static-analysis | planned | Run Clang Static Analyzer on Z3 source and log structured findings |
+| deeptest | planned | Deep property-based testing of Z3 internals |
+| memory-safety | planned | Memory safety verification of Z3 C++ source |
+
+## Agents
+
+Two orchestration agents compose these skills into end-to-end workflows:
+
+| Agent | Role |
+|-------|------|
+| z3-solver | Formulation and solving: encode, solve, prove, simplify, optimize, explain |
+| z3-verifier | Codebase quality: benchmark, static-analysis, deeptest, memory-safety |
+
+## Shared Infrastructure
+
+All scripts share a common library at `shared/z3db.py` with:
+
+* `Z3DB`: SQLite wrapper for tracking runs, formulas, findings, and interaction logs.
+* `run_z3()`: Pipe SMT-LIB2 into `z3 -in` with timeout handling.
+* `find_z3()`: Locate the Z3 binary across build directories and PATH.
+* Parsers: `parse_model()`, `parse_stats()`, `parse_unsat_core()`.
+
+The database schema lives in `shared/schema.sql`.
+
+## Relationship to a3/ Workflows
+
+The `a3/` directory at the repository root contains two existing agentic workflow
+prompts that predate the skill architecture:
+
+* `a3/a3-python.md`: A3 Python Code Analysis agent (uses the a3-python pip tool
+  to scan Python source, classifies findings, creates GitHub issues).
+* `a3/a3-rust.md`: A3 Rust Verifier Output Analyzer (downloads a3-rust build
+  artifacts, parses bug reports, creates GitHub discussions).
+
+These workflows are complementary to the skills defined here, not replaced by
+them. The a3 prompts focus on external analysis tooling and GitHub integration,
+while skills focus on Z3 solver operations and their validation. Both may be
+composed by the same orchestrating agent.
+
+## Usage
+
+Check database status and recent runs:
+
+```
+python shared/z3db.py status
+python shared/z3db.py runs --skill solve --last 5
+python shared/z3db.py log --run-id 12
+python shared/z3db.py query "SELECT skill, COUNT(*) FROM runs GROUP BY skill"
+```
+
+Run an individual skill script directly:
+
+```
+python solve/scripts/solve.py --file problem.smt2
+python encode/scripts/encode.py --validate formula.smt2
+python benchmark/scripts/benchmark.py --file problem.smt2
+```
--- a/.github/skills/benchmark/SKILL.md
+++ b/.github/skills/benchmark/SKILL.md
@ -0,0 +1,48 @@
+---
+name: benchmark
+description: Measure Z3 performance on a formula or file. Collects wall-clock time, theory solver statistics, memory usage, and conflict counts. Results are logged to z3agent.db for longitudinal tracking.
+---
+
+Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report performance characteristics. This is useful for identifying performance regressions, comparing tactic strategies, and profiling theory solver workload distribution.
+
+# Step 1: Run Z3 with statistics
+
+```bash
+python3 scripts/benchmark.py --file problem.smt2
+python3 scripts/benchmark.py --file problem.smt2 --runs 5
+python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug
+```
+
+The script invokes `z3 -st` and parses the `:key value` statistics block.
+
+# Step 2: Interpret the output
+
+The output includes:
+
+- wall-clock time (ms)
+- result (sat/unsat/unknown/timeout)
+- memory usage (MB)
+- conflicts, decisions, propagations
+- per-theory breakdown (arithmetic, bv, array, etc.)
+
+With `--runs N`, the script runs Z3 N times and reports min/median/max timing.
+
+# Step 3: Compare over time
+
+Past benchmark runs are logged to `z3agent.db`. Query them:
+```bash
+python3 ../../shared/z3db.py runs --skill benchmark --last 20
+python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5"
+```
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| formula | string | no | | SMT-LIB2 formula |
+| file | path | no | | path to .smt2 file |
+| runs | int | no | 1 | number of repeated runs for timing |
+| timeout | int | no | 60 | seconds per run |
+| z3 | path | no | auto | path to z3 binary |
+| debug | flag | no | off | verbose tracing |
+| db | path | no | .z3-agent/z3agent.db | logging database |
--- a/.github/skills/benchmark/scripts/benchmark.py
+++ b/.github/skills/benchmark/scripts/benchmark.py
@ -0,0 +1,74 @@
+#!/usr/bin/env python3
+"""
+benchmark.py: measure Z3 performance with statistics.
+
+Usage:
+    python benchmark.py --file problem.smt2
+    python benchmark.py --file problem.smt2 --runs 5
+"""
+
+import argparse
+import statistics
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, run_z3, parse_stats, setup_logging
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="benchmark")
+    parser.add_argument("--formula")
+    parser.add_argument("--file")
+    parser.add_argument("--runs", type=int, default=1)
+    parser.add_argument("--timeout", type=int, default=60)
+    parser.add_argument("--z3", default=None)
+    parser.add_argument("--db", default=None)
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    if args.file:
+        formula = Path(args.file).read_text()
+    elif args.formula:
+        formula = args.formula
+    else:
+        parser.error("provide --formula or --file")
+        return
+
+    db = Z3DB(args.db)
+    timings = []
+
+    for i in range(args.runs):
+        run_id = db.start_run("benchmark", formula)
+        result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
+                        args=["-st"], debug=args.debug)
+
+        stats = parse_stats(result["stdout"])
+        db.log_formula(run_id, formula, result["result"], stats=stats)
+        db.finish_run(run_id, result["result"], result["duration_ms"],
+                      result["exit_code"])
+        timings.append(result["duration_ms"])
+
+        if args.runs == 1:
+            print(f"result: {result['result']}")
+            print(f"time: {result['duration_ms']}ms")
+            if stats:
+                print("statistics:")
+                for k, v in sorted(stats.items()):
+                    print(f"  :{k} {v}")
+
+    if args.runs > 1:
+        print(f"runs: {args.runs}")
+        print(f"min: {min(timings)}ms")
+        print(f"median: {statistics.median(timings):.0f}ms")
+        print(f"max: {max(timings)}ms")
+        print(f"result: {result['result']}")
+
+    db.close()
+    sys.exit(0 if result["exit_code"] == 0 else 1)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/deeptest/SKILL.md
+++ b/.github/skills/deeptest/SKILL.md
@ -0,0 +1,70 @@
+---
+name: deeptest
+description: Generate stress tests and differential tests for Z3 theories. Creates random or structured SMT-LIB2 formulas, runs them through Z3, and checks for crashes, assertion failures, or result inconsistencies. Inspired by fuzzing and metamorphic testing approaches applied to SMT solvers.
+---
+
+Given a strategy and count, generate SMT-LIB2 formulas targeting Z3 internals and report anomalies. Strategies range from pure random generation to structured metamorphic and cross-theory combinations. Every formula and finding is logged to z3agent.db.
+
+# Step 1: Choose a strategy and run
+
+```bash
+python3 scripts/deeptest.py --strategy random --count 100 --seed 42
+python3 scripts/deeptest.py --strategy metamorphic --seed-file base.smt2 --count 50
+python3 scripts/deeptest.py --strategy cross-theory --theories "LIA,BV" --count 80
+python3 scripts/deeptest.py --strategy incremental --count 60 --debug
+```
+
+Available strategies:
+
+- `random`: generate formulas with random declarations (Int, Bool, BitVec), random arithmetic and boolean assertions, and check-sat.
+- `metamorphic`: start from a base formula (generated or loaded from file), apply equisatisfiable transformations (tautology insertion, double negation, assertion duplication), and verify the result stays consistent.
+- `cross-theory`: combine multiple theories (LIA, Bool, BV) in a single formula with bridging constraints to stress theory combination.
+- `incremental`: generate push/pop sequences with per-frame assertions to stress incremental solving.
+
+# Step 2: Interpret the output
+
+The script prints a summary after completion:
+
+```
+strategy:  random
+seed:      42
+formulas:  100
+anomalies: 2
+elapsed:   4500ms
+```
+
+A nonzero anomaly count means the run detected crashes (nonzero exit code), assertion failures in stderr, solver errors, or result disagreements between a base formula and its metamorphic variants.
+
+# Step 3: Inspect findings
+
+Findings are logged to `z3agent.db` with category, severity, and details:
+
+```bash
+python3 ../../shared/z3db.py query "SELECT category, severity, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='deeptest') ORDER BY finding_id DESC LIMIT 20"
+```
+
+Each finding includes the formula index, exit code, and a stderr excerpt for triage.
+
+# Step 4: Reproduce
+
+Use the `--seed` parameter to reproduce a run exactly:
+
+```bash
+python3 scripts/deeptest.py --strategy random --count 100 --seed 42
+```
+
+The seed is printed in every run summary and logged in the run record.
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| strategy | string | no | random | test generation strategy: random, metamorphic, cross-theory, incremental |
+| count | int | no | 50 | number of formulas to generate |
+| seed | int | no | clock | random seed for reproducibility |
+| seed-file | path | no | | base .smt2 file for metamorphic strategy |
+| theories | string | no | LIA,BV | comma-separated theories for cross-theory strategy |
+| timeout | int | no | 10 | per-formula Z3 timeout in seconds |
+| z3 | path | no | auto | path to z3 binary |
+| debug | flag | no | off | verbose tracing |
+| db | path | no | .z3-agent/z3agent.db | logging database |
--- a/.github/skills/deeptest/scripts/deeptest.py
+++ b/.github/skills/deeptest/scripts/deeptest.py
@ -0,0 +1,393 @@
+#!/usr/bin/env python3
+"""
+deeptest.py: generate and run stress tests for Z3.
+
+Usage:
+    python deeptest.py --strategy random --count 100
+    python deeptest.py --strategy metamorphic --seed-file base.smt2
+    python deeptest.py --strategy cross-theory --theories "LIA,BV" --debug
+"""
+
+import argparse
+import logging
+import random
+import sys
+import time
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, run_z3, setup_logging
+
+log = logging.getLogger("deeptest")
+
+# Sort and operator tables
+
+THEORY_SORTS = {
+    "LIA":  "Int",
+    "Bool": "Bool",
+    "BV":   "(_ BitVec 32)",
+}
+
+INT_ARITH = ["+", "-", "*"]
+INT_CMP   = [">", "<", ">=", "<=", "="]
+BV_ARITH  = ["bvadd", "bvsub", "bvand", "bvor", "bvxor"]
+BV_CMP    = ["bvslt", "bvsgt", "bvsle", "bvsge", "="]
+
+# Assertion generators (one per sort)
+
+
+def _int_assertion(rng, vs):
+    if len(vs) < 2:
+        return f"(assert ({rng.choice(INT_CMP)} {vs[0]} {rng.randint(-10, 10)}))"
+    a, b = rng.sample(vs, 2)
+    return f"(assert ({rng.choice(INT_CMP)} ({rng.choice(INT_ARITH)} {a} {b}) {rng.randint(-10, 10)}))"
+
+
+def _bool_assertion(rng, vs):
+    if len(vs) == 1:
+        return f"(assert {vs[0]})" if rng.random() < 0.5 else f"(assert (not {vs[0]}))"
+    a, b = rng.sample(vs, 2)
+    return f"(assert ({rng.choice(['and', 'or', '=>'])} {a} {b}))"
+
+
+def _bv_assertion(rng, vs):
+    lit = f"(_ bv{rng.randint(0, 255)} 32)"
+    if len(vs) < 2:
+        return f"(assert ({rng.choice(BV_CMP)} {vs[0]} {lit}))"
+    a, b = rng.sample(vs, 2)
+    return f"(assert ({rng.choice(BV_CMP)} ({rng.choice(BV_ARITH)} {a} {b}) {lit}))"
+
+
+SORT_ASSERTION = {
+    "Int":            _int_assertion,
+    "Bool":           _bool_assertion,
+    "(_ BitVec 32)":  _bv_assertion,
+}
+
+
+def _random_assertion(rng, vars_by_sort):
+    """Pick a populated sort and emit one random assertion."""
+    available = [s for s in vars_by_sort if vars_by_sort[s]]
+    if not available:
+        return None
+    sort = rng.choice(available)
+    return SORT_ASSERTION[sort](rng, vars_by_sort[sort])
+
+# Formula generators
+
+
+def gen_random_formula(rng, num_vars=5, num_assertions=5):
+    """Random declarations, random assertions, single check-sat."""
+    lines = []
+    vars_by_sort = {}
+    sorts = list(THEORY_SORTS.values())
+
+    for i in range(num_vars):
+        sort = rng.choice(sorts)
+        name = f"v{i}"
+        lines.append(f"(declare-const {name} {sort})")
+        vars_by_sort.setdefault(sort, []).append(name)
+
+    for _ in range(num_assertions):
+        a = _random_assertion(rng, vars_by_sort)
+        if a:
+            lines.append(a)
+
+    lines.append("(check-sat)")
+    return "\n".join(lines)
+
+
+def gen_metamorphic_variant(rng, base_formula):
+    """Apply an equisatisfiable transformation to a formula.
+
+    Transformations:
+      tautology   : insert (assert true) before check-sat
+      double_neg  : wrap one assertion body in double negation
+      duplicate   : repeat an existing assertion
+    """
+    lines = base_formula.strip().split("\n")
+    transform = rng.choice(["tautology", "double_neg", "duplicate"])
+    assertion_idxs = [i for i, l in enumerate(lines)
+                      if l.strip().startswith("(assert")]
+
+    if transform == "tautology":
+        pos = next((i for i, l in enumerate(lines) if "check-sat" in l),
+                   len(lines))
+        lines.insert(pos, "(assert true)")
+
+    elif transform == "double_neg" and assertion_idxs:
+        idx = rng.choice(assertion_idxs)
+        body = lines[idx].strip()
+        inner = body[len("(assert "):-1]
+        lines[idx] = f"(assert (not (not {inner})))"
+
+    elif transform == "duplicate" and assertion_idxs:
+        idx = rng.choice(assertion_idxs)
+        lines.insert(idx + 1, lines[idx])
+
+    return "\n".join(lines)
+
+
+def gen_cross_theory_formula(rng, theories, num_vars=4, num_assertions=6):
+    """Combine variables from multiple theories with bridging constraints."""
+    lines = []
+    vars_by_sort = {}
+    sorts = [THEORY_SORTS[t] for t in theories if t in THEORY_SORTS]
+    if not sorts:
+        sorts = list(THEORY_SORTS.values())
+
+    for i in range(num_vars):
+        sort = sorts[i % len(sorts)]
+        name = f"v{i}"
+        lines.append(f"(declare-const {name} {sort})")
+        vars_by_sort.setdefault(sort, []).append(name)
+
+    for _ in range(num_assertions):
+        a = _random_assertion(rng, vars_by_sort)
+        if a:
+            lines.append(a)
+
+    # Bridge Int and Bool when both present
+    int_vs = vars_by_sort.get("Int", [])
+    bool_vs = vars_by_sort.get("Bool", [])
+    if int_vs and bool_vs:
+        iv = rng.choice(int_vs)
+        bv = rng.choice(bool_vs)
+        lines.append(f"(assert (= {bv} (> {iv} 0)))")
+
+    lines.append("(check-sat)")
+    return "\n".join(lines)
+
+
+def gen_incremental_formula(rng, num_frames=3, num_vars=4,
+                            asserts_per_frame=3):
+    """Push/pop sequence: all variables declared globally, assertions scoped."""
+    lines = []
+    vars_by_sort = {}
+    sorts = list(THEORY_SORTS.values())
+
+    for i in range(num_vars):
+        sort = rng.choice(sorts)
+        name = f"v{i}"
+        lines.append(f"(declare-const {name} {sort})")
+        vars_by_sort.setdefault(sort, []).append(name)
+
+    for _ in range(num_frames):
+        lines.append("(push 1)")
+        for _ in range(asserts_per_frame):
+            a = _random_assertion(rng, vars_by_sort)
+            if a:
+                lines.append(a)
+        lines.append("(check-sat)")
+        lines.append("(pop 1)")
+
+    lines.append("(check-sat)")
+    return "\n".join(lines)
+
+# Anomaly detection
+
+
+def classify_result(result):
+    """Return an anomaly category string or None if the result looks normal."""
+    if result["exit_code"] != 0 and result["result"] != "timeout":
+        return "crash"
+    if "assertion" in result["stderr"].lower():
+        return "assertion_failure"
+    if result["result"] == "error":
+        return "error"
+    return None
+
+# Strategy runners
+
+
+def run_random(args, rng, db, run_id):
+    anomalies = 0
+    for i in range(args.count):
+        formula = gen_random_formula(rng, rng.randint(2, 8),
+                                     rng.randint(1, 10))
+        log.debug("formula %d:\n%s", i, formula)
+        result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
+                        debug=args.debug)
+        db.log_formula(run_id, formula, result["result"])
+
+        cat = classify_result(result)
+        if cat:
+            anomalies += 1
+            db.log_finding(
+                run_id, cat,
+                f"random formula #{i}: {cat} (exit={result['exit_code']})",
+                severity="high" if cat == "crash" else "medium",
+                details={"formula_index": i,
+                         "exit_code": result["exit_code"],
+                         "stderr": result["stderr"][:500]})
+            log.warning("anomaly in formula %d: %s", i, cat)
+    return anomalies
+
+
+def run_metamorphic(args, rng, db, run_id):
+    if args.seed_file:
+        base = Path(args.seed_file).read_text()
+    else:
+        base = gen_random_formula(rng, num_vars=4, num_assertions=3)
+
+    base_out = run_z3(base, z3_bin=args.z3, timeout=args.timeout,
+                      debug=args.debug)
+    base_status = base_out["result"]
+    db.log_formula(run_id, base, base_status)
+    log.info("base formula result: %s", base_status)
+
+    if base_status not in ("sat", "unsat"):
+        db.log_finding(run_id, "skip",
+                       f"base formula not definite: {base_status}",
+                       severity="info")
+        return 0
+
+    anomalies = 0
+    for i in range(args.count):
+        variant = gen_metamorphic_variant(rng, base)
+        log.debug("variant %d:\n%s", i, variant)
+        result = run_z3(variant, z3_bin=args.z3, timeout=args.timeout,
+                        debug=args.debug)
+        db.log_formula(run_id, variant, result["result"])
+
+        cat = classify_result(result)
+        if cat:
+            anomalies += 1
+            db.log_finding(
+                run_id, cat,
+                f"metamorphic variant #{i}: {cat}",
+                severity="high",
+                details={"variant_index": i,
+                         "stderr": result["stderr"][:500]})
+            log.warning("anomaly in variant %d: %s", i, cat)
+            continue
+
+        if result["result"] in ("sat", "unsat") \
+                and result["result"] != base_status:
+            anomalies += 1
+            db.log_finding(
+                run_id, "disagreement",
+                f"variant #{i}: expected {base_status}, "
+                f"got {result['result']}",
+                severity="critical",
+                details={"variant_index": i,
+                         "expected": base_status,
+                         "actual": result["result"]})
+            log.warning("disagreement in variant %d: expected %s, got %s",
+                        i, base_status, result["result"])
+    return anomalies
+
+
+def run_cross_theory(args, rng, db, run_id):
+    theories = [t.strip() for t in args.theories.split(",")]
+    anomalies = 0
+    for i in range(args.count):
+        formula = gen_cross_theory_formula(rng, theories,
+                                           rng.randint(3, 8),
+                                           rng.randint(2, 10))
+        log.debug("cross-theory formula %d:\n%s", i, formula)
+        result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
+                        debug=args.debug)
+        db.log_formula(run_id, formula, result["result"])
+
+        cat = classify_result(result)
+        if cat:
+            anomalies += 1
+            db.log_finding(
+                run_id, cat,
+                f"cross-theory #{i} ({','.join(theories)}): {cat}",
+                severity="high" if cat == "crash" else "medium",
+                details={"formula_index": i, "theories": theories,
+                         "exit_code": result["exit_code"],
+                         "stderr": result["stderr"][:500]})
+            log.warning("anomaly in cross-theory formula %d: %s", i, cat)
+    return anomalies
+
+
+def run_incremental(args, rng, db, run_id):
+    anomalies = 0
+    for i in range(args.count):
+        num_frames = rng.randint(2, 6)
+        formula = gen_incremental_formula(rng, num_frames)
+        log.debug("incremental formula %d:\n%s", i, formula)
+        result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout,
+                        debug=args.debug)
+        db.log_formula(run_id, formula, result["result"])
+
+        cat = classify_result(result)
+        if cat:
+            anomalies += 1
+            db.log_finding(
+                run_id, cat,
+                f"incremental #{i} ({num_frames} frames): {cat}",
+                severity="high" if cat == "crash" else "medium",
+                details={"formula_index": i, "num_frames": num_frames,
+                         "exit_code": result["exit_code"],
+                         "stderr": result["stderr"][:500]})
+            log.warning("anomaly in incremental formula %d: %s", i, cat)
+    return anomalies
+
+
+STRATEGIES = {
+    "random":        run_random,
+    "metamorphic":   run_metamorphic,
+    "cross-theory":  run_cross_theory,
+    "incremental":   run_incremental,
+}
+
+# Entry point
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        prog="deeptest",
+        description="Generate and run stress tests for Z3.",
+    )
+    parser.add_argument("--strategy", choices=list(STRATEGIES),
+                        default="random",
+                        help="test generation strategy")
+    parser.add_argument("--count", type=int, default=50,
+                        help="number of formulas to generate")
+    parser.add_argument("--seed", type=int, default=None,
+                        help="random seed for reproducibility")
+    parser.add_argument("--seed-file", default=None,
+                        help="base .smt2 file for metamorphic strategy")
+    parser.add_argument("--theories", default="LIA,BV",
+                        help="comma-separated theories for cross-theory")
+    parser.add_argument("--timeout", type=int, default=10,
+                        help="per-formula Z3 timeout in seconds")
+    parser.add_argument("--z3", default=None, help="path to z3 binary")
+    parser.add_argument("--db", default=None, help="path to z3agent.db")
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    seed = args.seed if args.seed is not None else int(time.time())
+    rng = random.Random(seed)
+    log.info("seed: %d", seed)
+
+    db = Z3DB(args.db)
+    run_id = db.start_run(
+        "deeptest",
+        f"strategy={args.strategy} count={args.count} seed={seed}")
+
+    t0 = time.monotonic()
+    anomalies = STRATEGIES[args.strategy](args, rng, db, run_id)
+    elapsed_ms = int((time.monotonic() - t0) * 1000)
+
+    status = "success" if anomalies == 0 else "findings"
+    db.finish_run(run_id, status, elapsed_ms)
+
+    print(f"strategy:  {args.strategy}")
+    print(f"seed:      {seed}")
+    print(f"formulas:  {args.count}")
+    print(f"anomalies: {anomalies}")
+    print(f"elapsed:   {elapsed_ms}ms")
+
+    db.close()
+    sys.exit(1 if anomalies > 0 else 0)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/encode/SKILL.md
+++ b/.github/skills/encode/SKILL.md
@ -0,0 +1,45 @@
+---
+name: encode
+description: Translate constraint problems into SMT-LIB2 or Z3 Python API code. Handles common problem classes including scheduling, graph coloring, arithmetic puzzles, and verification conditions.
+---
+
+Given a problem description (natural language, pseudocode, or a partial formulation), produce a complete, syntactically valid SMT-LIB2 encoding or Z3 Python script. The encoding should declare all variables, assert all constraints, and include the appropriate check-sat / get-model commands.
+
+# Step 1: Identify the problem class
+
+Common encodings:
+
+| Problem class | Theory | Typical sorts |
+|---------------|--------|---------------|
+| Integer arithmetic | LIA / NIA | Int |
+| Real arithmetic | LRA / NRA | Real |
+| Bitvector operations | QF_BV | (_ BitVec N) |
+| Arrays and maps | QF_AX | (Array Int Int) |
+| Strings and regex | QF_S | String, RegLan |
+| Uninterpreted functions | QF_UF | custom sorts |
+| Mixed theories | AUFLIA, etc. | combination |
+
+# Step 2: Generate the encoding
+
+```bash
+python3 scripts/encode.py --problem "Find integers x, y such that x^2 + y^2 = 25 and x > 0" --format smtlib2
+python3 scripts/encode.py --problem "Schedule 4 tasks on 2 machines minimizing makespan" --format python
+```
+
+For `--format smtlib2`, the output is a complete .smt2 file ready for the **solve** skill.
+For `--format python`, the output is a standalone Z3 Python script.
+
+# Step 3: Validate the encoding
+
+The script checks that the generated formula is syntactically valid by running a quick `z3 -in` parse check (no solving, just syntax). Parse errors are reported with the offending line.
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| problem | string | yes | | problem description |
+| format | string | no | smtlib2 | output format: smtlib2 or python |
+| output | path | no | stdout | write to file instead of stdout |
+| validate | flag | no | on | run syntax check on the output |
+| debug | flag | no | off | verbose tracing |
+| db | path | no | .z3-agent/z3agent.db | logging database |
--- a/.github/skills/encode/scripts/encode.py
+++ b/.github/skills/encode/scripts/encode.py
@ -0,0 +1,144 @@
+#!/usr/bin/env python3
+"""
+encode.py: validate and format SMT-LIB2 encodings.
+
+Usage:
+    python encode.py --validate formula.smt2
+    python encode.py --validate formula.smt2 --debug
+"""
+
+import argparse
+import re
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, run_z3, setup_logging
+
+
+VALIDATION_TIMEOUT = 5
+
+
+def read_input(path_or_stdin: str) -> str:
+    """Read formula from a file path or stdin (when path is '-')."""
+    if path_or_stdin == "-":
+        return sys.stdin.read()
+    p = Path(path_or_stdin)
+    if not p.exists():
+        print(f"file not found: {p}", file=sys.stderr)
+        sys.exit(1)
+    return p.read_text()
+
+
+def find_errors(output: str) -> list:
+    """Extract (error ...) messages from Z3 output."""
+    return re.findall(r'\(error\s+"([^"]+)"\)', output)
+
+
+def validate(formula: str, z3_bin: str = None, debug: bool = False) -> dict:
+    """
+    Validate an SMT-LIB2 formula by piping it through z3 -in.
+    Returns a dict with 'valid' (bool), 'errors' (list), and 'raw' output.
+    """
+    result = run_z3(
+        formula, z3_bin=z3_bin, timeout=VALIDATION_TIMEOUT, debug=debug,
+    )
+    errors = find_errors(result["stdout"]) + find_errors(result["stderr"])
+
+    if result["result"] == "timeout":
+        # Timeout during validation is not a syntax error: the formula
+        # parsed successfully but solving exceeded the limit.  That counts
+        # as syntactically valid.
+        return {"valid": True, "errors": [], "raw": result}
+
+    if errors or result["exit_code"] != 0:
+        return {"valid": False, "errors": errors, "raw": result}
+
+    return {"valid": True, "errors": [], "raw": result}
+
+
+def report_errors(errors: list, formula: str):
+    """Print each syntax error with surrounding context."""
+    lines = formula.splitlines()
+    print(f"validation failed: {len(errors)} error(s)", file=sys.stderr)
+    for err in errors:
+        print(f"  : {err}", file=sys.stderr)
+    if len(lines) <= 20:
+        print("formula:", file=sys.stderr)
+        for i, line in enumerate(lines, 1):
+            print(f"  {i:3d}  {line}", file=sys.stderr)
+
+
+def write_output(formula: str, output_path: str, fmt: str):
+    """Write the validated formula to a file or stdout."""
+    if fmt == "python":
+        print("python format output is generated by the agent, "
+              "not by this script", file=sys.stderr)
+        sys.exit(1)
+
+    if output_path:
+        Path(output_path).write_text(formula)
+        print(f"written to {output_path}")
+    else:
+        print(formula)
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="encode")
+    parser.add_argument(
+        "--validate",
+        metavar="FILE",
+        help="path to .smt2 file to validate, or '-' for stdin",
+    )
+    parser.add_argument(
+        "--format",
+        choices=["smtlib2", "python"],
+        default="smtlib2",
+        help="output format (default: smtlib2)",
+    )
+    parser.add_argument(
+        "--output",
+        metavar="FILE",
+        default=None,
+        help="write result to file instead of stdout",
+    )
+    parser.add_argument("--z3", default=None, help="path to z3 binary")
+    parser.add_argument("--db", default=None)
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    if not args.validate:
+        parser.error("provide --validate FILE")
+        return
+
+    formula = read_input(args.validate)
+
+    db = Z3DB(args.db)
+    run_id = db.start_run("encode", formula)
+
+    result = validate(formula, z3_bin=args.z3, debug=args.debug)
+
+    if result["valid"]:
+        db.log_formula(run_id, formula, "valid")
+        db.finish_run(run_id, "valid", result["raw"]["duration_ms"], 0)
+        write_output(formula, args.output, args.format)
+        db.close()
+        sys.exit(0)
+    else:
+        db.log_formula(run_id, formula, "error")
+        for err in result["errors"]:
+            db.log_finding(run_id, "syntax", err, severity="error")
+        db.finish_run(
+            run_id, "error",
+            result["raw"]["duration_ms"],
+            result["raw"]["exit_code"],
+        )
+        report_errors(result["errors"], formula)
+        db.close()
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/explain/SKILL.md
+++ b/.github/skills/explain/SKILL.md
@ -0,0 +1,52 @@
+---
+name: explain
+description: Parse and interpret Z3 output for human consumption. Handles models, unsat cores, proofs, statistics, and error messages. Translates solver internals into plain-language explanations.
+---
+
+Given raw Z3 output (from the **solve**, **prove**, **optimize**, or **benchmark** skills), produce a structured explanation. This skill is for cases where the solver output is large, nested, or otherwise difficult to read directly.
+
+# Step 1: Identify the output type
+
+| Output contains | Explanation type |
+|----------------|-----------------|
+| `(define-fun ...)` blocks | model explanation |
+| unsat core labels | conflict explanation |
+| `:key value` statistics | performance breakdown |
+| `(error ...)` | error diagnosis |
+| proof terms | proof sketch |
+
+# Step 2: Run the explainer
+
+```bash
+python3 scripts/explain.py --file output.txt
+python3 scripts/explain.py --stdin < output.txt
+python3 scripts/explain.py --file output.txt --debug
+```
+
+The script auto-detects the output type and produces a structured summary.
+
+# Step 3: Interpret the explanation
+
+For models:
+- Each variable is listed with its value and sort
+- Array and function interpretations are expanded
+- Bitvector values are shown in decimal and hex
+
+For unsat cores:
+- The conflicting named assertions are listed
+- A minimal conflict set is highlighted
+
+For statistics:
+- Time breakdown by phase (preprocessing, solving, model construction)
+- Theory solver load distribution
+- Memory high-water mark
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| file | path | no | | file containing Z3 output |
+| stdin | flag | no | off | read from stdin |
+| type | string | no | auto | force output type: model, core, stats, error |
+| debug | flag | no | off | verbose tracing |
+| db | path | no | .z3-agent/z3agent.db | logging database |
--- a/.github/skills/explain/scripts/explain.py
+++ b/.github/skills/explain/scripts/explain.py
@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+"""
+explain.py: interpret Z3 output in a readable form.
+
+Usage:
+    python explain.py --file output.txt
+    echo "sat\n(model ...)" | python explain.py --stdin
+"""
+
+import argparse
+import re
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, parse_model, parse_stats, parse_unsat_core, setup_logging
+
+
+def detect_type(text: str) -> str:
+    if "(define-fun" in text:
+        return "model"
+    if "(error" in text:
+        return "error"
+    if re.search(r':\S+\s+[\d.]+', text):
+        return "stats"
+    first = text.strip().split("\n")[0].strip()
+    if first == "unsat":
+        return "core"
+    return "unknown"
+
+
+def explain_model(text: str):
+    model = parse_model(text)
+    if not model:
+        print("no model found in output")
+        return
+    print("satisfying assignment:")
+    for name, val in model.items():
+        # show hex for large integers (likely bitvectors)
+        try:
+            n = int(val)
+            if abs(n) > 255:
+                print(f"  {name} = {val} (0x{n:x})")
+            else:
+                print(f"  {name} = {val}")
+        except ValueError:
+            print(f"  {name} = {val}")
+
+
+def explain_core(text: str):
+    core = parse_unsat_core(text)
+    if core:
+        print(f"conflicting assertions ({len(core)}):")
+        for label in core:
+            print(f"  {label}")
+    else:
+        print("unsat (no named assertions for core extraction)")
+
+
+def explain_stats(text: str):
+    stats = parse_stats(text)
+    if not stats:
+        print("no statistics found")
+        return
+    print("performance breakdown:")
+    for k in sorted(stats):
+        print(f"  :{k} {stats[k]}")
+
+    if "time" in stats:
+        print(f"\ntotal time: {stats['time']}s")
+    if "memory" in stats:
+        print(f"peak memory: {stats['memory']} MB")
+
+
+def explain_error(text: str):
+    errors = re.findall(r'\(error\s+"([^"]+)"\)', text)
+    if errors:
+        print(f"Z3 reported {len(errors)} error(s):")
+        for e in errors:
+            print(f"  {e}")
+    else:
+        print("error in output but could not parse message")
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="explain")
+    parser.add_argument("--file")
+    parser.add_argument("--stdin", action="store_true")
+    parser.add_argument("--type", choices=["model", "core", "stats", "error", "auto"],
+                        default="auto")
+    parser.add_argument("--db", default=None)
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    if args.file:
+        text = Path(args.file).read_text()
+    elif args.stdin:
+        text = sys.stdin.read()
+    else:
+        parser.error("provide --file or --stdin")
+        return
+
+    output_type = args.type if args.type != "auto" else detect_type(text)
+
+    db = Z3DB(args.db)
+    run_id = db.start_run("explain", text[:200])
+
+    if output_type == "model":
+        explain_model(text)
+    elif output_type == "core":
+        explain_core(text)
+    elif output_type == "stats":
+        explain_stats(text)
+    elif output_type == "error":
+        explain_error(text)
+    else:
+        print("could not determine output type")
+        print("raw output:")
+        print(text[:500])
+
+    db.finish_run(run_id, "success", 0, 0)
+    db.close()
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/memory-safety/SKILL.md
+++ b/.github/skills/memory-safety/SKILL.md
@ -0,0 +1,53 @@
+---
+name: memory-safety
+description: Run AddressSanitizer and UndefinedBehaviorSanitizer on the Z3 test suite to detect memory errors, undefined behavior, and leaks. Logs each finding to z3agent.db.
+---
+
+Build Z3 with compiler-based sanitizer instrumentation, execute the test suite, and parse the runtime output for memory safety violations. Supported sanitizers are AddressSanitizer (heap and stack buffer overflows, use-after-free, double-free, memory leaks) and UndefinedBehaviorSanitizer (signed integer overflow, null pointer dereference, misaligned access, shift errors). Findings are deduplicated and stored in z3agent.db for triage and longitudinal tracking.
+
+# Step 1: Configure and build
+
+The script invokes cmake with the appropriate `-fsanitize` flags and builds the `test-z3` target. Each sanitizer uses a separate build directory to avoid flag conflicts. If a prior instrumented build exists with matching flags, only incremental compilation runs.
+
+```bash
+python3 scripts/memory_safety.py --sanitizer asan
+python3 scripts/memory_safety.py --sanitizer ubsan
+python3 scripts/memory_safety.py --sanitizer both
+```
+
+To reuse an existing build:
+```bash
+python3 scripts/memory_safety.py --sanitizer asan --skip-build --build-dir build/sanitizer-asan
+```
+
+# Step 2: Run and collect
+
+The test binary runs with `halt_on_error=0` so the sanitizer reports all violations rather than aborting on the first. The script parses `ERROR: AddressSanitizer`, `runtime error:`, and `ERROR: LeakSanitizer` patterns from the combined output, extracts source locations where available, and deduplicates by category, file, and line.
+
+```bash
+python3 scripts/memory_safety.py --sanitizer asan --timeout 900 --debug
+```
+
+# Step 3: Interpret results
+
+- `clean`: no sanitizer violations detected.
+- `findings`: one or more violations found. Each is printed with severity, category, message, and source location.
+- `timeout`: the test suite did not complete within the deadline. Increase the timeout or investigate a possible infinite loop.
+- `error`: build or execution failed before sanitizer output could be collected.
+
+Query past runs:
+```bash
+python3 ../../shared/z3db.py runs --skill memory-safety --last 10
+python3 ../../shared/z3db.py query "SELECT category, severity, file, line, message FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='memory-safety') ORDER BY run_id DESC LIMIT 20"
+```
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| sanitizer | choice | no | asan | which sanitizer to enable: asan, ubsan, or both |
+| build-dir | path | no | build/sanitizer-{name} | path to the build directory |
+| timeout | int | no | 600 | seconds before killing the test run |
+| skip-build | flag | no | off | reuse an existing instrumented build |
+| debug | flag | no | off | verbose cmake, make, and test output |
+| db | path | no | .z3-agent/z3agent.db | path to the logging database |
--- a/.github/skills/memory-safety/scripts/memory_safety.py
+++ b/.github/skills/memory-safety/scripts/memory_safety.py
@ -0,0 +1,266 @@
+#!/usr/bin/env python3
+"""
+memory_safety.py: run sanitizer checks on Z3 test suite.
+
+Usage:
+    python memory_safety.py --sanitizer asan
+    python memory_safety.py --sanitizer ubsan --debug
+"""
+
+import argparse
+import logging
+import os
+import re
+import subprocess
+import sys
+import time
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, setup_logging
+
+logger = logging.getLogger("z3agent")
+
+SANITIZER_FLAGS = {
+    "asan": "-fsanitize=address -fno-omit-frame-pointer",
+    "ubsan": "-fsanitize=undefined -fno-omit-frame-pointer",
+}
+
+ASAN_ERROR = re.compile(r"ERROR:\s*AddressSanitizer:\s*(\S+)")
+UBSAN_ERROR = re.compile(r":\d+:\d+:\s*runtime error:\s*(.+)")
+LEAK_ERROR = re.compile(r"ERROR:\s*LeakSanitizer:")
+LOCATION = re.compile(r"(\S+\.(?:cpp|c|h|hpp)):(\d+)")
+
+
+def find_repo_root() -> Path:
+    d = Path.cwd()
+    for _ in range(10):
+        if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
+            return d
+        parent = d.parent
+        if parent == d:
+            break
+        d = parent
+    logger.error("could not locate Z3 repository root")
+    sys.exit(1)
+
+
+def build_is_configured(build_dir: Path, sanitizer: str) -> bool:
+    """Check whether the build directory already has a matching cmake config."""
+    cache = build_dir / "CMakeCache.txt"
+    if not cache.is_file():
+        return False
+    expected = SANITIZER_FLAGS[sanitizer].split()[0]
+    return expected in cache.read_text()
+
+
+def configure(build_dir: Path, sanitizer: str, repo_root: Path) -> bool:
+    """Run cmake with the requested sanitizer flags."""
+    flags = SANITIZER_FLAGS[sanitizer]
+    build_dir.mkdir(parents=True, exist_ok=True)
+    cmd = [
+        "cmake", str(repo_root),
+        f"-DCMAKE_C_FLAGS={flags}",
+        f"-DCMAKE_CXX_FLAGS={flags}",
+        f"-DCMAKE_EXE_LINKER_FLAGS={flags}",
+        "-DCMAKE_BUILD_TYPE=Debug",
+        "-DZ3_BUILD_TEST=ON",
+    ]
+    logger.info("configuring %s build in %s", sanitizer, build_dir)
+    logger.debug("cmake command: %s", " ".join(cmd))
+    proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
+    if proc.returncode != 0:
+        logger.error("cmake failed:\n%s", proc.stderr)
+        return False
+    return True
+
+
+def compile_tests(build_dir: Path) -> bool:
+    """Compile the test-z3 target."""
+    nproc = os.cpu_count() or 4
+    cmd = ["make", f"-j{nproc}", "test-z3"]
+    logger.info("compiling test-z3 (%d parallel jobs)", nproc)
+    proc = subprocess.run(cmd, cwd=build_dir, capture_output=True, text=True)
+    if proc.returncode != 0:
+        logger.error("compilation failed:\n%s", proc.stderr[-2000:])
+        return False
+    return True
+
+
+def run_tests(build_dir: Path, timeout: int) -> dict:
+    """Execute test-z3 under sanitizer runtime and capture output."""
+    test_bin = build_dir / "test-z3"
+    if not test_bin.is_file():
+        logger.error("test-z3 not found at %s", test_bin)
+        return {"stdout": "", "stderr": "binary not found", "exit_code": -1,
+                "duration_ms": 0}
+
+    env = os.environ.copy()
+    env["ASAN_OPTIONS"] = "detect_leaks=1:halt_on_error=0:print_stacktrace=1"
+    env["UBSAN_OPTIONS"] = "print_stacktrace=1:halt_on_error=0"
+
+    cmd = [str(test_bin), "/a"]
+    logger.info("running: %s", " ".join(cmd))
+    start = time.monotonic()
+    try:
+        proc = subprocess.run(
+            cmd, capture_output=True, text=True, timeout=timeout,
+            cwd=build_dir, env=env,
+        )
+    except subprocess.TimeoutExpired:
+        ms = int((time.monotonic() - start) * 1000)
+        logger.warning("test-z3 timed out after %dms", ms)
+        return {"stdout": "", "stderr": "timeout", "exit_code": -1,
+                "duration_ms": ms}
+
+    ms = int((time.monotonic() - start) * 1000)
+    logger.debug("exit_code=%d duration=%dms", proc.returncode, ms)
+    return {
+        "stdout": proc.stdout,
+        "stderr": proc.stderr,
+        "exit_code": proc.returncode,
+        "duration_ms": ms,
+    }
+
+
+def parse_findings(output: str) -> list:
+    """Extract sanitizer error reports from combined stdout and stderr."""
+    findings = []
+    lines = output.split("\n")
+
+    for i, line in enumerate(lines):
+        entry = None
+
+        m = ASAN_ERROR.search(line)
+        if m:
+            entry = {"category": "asan", "message": m.group(1),
+                     "severity": "high"}
+
+        if not entry:
+            m = LEAK_ERROR.search(line)
+            if m:
+                entry = {"category": "leak",
+                         "message": "detected memory leaks",
+                         "severity": "high"}
+
+        if not entry:
+            m = UBSAN_ERROR.search(line)
+            if m:
+                entry = {"category": "ubsan", "message": m.group(1),
+                         "severity": "medium"}
+
+        if not entry:
+            continue
+
+        file_path, line_no = None, None
+        window = lines[max(0, i - 2):i + 5]
+        for ctx in window:
+            loc = LOCATION.search(ctx)
+            if loc and "/usr/" not in loc.group(1):
+                file_path = loc.group(1)
+                line_no = int(loc.group(2))
+                break
+
+        entry["file"] = file_path
+        entry["line"] = line_no
+        entry["raw"] = line.strip()
+        findings.append(entry)
+
+    return findings
+
+
+def deduplicate(findings: list) -> list:
+    """Remove duplicate reports at the same category, file, and line."""
+    seen = set()
+    result = []
+    for f in findings:
+        key = (f["category"], f["file"], f["line"], f["message"])
+        if key in seen:
+            continue
+        seen.add(key)
+        result.append(f)
+    return result
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="memory-safety")
+    parser.add_argument("--sanitizer", choices=["asan", "ubsan", "both"],
+                        default="asan",
+                        help="sanitizer to enable (default: asan)")
+    parser.add_argument("--build-dir", default=None,
+                        help="path to build directory")
+    parser.add_argument("--timeout", type=int, default=600,
+                        help="seconds before killing test run")
+    parser.add_argument("--skip-build", action="store_true",
+                        help="reuse existing instrumented build")
+    parser.add_argument("--db", default=None,
+                        help="path to z3agent.db")
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+    repo_root = find_repo_root()
+
+    sanitizers = ["asan", "ubsan"] if args.sanitizer == "both" else [args.sanitizer]
+    all_findings = []
+
+    db = Z3DB(args.db)
+
+    for san in sanitizers:
+        if args.build_dir:
+            build_dir = Path(args.build_dir)
+        else:
+            build_dir = repo_root / "build" / f"sanitizer-{san}"
+
+        run_id = db.start_run("memory-safety", f"sanitizer={san}")
+        db.log(f"sanitizer: {san}, build: {build_dir}", run_id=run_id)
+
+        if not args.skip_build:
+            needs_configure = not build_is_configured(build_dir, san)
+            if needs_configure and not configure(build_dir, san, repo_root):
+                db.finish_run(run_id, "error", 0, exit_code=1)
+                print(f"FAIL: cmake configuration failed for {san}")
+                continue
+            if not compile_tests(build_dir):
+                db.finish_run(run_id, "error", 0, exit_code=1)
+                print(f"FAIL: compilation failed for {san}")
+                continue
+
+        result = run_tests(build_dir, args.timeout)
+        combined = result["stdout"] + "\n" + result["stderr"]
+        findings = deduplicate(parse_findings(combined))
+
+        for f in findings:
+            db.log_finding(
+                run_id,
+                category=f["category"],
+                message=f["message"],
+                severity=f["severity"],
+                file=f["file"],
+                line=f["line"],
+                details={"raw": f["raw"]},
+            )
+
+        status = "clean" if not findings else "findings"
+        if result["exit_code"] == -1:
+            status = "timeout" if "timeout" in result["stderr"] else "error"
+
+        db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
+        all_findings.extend(findings)
+        print(f"{san}: {len(findings)} finding(s), {result['duration_ms']}ms")
+
+    if all_findings:
+        print(f"\nTotal: {len(all_findings)} finding(s)")
+        for f in all_findings:
+            loc = f"{f['file']}:{f['line']}" if f["file"] else "unknown location"
+            print(f"  [{f['severity']}] {f['category']}: {f['message']} at {loc}")
+        db.close()
+        sys.exit(1)
+    else:
+        print("\nNo sanitizer findings.")
+        db.close()
+        sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/optimize/SKILL.md
+++ b/.github/skills/optimize/SKILL.md
@ -0,0 +1,48 @@
+---
+name: optimize
+description: Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.
+---
+
+Given a set of constraints and an objective function, find the optimal value. Z3 supports both hard constraints (must hold) and soft constraints (weighted preferences), as well as lexicographic multi-objective optimization.
+
+# Step 1: Formulate the problem
+
+The formula uses the `(minimize ...)` or `(maximize ...)` directives followed by `(check-sat)` and `(get-model)`.
+
+Example: minimize `x + y` subject to `x >= 1`, `y >= 2`, `x + y <= 10`:
+```smtlib
+(declare-const x Int)
+(declare-const y Int)
+(assert (>= x 1))
+(assert (>= y 2))
+(assert (<= (+ x y) 10))
+(minimize (+ x y))
+(check-sat)
+(get-model)
+```
+
+# Step 2: Run the optimizer
+
+```bash
+python3 scripts/optimize.py --file scheduling.smt2
+python3 scripts/optimize.py --formula "<inline smt-lib2>" --debug
+```
+
+# Step 3: Interpret the output
+
+- `sat` with a model: the optimal assignment satisfying all constraints.
+- `unsat`: the constraints are contradictory; no feasible solution exists.
+- `unknown` or `timeout`: Z3 could not determine optimality.
+
+The script prints the objective value and the satisfying assignment.
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| formula | string | no | | SMT-LIB2 formula with minimize/maximize |
+| file | path | no | | path to .smt2 file |
+| timeout | int | no | 60 | seconds |
+| z3 | path | no | auto | path to z3 binary |
+| debug | flag | no | off | verbose tracing |
+| db | path | no | .z3-agent/z3agent.db | logging database |
--- a/.github/skills/optimize/scripts/optimize.py
+++ b/.github/skills/optimize/scripts/optimize.py
@ -0,0 +1,60 @@
+#!/usr/bin/env python3
+"""
+optimize.py: solve constrained optimization problems via Z3.
+
+Usage:
+    python optimize.py --file scheduling.smt2
+    python optimize.py --formula "(declare-const x Int)..." --debug
+"""
+
+import argparse
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, run_z3, parse_model, setup_logging
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="optimize")
+    parser.add_argument("--formula")
+    parser.add_argument("--file")
+    parser.add_argument("--timeout", type=int, default=60)
+    parser.add_argument("--z3", default=None)
+    parser.add_argument("--db", default=None)
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    if args.file:
+        formula = Path(args.file).read_text()
+    elif args.formula:
+        formula = args.formula
+    else:
+        parser.error("provide --formula or --file")
+        return
+
+    db = Z3DB(args.db)
+    run_id = db.start_run("optimize", formula)
+
+    result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
+
+    model = parse_model(result["stdout"]) if result["result"] == "sat" else None
+
+    db.log_formula(run_id, formula, result["result"],
+                   str(model) if model else None)
+    db.finish_run(run_id, result["result"], result["duration_ms"],
+                  result["exit_code"])
+
+    print(result["result"])
+    if model:
+        for name, val in model.items():
+            print(f"  {name} = {val}")
+
+    db.close()
+    sys.exit(0 if result["exit_code"] == 0 else 1)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/prove/SKILL.md
+++ b/.github/skills/prove/SKILL.md
@ -0,0 +1,54 @@
+---
+name: prove
+description: Prove validity of logical statements by negation and satisfiability checking. If the negation is unsatisfiable, the original statement is valid. Otherwise a counterexample is returned.
+---
+
+Given a conjecture (an SMT-LIB2 assertion or a natural language claim), determine whether it holds universally. The method is standard: negate the conjecture and check satisfiability. If the negation is unsatisfiable, the original is valid. If satisfiable, the model is a counterexample.
+
+# Step 1: Prepare the negated formula
+
+Wrap the conjecture in `(assert (not ...))` and append `(check-sat)(get-model)`.
+
+Example: to prove that `(> x 3)` implies `(> x 1)`:
+```smtlib
+(declare-const x Int)
+(assert (not (=> (> x 3) (> x 1))))
+(check-sat)
+(get-model)
+```
+
+# Step 2: Run the prover
+
+```bash
+python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
+```
+
+For file input where the file contains the full negated formula:
+```bash
+python3 scripts/prove.py --file negated.smt2
+```
+
+With debug tracing:
+```bash
+python3 scripts/prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int" --debug
+```
+
+# Step 3: Interpret the output
+
+- `valid`: the negation was unsat, so the conjecture holds for all inputs.
+- `invalid` followed by a counterexample: the negation was sat; the model shows a concrete assignment where the conjecture fails.
+- `unknown` or `timeout`: Z3 could not decide. The conjecture may require auxiliary lemmas or induction.
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| conjecture | string | no | | the assertion to prove (without negation) |
+| vars | string | no | | variable declarations as "name:sort" pairs, comma-separated |
+| file | path | no | | .smt2 file with the negated formula |
+| timeout | int | no | 30 | seconds |
+| z3 | path | no | auto | path to z3 binary |
+| debug | flag | no | off | verbose tracing |
+| db | path | no | .z3-agent/z3agent.db | logging database |
+
+Either `conjecture` (with `vars`) or `file` must be provided.
--- a/.github/skills/prove/scripts/prove.py
+++ b/.github/skills/prove/scripts/prove.py
@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+"""
+prove.py: prove validity by negation + satisfiability check.
+
+Usage:
+    python prove.py --conjecture "(=> (> x 3) (> x 1))" --vars "x:Int"
+    python prove.py --file negated.smt2
+"""
+
+import argparse
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, run_z3, parse_model, setup_logging
+
+
+def build_formula(conjecture: str, vars_str: str) -> str:
+    lines = []
+    if vars_str:
+        for v in vars_str.split(","):
+            v = v.strip()
+            name, sort = v.split(":")
+            lines.append(f"(declare-const {name.strip()} {sort.strip()})")
+    lines.append(f"(assert (not {conjecture}))")
+    lines.append("(check-sat)")
+    lines.append("(get-model)")
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="prove")
+    parser.add_argument("--conjecture", help="assertion to prove")
+    parser.add_argument("--vars", help="variable declarations, e.g. 'x:Int,y:Bool'")
+    parser.add_argument("--file", help="path to .smt2 file with negated formula")
+    parser.add_argument("--timeout", type=int, default=30)
+    parser.add_argument("--z3", default=None)
+    parser.add_argument("--db", default=None)
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    if args.file:
+        formula = Path(args.file).read_text()
+    elif args.conjecture:
+        formula = build_formula(args.conjecture, args.vars or "")
+    else:
+        parser.error("provide --conjecture or --file")
+        return
+
+    db = Z3DB(args.db)
+    run_id = db.start_run("prove", formula)
+
+    result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
+
+    if result["result"] == "unsat":
+        verdict = "valid"
+    elif result["result"] == "sat":
+        verdict = "invalid"
+    else:
+        verdict = result["result"]
+
+    model = parse_model(result["stdout"]) if verdict == "invalid" else None
+
+    db.log_formula(run_id, formula, verdict, str(model) if model else None)
+    db.finish_run(run_id, verdict, result["duration_ms"], result["exit_code"])
+
+    print(verdict)
+    if model:
+        print("counterexample:")
+        for name, val in model.items():
+            print(f"  {name} = {val}")
+
+    db.close()
+    # Exit 0 when we successfully determined validity or invalidity;
+    # exit 1 only for errors/timeouts.
+    sys.exit(0 if verdict in ("valid", "invalid") else 1)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/shared/schema.sql
+++ b/.github/skills/shared/schema.sql
@ -0,0 +1,57 @@
+-- z3agent schema v1
+
+PRAGMA journal_mode=WAL;
+PRAGMA foreign_keys=ON;
+
+CREATE TABLE IF NOT EXISTS runs (
+    run_id      INTEGER PRIMARY KEY AUTOINCREMENT,
+    skill       TEXT NOT NULL,
+    input_hash  TEXT,
+    status      TEXT NOT NULL DEFAULT 'running',
+    duration_ms INTEGER,
+    exit_code   INTEGER,
+    timestamp   TEXT NOT NULL DEFAULT (datetime('now'))
+);
+
+CREATE INDEX IF NOT EXISTS idx_runs_skill ON runs(skill);
+CREATE INDEX IF NOT EXISTS idx_runs_status ON runs(status);
+
+CREATE TABLE IF NOT EXISTS formulas (
+    formula_id  INTEGER PRIMARY KEY AUTOINCREMENT,
+    run_id      INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
+    smtlib2     TEXT NOT NULL,
+    result      TEXT,
+    model       TEXT,
+    stats       TEXT,
+    timestamp   TEXT NOT NULL DEFAULT (datetime('now'))
+);
+
+CREATE INDEX IF NOT EXISTS idx_formulas_run ON formulas(run_id);
+CREATE INDEX IF NOT EXISTS idx_formulas_result ON formulas(result);
+
+CREATE TABLE IF NOT EXISTS findings (
+    finding_id  INTEGER PRIMARY KEY AUTOINCREMENT,
+    run_id      INTEGER REFERENCES runs(run_id) ON DELETE CASCADE,
+    category    TEXT NOT NULL,
+    severity    TEXT,
+    file        TEXT,
+    line        INTEGER,
+    message     TEXT NOT NULL,
+    details     TEXT,
+    timestamp   TEXT NOT NULL DEFAULT (datetime('now'))
+);
+
+CREATE INDEX IF NOT EXISTS idx_findings_run ON findings(run_id);
+CREATE INDEX IF NOT EXISTS idx_findings_category ON findings(category);
+CREATE INDEX IF NOT EXISTS idx_findings_severity ON findings(severity);
+
+CREATE TABLE IF NOT EXISTS interaction_log (
+    log_id      INTEGER PRIMARY KEY AUTOINCREMENT,
+    run_id      INTEGER REFERENCES runs(run_id) ON DELETE SET NULL,
+    level       TEXT NOT NULL DEFAULT 'info',
+    message     TEXT NOT NULL,
+    timestamp   TEXT NOT NULL DEFAULT (datetime('now'))
+);
+
+CREATE INDEX IF NOT EXISTS idx_log_run ON interaction_log(run_id);
+CREATE INDEX IF NOT EXISTS idx_log_level ON interaction_log(level);
--- a/.github/skills/shared/z3db.py
+++ b/.github/skills/shared/z3db.py
@ -0,0 +1,328 @@
+#!/usr/bin/env python3
+"""
+z3db: shared library and CLI for Z3 skill scripts.
+
+Library usage:
+    from z3db import Z3DB, find_z3, run_z3
+
+CLI usage:
+    python z3db.py init
+    python z3db.py status
+    python z3db.py log [--run-id N]
+    python z3db.py runs [--skill solve] [--last N]
+    python z3db.py query "SELECT ..."
+"""
+
+import argparse
+import hashlib
+import json
+import logging
+import os
+import re
+import shutil
+import sqlite3
+import subprocess
+import sys
+import time
+from pathlib import Path
+from typing import Optional
+
+
+SCHEMA_PATH = Path(__file__).parent / "schema.sql"
+DEFAULT_DB_DIR = ".z3-agent"
+DEFAULT_DB_NAME = "z3agent.db"
+
+logger = logging.getLogger("z3agent")
+
+
+def setup_logging(debug: bool = False):
+    level = logging.DEBUG if debug else logging.INFO
+    fmt = "[%(levelname)s] %(message)s" if not debug else \
+          "[%(levelname)s %(asctime)s] %(message)s"
+    logging.basicConfig(level=level, format=fmt, stream=sys.stderr)
+
+
+class Z3DB:
+    """SQLite handle for z3agent.db, tracks runs, formulas, findings, logs."""
+
+    def __init__(self, db_path: Optional[str] = None):
+        if db_path is None:
+            db_dir = Path(DEFAULT_DB_DIR)
+            db_dir.mkdir(exist_ok=True)
+            db_path = str(db_dir / DEFAULT_DB_NAME)
+        self.db_path = db_path
+        self.conn = sqlite3.connect(db_path)
+        self.conn.execute("PRAGMA foreign_keys=ON")
+        self.conn.row_factory = sqlite3.Row
+        self._init_schema()
+
+    def _init_schema(self):
+        self.conn.executescript(SCHEMA_PATH.read_text())
+
+    def close(self):
+        self.conn.close()
+
+    def start_run(self, skill: str, input_text: str = "") -> int:
+        input_hash = hashlib.sha256(input_text.encode()).hexdigest()[:16]
+        cur = self.conn.execute(
+            "INSERT INTO runs (skill, input_hash) VALUES (?, ?)",
+            (skill, input_hash),
+        )
+        self.conn.commit()
+        run_id = cur.lastrowid
+        logger.debug("started run %d (skill=%s, hash=%s)", run_id, skill, input_hash)
+        return run_id
+
+    def finish_run(self, run_id: int, status: str, duration_ms: int,
+                   exit_code: int = 0):
+        self.conn.execute(
+            "UPDATE runs SET status=?, duration_ms=?, exit_code=? WHERE run_id=?",
+            (status, duration_ms, exit_code, run_id),
+        )
+        self.conn.commit()
+        logger.debug("finished run %d: %s (%dms)", run_id, status, duration_ms)
+
+    def log_formula(self, run_id: int, smtlib2: str, result: str = None,
+                    model: str = None, stats: dict = None) -> int:
+        cur = self.conn.execute(
+            "INSERT INTO formulas (run_id, smtlib2, result, model, stats) "
+            "VALUES (?, ?, ?, ?, ?)",
+            (run_id, smtlib2, result, model,
+             json.dumps(stats) if stats else None),
+        )
+        self.conn.commit()
+        return cur.lastrowid
+
+    def log_finding(self, run_id: int, category: str, message: str,
+                    severity: str = None, file: str = None,
+                    line: int = None, details: dict = None) -> int:
+        cur = self.conn.execute(
+            "INSERT INTO findings (run_id, category, severity, file, line, "
+            "message, details) VALUES (?, ?, ?, ?, ?, ?, ?)",
+            (run_id, category, severity, file, line, message,
+             json.dumps(details) if details else None),
+        )
+        self.conn.commit()
+        return cur.lastrowid
+
+    def log(self, message: str, level: str = "info", run_id: int = None):
+        """Write to stderr and to the interaction_log table."""
+        getattr(logger, level, logger.info)(message)
+        self.conn.execute(
+            "INSERT INTO interaction_log (run_id, level, message) "
+            "VALUES (?, ?, ?)",
+            (run_id, level, message),
+        )
+        self.conn.commit()
+
+    def get_runs(self, skill: str = None, last: int = 10):
+        sql = "SELECT * FROM runs"
+        params = []
+        if skill:
+            sql += " WHERE skill = ?"
+            params.append(skill)
+        sql += " ORDER BY run_id DESC LIMIT ?"
+        params.append(last)
+        return self.conn.execute(sql, params).fetchall()
+
+    def get_status(self) -> dict:
+        rows = self.conn.execute(
+            "SELECT status, COUNT(*) as cnt FROM runs GROUP BY status"
+        ).fetchall()
+        total = sum(r["cnt"] for r in rows)
+        by_status = {r["status"]: r["cnt"] for r in rows}
+        last = self.conn.execute(
+            "SELECT timestamp FROM runs ORDER BY run_id DESC LIMIT 1"
+        ).fetchone()
+        return {
+            "total": total,
+            **by_status,
+            "last_run": last["timestamp"] if last else None,
+        }
+
+    def get_logs(self, run_id: int = None, last: int = 50):
+        if run_id:
+            return self.conn.execute(
+                "SELECT * FROM interaction_log WHERE run_id=? "
+                "ORDER BY log_id DESC LIMIT ?", (run_id, last)
+            ).fetchall()
+        return self.conn.execute(
+            "SELECT * FROM interaction_log ORDER BY log_id DESC LIMIT ?",
+            (last,)
+        ).fetchall()
+
+    def query(self, sql: str):
+        return self.conn.execute(sql).fetchall()
+
+
+def find_z3(hint: str = None) -> str:
+    """Locate the z3 binary: explicit path > build dirs > PATH."""
+    candidates = []
+    if hint:
+        candidates.append(hint)
+
+    repo_root = _find_repo_root()
+    if repo_root:
+        for build_dir in ["build", "build/release", "build/debug"]:
+            candidates.append(str(repo_root / build_dir / "z3"))
+
+    path_z3 = shutil.which("z3")
+    if path_z3:
+        candidates.append(path_z3)
+
+    for c in candidates:
+        p = Path(c)
+        if p.is_file() and os.access(p, os.X_OK):
+            logger.debug("found z3: %s", p)
+            return str(p)
+
+    logger.error("z3 binary not found. Searched: %s", candidates)
+    sys.exit(1)
+
+
+def _find_repo_root() -> Optional[Path]:
+    d = Path.cwd()
+    for _ in range(10):
+        if (d / "CMakeLists.txt").exists() and (d / "src").is_dir():
+            return d
+        parent = d.parent
+        if parent == d:
+            break
+        d = parent
+    return None
+
+
+def run_z3(formula: str, z3_bin: str = None, timeout: int = 30,
+           args: list = None, debug: bool = False) -> dict:
+    """Pipe an SMT-LIB2 formula into z3 -in, return parsed output."""
+    z3_path = find_z3(z3_bin)
+    cmd = [z3_path, "-in"] + (args or [])
+
+    logger.debug("cmd: %s", " ".join(cmd))
+    logger.debug("stdin:\n%s", formula)
+
+    start = time.monotonic()
+    try:
+        proc = subprocess.run(
+            cmd, input=formula, capture_output=True, text=True,
+            timeout=timeout,
+        )
+    except subprocess.TimeoutExpired:
+        duration_ms = int((time.monotonic() - start) * 1000)
+        logger.warning("z3 timed out after %dms", duration_ms)
+        return {
+            "stdout": "", "stderr": "timeout", "exit_code": -1,
+            "duration_ms": duration_ms, "result": "timeout",
+        }
+
+    duration_ms = int((time.monotonic() - start) * 1000)
+
+    logger.debug("exit_code=%d duration=%dms", proc.returncode, duration_ms)
+    logger.debug("stdout:\n%s", proc.stdout)
+    if proc.stderr:
+        logger.debug("stderr:\n%s", proc.stderr)
+
+    first_line = proc.stdout.strip().split("\n")[0].strip() if proc.stdout else ""
+    result = first_line if first_line in ("sat", "unsat", "unknown") else "error"
+
+    return {
+        "stdout": proc.stdout,
+        "stderr": proc.stderr,
+        "exit_code": proc.returncode,
+        "duration_ms": duration_ms,
+        "result": result,
+    }
+
+
+def parse_model(stdout: str) -> Optional[dict]:
+    """Pull define-fun entries from a (get-model) response."""
+    model = {}
+    for m in re.finditer(
+        r'\(define-fun\s+(\S+)\s+\(\)\s+\S+\s+(.+?)\)', stdout
+    ):
+        model[m.group(1)] = m.group(2).strip()
+    return model if model else None
+
+
+def parse_stats(stdout: str) -> Optional[dict]:
+    """Parse :key value pairs from z3 -st output."""
+    stats = {}
+    for m in re.finditer(r':(\S+)\s+([\d.]+)', stdout):
+        key, val = m.group(1), m.group(2)
+        stats[key] = float(val) if '.' in val else int(val)
+    return stats if stats else None
+
+
+def parse_unsat_core(stdout: str) -> Optional[list]:
+    for line in stdout.strip().split("\n"):
+        line = line.strip()
+        if line.startswith("(") and not line.startswith("(error"):
+            labels = line.strip("()").split()
+            if labels:
+                return labels
+    return None
+
+
+def cli():
+    parser = argparse.ArgumentParser(
+        description="Z3 Agent database CLI",
+        prog="z3db",
+    )
+    parser.add_argument("--db", default=None, help="path to z3agent.db")
+    parser.add_argument("--debug", action="store_true", help="verbose output")
+
+    sub = parser.add_subparsers(dest="command")
+
+    sub.add_parser("init", help="initialize the database")
+
+    status_p = sub.add_parser("status", help="show run summary")
+
+    log_p = sub.add_parser("log", help="show interaction log")
+    log_p.add_argument("--run-id", type=int, help="filter by run ID")
+    log_p.add_argument("--last", type=int, default=50)
+
+    runs_p = sub.add_parser("runs", help="list runs")
+    runs_p.add_argument("--skill", help="filter by skill name")
+    runs_p.add_argument("--last", type=int, default=10)
+
+    query_p = sub.add_parser("query", help="run raw SQL")
+    query_p.add_argument("sql", help="SQL query string")
+
+    args = parser.parse_args()
+    setup_logging(args.debug)
+
+    db = Z3DB(args.db)
+
+    if args.command == "init":
+        print(f"Database initialized at {db.db_path}")
+
+    elif args.command == "status":
+        s = db.get_status()
+        print(f"Runs: {s['total']}"
+              f" | success: {s.get('success', 0)}"
+              f" | error: {s.get('error', 0)}"
+              f" | timeout: {s.get('timeout', 0)}"
+              f" | Last: {s['last_run'] or 'never'}")
+
+    elif args.command == "log":
+        for row in db.get_logs(args.run_id, args.last):
+            print(f"[{row['level']}] {row['timestamp']} "
+                  f"(run {row['run_id']}): {row['message']}")
+
+    elif args.command == "runs":
+        for row in db.get_runs(args.skill, args.last):
+            print(f"#{row['run_id']} {row['skill']} {row['status']} "
+                  f"{row['duration_ms']}ms @ {row['timestamp']}")
+
+    elif args.command == "query":
+        for row in db.query(args.sql):
+            print(dict(row))
+
+    else:
+        parser.print_help()
+
+    db.close()
+
+
+if __name__ == "__main__":
+    cli()
--- a/.github/skills/simplify/SKILL.md
+++ b/.github/skills/simplify/SKILL.md
@ -0,0 +1,48 @@
+---
+name: simplify
+description: Reduce formula complexity using Z3 tactic chains. Supports configurable tactic pipelines for boolean, arithmetic, and bitvector simplification.
+---
+
+Given a formula, apply a sequence of Z3 tactics to produce an equivalent but simpler form. This is useful for understanding what Z3 sees after preprocessing, debugging tactic selection, and reducing formula size before solving.
+
+# Step 1: Choose tactics
+
+Z3 provides dozens of tactics. Common ones:
+
+| Tactic | What it does |
+|--------|-------------|
+| simplify | constant folding, algebraic identities |
+| propagate-values | substitute known equalities |
+| ctx-simplify | context-dependent simplification |
+| elim-uncnstr | remove unconstrained variables |
+| solve-eqs | Gaussian elimination |
+| bit-blast | reduce bitvectors to booleans |
+| tseitin-cnf | convert to CNF |
+| aig | and-inverter graph reduction |
+
+# Step 2: Run simplification
+
+```bash
+python3 scripts/simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
+python3 scripts/simplify.py --file formula.smt2 --tactics "simplify,propagate-values,ctx-simplify"
+python3 scripts/simplify.py --file formula.smt2 --debug
+```
+
+Without `--tactics`, the script applies the default chain: `simplify`, `propagate-values`, `ctx-simplify`.
+
+# Step 3: Interpret the output
+
+The script prints the simplified formula in SMT-LIB2 syntax. Subgoals are printed as separate `(assert ...)` blocks.
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| formula | string | no | | SMT-LIB2 formula to simplify |
+| vars | string | no | | variable declarations as "name:sort" pairs |
+| file | path | no | | path to .smt2 file |
+| tactics | string | no | simplify,propagate-values,ctx-simplify | comma-separated tactic names |
+| timeout | int | no | 30 | seconds |
+| z3 | path | no | auto | path to z3 binary |
+| debug | flag | no | off | verbose tracing |
+| db | path | no | .z3-agent/z3agent.db | logging database |
--- a/.github/skills/simplify/scripts/simplify.py
+++ b/.github/skills/simplify/scripts/simplify.py
@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+"""
+simplify.py: apply Z3 tactics to simplify an SMT-LIB2 formula.
+
+Usage:
+    python simplify.py --formula "(assert (and (> x 0) (> x 0)))" --vars "x:Int"
+    python simplify.py --file formula.smt2 --tactics "simplify,solve-eqs"
+"""
+
+import argparse
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, run_z3, setup_logging
+
+
+DEFAULT_TACTICS = "simplify,propagate-values,ctx-simplify"
+
+
+def build_tactic_formula(base_formula: str, tactics: str) -> str:
+    tactic_list = [t.strip() for t in tactics.split(",")]
+    if len(tactic_list) == 1:
+        tactic_expr = f"(then {tactic_list[0]} skip)"
+    else:
+        tactic_expr = "(then " + " ".join(tactic_list) + ")"
+    return base_formula + f"\n(apply {tactic_expr})\n"
+
+
+def build_formula_from_parts(formula_str: str, vars_str: str) -> str:
+    lines = []
+    if vars_str:
+        for v in vars_str.split(","):
+            v = v.strip()
+            name, sort = v.split(":")
+            lines.append(f"(declare-const {name.strip()} {sort.strip()})")
+    lines.append(formula_str)
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="simplify")
+    parser.add_argument("--formula")
+    parser.add_argument("--vars")
+    parser.add_argument("--file")
+    parser.add_argument("--tactics", default=DEFAULT_TACTICS)
+    parser.add_argument("--timeout", type=int, default=30)
+    parser.add_argument("--z3", default=None)
+    parser.add_argument("--db", default=None)
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    if args.file:
+        base = Path(args.file).read_text()
+    elif args.formula:
+        base = build_formula_from_parts(args.formula, args.vars or "")
+    else:
+        parser.error("provide --formula or --file")
+        return
+
+    formula = build_tactic_formula(base, args.tactics)
+
+    db = Z3DB(args.db)
+    run_id = db.start_run("simplify", formula)
+
+    result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
+
+    status = "success" if result["exit_code"] == 0 else "error"
+    db.log_formula(run_id, formula, status)
+    db.finish_run(run_id, status, result["duration_ms"], result["exit_code"])
+
+    print(result["stdout"])
+    if result["stderr"] and result["exit_code"] != 0:
+        print(result["stderr"], file=sys.stderr)
+
+    db.close()
+    sys.exit(0 if result["exit_code"] == 0 else 1)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/solve/SKILL.md
+++ b/.github/skills/solve/SKILL.md
@ -0,0 +1,50 @@
+---
+name: solve
+description: Check satisfiability of SMT-LIB2 formulas using Z3. Returns sat/unsat with models or unsat cores. Logs every invocation to z3agent.db for auditability.
+---
+
+Given an SMT-LIB2 formula (or a set of constraints described in natural language), determine whether the formula is satisfiable. If sat, extract a satisfying assignment. If unsat and tracking labels are present, extract the unsat core.
+
+# Step 1: Prepare the formula
+
+If the input is already valid SMT-LIB2, use it directly. If it is a natural language description, use the **encode** skill first to produce SMT-LIB2.
+
+The formula must include `(check-sat)` at the end. Append `(get-model)` for satisfiable queries or `(get-unsat-core)` when named assertions are used.
+
+# Step 2: Run Z3
+
+```bash
+python3 scripts/solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
+```
+
+For file input:
+```bash
+python3 scripts/solve.py --file problem.smt2
+```
+
+With debug tracing:
+```bash
+python3 scripts/solve.py --file problem.smt2 --debug
+```
+
+The script pipes the formula to `z3 -in` via subprocess (no shell expansion), logs the run to `.z3-agent/z3agent.db`, and prints the result.
+
+# Step 3: Interpret the output
+
+- `sat` followed by a model: the formula is satisfiable; the model assigns concrete values to each declared constant.
+- `unsat`: no assignment exists. If `(get-unsat-core)` was used, the conflicting named assertions are listed.
+- `unknown`: Z3 could not decide within the timeout. Consider increasing the timeout or simplifying the formula.
+- `timeout`: the process was killed after the deadline. Try the **simplify** skill to reduce complexity.
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| formula | string | no | | SMT-LIB2 formula as a string |
+| file | path | no | | path to an .smt2 file |
+| timeout | int | no | 30 | seconds before killing z3 |
+| z3 | path | no | auto | explicit path to z3 binary |
+| debug | flag | no | off | print z3 command, stdin, stdout, stderr, timing |
+| db | path | no | .z3-agent/z3agent.db | path to the logging database |
+
+Either `formula` or `file` must be provided.
--- a/.github/skills/solve/scripts/solve.py
+++ b/.github/skills/solve/scripts/solve.py
@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+"""
+solve.py: check satisfiability of an SMT-LIB2 formula via Z3.
+
+Usage:
+    python solve.py --formula "(declare-const x Int)(assert (> x 0))(check-sat)(get-model)"
+    python solve.py --file problem.smt2
+    python solve.py --file problem.smt2 --debug --timeout 60
+"""
+
+import argparse
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, run_z3, parse_model, parse_unsat_core, setup_logging
+
+
+def main():
+    parser = argparse.ArgumentParser(prog="solve")
+    parser.add_argument("--formula", help="SMT-LIB2 formula string")
+    parser.add_argument("--file", help="path to .smt2 file")
+    parser.add_argument("--timeout", type=int, default=30)
+    parser.add_argument("--z3", default=None, help="path to z3 binary")
+    parser.add_argument("--db", default=None)
+    parser.add_argument("--debug", action="store_true")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    if args.file:
+        formula = Path(args.file).read_text()
+    elif args.formula:
+        formula = args.formula
+    else:
+        parser.error("provide --formula or --file")
+        return
+
+    db = Z3DB(args.db)
+    run_id = db.start_run("solve", formula)
+
+    result = run_z3(formula, z3_bin=args.z3, timeout=args.timeout, debug=args.debug)
+
+    model = parse_model(result["stdout"]) if result["result"] == "sat" else None
+    core = parse_unsat_core(result["stdout"]) if result["result"] == "unsat" else None
+
+    db.log_formula(run_id, formula, result["result"],
+                   str(model) if model else None)
+    db.finish_run(run_id, result["result"], result["duration_ms"],
+                  result["exit_code"])
+
+    print(result["result"])
+    if model:
+        for name, val in model.items():
+            print(f"  {name} = {val}")
+    if core:
+        print("unsat core:", " ".join(core))
+    if result["stderr"] and result["result"] == "error":
+        print(result["stderr"], file=sys.stderr)
+
+    db.close()
+    sys.exit(0 if result["exit_code"] == 0 else 1)
+
+
+if __name__ == "__main__":
+    main()
--- a/.github/skills/static-analysis/SKILL.md
+++ b/.github/skills/static-analysis/SKILL.md
@ -0,0 +1,46 @@
+---
+name: static-analysis
+description: Run Clang Static Analyzer (scan-build) on Z3 source and log structured findings to z3agent.db.
+---
+
+Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plist diagnostics, and record each finding with file, line, category, and description. This skill wraps scan-build into a reproducible, logged workflow suitable for regular analysis sweeps and regression tracking.
+
+# Step 1: Run the analysis
+
+```bash
+python3 scripts/static_analysis.py --build-dir build
+python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug
+python3 scripts/static_analysis.py --build-dir build --timeout 1800
+```
+
+The script invokes `scan-build cmake ..` followed by `scan-build make` inside the specified build directory. Clang checker output is written to `--output-dir` (defaults to a `scan-results` subdirectory of the build directory).
+
+# Step 2: Interpret the output
+
+Each finding is printed with its source location, category, and description:
+
+```
+[Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read
+[Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer
+```
+
+A summary table groups findings by category so that high-frequency classes are visible at a glance.
+
+# Step 3: Review historical findings
+
+All findings are logged to `z3agent.db`. Query them to track trends:
+
+```bash
+python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"
+python3 ../../shared/z3db.py runs --skill static-analysis --last 10
+```
+
+# Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| build-dir | path | yes | | path to the CMake build directory |
+| output-dir | path | no | BUILD/scan-results | directory for scan-build output |
+| timeout | int | no | 1200 | seconds allowed for the full build |
+| db | path | no | .z3-agent/z3agent.db | logging database |
+| debug | flag | no | off | verbose tracing |
--- a/.github/skills/static-analysis/scripts/static_analysis.py
+++ b/.github/skills/static-analysis/scripts/static_analysis.py
@ -0,0 +1,255 @@
+#!/usr/bin/env python3
+"""
+static_analysis.py: run Clang Static Analyzer on Z3 source.
+
+Usage:
+    python static_analysis.py --build-dir build
+    python static_analysis.py --build-dir build --output-dir /tmp/sa-results
+    python static_analysis.py --build-dir build --debug
+"""
+
+import argparse
+import logging
+import os
+import plistlib
+import shutil
+import subprocess
+import sys
+import time
+from collections import Counter
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent / "shared"))
+from z3db import Z3DB, setup_logging
+
+logger = logging.getLogger("z3agent")
+
+SCAN_BUILD_NAMES = ["scan-build", "scan-build-14", "scan-build-15", "scan-build-16"]
+
+
+def find_scan_build() -> str:
+    """Locate the scan-build binary on PATH."""
+    for name in SCAN_BUILD_NAMES:
+        path = shutil.which(name)
+        if path:
+            logger.debug("found scan-build: %s", path)
+            return path
+    logger.error(
+        "scan-build not found. Install clang-tools or set PATH. "
+        "Searched: %s", ", ".join(SCAN_BUILD_NAMES)
+    )
+    sys.exit(1)
+
+
+def run_configure(scan_build: str, build_dir: Path, output_dir: Path,
+                  timeout: int) -> bool:
+    """Run scan-build cmake to configure the project."""
+    repo_root = build_dir.parent
+    cmd = [
+        scan_build,
+        "-o", str(output_dir),
+        "cmake",
+        str(repo_root),
+    ]
+    logger.info("configuring: %s", " ".join(cmd))
+    try:
+        proc = subprocess.run(
+            cmd, cwd=str(build_dir),
+            capture_output=True, text=True, timeout=timeout,
+        )
+    except subprocess.TimeoutExpired:
+        logger.error("cmake configuration timed out after %ds", timeout)
+        return False
+
+    if proc.returncode != 0:
+        logger.error("cmake configuration failed (exit %d)", proc.returncode)
+        logger.error("stderr: %s", proc.stderr[:2000])
+        return False
+
+    logger.info("configuration complete")
+    return True
+
+
+def run_build(scan_build: str, build_dir: Path, output_dir: Path,
+              timeout: int) -> bool:
+    """Run scan-build make to build and analyze."""
+    nproc = os.cpu_count() or 4
+    cmd = [
+        scan_build,
+        "-o", str(output_dir),
+        "--status-bugs",
+        "make",
+        f"-j{nproc}",
+    ]
+    logger.info("building with analysis: %s", " ".join(cmd))
+    try:
+        proc = subprocess.run(
+            cmd, cwd=str(build_dir),
+            capture_output=True, text=True, timeout=timeout,
+        )
+    except subprocess.TimeoutExpired:
+        logger.error("build timed out after %ds", timeout)
+        return False
+
+    # scan-build returns nonzero when bugs are found (due to --status-bugs),
+    # so a nonzero exit code is not necessarily a build failure.
+    if proc.returncode != 0:
+        logger.info(
+            "scan-build exited with code %d (nonzero may indicate findings)",
+            proc.returncode,
+        )
+    else:
+        logger.info("build complete, no bugs reported by scan-build")
+
+    if proc.stderr:
+        logger.debug("build stderr (last 2000 chars): %s", proc.stderr[-2000:])
+    return True
+
+
+def collect_plist_files(output_dir: Path) -> list:
+    """Recursively find all .plist diagnostic files under the output directory."""
+    plists = sorted(output_dir.rglob("*.plist"))
+    logger.debug("found %d plist files in %s", len(plists), output_dir)
+    return plists
+
+
+def parse_plist_findings(plist_path: Path) -> list:
+    """Extract findings from a single Clang plist diagnostic file.
+
+    Returns a list of dicts with keys: file, line, col, category, type, description.
+    """
+    findings = []
+    try:
+        with open(plist_path, "rb") as f:
+            data = plistlib.load(f)
+    except Exception as exc:
+        logger.warning("could not parse %s: %s", plist_path, exc)
+        return findings
+
+    source_files = data.get("files", [])
+    for diag in data.get("diagnostics", []):
+        location = diag.get("location", {})
+        file_idx = location.get("file", 0)
+        source_file = source_files[file_idx] if file_idx < len(source_files) else "<unknown>"
+        findings.append({
+            "file": source_file,
+            "line": location.get("line", 0),
+            "col": location.get("col", 0),
+            "category": diag.get("category", "uncategorized"),
+            "type": diag.get("type", ""),
+            "description": diag.get("description", ""),
+        })
+    return findings
+
+
+def collect_all_findings(output_dir: Path) -> list:
+    """Parse every plist file under output_dir and return merged findings."""
+    all_findings = []
+    for plist_path in collect_plist_files(output_dir):
+        all_findings.extend(parse_plist_findings(plist_path))
+    return all_findings
+
+
+def log_findings(db, run_id: int, findings: list):
+    """Persist each finding to z3agent.db."""
+    for f in findings:
+        db.log_finding(
+            run_id,
+            category=f["category"],
+            message=f["description"],
+            severity=f.get("type"),
+            file=f["file"],
+            line=f["line"],
+            details={"col": f["col"], "type": f["type"]},
+        )
+
+
+def print_findings(findings: list):
+    """Print individual findings and a category summary."""
+    if not findings:
+        print("No findings reported.")
+        return
+
+    for f in findings:
+        label = f["category"]
+        if f["type"]:
+            label = f["type"]
+        print(f"[{label}] {f['file']}:{f['line']}: {f['description']}")
+
+    print()
+    counts = Counter(f["category"] for f in findings)
+    print(f"Total findings: {len(findings)}")
+    print("By category:")
+    for cat, cnt in counts.most_common():
+        print(f"  {cat}: {cnt}")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        prog="static_analysis",
+        description="Run Clang Static Analyzer on Z3 and log findings.",
+    )
+    parser.add_argument(
+        "--build-dir", required=True,
+        help="path to the CMake build directory",
+    )
+    parser.add_argument(
+        "--output-dir", default=None,
+        help="directory for scan-build results (default: BUILD/scan-results)",
+    )
+    parser.add_argument(
+        "--timeout", type=int, default=1200,
+        help="seconds allowed for the full analysis build",
+    )
+    parser.add_argument("--db", default=None, help="path to z3agent.db")
+    parser.add_argument("--debug", action="store_true", help="verbose tracing")
+    args = parser.parse_args()
+
+    setup_logging(args.debug)
+
+    scan_build = find_scan_build()
+
+    build_dir = Path(args.build_dir).resolve()
+    build_dir.mkdir(parents=True, exist_ok=True)
+
+    output_dir = Path(args.output_dir) if args.output_dir else build_dir / "scan-results"
+    output_dir = output_dir.resolve()
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    db = Z3DB(args.db)
+    run_id = db.start_run("static-analysis", f"build_dir={build_dir}")
+    start = time.monotonic()
+
+    if not run_configure(scan_build, build_dir, output_dir, timeout=args.timeout):
+        elapsed = int((time.monotonic() - start) * 1000)
+        db.finish_run(run_id, "error", elapsed, exit_code=1)
+        db.close()
+        sys.exit(1)
+
+    if not run_build(scan_build, build_dir, output_dir, timeout=args.timeout):
+        elapsed = int((time.monotonic() - start) * 1000)
+        db.finish_run(run_id, "error", elapsed, exit_code=1)
+        db.close()
+        sys.exit(1)
+
+    elapsed = int((time.monotonic() - start) * 1000)
+
+    findings = collect_all_findings(output_dir)
+    log_findings(db, run_id, findings)
+
+    status = "clean" if len(findings) == 0 else "findings"
+    db.finish_run(run_id, status, elapsed, exit_code=0)
+
+    db.log(
+        f"static analysis complete: {len(findings)} finding(s) in {elapsed}ms",
+        run_id=run_id,
+    )
+
+    print_findings(findings)
+
+    db.close()
+    sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()