3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-03-17 10:33:48 +00:00

Add action/expectation/result structure to all skill definitions

Each step in every SKILL.md now carries labeled Action, Expectation,
and Result blocks so the agent can mechanically execute, verify, and
branch at each stage. Format chosen after comparing three variants
(indented blocks, inline keywords, tables) on a prove-validity
simulation; indented blocks scored highest on routing completeness
and checkability.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Angelica Moreira 2026-03-11 19:51:59 +00:00
parent d349b93d1d
commit 9d674404c8
10 changed files with 364 additions and 48 deletions

View file

@ -7,28 +7,63 @@ Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plis
# Step 1: Run the analysis
Action:
Invoke the script pointing at the CMake build directory. The script
runs `scan-build cmake ..` followed by `scan-build make` and writes
checker output to the output directory.
Expectation:
scan-build completes within the timeout, producing plist diagnostic
files in the output directory (defaults to a `scan-results` subdirectory
of the build directory).
Result:
On success: diagnostics are parsed and findings are printed. Proceed to
Step 2.
On failure: verify that clang and scan-build are installed and that the
build directory contains a valid CMake configuration.
```bash
python3 scripts/static_analysis.py --build-dir build
python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug
python3 scripts/static_analysis.py --build-dir build --timeout 1800
```
The script invokes `scan-build cmake ..` followed by `scan-build make` inside the specified build directory. Clang checker output is written to `--output-dir` (defaults to a `scan-results` subdirectory of the build directory).
# Step 2: Interpret the output
Each finding is printed with its source location, category, and description:
Action:
Review the printed findings and the summary table grouped by category.
Expectation:
Each finding shows its source location, category, and description.
The summary table ranks categories by frequency for quick triage.
Result:
On zero findings: the codebase passes all enabled static checks.
On findings: prioritize by category frequency and severity. Address
null dereferences and use-after-free classes first.
Example output:
```
[Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read
[Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer
```
A summary table groups findings by category so that high-frequency classes are visible at a glance.
# Step 3: Review historical findings
All findings are logged to `z3agent.db`. Query them to track trends:
Action:
Query z3agent.db to compare current results against prior analysis
runs.
Expectation:
Queries return category counts and run history, enabling regression
detection across commits.
Result:
On stable or decreasing counts: no regressions introduced.
On increased counts: cross-reference new findings with recent commits
to identify the responsible change.
```bash
python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"