Add action/expectation/result structure to all skill definitions

Each step in every SKILL.md now carries labeled Action, Expectation, and Result blocks so the agent can mechanically execute, verify, and branch at each stage. Format chosen after comparing three variants (indented blocks, inline keywords, tables) on a prove-validity simulation; indented blocks scored highest on routing completeness and checkability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-16 07:05:35 +00:00 · 2026-03-11 19:51:59 +00:00 · 2026-03-11 19:51:59 +00:00 · 9d674404c8
commit 9d674404c8
parent d349b93d1d
10 changed files with 364 additions and 48 deletions
--- a/.github/skills/static-analysis/SKILL.md
+++ b/.github/skills/static-analysis/SKILL.md
@ -7,28 +7,63 @@ Run the Clang Static Analyzer over a CMake build of Z3, parse the resulting plis

 # Step 1: Run the analysis

+Action:
+    Invoke the script pointing at the CMake build directory. The script
+    runs `scan-build cmake ..` followed by `scan-build make` and writes
+    checker output to the output directory.
+
+Expectation:
+    scan-build completes within the timeout, producing plist diagnostic
+    files in the output directory (defaults to a `scan-results` subdirectory
+    of the build directory).
+
+Result:
+    On success: diagnostics are parsed and findings are printed. Proceed to
+    Step 2.
+    On failure: verify that clang and scan-build are installed and that the
+    build directory contains a valid CMake configuration.
+
 ```bash
 python3 scripts/static_analysis.py --build-dir build
 python3 scripts/static_analysis.py --build-dir build --output-dir /tmp/sa-results --debug
 python3 scripts/static_analysis.py --build-dir build --timeout 1800
 ```

-The script invokes `scan-build cmake ..` followed by `scan-build make` inside the specified build directory. Clang checker output is written to `--output-dir` (defaults to a `scan-results` subdirectory of the build directory).
-
 # Step 2: Interpret the output

-Each finding is printed with its source location, category, and description:
+Action:
+    Review the printed findings and the summary table grouped by category.
+
+Expectation:
+    Each finding shows its source location, category, and description.
+    The summary table ranks categories by frequency for quick triage.
+
+Result:
+    On zero findings: the codebase passes all enabled static checks.
+    On findings: prioritize by category frequency and severity. Address
+    null dereferences and use-after-free classes first.
+
+Example output:

 ```
 [Dead store] src/ast/ast.cpp:142: Value stored to 'result' is never read
 [Null dereference] src/smt/theory_lra.cpp:87: Access to field 'next' results in a dereference of a null pointer
 ```

-A summary table groups findings by category so that high-frequency classes are visible at a glance.
-
 # Step 3: Review historical findings

-All findings are logged to `z3agent.db`. Query them to track trends:
+Action:
+    Query z3agent.db to compare current results against prior analysis
+    runs.
+
+Expectation:
+    Queries return category counts and run history, enabling regression
+    detection across commits.
+
+Result:
+    On stable or decreasing counts: no regressions introduced.
+    On increased counts: cross-reference new findings with recent commits
+    to identify the responsible change.

 ```bash
 python3 ../../shared/z3db.py query "SELECT category, COUNT(*) as cnt FROM findings WHERE run_id IN (SELECT run_id FROM runs WHERE skill='static-analysis') GROUP BY category ORDER BY cnt DESC"