mirrors/z3

mirror of https://github.com/Z3Prover/z3 synced 2026-03-21 20:18:50 +00:00

Angelica Moreira 9d674404c8 Add action/expectation/result structure to all skill definitions

Each step in every SKILL.md now carries labeled Action, Expectation,
and Result blocks so the agent can mechanically execute, verify, and
branch at each stage. Format chosen after comparing three variants
(indented blocks, inline keywords, tables) on a prove-validity
simulation; indented blocks scored highest on routing completeness
and checkability.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-03-11 19:51:59 +00:00

2.7 KiB

Raw Blame History

name	description
benchmark	Measure Z3 performance on a formula or file. Collects wall-clock time, theory solver statistics, memory usage, and conflict counts. Results are logged to z3agent.db for longitudinal tracking.

Given an SMT-LIB2 formula or file, run Z3 with statistics enabled and report performance characteristics. This is useful for identifying performance regressions, comparing tactic strategies, and profiling theory solver workload distribution.

Step 1: Run Z3 with statistics

Action: Invoke benchmark.py with the formula or file. Use --runs N for repeated timing.

Expectation: The script invokes z3 -st, parses the statistics block, and prints a performance summary. A run entry is logged to z3agent.db.

Result: Timing and statistics are displayed. Proceed to Step 2 to interpret.

python3 scripts/benchmark.py --file problem.smt2
python3 scripts/benchmark.py --file problem.smt2 --runs 5
python3 scripts/benchmark.py --formula "(declare-const x Int)..." --debug

Step 2: Interpret the output

Action: Review wall-clock time, memory usage, conflict counts, and per-theory breakdowns.

Expectation: A complete performance profile including min/median/max timing when multiple runs are requested.

Result: If performance is acceptable, no action needed. If slow, try simplify to reduce the formula or adjust tactic strategies.

The output includes:

wall-clock time (ms)
result (sat/unsat/unknown/timeout)
memory usage (MB)
conflicts, decisions, propagations
per-theory breakdown (arithmetic, bv, array, etc.)

With --runs N, the script runs Z3 N times and reports min/median/max timing.

Step 3: Compare over time

Action: Query past benchmark runs from z3agent.db to detect regressions or improvements.

Expectation: Historical run data is available for comparison, ordered by recency.

Result: If performance regressed, investigate recent formula or tactic changes. If improved, record the successful configuration.

python3 ../../shared/z3db.py runs --skill benchmark --last 20
python3 ../../shared/z3db.py query "SELECT smtlib2, result, stats FROM formulas WHERE run_id IN (SELECT run_id FROM runs WHERE skill='benchmark') ORDER BY run_id DESC LIMIT 5"

Parameters

Parameter	Type	Required	Default	Description
formula	string	no		SMT-LIB2 formula
file	path	no		path to .smt2 file
runs	int	no	1	number of repeated runs for timing
timeout	int	no	60	seconds per run
z3	path	no	auto	path to z3 binary
debug	flag	no	off	verbose tracing
db	path	no	.z3-agent/z3agent.db	logging database

2.7 KiB Raw Blame History

Step 1: Run Z3 with statistics

Step 2: Interpret the output

Step 3: Compare over time

Parameters

2.7 KiB

Raw Blame History