mirrors/z3

Fork 0

mirror of https://github.com/Z3Prover/z3 synced 2026-03-15 09:39:59 +00:00

Angelica Moreira f120cc6903 add per-skill @z3 usage examples to agent readme

2026-03-12 00:16:06 +00:00

10 KiB

Raw Permalink Blame History

Z3 Agent

A Copilot agent for the Z3 theorem prover. It wraps 9 skills that cover SMT solving and code quality analysis.

What it does

The agent handles two kinds of requests:

SMT solving: formulate constraints, check satisfiability, prove properties, optimize objectives, simplify expressions.
Code quality: run sanitizers (ASan, UBSan) and Clang Static Analyzer against the Z3 codebase to catch memory bugs and logic errors.

Prerequisites

You need a built Z3 binary. The scripts look for it in this order:

Explicit --z3 path/to/z3
build/z3, build/release/z3, build/debug/z3 (relative to repo root)
z3 on your PATH

For code quality skills you also need:

memory-safety: cmake, make, and a compiler with sanitizer support (gcc or clang). The script checks at startup and tells you what is missing.
static-analysis: scan-build (part of clang-tools). Same early check with install instructions if absent.

Using the agent in Copilot Chat

Mention @z3 and describe what you want in plain language. The agent figures out which skill to use, builds the formula if needed, runs Z3, and gives you the result.

solve: check satisfiability

@z3 is (x > 0 and y > 0 and x + y < 5) satisfiable over the integers?

Expected response: sat with a model like x = 1, y = 1.

@z3 can x + y = 10 and x - y = 4 both hold at the same time?

Expected response: sat with x = 7, y = 3.

@z3 is there an integer x where x > 0, x < 0?

Expected response: unsat (no such integer exists).

prove: check if something is always true

@z3 prove that x * x >= 0 for all integers x

Expected response: valid (the negation is unsatisfiable, so the property holds).

@z3 is it true that (a + b)^2 >= 0 for all real a and b?

Expected response: valid.

@z3 prove that if x > y and y > z then x > z, for integers

Expected response: valid (transitivity of >).

optimize: find the best value

@z3 maximize 3x + 2y where x >= 1, y >= 1, and x + y <= 20

Expected response: sat with x = 19, y = 1 (objective = 59).

@z3 minimize x + y where x >= 5, y >= 3, and x + y >= 10

Expected response: sat with x = 5, y = 5 or similar (objective = 10).

simplify: reduce an expression

@z3 simplify x + 0 + 1*x

Expected response: 2*x.

@z3 simplify (a and true) or (a and false)

Expected response: a.

encode: translate a problem to SMT-LIB2

@z3 encode this as SMT-LIB2: find integers x and y where x + y = 10 and x > y

Expected response: the SMT-LIB2 formula:

(declare-const x Int)
(declare-const y Int)
(assert (= (+ x y) 10))
(assert (> x y))
(check-sat)
(get-model)

explain: interpret Z3 output

@z3 what does this Z3 output mean?
sat
(
  (define-fun x () Int 7)
  (define-fun y () Int 3)
)

Expected response: a readable summary like "satisfying assignment: x = 7, y = 3".

@z3 Z3 returned unknown, what does that mean?

Expected response: an explanation of common causes (timeout, incomplete theory, quantifiers).

benchmark: measure performance

@z3 how fast can Z3 solve (x > 0 and y > 0 and x + y < 100)? run it 5 times

Expected response: timing stats like min/median/max in milliseconds and the result.

memory-safety: find memory bugs in Z3

@z3 run AddressSanitizer on the Z3 test suite

Expected response: builds Z3 with ASan, runs the tests, reports any findings with category, file, and line number. If clean, reports no findings.

@z3 check for undefined behavior in Z3

Expected response: runs UBSan, same format.

@z3 run both sanitizers

Expected response: runs ASan and UBSan, aggregates findings from both.

static-analysis: find bugs without running the code

@z3 run static analysis on the Z3 source

Expected response: runs Clang Static Analyzer, reports findings grouped by category (null dereference, dead store, memory leak, etc.) with file and line.

multi-skill: the agent chains skills when needed

@z3 prove that for all integers, if x^2 is even then x is even

The agent uses encode to formalize and negate the statement, then prove to check it, then explain to present the result.

@z3 full verification pass before the release

The agent runs memory-safety (ASan + UBSan) and static-analysis in parallel, then aggregates and deduplicates findings sorted by severity.

Using the scripts directly

Every skill lives under .github/skills/<name>/scripts/. All scripts accept --debug for full tracing and --db path to specify where the SQLite log goes (defaults to z3agent.db in the current directory).

solve

Check whether a set of constraints has a solution.

python3 .github/skills/solve/scripts/solve.py \
  --z3 build/release/z3 \
  --formula '
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 5))
(check-sat)
(get-model)'

Output:

sat
  x = 1
  y = 1

prove

Check whether a property holds for all values. The script negates your conjecture and asks Z3 if the negation is satisfiable. If it is not, the property is valid.

python3 .github/skills/prove/scripts/prove.py \
  --z3 build/release/z3 \
  --conjecture '(>= (* x x) 0)' \
  --vars 'x:Int'

Output:

valid

If Z3 finds a counterexample, it prints invalid followed by the counterexample values.

optimize

Find the best value of an objective subject to constraints.

python3 .github/skills/optimize/scripts/optimize.py \
  --z3 build/release/z3 \
  --formula '
(declare-const x Int)
(declare-const y Int)
(assert (>= x 1))
(assert (>= y 1))
(assert (<= (+ x y) 20))
(maximize (+ (* 3 x) (* 2 y)))
(check-sat)
(get-model)'

Output:

sat
  x = 19
  y = 1

Here Z3 maximizes 3x + 2y under the constraint x + y <= 20, so it pushes x as high as possible (19) and keeps y at its minimum (1), giving 3*19 + 2*1 = 59.

simplify

Reduce expressions using Z3 tactic chains.

python3 .github/skills/simplify/scripts/simplify.py \
  --z3 build/release/z3 \
  --formula '(declare-const x Int)(simplify (+ x 0 (* 1 x)))'

Output:

(* 2 x)
(goals
(goal
  :precision precise :depth 1)
)

Z3 simplified x + 0 + 1*x down to 2*x.

benchmark

Measure solving time over multiple runs.

python3 .github/skills/benchmark/scripts/benchmark.py \
  --z3 build/release/z3 \
  --runs 5 \
  --formula '
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 100))
(check-sat)'

Output (times will vary on your machine):

runs: 5
min: 27ms
median: 28ms
max: 30ms
result: sat

explain

Interpret Z3 output in readable form. It reads from stdin:

echo 'sat
(
  (define-fun x () Int
    19)
  (define-fun y () Int
    1)
)' | python3 .github/skills/explain/scripts/explain.py --stdin --type model

Output:

satisfying assignment:
  x = 19
  y = 1

encode

Validate that an SMT-LIB2 file is well-formed by running it through Z3:

python3 .github/skills/encode/scripts/encode.py \
  --z3 build/release/z3 \
  --validate problem.smt2

If the file parses and runs without errors, it prints the formula back. If there are syntax or sort errors, it prints the Z3 error message.

memory-safety

Build Z3 with AddressSanitizer or UndefinedBehaviorSanitizer, run the test suite, and collect any findings.

python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer asan
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer ubsan
python3 .github/skills/memory-safety/scripts/memory_safety.py --sanitizer both

Use --skip-build to reuse a previous instrumented build. Use --build-dir path to control where the build goes (defaults to build/sanitizer-asan or build/sanitizer-ubsan under the repo root).

If cmake, make, or a C compiler is not found, the script prints what you need to install and exits.

static-analysis

Run Clang Static Analyzer over the Z3 source tree.

python3 .github/skills/static-analysis/scripts/static_analysis.py \
  --build-dir build/scan

Results go to build/scan/scan-results/ by default. Findings are printed grouped by category with file and line number.

If scan-build is not on your PATH, the script prints install instructions for Ubuntu, macOS, and Fedora.

Debug tracing

Add --debug to any command to see the full trace: run IDs, z3 binary path, the exact command and stdin sent to Z3, stdout/stderr received, timing, and database logging. Example:

python3 .github/skills/solve/scripts/solve.py \
  --z3 build/release/z3 --debug \
  --formula '
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 5))
(check-sat)
(get-model)'

[DEBUG] started run 31 (skill=solve, hash=d64beb5a61842362)
[DEBUG] found z3: build/release/z3
[DEBUG] cmd: build/release/z3 -in
[DEBUG] stdin:
(declare-const x Int)
(declare-const y Int)
(assert (> x 0))
(assert (> y 0))
(assert (< (+ x y) 5))
(check-sat)
(get-model)
[DEBUG] exit_code=0 duration=28ms
[DEBUG] stdout:
sat
(
  (define-fun x () Int
    1)
  (define-fun y () Int
    1)
)
[DEBUG] finished run 31: sat (28ms)
sat
  x = 1
  y = 1

Logging

Every run is logged to a SQLite database (z3agent.db by default). You can query it directly:

sqlite3 z3agent.db "SELECT id, skill, status, duration_ms FROM runs ORDER BY id DESC LIMIT 10;"

Use --db /path/to/file.db on any script to put the database somewhere else.

Skill list

Skill	What it does
solve	check satisfiability, extract models or unsat cores
prove	prove validity by negating and checking unsatisfiability
optimize	minimize or maximize objectives under constraints
simplify	reduce formulas with Z3 tactic chains
encode	translate problems into SMT-LIB2, validate syntax
explain	interpret Z3 output (models, cores, stats, errors)
benchmark	measure solving time, collect statistics
memory-safety	run ASan/UBSan on Z3 test suite
static-analysis	run Clang Static Analyzer on Z3 source

10 KiB Raw Permalink Blame History

Z3 Agent

What it does

Prerequisites

Using the agent in Copilot Chat

solve: check satisfiability

prove: check if something is always true

optimize: find the best value

simplify: reduce an expression

encode: translate a problem to SMT-LIB2

explain: interpret Z3 output

benchmark: measure performance

memory-safety: find memory bugs in Z3

static-analysis: find bugs without running the code

multi-skill: the agent chains skills when needed

Using the scripts directly

solve

prove

optimize

simplify

benchmark

explain

encode

memory-safety

static-analysis

Debug tracing

Logging

Skill list

10 KiB

Raw Permalink Blame History