3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-06-04 16:10:50 +00:00

ci(nightly): always-on issue-oracle smoke test (~2 min, never fails the build) (#9688)

Adds a tightly-bounded issue-oracle smoke test as a sibling of the
existing `test_benchmarks.py` step in the nightly's `ubuntu-build` job.
The step always runs as part of every nightly, can never fail the build,
and completes in ~2 min.

## Why

`Z3Prover/bench` ships a per-issue regression corpus
(`inputs/issues/iss-N/`) plus a runner
(`scripts/issues_check_oracle.py`) that diffs current z3 output against
captured `<stem>.expected.out` byte streams. Wiring that into the
nightly gives us a daily smoke signal that detects regressions on
benchmarks distilled from real z3 issues — without requiring any z3
contributor to ever touch the bench repo.

## What

A two-step block added right after the existing `Clone z3test` + `Test`
steps in `ubuntu-build`:

1. **Clone bench (sparse, ~800 MB of ~12 GB total)**
`git clone --depth 1 --filter=blob:none --sparse
https://github.com/Z3Prover/bench bench` then `sparse-checkout set
scripts inputs/issues`.

2. **Run issue-oracle smoke test (~2 min)**
   ```yaml
   continue-on-error: true
   run: |
     timeout 90 python bench/scripts/issues_check_oracle.py \
       --z3 build-dist/z3 \
       --all bench/inputs/issues \
       --max 200 --timeout 5 --wallclock 60 \
       --jobs 0 --quiet \
       --json-report issue-oracle-report.json
   ```

The JSON report is then uploaded as a workflow artifact
(`issue-oracle-report`, 7-day retention) for inspection.

### Wall-clock bounds (defense in depth)

| Bound | Where | Purpose |
|---|---|---|
| `--max 200` | issues_check_oracle CLI | walk only first 200 of ~2,700
`iss-*` dirs (alphabetic; stable across nightlies) |
| `--timeout 5` | issues_check_oracle CLI | per-file z3 cap |
| `--wallclock 60` | issues_check_oracle CLI | hard global cap inside
the script |
| `timeout 90` | shell wrapper | belt-and-braces backstop, leaves 30 s
headroom for the script to flush its JSON report before SIGTERM |
| `continue-on-error: true` | step gate | absorbs every failure mode
(missing z3, sparse-clone failure, outer timeout firing, etc.) so the
smoke test can **never** red the nightly build |

### Scope

Only `ubuntu-build` and only one place in `nightly.yml`. The push/PR
lanes (`ci.yml`, `Windows.yml`) and the other scheduled/dispatch lanes
(`coverage.yml`, `memory-safety.yml`, `nightly-validation.yml`,
`release.yml`, `wip.yml`, `daily-test-improver`) are intentionally left
untouched so this gate runs exactly once per night.

## Local verification

On Mac (16 cores, capped to 8 jobs by `--jobs 0` resolving to `min(jobs,
cores)`):
```
[issues_check_oracle] 368 file-check(s) | timeout=5s | wallclock=60s
=== summary ===
  total: 368   ok: 286   DIFF: 4 (per-file timeouts)   skipped: 78
  elapsed: 8.3s / 60s
exit code: 0
```
GHA Ubuntu (4 cores → 4 jobs) extrapolation: ~17 s typical, well under
all wall-clock caps.

### Adversarial cases (all leave the workflow green via step-level
`continue-on-error: true`)

| Failure mode | Result |
|---|---|
| z3 binary missing | each per-file run records `exec-error`, script
summary-exits 0 → green |
| Sparse clone fails (previous step's continue-on-error absorbs it) |
oracle finds no `bench/` → script `sys.exit(1)` → step's
continue-on-error absorbs → green |
| Wallclock fires | script writes report with `wallclock_hit: true`,
exits 0 → green |
| Outer `timeout 90` fires | SIGTERM → bash exits 124 → step's
continue-on-error absorbs → green |

## Companion bench-repo PR

The data side of this (per-bench sidecar schema, `bug-K.json` +
`<stem>.expected.out`, oracle rewrite) lands in `Z3Prover/bench` as PR
[#2503](https://github.com/Z3Prover/bench/pull/2503). The nightly step
here depends on that PR's `scripts/issues_check_oracle.py` and the
migrated corpus. Both PRs should be merged together; bench can also
merge first (the script handles a missing corpus gracefully).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Lev Nachmanson 2026-06-04 00:35:43 +00:00 committed by GitHub
parent b2401b87db
commit 400fe313d9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -244,6 +244,62 @@ jobs:
- name: Test
run: python z3test/scripts/test_benchmarks.py build-dist/z3 z3test/regressions/smt2
- name: Clone bench (for issue-oracle smoke test)
continue-on-error: true
run: |
# Sparse clone: bench is ~12 GB total but only scripts/ and
# inputs/issues/ (~800 MB) are needed by the oracle. The
# smoke-test runs on a sampled subset of iss-* dirs (see
# next step), so even this 800 MB pull is upper-bounded
# at the clone level rather than runtime.
git clone --depth 1 --filter=blob:none --sparse \
https://github.com/Z3Prover/bench bench
git -C bench sparse-checkout set scripts inputs/issues
- name: Run issue-oracle smoke test
# continue-on-error: true means this step can NEVER red the
# nightly build. Its job is to produce a JSON report artifact,
# not to gate the build. The bounds below (--max 200, --timeout
# 5, --wallclock 20, outer timeout 90) keep the step under ~2
# minutes regardless of corpus growth.
continue-on-error: true
run: |
# SMOKE-TEST budget (sampled subset of the corpus):
# --max 200 first 200 of ~2,700 iss-* dirs
# (sorted, so the same sample every run
# and easy to diff across nightlies)
# --timeout 5 per-file z3 cap (matches capture-time
# timeout/4, so any bench that escapes
# this cap was already on the edge)
# --wallclock 20 hard global cap INSIDE the script
# --quiet suppress per-issue progress lines
# timeout 90 shell-level belt-and-braces wrapper,
# leaves 30 s headroom for the script
# to flush its JSON report before SIGTERM
# The script ALWAYS exits 0 on normal operation, so the only
# ways this step can non-zero are: missing z3 binary, sparse
# clone failure (the previous step's continue-on-error
# absorbs that), or the outer `timeout` firing. All three
# are absorbed by this step's continue-on-error: true.
timeout 90 python bench/scripts/issues_check_oracle.py \
--z3 build-dist/z3 \
--all bench/inputs/issues \
--max 200 \
--timeout 5 \
--wallclock 20 \
--jobs 0 \
--quiet \
--json-report issue-oracle-report.json
- name: Upload issue-oracle report
if: always()
continue-on-error: true
uses: actions/upload-artifact@v7.0.1
with:
name: issue-oracle-report
path: issue-oracle-report.json
retention-days: 7
- name: Upload artifact
uses: actions/upload-artifact@v7.0.1