3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-06-05 08:30:50 +00:00
z3/.github/workflows
Lev Nachmanson 400fe313d9
ci(nightly): always-on issue-oracle smoke test (~2 min, never fails the build) (#9688)
Adds a tightly-bounded issue-oracle smoke test as a sibling of the
existing `test_benchmarks.py` step in the nightly's `ubuntu-build` job.
The step always runs as part of every nightly, can never fail the build,
and completes in ~2 min.

## Why

`Z3Prover/bench` ships a per-issue regression corpus
(`inputs/issues/iss-N/`) plus a runner
(`scripts/issues_check_oracle.py`) that diffs current z3 output against
captured `<stem>.expected.out` byte streams. Wiring that into the
nightly gives us a daily smoke signal that detects regressions on
benchmarks distilled from real z3 issues — without requiring any z3
contributor to ever touch the bench repo.

## What

A two-step block added right after the existing `Clone z3test` + `Test`
steps in `ubuntu-build`:

1. **Clone bench (sparse, ~800 MB of ~12 GB total)**
`git clone --depth 1 --filter=blob:none --sparse
https://github.com/Z3Prover/bench bench` then `sparse-checkout set
scripts inputs/issues`.

2. **Run issue-oracle smoke test (~2 min)**
   ```yaml
   continue-on-error: true
   run: |
     timeout 90 python bench/scripts/issues_check_oracle.py \
       --z3 build-dist/z3 \
       --all bench/inputs/issues \
       --max 200 --timeout 5 --wallclock 60 \
       --jobs 0 --quiet \
       --json-report issue-oracle-report.json
   ```

The JSON report is then uploaded as a workflow artifact
(`issue-oracle-report`, 7-day retention) for inspection.

### Wall-clock bounds (defense in depth)

| Bound | Where | Purpose |
|---|---|---|
| `--max 200` | issues_check_oracle CLI | walk only first 200 of ~2,700
`iss-*` dirs (alphabetic; stable across nightlies) |
| `--timeout 5` | issues_check_oracle CLI | per-file z3 cap |
| `--wallclock 60` | issues_check_oracle CLI | hard global cap inside
the script |
| `timeout 90` | shell wrapper | belt-and-braces backstop, leaves 30 s
headroom for the script to flush its JSON report before SIGTERM |
| `continue-on-error: true` | step gate | absorbs every failure mode
(missing z3, sparse-clone failure, outer timeout firing, etc.) so the
smoke test can **never** red the nightly build |

### Scope

Only `ubuntu-build` and only one place in `nightly.yml`. The push/PR
lanes (`ci.yml`, `Windows.yml`) and the other scheduled/dispatch lanes
(`coverage.yml`, `memory-safety.yml`, `nightly-validation.yml`,
`release.yml`, `wip.yml`, `daily-test-improver`) are intentionally left
untouched so this gate runs exactly once per night.

## Local verification

On Mac (16 cores, capped to 8 jobs by `--jobs 0` resolving to `min(jobs,
cores)`):
```
[issues_check_oracle] 368 file-check(s) | timeout=5s | wallclock=60s
=== summary ===
  total: 368   ok: 286   DIFF: 4 (per-file timeouts)   skipped: 78
  elapsed: 8.3s / 60s
exit code: 0
```
GHA Ubuntu (4 cores → 4 jobs) extrapolation: ~17 s typical, well under
all wall-clock caps.

### Adversarial cases (all leave the workflow green via step-level
`continue-on-error: true`)

| Failure mode | Result |
|---|---|
| z3 binary missing | each per-file run records `exec-error`, script
summary-exits 0 → green |
| Sparse clone fails (previous step's continue-on-error absorbs it) |
oracle finds no `bench/` → script `sys.exit(1)` → step's
continue-on-error absorbs → green |
| Wallclock fires | script writes report with `wallclock_hit: true`,
exits 0 → green |
| Outer `timeout 90` fires | SIGTERM → bash exits 124 → step's
continue-on-error absorbs → green |

## Companion bench-repo PR

The data side of this (per-bench sidecar schema, `bug-K.json` +
`<stem>.expected.out`, oracle rewrite) lands in `Z3Prover/bench` as PR
[#2503](https://github.com/Z3Prover/bench/pull/2503). The nightly step
here depends on that PR's `scripts/issues_check_oracle.py` and the
migrated corpus. Both PRs should be merged together; bench can also
merge first (the script handles a missing corpus gracefully).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-03 17:35:43 -07:00
..
shared Upgrade agentic workflows to gh-aw v0.36.0 (#8122) 2026-01-08 11:50:35 -08:00
a3-python.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
a3-python.md Add noop report-as-issue: false to all agentic workflow frontmatter 2026-03-12 20:01:30 +00:00
academic-citation-tracker.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
academic-citation-tracker.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
agentics-maintenance.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
android-build.yml Bump actions/upload-artifact from 7.0.0 to 7.0.1 (#9300) 2026-04-19 16:51:02 +02:00
api-coherence-checker.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
api-coherence-checker.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
build-warning-fixer.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
build-warning-fixer.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
build-z3-cache.yml Bump actions/cache from 5.0.4 to 5.0.5 (#9299) 2026-04-19 15:57:29 +02:00
ci.yml Make manylinux Python selection dynamic in CI and release workflows (#9502) 2026-05-12 12:42:04 -04:00
code-conventions-analyzer.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
code-conventions-analyzer.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
code-simplifier.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
code-simplifier.md Add noop report-as-issue: false to code-simplifier workflow (#9397) 2026-04-26 18:28:30 +02:00
compare-stats-anomaly-reporter.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
compare-stats-anomaly-reporter.md Update compare-stats anomaly reporter to read benchmark stats from /z3/ (#9650) 2026-05-27 09:57:20 -07:00
copilot-setup-steps.yml update aw to current version 2026-01-08 18:15:03 +00:00
coverage.yml Bump actions/upload-artifact from 7.0.0 to 7.0.1 (#9300) 2026-04-19 16:51:02 +02:00
cross-build.yml Bump actions/checkout from 5.0.1 to 6.0.2 (#9018) 2026-03-16 15:52:35 -07:00
csa-analysis.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
csa-analysis.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
docs.yml Bump actions/upload-artifact from 7.0.0 to 7.0.1 (#9300) 2026-04-19 16:51:02 +02:00
issue-backlog-processor.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
issue-backlog-processor.md Fix Issue Backlog Processor: prevent context exhaustion by batching and requiring safe output (#9272) 2026-04-11 10:21:01 -07:00
mark-prs-ready-for-review.yml Bump actions/github-script from 8.0.0 to 9.0.0 (#9296) 2026-04-19 16:49:03 +02:00
memory-safety-report.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
memory-safety-report.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
memory-safety.yml Bump actions/upload-artifact from 7.0.0 to 7.0.1 (#9300) 2026-04-19 16:51:02 +02:00
msvc-static-build-clang-cl.yml Bump actions/checkout from 5.0.1 to 6.0.2 (#9018) 2026-03-16 15:52:35 -07:00
msvc-static-build.yml Bump actions/checkout from 5.0.1 to 6.0.2 (#9018) 2026-03-16 15:52:35 -07:00
nightly-validation.yml Add riscv64 wheel builds to nightly and release PyPI publishing (#9153) 2026-03-28 15:26:59 -07:00
nightly.yml ci(nightly): always-on issue-oracle smoke test (~2 min, never fails the build) (#9688) 2026-06-03 17:35:43 -07:00
nuget-build.yml Bump nuget/setup-nuget from 3 to 4 (#9350) 2026-04-21 19:26:55 +02:00
ocaml.yaml Bump actions/cache from 5.0.4 to 5.0.5 (#9299) 2026-04-19 15:57:29 +02:00
ostrich-benchmark.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
ostrich-benchmark.md fix(ostrich-benchmark): add safeoutputs keepalive noop calls before long benchmark run (#9313) 2026-04-16 03:22:33 +02:00
pyodide.yml Bump actions/checkout from 5.0.1 to 6.0.2 (#9018) 2026-03-16 15:52:35 -07:00
qf-s-benchmark.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
qf-s-benchmark.md fix(qf-s-benchmark): add safeoutputs keepalive noop after build, reduce cap 500→300 (#9290) 2026-04-12 18:26:55 -07:00
release-notes-updater.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
release-notes-updater.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
release.yml CI: validate libz3.dylib architecture on macOS to prevent #9662 regression (#9669) 2026-05-29 16:00:36 -07:00
smtlib-benchmark-finder.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
smtlib-benchmark-finder.md Add smtlib-benchmark-finder agentic workflow (#9629) 2026-05-26 15:28:11 -07:00
specbot-crash-analyzer.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
specbot-crash-analyzer.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
tactic-to-simplifier.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
tactic-to-simplifier.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
tptp-benchmark.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
tptp-benchmark.md Add weekly TPTP front-end benchmark workflow (#9523) 2026-05-13 06:05:08 -04:00
wasm-release.yml Bump mymindstorm/setup-emsdk from 15 to 16 (#9297) 2026-04-19 15:57:13 +02:00
wasm.yml Bump mymindstorm/setup-emsdk from 15 to 16 (#9297) 2026-04-19 15:57:13 +02:00
Windows.yml Bump microsoft/setup-msbuild from 2 to 3 (#9109) 2026-03-23 16:33:25 -07:00
wip.yml Bump actions/checkout from 5.0.1 to 6.0.2 (#9018) 2026-03-16 15:52:35 -07:00
workflow-suggestion-agent.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
workflow-suggestion-agent.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00
zipt-code-reviewer.lock.yml Bump github/gh-aw-actions from 0.76.1 to 0.77.0 (#9661) 2026-06-01 16:01:32 -07:00
zipt-code-reviewer.md Fix agentic workflow compilation errors (gh-aw v0.68 compat) (#9275) 2026-04-11 10:19:45 -07:00