Adds a tightly-bounded issue-oracle smoke test as a sibling of the
existing `test_benchmarks.py` step in the nightly's `ubuntu-build` job.
The step always runs as part of every nightly, can never fail the build,
and completes in ~2 min.
## Why
`Z3Prover/bench` ships a per-issue regression corpus
(`inputs/issues/iss-N/`) plus a runner
(`scripts/issues_check_oracle.py`) that diffs current z3 output against
captured `<stem>.expected.out` byte streams. Wiring that into the
nightly gives us a daily smoke signal that detects regressions on
benchmarks distilled from real z3 issues — without requiring any z3
contributor to ever touch the bench repo.
## What
A two-step block added right after the existing `Clone z3test` + `Test`
steps in `ubuntu-build`:
1. **Clone bench (sparse, ~800 MB of ~12 GB total)**
`git clone --depth 1 --filter=blob:none --sparse
https://github.com/Z3Prover/bench bench` then `sparse-checkout set
scripts inputs/issues`.
2. **Run issue-oracle smoke test (~2 min)**
```yaml
continue-on-error: true
run: |
timeout 90 python bench/scripts/issues_check_oracle.py \
--z3 build-dist/z3 \
--all bench/inputs/issues \
--max 200 --timeout 5 --wallclock 60 \
--jobs 0 --quiet \
--json-report issue-oracle-report.json
```
The JSON report is then uploaded as a workflow artifact
(`issue-oracle-report`, 7-day retention) for inspection.
### Wall-clock bounds (defense in depth)
| Bound | Where | Purpose |
|---|---|---|
| `--max 200` | issues_check_oracle CLI | walk only first 200 of ~2,700
`iss-*` dirs (alphabetic; stable across nightlies) |
| `--timeout 5` | issues_check_oracle CLI | per-file z3 cap |
| `--wallclock 60` | issues_check_oracle CLI | hard global cap inside
the script |
| `timeout 90` | shell wrapper | belt-and-braces backstop, leaves 30 s
headroom for the script to flush its JSON report before SIGTERM |
| `continue-on-error: true` | step gate | absorbs every failure mode
(missing z3, sparse-clone failure, outer timeout firing, etc.) so the
smoke test can **never** red the nightly build |
### Scope
Only `ubuntu-build` and only one place in `nightly.yml`. The push/PR
lanes (`ci.yml`, `Windows.yml`) and the other scheduled/dispatch lanes
(`coverage.yml`, `memory-safety.yml`, `nightly-validation.yml`,
`release.yml`, `wip.yml`, `daily-test-improver`) are intentionally left
untouched so this gate runs exactly once per night.
## Local verification
On Mac (16 cores, capped to 8 jobs by `--jobs 0` resolving to `min(jobs,
cores)`):
```
[issues_check_oracle] 368 file-check(s) | timeout=5s | wallclock=60s
=== summary ===
total: 368 ok: 286 DIFF: 4 (per-file timeouts) skipped: 78
elapsed: 8.3s / 60s
exit code: 0
```
GHA Ubuntu (4 cores → 4 jobs) extrapolation: ~17 s typical, well under
all wall-clock caps.
### Adversarial cases (all leave the workflow green via step-level
`continue-on-error: true`)
| Failure mode | Result |
|---|---|
| z3 binary missing | each per-file run records `exec-error`, script
summary-exits 0 → green |
| Sparse clone fails (previous step's continue-on-error absorbs it) |
oracle finds no `bench/` → script `sys.exit(1)` → step's
continue-on-error absorbs → green |
| Wallclock fires | script writes report with `wallclock_hit: true`,
exits 0 → green |
| Outer `timeout 90` fires | SIGTERM → bash exits 124 → step's
continue-on-error absorbs → green |
## Companion bench-repo PR
The data side of this (per-bench sidecar schema, `bug-K.json` +
`<stem>.expected.out`, oracle rewrite) lands in `Z3Prover/bench` as PR
[#2503](https://github.com/Z3Prover/bench/pull/2503). The nightly step
here depends on that PR's `scripts/issues_check_oracle.py` and the
migrated corpus. Both PRs should be merged together; bench can also
merge first (the script handles a missing corpus gracefully).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bumps [github/gh-aw-actions](https://github.com/github/gh-aw-actions)
from 0.76.1 to 0.77.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/gh-aw-actions/releases">github/gh-aw-actions's
releases</a>.</em></p>
<blockquote>
<h2>v0.77.0</h2>
<p>Sync of actions from <a
href="https://github.com/github/gh-aw">gh-aw</a> at
<code>v0.77.0</code>.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b11be78086"><code>b11be78</code></a>
chore: sync actions from gh-aw@v0.77.0 (<a
href="https://redirect.github.com/github/gh-aw-actions/issues/122">#122</a>)</li>
<li>See full diff in <a
href="https://github.com/github/gh-aw-actions/compare/v0.76.1...v0.77.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This updates the compare-stats anomaly reporter to retrieve benchmark
data from `http://mtzguido.tplinkdns.com:8081/z3/` instead of the old
`compare_stats.html` endpoint. The workflow prompt and generated lock
file now consistently reference the root benchmark stats URL.
- **Workflow source**
- Repoint the benchmark source URL from `.../compare_stats.html` to
`.../z3/`
- Update prompt text to describe the source as benchmark statistics
rather than a specific HTML file
- Rename the temporary fetched artifact from `compare_stats.html` to
`benchmark_stats.html` for consistency
- **Generated workflow**
- Regenerate `compare-stats-anomaly-reporter.lock.yml` so the compiled
workflow matches the markdown source
- Refresh the embedded workflow description and prompt payload to
reference the new endpoint
- **Report output**
- Update the discussion template’s source link to point at the root
benchmark stats page
```md
Source URL:
`http://mtzguido.tplinkdns.com:8081/z3/`
curl -fsSL --max-time 60 "http://mtzguido.tplinkdns.com:8081/z3/" \
-o /tmp/gh-aw/agent/benchmark_stats.html
```
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This adds an agentic workflow that analyzes `compare_stats.html` over a
rolling 30-hour window and publishes a GitHub Discussion summarizing
bugs, crashes, and anomalies. It explicitly captures unknown-outlier
patterns where a benchmark is `unknown` while peers in the same set are
mostly `sat`/`unsat`/`timeout`.
- **Workflow added**
- Introduces `.github/workflows/compare-stats-anomaly-reporter.md` (plus
compiled `.lock.yml`).
- Supports `workflow_dispatch` and scheduled execution.
- Uses safe discussion output with auto-close of older reports for the
same stream.
- **Data acquisition + robustness**
- Fetches `http://mtzguido.tplinkdns.com:8081/z3/compare_stats.html`
with `curl` and `wget` fallback.
- Adds integrity checks (non-empty HTML/table presence) and explicit
incomplete-report behavior on fetch/parse failures.
- **30-hour analysis semantics**
- Filters rows by timestamp candidates (`time`, `timestamp`, `date`,
`run`, etc.) using UTC.
- Falls back to full-table analysis when timestamps are unavailable, and
marks the report accordingly.
- **Classification logic**
- Detects bug/crash signals from status/details (`crash`, `segfault`,
`assert`, `abort`, `exception`, `error`, `failed`, `bug`).
- Detects:
- unknown-outlier anomalies (thresholded minority `unknown` in otherwise
decisive SAT-family outcomes),
- status divergences (conflicting non-timeout outcomes for same
benchmark),
- repeated hard-failure anomalies.
- **Discussion output shape**
- Produces a compact report with executive counts, bug/crash table,
anomaly subsections, and raw extraction summary/limitations.
```yaml
safe-outputs:
create-discussion:
title-prefix: "[Compare Stats] "
category: "agentic workflows"
close-older-discussions: true
```
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Monthly agentic workflow that discovers community-contributed SMT-LIB
benchmark repositories on GitHub, excluding official distributions from
smtlib.org and Zenodo, and posts a categorised summary as a GitHub
Discussion.
## Workflow steps
- **Exclusion list** — scrapes `smtlib.cs.uiowa.edu` + paginates the
[Zenodo SMT-LIB community](https://zenodo.org/communities/smt-lib/) (up
to 300 records) to extract official repo URLs; hard-excludes `SMT-LIB/*`
and `SMT-Competition/*` orgs
- **GitHub search** — 7 search strategies (topic tags
`smtlib`/`smt-lib`/`smt2`, filename/path patterns, README mentions)
scoped to repos pushed since the last run (90-day window on first run)
- **Filter + classify** — removes official sets and noise (homework
repos, empty repos); categorises survivors into: solver evaluation,
verification, security/CTF, theory/logic, tool output, education, other
- **Discussion report** — posts `[SMT-LIB Benchmarks] Community
Benchmark Repository Survey — [Month YYYY]`; older discussions
auto-closed after 90 days
- **Cache** — stores classified repo sets and run date so subsequent
runs are incremental
## Files
- `.github/workflows/smtlib-benchmark-finder.md` — workflow definition
- `.github/workflows/smtlib-benchmark-finder.lock.yml` — compiled GitHub
Actions YAML (`gh aw compile`)
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This updates the release and nightly pipelines so the Pyodide wheel is
built alongside the existing Python wheels and carried through the
normal publication flow. As a result, Pyodide wheels are included in
nightly assets and in release-time Python package publishing without
introducing a separate publishing path.
- **Pyodide build in release/nightly**
- Added a `pyodide-python` job to both `release.yml` and `nightly.yml`
- Reused the existing Pyodide build flow: install
`pyodide-build`/`pyodide-cli`, provision the matching Emscripten
toolchain, build the wheel, and run `z3test.py` against the built
artifact
- **Artifact integration**
- Upload the built wheel as a dedicated workflow artifact:
`PyodidePythonBuild`
- Extend the existing `python-package` aggregation job in both workflows
to depend on and download that artifact
- **Publication path**
- No new publish logic was added
- The existing downstream steps already publish everything collected in
`src/api/python/dist`, so the Pyodide wheel now flows into:
- nightly GitHub release assets
- release GitHub assets
- release-time PyPI publishing when enabled
```yaml
pyodide-python:
name: "Python bindings (Pyodide)"
...
- name: Upload artifact
uses: actions/upload-artifact@v7.0.1
with:
name: PyodidePythonBuild
path: src/api/python/dist/*.whl
python-package:
needs: [..., pyodide-python]
...
- name: Download Pyodide Build
uses: actions/download-artifact@v8.0.1
with:
name: PyodidePythonBuild
path: artifacts
```
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
- Remove .csproj overwrite in macOS x64 and ARM64 NuGet jobs that
replaced the versioned PackageReference with Version="*", causing
FileNotFoundException for Microsoft.Z3 assembly
- Add validate-build-script-tests job to run scripts/tests/test_*.py,
ensuring JNI arch flag tests from PR #8896 are exercised nightly
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>