mirrors/z3 - Libre-Chip.org

mirrors/z3

Fork 0

mirror of https://github.com/Z3Prover/z3 synced 2026-06-01 14:47:51 +00:00

Commit graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Copilot	eb4c3a0756	Update compare-stats anomaly reporter to read benchmark stats from `/z3/` (#9650 ) This updates the compare-stats anomaly reporter to retrieve benchmark data from `http://mtzguido.tplinkdns.com:8081/z3/` instead of the old `compare_stats.html` endpoint. The workflow prompt and generated lock file now consistently reference the root benchmark stats URL. - Workflow source - Repoint the benchmark source URL from `.../compare_stats.html` to `.../z3/` - Update prompt text to describe the source as benchmark statistics rather than a specific HTML file - Rename the temporary fetched artifact from `compare_stats.html` to `benchmark_stats.html` for consistency - Generated workflow - Regenerate `compare-stats-anomaly-reporter.lock.yml` so the compiled workflow matches the markdown source - Refresh the embedded workflow description and prompt payload to reference the new endpoint - Report output - Update the discussion template’s source link to point at the root benchmark stats page ```md Source URL: `http://mtzguido.tplinkdns.com:8081/z3/` curl -fsSL --max-time 60 "http://mtzguido.tplinkdns.com:8081/z3/" \ -o /tmp/gh-aw/agent/benchmark_stats.html ``` --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-27 09:57:20 -07:00
Copilot	b3fff5b399	Add compare-stats anomaly reporter workflow for 30h bug/crash triage (#9647 ) This adds an agentic workflow that analyzes `compare_stats.html` over a rolling 30-hour window and publishes a GitHub Discussion summarizing bugs, crashes, and anomalies. It explicitly captures unknown-outlier patterns where a benchmark is `unknown` while peers in the same set are mostly `sat`/`unsat`/`timeout`. - Workflow added - Introduces `.github/workflows/compare-stats-anomaly-reporter.md` (plus compiled `.lock.yml`). - Supports `workflow_dispatch` and scheduled execution. - Uses safe discussion output with auto-close of older reports for the same stream. - Data acquisition + robustness - Fetches `http://mtzguido.tplinkdns.com:8081/z3/compare_stats.html` with `curl` and `wget` fallback. - Adds integrity checks (non-empty HTML/table presence) and explicit incomplete-report behavior on fetch/parse failures. - 30-hour analysis semantics - Filters rows by timestamp candidates (`time`, `timestamp`, `date`, `run`, etc.) using UTC. - Falls back to full-table analysis when timestamps are unavailable, and marks the report accordingly. - Classification logic - Detects bug/crash signals from status/details (`crash`, `segfault`, `assert`, `abort`, `exception`, `error`, `failed`, `bug`). - Detects: - unknown-outlier anomalies (thresholded minority `unknown` in otherwise decisive SAT-family outcomes), - status divergences (conflicting non-timeout outcomes for same benchmark), - repeated hard-failure anomalies. - Discussion output shape - Produces a compact report with executive counts, bug/crash table, anomaly subsections, and raw extraction summary/limitations. ```yaml safe-outputs: create-discussion: title-prefix: "[Compare Stats] " category: "agentic workflows" close-older-discussions: true ``` --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-27 09:25:33 -07:00

Copilot

eb4c3a0756

Update compare-stats anomaly reporter to read benchmark stats from /z3/ (#9650 )

This updates the compare-stats anomaly reporter to retrieve benchmark
data from `http://mtzguido.tplinkdns.com:8081/z3/` instead of the old
`compare_stats.html` endpoint. The workflow prompt and generated lock
file now consistently reference the root benchmark stats URL.

- **Workflow source**
- Repoint the benchmark source URL from `.../compare_stats.html` to
`.../z3/`
- Update prompt text to describe the source as benchmark statistics
rather than a specific HTML file
- Rename the temporary fetched artifact from `compare_stats.html` to
`benchmark_stats.html` for consistency

- **Generated workflow**
- Regenerate `compare-stats-anomaly-reporter.lock.yml` so the compiled
workflow matches the markdown source
- Refresh the embedded workflow description and prompt payload to
reference the new endpoint

- **Report output**
- Update the discussion template’s source link to point at the root
benchmark stats page

```md
Source URL:
`http://mtzguido.tplinkdns.com:8081/z3/`

curl -fsSL --max-time 60 "http://mtzguido.tplinkdns.com:8081/z3/" \
  -o /tmp/gh-aw/agent/benchmark_stats.html
```

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

2026-05-27 09:57:20 -07:00

Copilot

b3fff5b399

Add compare-stats anomaly reporter workflow for 30h bug/crash triage (#9647 )

This adds an agentic workflow that analyzes `compare_stats.html` over a
rolling 30-hour window and publishes a GitHub Discussion summarizing
bugs, crashes, and anomalies. It explicitly captures unknown-outlier
patterns where a benchmark is `unknown` while peers in the same set are
mostly `sat`/`unsat`/`timeout`.

- **Workflow added**
- Introduces `.github/workflows/compare-stats-anomaly-reporter.md` (plus
compiled `.lock.yml`).
  - Supports `workflow_dispatch` and scheduled execution.
- Uses safe discussion output with auto-close of older reports for the
same stream.

- **Data acquisition + robustness**
- Fetches `http://mtzguido.tplinkdns.com:8081/z3/compare_stats.html`
with `curl` and `wget` fallback.
- Adds integrity checks (non-empty HTML/table presence) and explicit
incomplete-report behavior on fetch/parse failures.

- **30-hour analysis semantics**
- Filters rows by timestamp candidates (`time`, `timestamp`, `date`,
`run`, etc.) using UTC.
- Falls back to full-table analysis when timestamps are unavailable, and
marks the report accordingly.

- **Classification logic**
- Detects bug/crash signals from status/details (`crash`, `segfault`,
`assert`, `abort`, `exception`, `error`, `failed`, `bug`).
  - Detects:
- unknown-outlier anomalies (thresholded minority `unknown` in otherwise
decisive SAT-family outcomes),
- status divergences (conflicting non-timeout outcomes for same
benchmark),
    - repeated hard-failure anomalies.

- **Discussion output shape**
- Produces a compact report with executive counts, bug/crash table,
anomaly subsections, and raw extraction summary/limitations.

```yaml
safe-outputs:
  create-discussion:
    title-prefix: "[Compare Stats] "
    category: "agentic workflows"
    close-older-discussions: true
```

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

2026-05-27 09:25:33 -07:00

2 commits