---
description: Analyze compare_stats.html for the latest 30 hours and publish bug/crash/anomaly summary as a GitHub Discussion
on:
schedule:
- cron: "0 */12 * * *"
workflow_dispatch:
permissions: read-all
strict: false
timeout-minutes: 45
network:
allowed:
- defaults
- mtzguido.tplinkdns.com
tools:
bash: [":*"]
github:
toolsets: [default]
safe-outputs:
create-discussion:
title-prefix: "[Compare Stats] "
category: "agentic workflows"
close-older-discussions: true
missing-tool:
create-issue: true
noop:
report-as-issue: false
---
# Compare Stats Bug/Crash/Anomaly Reporter
Your name is ${{ github.workflow }}. You are a Z3 benchmarking analysis agent for `${{ github.repository }}`.
Analyze the benchmark comparison page below, focusing on results from the last 30 hours, then create a GitHub Discussion with a concise but actionable summary of:
- Bugs
- Crashes
- Anomalies
Source URL:
`http://mtzguido.tplinkdns.com:8081/z3/compare_stats.html`
Note: this endpoint is currently HTTP-only. Treat fetched data as non-sensitive benchmark telemetry and do not include secrets in requests or reports.
Note: the workflow runs every 12 hours but analyzes 30 hours intentionally to provide overlap and avoid missing transient failures between runs.
Overlapping windows are expected; `close-older-discussions: true` keeps only the latest report thread active.
## Requirements
### 1) Fetch and save the source page
Use bash to fetch the page into `/tmp/gh-aw/agent/compare_stats.html`.
Try this first:
```bash
curl -fsSL --max-time 60 "http://mtzguido.tplinkdns.com:8081/z3/compare_stats.html" -o /tmp/gh-aw/agent/compare_stats.html
```
If that fails, retry once with:
```bash
wget -q -T 60 -O /tmp/gh-aw/agent/compare_stats.html "http://mtzguido.tplinkdns.com:8081/z3/compare_stats.html"
```
If both fail, still create a discussion that explains the fetch failure, includes stderr output, and marks the report as incomplete.
After a successful fetch, perform basic integrity checks before parsing:
- file is non-empty
- content includes `= 4`, `unknown_count / total_rows <= 0.4`, and `(sat_count + unsat_count + timeout_count) / total_rows >= 0.6`.
- If set/suite/group columns are missing, fallback grouping order is: directory prefix of benchmark path/name, then benchmark name prefix before first separator (`/`, `:`, `::`), then a single global group.
2. **Status divergence anomaly**:
- Same benchmark name appears multiple times with conflicting non-timeout statuses (for example `sat` vs `unsat`).
- Ignore timeout-only disagreements here; timeout behavior is covered under the repeated hard-failure anomaly section to reduce noise from transient runtime variance.
3. **Repeated hard-failure anomaly**:
- Same benchmark appears repeatedly with crash/error-like status in the time window.
### 5) Generate discussion report
Create a GitHub Discussion using `create-discussion` safe output.
Use this structure:
```markdown
### Compare Stats Analysis Report
**Source**: [compare_stats.html](http://mtzguido.tplinkdns.com:8081/z3/compare_stats.html)
**Workflow Run**: [#${{ github.run_id }}](https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }})
**Analysis Time (UTC)**:
**Window**: last 30 hours (or fallback mode)
### Executive Summary
- Rows analyzed: N
- Rows in 30h window: M (or "timestamp unavailable")
- Bugs/crashes: B
- Anomalies: A
### Bugs and Crashes
| Benchmark Set | Benchmark | Status | Details | Timestamp |
|---|---|---|---|---|
| ... |
### Anomalies
#### Unknown-Outlier Cases
| Benchmark Set | Benchmark | Status | Peer Status Distribution | Timestamp |
|---|---|---|---|---|
| ... |
#### Status Divergences
| Benchmark | Observed Statuses | Benchmark Set(s) | Timestamp(s) |
|---|---|---|---|
| ... |
#### Repeated Hard Failures
| Benchmark | Failure Count | Representative Status/Details | Benchmark Set(s) |
|---|---|---|---|
| ... |
### Notes and Limitations
- Mention parsing assumptions
- Mention missing columns/timestamps if any
Raw Extraction Summary
- Table count
- Candidate columns used
- Top status distribution
- Up to 30 representative raw rows (sanitized)
```
## Reporting Rules
- Be factual and concise.
- Do not claim certainty when column mapping is heuristic.
- If no bugs/crashes/anomalies are found, still create the discussion and explicitly state "No issues detected in analyzed window."
- Do not open PRs or modify repository files.