mirror of
https://github.com/Z3Prover/z3
synced 2026-06-14 21:05:39 +00:00
Add smtlib-benchmark-finder agentic workflow (#9629)
Monthly agentic workflow that discovers community-contributed SMT-LIB benchmark repositories on GitHub, excluding official distributions from smtlib.org and Zenodo, and posts a categorised summary as a GitHub Discussion. ## Workflow steps - **Exclusion list** — scrapes `smtlib.cs.uiowa.edu` + paginates the [Zenodo SMT-LIB community](https://zenodo.org/communities/smt-lib/) (up to 300 records) to extract official repo URLs; hard-excludes `SMT-LIB/*` and `SMT-Competition/*` orgs - **GitHub search** — 7 search strategies (topic tags `smtlib`/`smt-lib`/`smt2`, filename/path patterns, README mentions) scoped to repos pushed since the last run (90-day window on first run) - **Filter + classify** — removes official sets and noise (homework repos, empty repos); categorises survivors into: solver evaluation, verification, security/CTF, theory/logic, tool output, education, other - **Discussion report** — posts `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`; older discussions auto-closed after 90 days - **Cache** — stores classified repo sets and run date so subsequent runs are incremental ## Files - `.github/workflows/smtlib-benchmark-finder.md` — workflow definition - `.github/workflows/smtlib-benchmark-finder.lock.yml` — compiled GitHub Actions YAML (`gh aw compile`) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This commit is contained in:
parent
c3f5365a95
commit
29c4e539f6
3 changed files with 1791 additions and 5 deletions
12
.github/dependabot.yml
vendored
12
.github/dependabot.yml
vendored
|
|
@ -1,6 +1,8 @@
|
||||||
version: 2
|
|
||||||
updates:
|
updates:
|
||||||
- package-ecosystem: "github-actions"
|
- directory: /
|
||||||
directory: "/"
|
ignore:
|
||||||
schedule:
|
- dependency-name: "github/gh-aw-actions/**" # Managed by gh aw compile. Version-locked to the gh-aw compiler; do not bump.
|
||||||
interval: "weekly"
|
package-ecosystem: github-actions
|
||||||
|
schedule:
|
||||||
|
interval: weekly
|
||||||
|
version: 2
|
||||||
|
|
|
||||||
1442
.github/workflows/smtlib-benchmark-finder.lock.yml
generated
vendored
Normal file
1442
.github/workflows/smtlib-benchmark-finder.lock.yml
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
342
.github/workflows/smtlib-benchmark-finder.md
vendored
Normal file
342
.github/workflows/smtlib-benchmark-finder.md
vendored
Normal file
|
|
@ -0,0 +1,342 @@
|
||||||
|
---
|
||||||
|
description: >
|
||||||
|
Monthly SMTLIB Benchmark Finder.
|
||||||
|
Searches GitHub for repositories containing SMT-LIB benchmarks (.smt2 files),
|
||||||
|
excludes repositories that belong to the official SMT-LIB benchmark sets
|
||||||
|
(linked from smtlib.org and hosted on Zenodo), and posts a curated summary
|
||||||
|
of community-contributed benchmark links as a GitHub Discussion.
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: "0 8 1 * *"
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
timeout-minutes: 60
|
||||||
|
|
||||||
|
permissions: read-all
|
||||||
|
|
||||||
|
network:
|
||||||
|
allowed:
|
||||||
|
- defaults
|
||||||
|
- github
|
||||||
|
- smtlib.cs.uiowa.edu
|
||||||
|
- zenodo.org
|
||||||
|
|
||||||
|
tools:
|
||||||
|
cache-memory: true
|
||||||
|
web-fetch: {}
|
||||||
|
github:
|
||||||
|
toolsets: [default, repos]
|
||||||
|
bash: [":*"]
|
||||||
|
|
||||||
|
safe-outputs:
|
||||||
|
mentions: false
|
||||||
|
allowed-github-references: []
|
||||||
|
max-bot-mentions: 1
|
||||||
|
create-discussion:
|
||||||
|
title-prefix: "[SMT-LIB Benchmarks] "
|
||||||
|
category: "Agentic Workflows"
|
||||||
|
close-older-discussions: true
|
||||||
|
expires: 90d
|
||||||
|
missing-tool:
|
||||||
|
create-issue: true
|
||||||
|
noop:
|
||||||
|
report-as-issue: false
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# SMTLIB Benchmark Finder
|
||||||
|
|
||||||
|
## Job Description
|
||||||
|
|
||||||
|
Your name is ${{ github.workflow }}. You are a research analyst for the Z3 theorem
|
||||||
|
prover repository `${{ github.repository }}`. Your mission is to discover GitHub
|
||||||
|
repositories that host SMT-LIB benchmarks, exclude the ones that are already part
|
||||||
|
of the official SMT-LIB benchmark distribution (linked from smtlib.org and published
|
||||||
|
on Zenodo), and post a curated summary of community-contributed benchmark links as a
|
||||||
|
GitHub Discussion.
|
||||||
|
|
||||||
|
## Step 1: Load Cache and Determine Run Mode
|
||||||
|
|
||||||
|
Check cache memory for:
|
||||||
|
- `official_repos`: set of GitHub repository full names (`owner/repo`) and Zenodo
|
||||||
|
record IDs already identified as official SMT-LIB benchmark sets
|
||||||
|
- `known_community_repos`: set of repo full names already listed in a previous report
|
||||||
|
- `last_run_date`: ISO-8601 date string of the previous run
|
||||||
|
|
||||||
|
Use the cache to skip repos already classified. On the very first run (no cache),
|
||||||
|
perform a full discovery pass. On subsequent runs focus on repos pushed or created
|
||||||
|
since `last_run_date`.
|
||||||
|
|
||||||
|
## Step 2: Collect Official SMT-LIB Benchmark Sets to Exclude
|
||||||
|
|
||||||
|
### 2.1 Scrape smtlib.org
|
||||||
|
|
||||||
|
Fetch the SMT-LIB benchmarks page and extract all linked Zenodo DOIs and GitHub URLs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s "https://smtlib.cs.uiowa.edu/benchmarks.shtml" -o /tmp/smtlib-benchmarks.html
|
||||||
|
# Also try the main page and any mirror
|
||||||
|
curl -s "https://smtlib.cs.uiowa.edu/" -o /tmp/smtlib-home.html
|
||||||
|
```
|
||||||
|
|
||||||
|
Parse both files for:
|
||||||
|
- Zenodo DOI links (`doi.org/10.5281/zenodo.*` or `zenodo.org/record/*`)
|
||||||
|
- GitHub repository URLs (`github.com/...`)
|
||||||
|
- Any other hosted benchmark archive links
|
||||||
|
|
||||||
|
### 2.2 Enumerate Zenodo SMT-LIB Community Records
|
||||||
|
|
||||||
|
Query the Zenodo API for all records in the SMT-LIB community:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=1" \
|
||||||
|
-o /tmp/zenodo-smtlib-page1.json
|
||||||
|
|
||||||
|
# Check if there are more pages (paginate until empty)
|
||||||
|
curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=2" \
|
||||||
|
-o /tmp/zenodo-smtlib-page2.json 2>/dev/null || true
|
||||||
|
|
||||||
|
curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=3" \
|
||||||
|
-o /tmp/zenodo-smtlib-page3.json 2>/dev/null || true
|
||||||
|
```
|
||||||
|
|
||||||
|
For each Zenodo record extract:
|
||||||
|
- Record ID (e.g. `5827900`)
|
||||||
|
- Title
|
||||||
|
- Any GitHub repository URLs listed in the description or related identifiers
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 - <<'PYEOF'
|
||||||
|
import json, re
|
||||||
|
|
||||||
|
def extract_github_repos(text):
|
||||||
|
pattern = r'github\.com/([A-Za-z0-9_.-]+/[A-Za-z0-9_.-]+)'
|
||||||
|
return set(re.findall(pattern, text or ''))
|
||||||
|
|
||||||
|
official_repos = set()
|
||||||
|
official_zenodo_ids = set()
|
||||||
|
|
||||||
|
for fname in ['/tmp/zenodo-smtlib-page1.json', '/tmp/zenodo-smtlib-page2.json',
|
||||||
|
'/tmp/zenodo-smtlib-page3.json']:
|
||||||
|
try:
|
||||||
|
data = json.load(open(fname))
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
for hit in data.get('hits', {}).get('hits', []):
|
||||||
|
rid = str(hit.get('id', ''))
|
||||||
|
official_zenodo_ids.add(rid)
|
||||||
|
metadata = hit.get('metadata', {})
|
||||||
|
description = metadata.get('description', '')
|
||||||
|
related = ' '.join(
|
||||||
|
r.get('identifier', '')
|
||||||
|
for r in metadata.get('related_identifiers', [])
|
||||||
|
)
|
||||||
|
title = metadata.get('title', '')
|
||||||
|
for repo in extract_github_repos(description + ' ' + related + ' ' + title):
|
||||||
|
official_repos.add(repo.lower().rstrip('.'))
|
||||||
|
|
||||||
|
with open('/tmp/smtlib-benchmarks.html') as f:
|
||||||
|
html = f.read()
|
||||||
|
for repo in extract_github_repos(html):
|
||||||
|
official_repos.add(repo.lower().rstrip('.'))
|
||||||
|
|
||||||
|
print("OFFICIAL_ZENODO_IDS:", ','.join(sorted(official_zenodo_ids)) or '(none)')
|
||||||
|
print("OFFICIAL_GITHUB_REPOS:", ','.join(sorted(official_repos)) or '(none)')
|
||||||
|
PYEOF
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.3 Well-Known Official Repository Patterns
|
||||||
|
|
||||||
|
Regardless of the above scrape, always exclude:
|
||||||
|
- Any repo under the `SMT-LIB` GitHub organization (`SMT-LIB/*`)
|
||||||
|
- Any repo whose name matches `smt-comp-*` that is under `SMT-Competition` org
|
||||||
|
- The Z3 repo itself (`Z3Prover/z3`) and its forks
|
||||||
|
|
||||||
|
Combine all official sources into a single exclusion set stored at
|
||||||
|
`/tmp/official_exclusions.txt` (one `owner/repo` pattern per line, lowercase).
|
||||||
|
|
||||||
|
## Step 3: Search GitHub for Community SMT-LIB Benchmark Repositories
|
||||||
|
|
||||||
|
Use multiple GitHub search strategies to find repos containing `.smt2` benchmark
|
||||||
|
files that are NOT part of the official distribution. Search for repos updated
|
||||||
|
since the last run date (or the last 90 days for the initial run).
|
||||||
|
|
||||||
|
Compute the cutoff date first:
|
||||||
|
```bash
|
||||||
|
CUTOFF=$(date -d "90 days ago" +%Y-%m-%d 2>/dev/null || date -v-90d +%Y-%m-%d)
|
||||||
|
echo "Using cutoff date: $CUTOFF"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search Strategies
|
||||||
|
|
||||||
|
Use the GitHub MCP server tools to run these searches:
|
||||||
|
|
||||||
|
1. **Topic search**: `topic:smtlib pushed:>$CUTOFF`
|
||||||
|
2. **Topic search variant**: `topic:smt-lib pushed:>$CUTOFF`
|
||||||
|
3. **Topic search variant**: `topic:smt2 pushed:>$CUTOFF`
|
||||||
|
4. **Benchmark filename pattern**: `filename:*.smt2 pushed:>$CUTOFF` (limit to top results)
|
||||||
|
5. **Benchmarks directory pattern**: `path:benchmarks *.smt2 in:path pushed:>$CUTOFF`
|
||||||
|
6. **README mention**: `SMT-LIB benchmarks in:readme stars:>2 pushed:>$CUTOFF`
|
||||||
|
7. **Organization-level search**: repos under `SMT-Competition` org, if any are not already excluded
|
||||||
|
|
||||||
|
For each search, collect: `full_name`, `html_url`, `description`, `stargazers_count`,
|
||||||
|
`updated_at`, `pushed_at`, `default_branch`, `topics`.
|
||||||
|
|
||||||
|
Deduplicate by `full_name`. Limit to 200 total candidates before filtering.
|
||||||
|
|
||||||
|
## Step 4: Filter Out Official Benchmark Sets
|
||||||
|
|
||||||
|
For each candidate repo:
|
||||||
|
|
||||||
|
1. Check if `full_name.lower()` is in the exclusion set from Step 2.
|
||||||
|
2. Check if the repo is owned by `SMT-LIB` or `SMT-Competition` org (case-insensitive).
|
||||||
|
3. Check if the repo is a fork of a known official repo (if `fork: true` and parent is
|
||||||
|
in the exclusion set).
|
||||||
|
|
||||||
|
Discard repos that match any of the above. Keep the rest as community benchmarks.
|
||||||
|
|
||||||
|
Also apply quality filters to reduce noise — skip repos that:
|
||||||
|
- Have 0 stars and fewer than 3 `.smt2` files (likely a student homework or test repo
|
||||||
|
with minimal public value); use your judgement — if the repo description clearly
|
||||||
|
describes a research benchmark, keep it regardless of star count.
|
||||||
|
- Were created but never pushed after creation (empty repos).
|
||||||
|
- Have names that are clearly course assignment repositories
|
||||||
|
(e.g. contain `homework`, `assignment`, `hw[0-9]`, `cs[0-9]{3}`).
|
||||||
|
|
||||||
|
## Step 5: Classify Remaining Repos
|
||||||
|
|
||||||
|
For each repo that survives filtering, classify it into one of these categories:
|
||||||
|
|
||||||
|
| Category | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| **Solver evaluation** | Benchmarks used to evaluate or compare SMT solvers |
|
||||||
|
| **Verification** | Benchmarks from program verification or model checking |
|
||||||
|
| **Security / CTF** | Benchmarks from security research or CTF challenges |
|
||||||
|
| **Theory / logic** | Benchmarks exploring specific SMT theories or logics |
|
||||||
|
| **Tool output** | Benchmarks generated by another tool (e.g. a compiler, fuzzer) |
|
||||||
|
| **Education** | Course materials or tutorials with benchmark examples |
|
||||||
|
| **Other / unknown** | Does not fit another category |
|
||||||
|
|
||||||
|
Base the classification on: repo description, README (fetch if web-fetch is available),
|
||||||
|
topics, and directory structure. A single brief `web-fetch` of the repo's README is
|
||||||
|
sufficient; do not fetch individual `.smt2` files.
|
||||||
|
|
||||||
|
Note the dominant SMT logic(s) present, if discernible from the description or topics
|
||||||
|
(e.g. QF_BV, QF_LIA, QF_S, NIA, …).
|
||||||
|
|
||||||
|
## Step 6: Generate the Discussion Report
|
||||||
|
|
||||||
|
Create a GitHub Discussion. Use heading level 3 or deeper (`###`, `####`, …) for all
|
||||||
|
section headers; never use `##` or `#` in the body.
|
||||||
|
Wrap long tables in `<details>` tags to keep the report scannable.
|
||||||
|
|
||||||
|
Title: `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`
|
||||||
|
|
||||||
|
Structure the report as follows:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
**Period covered**: [cutoff date] – [today's date]
|
||||||
|
**Repositories found**: N community repos (after excluding M official sets)
|
||||||
|
**New this run**: N (not listed in previous report)
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
1–2 sentences summarising the breadth of community SMT-LIB benchmarks found.
|
||||||
|
|
||||||
|
### Community Benchmark Repositories
|
||||||
|
|
||||||
|
Use `###` for category headers. Within each category, list repos as a markdown table.
|
||||||
|
For each repo include:
|
||||||
|
- Repo link (`[owner/repo](html_url)`)
|
||||||
|
- Stars
|
||||||
|
- Last pushed date
|
||||||
|
- Dominant logic(s) (if known)
|
||||||
|
- Brief description (from repo description or README, max 120 chars)
|
||||||
|
|
||||||
|
#### Solver Evaluation
|
||||||
|
|
||||||
|
| Repository | ⭐ | Last pushed | Logic(s) | Description |
|
||||||
|
|------------|-----|------------|---------|-------------|
|
||||||
|
| [owner/repo](url) | N | YYYY-MM-DD | QF_BV, QF_LIA | … |
|
||||||
|
|
||||||
|
#### Verification
|
||||||
|
|
||||||
|
[same table structure]
|
||||||
|
|
||||||
|
#### Security / CTF
|
||||||
|
|
||||||
|
[same table structure]
|
||||||
|
|
||||||
|
#### Theory / Logic
|
||||||
|
|
||||||
|
[same table structure]
|
||||||
|
|
||||||
|
#### Tool Output
|
||||||
|
|
||||||
|
[same table structure]
|
||||||
|
|
||||||
|
#### Education
|
||||||
|
|
||||||
|
[same table structure]
|
||||||
|
|
||||||
|
#### Other / Unknown
|
||||||
|
|
||||||
|
[same table structure]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Exclusions Applied
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Official SMT-LIB sets excluded from this report</summary>
|
||||||
|
|
||||||
|
List Zenodo record IDs and GitHub repos identified as official distributions.
|
||||||
|
|
||||||
|
| Source | Identifier | Notes |
|
||||||
|
|--------|-----------|-------|
|
||||||
|
| Zenodo | [10.5281/zenodo.XXXXXXX](https://zenodo.org/record/XXXXXXX) | Official QF_BV benchmark set |
|
||||||
|
| GitHub | [SMT-LIB/benchmarks-non-incremental](https://github.com/SMT-LIB/benchmarks-non-incremental) | |
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Methodology
|
||||||
|
|
||||||
|
Brief note on search queries used, cutoff date, and any quality filters applied.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 7: Update Cache Memory
|
||||||
|
|
||||||
|
After posting the discussion, update cache memory with:
|
||||||
|
- `official_repos`: updated exclusion set (union of previous + newly found)
|
||||||
|
- `known_community_repos`: union of previous + repos listed in this report
|
||||||
|
- `last_run_date`: today's ISO-8601 date
|
||||||
|
- `report_url`: URL of the GitHub Discussion created
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
- **Be conservative with exclusions**: when in doubt whether a repo is "official",
|
||||||
|
keep it in the community list rather than silently dropping it.
|
||||||
|
- **Be accurate**: only include repos that genuinely contain SMT-LIB `.smt2` files
|
||||||
|
or clearly describe themselves as SMT-LIB benchmark collections.
|
||||||
|
- **Avoid noise**: student homework repos and trivially small repos add clutter;
|
||||||
|
apply the quality filters from Step 4 judiciously.
|
||||||
|
- **No source code changes**: DO NOT create pull requests or modify any source files.
|
||||||
|
- **No copyrighted content**: DO NOT reproduce benchmark file contents; only post
|
||||||
|
links and metadata.
|
||||||
|
- **Always cite sources**: include the full GitHub URL for every listed repository.
|
||||||
|
- **Use cache**: skip repos already classified in a previous run to keep runtime short.
|
||||||
|
- **Fail gracefully**: if GitHub search rate-limits the workflow, post whatever was
|
||||||
|
collected so far with a note that the search was incomplete.
|
||||||
|
|
||||||
|
## Important Notes
|
||||||
|
|
||||||
|
- DO NOT create pull requests or modify source files.
|
||||||
|
- DO close older SMT-LIB Benchmarks discussions automatically (configured).
|
||||||
|
- DO always call `create_discussion` or `noop` before the workflow ends.
|
||||||
|
Failing to produce any safe output triggers an automatic failure issue.
|
||||||
|
- DO use cache memory to avoid re-processing repos already surveyed.
|
||||||
|
- DO limit individual `web-fetch` calls (README fetches) to repos where the
|
||||||
|
description alone is insufficient for classification.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue