mirror of
https://github.com/Z3Prover/z3
synced 2026-05-27 12:26:25 +00:00
Add smtlib-benchmark-finder agentic workflow (#9629)
Monthly agentic workflow that discovers community-contributed SMT-LIB benchmark repositories on GitHub, excluding official distributions from smtlib.org and Zenodo, and posts a categorised summary as a GitHub Discussion. ## Workflow steps - **Exclusion list** — scrapes `smtlib.cs.uiowa.edu` + paginates the [Zenodo SMT-LIB community](https://zenodo.org/communities/smt-lib/) (up to 300 records) to extract official repo URLs; hard-excludes `SMT-LIB/*` and `SMT-Competition/*` orgs - **GitHub search** — 7 search strategies (topic tags `smtlib`/`smt-lib`/`smt2`, filename/path patterns, README mentions) scoped to repos pushed since the last run (90-day window on first run) - **Filter + classify** — removes official sets and noise (homework repos, empty repos); categorises survivors into: solver evaluation, verification, security/CTF, theory/logic, tool output, education, other - **Discussion report** — posts `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`; older discussions auto-closed after 90 days - **Cache** — stores classified repo sets and run date so subsequent runs are incremental ## Files - `.github/workflows/smtlib-benchmark-finder.md` — workflow definition - `.github/workflows/smtlib-benchmark-finder.lock.yml` — compiled GitHub Actions YAML (`gh aw compile`) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This commit is contained in:
parent
c3f5365a95
commit
29c4e539f6
3 changed files with 1791 additions and 5 deletions
12
.github/dependabot.yml
vendored
12
.github/dependabot.yml
vendored
|
|
@ -1,6 +1,8 @@
|
|||
version: 2
|
||||
updates:
|
||||
- package-ecosystem: "github-actions"
|
||||
directory: "/"
|
||||
schedule:
|
||||
interval: "weekly"
|
||||
- directory: /
|
||||
ignore:
|
||||
- dependency-name: "github/gh-aw-actions/**" # Managed by gh aw compile. Version-locked to the gh-aw compiler; do not bump.
|
||||
package-ecosystem: github-actions
|
||||
schedule:
|
||||
interval: weekly
|
||||
version: 2
|
||||
|
|
|
|||
1442
.github/workflows/smtlib-benchmark-finder.lock.yml
generated
vendored
Normal file
1442
.github/workflows/smtlib-benchmark-finder.lock.yml
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
342
.github/workflows/smtlib-benchmark-finder.md
vendored
Normal file
342
.github/workflows/smtlib-benchmark-finder.md
vendored
Normal file
|
|
@ -0,0 +1,342 @@
|
|||
---
|
||||
description: >
|
||||
Monthly SMTLIB Benchmark Finder.
|
||||
Searches GitHub for repositories containing SMT-LIB benchmarks (.smt2 files),
|
||||
excludes repositories that belong to the official SMT-LIB benchmark sets
|
||||
(linked from smtlib.org and hosted on Zenodo), and posts a curated summary
|
||||
of community-contributed benchmark links as a GitHub Discussion.
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 8 1 * *"
|
||||
workflow_dispatch:
|
||||
|
||||
timeout-minutes: 60
|
||||
|
||||
permissions: read-all
|
||||
|
||||
network:
|
||||
allowed:
|
||||
- defaults
|
||||
- github
|
||||
- smtlib.cs.uiowa.edu
|
||||
- zenodo.org
|
||||
|
||||
tools:
|
||||
cache-memory: true
|
||||
web-fetch: {}
|
||||
github:
|
||||
toolsets: [default, repos]
|
||||
bash: [":*"]
|
||||
|
||||
safe-outputs:
|
||||
mentions: false
|
||||
allowed-github-references: []
|
||||
max-bot-mentions: 1
|
||||
create-discussion:
|
||||
title-prefix: "[SMT-LIB Benchmarks] "
|
||||
category: "Agentic Workflows"
|
||||
close-older-discussions: true
|
||||
expires: 90d
|
||||
missing-tool:
|
||||
create-issue: true
|
||||
noop:
|
||||
report-as-issue: false
|
||||
|
||||
---
|
||||
|
||||
# SMTLIB Benchmark Finder
|
||||
|
||||
## Job Description
|
||||
|
||||
Your name is ${{ github.workflow }}. You are a research analyst for the Z3 theorem
|
||||
prover repository `${{ github.repository }}`. Your mission is to discover GitHub
|
||||
repositories that host SMT-LIB benchmarks, exclude the ones that are already part
|
||||
of the official SMT-LIB benchmark distribution (linked from smtlib.org and published
|
||||
on Zenodo), and post a curated summary of community-contributed benchmark links as a
|
||||
GitHub Discussion.
|
||||
|
||||
## Step 1: Load Cache and Determine Run Mode
|
||||
|
||||
Check cache memory for:
|
||||
- `official_repos`: set of GitHub repository full names (`owner/repo`) and Zenodo
|
||||
record IDs already identified as official SMT-LIB benchmark sets
|
||||
- `known_community_repos`: set of repo full names already listed in a previous report
|
||||
- `last_run_date`: ISO-8601 date string of the previous run
|
||||
|
||||
Use the cache to skip repos already classified. On the very first run (no cache),
|
||||
perform a full discovery pass. On subsequent runs focus on repos pushed or created
|
||||
since `last_run_date`.
|
||||
|
||||
## Step 2: Collect Official SMT-LIB Benchmark Sets to Exclude
|
||||
|
||||
### 2.1 Scrape smtlib.org
|
||||
|
||||
Fetch the SMT-LIB benchmarks page and extract all linked Zenodo DOIs and GitHub URLs:
|
||||
|
||||
```bash
|
||||
curl -s "https://smtlib.cs.uiowa.edu/benchmarks.shtml" -o /tmp/smtlib-benchmarks.html
|
||||
# Also try the main page and any mirror
|
||||
curl -s "https://smtlib.cs.uiowa.edu/" -o /tmp/smtlib-home.html
|
||||
```
|
||||
|
||||
Parse both files for:
|
||||
- Zenodo DOI links (`doi.org/10.5281/zenodo.*` or `zenodo.org/record/*`)
|
||||
- GitHub repository URLs (`github.com/...`)
|
||||
- Any other hosted benchmark archive links
|
||||
|
||||
### 2.2 Enumerate Zenodo SMT-LIB Community Records
|
||||
|
||||
Query the Zenodo API for all records in the SMT-LIB community:
|
||||
|
||||
```bash
|
||||
curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=1" \
|
||||
-o /tmp/zenodo-smtlib-page1.json
|
||||
|
||||
# Check if there are more pages (paginate until empty)
|
||||
curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=2" \
|
||||
-o /tmp/zenodo-smtlib-page2.json 2>/dev/null || true
|
||||
|
||||
curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=3" \
|
||||
-o /tmp/zenodo-smtlib-page3.json 2>/dev/null || true
|
||||
```
|
||||
|
||||
For each Zenodo record extract:
|
||||
- Record ID (e.g. `5827900`)
|
||||
- Title
|
||||
- Any GitHub repository URLs listed in the description or related identifiers
|
||||
|
||||
```bash
|
||||
python3 - <<'PYEOF'
|
||||
import json, re
|
||||
|
||||
def extract_github_repos(text):
|
||||
pattern = r'github\.com/([A-Za-z0-9_.-]+/[A-Za-z0-9_.-]+)'
|
||||
return set(re.findall(pattern, text or ''))
|
||||
|
||||
official_repos = set()
|
||||
official_zenodo_ids = set()
|
||||
|
||||
for fname in ['/tmp/zenodo-smtlib-page1.json', '/tmp/zenodo-smtlib-page2.json',
|
||||
'/tmp/zenodo-smtlib-page3.json']:
|
||||
try:
|
||||
data = json.load(open(fname))
|
||||
except Exception:
|
||||
continue
|
||||
for hit in data.get('hits', {}).get('hits', []):
|
||||
rid = str(hit.get('id', ''))
|
||||
official_zenodo_ids.add(rid)
|
||||
metadata = hit.get('metadata', {})
|
||||
description = metadata.get('description', '')
|
||||
related = ' '.join(
|
||||
r.get('identifier', '')
|
||||
for r in metadata.get('related_identifiers', [])
|
||||
)
|
||||
title = metadata.get('title', '')
|
||||
for repo in extract_github_repos(description + ' ' + related + ' ' + title):
|
||||
official_repos.add(repo.lower().rstrip('.'))
|
||||
|
||||
with open('/tmp/smtlib-benchmarks.html') as f:
|
||||
html = f.read()
|
||||
for repo in extract_github_repos(html):
|
||||
official_repos.add(repo.lower().rstrip('.'))
|
||||
|
||||
print("OFFICIAL_ZENODO_IDS:", ','.join(sorted(official_zenodo_ids)) or '(none)')
|
||||
print("OFFICIAL_GITHUB_REPOS:", ','.join(sorted(official_repos)) or '(none)')
|
||||
PYEOF
|
||||
```
|
||||
|
||||
### 2.3 Well-Known Official Repository Patterns
|
||||
|
||||
Regardless of the above scrape, always exclude:
|
||||
- Any repo under the `SMT-LIB` GitHub organization (`SMT-LIB/*`)
|
||||
- Any repo whose name matches `smt-comp-*` that is under `SMT-Competition` org
|
||||
- The Z3 repo itself (`Z3Prover/z3`) and its forks
|
||||
|
||||
Combine all official sources into a single exclusion set stored at
|
||||
`/tmp/official_exclusions.txt` (one `owner/repo` pattern per line, lowercase).
|
||||
|
||||
## Step 3: Search GitHub for Community SMT-LIB Benchmark Repositories
|
||||
|
||||
Use multiple GitHub search strategies to find repos containing `.smt2` benchmark
|
||||
files that are NOT part of the official distribution. Search for repos updated
|
||||
since the last run date (or the last 90 days for the initial run).
|
||||
|
||||
Compute the cutoff date first:
|
||||
```bash
|
||||
CUTOFF=$(date -d "90 days ago" +%Y-%m-%d 2>/dev/null || date -v-90d +%Y-%m-%d)
|
||||
echo "Using cutoff date: $CUTOFF"
|
||||
```
|
||||
|
||||
### Search Strategies
|
||||
|
||||
Use the GitHub MCP server tools to run these searches:
|
||||
|
||||
1. **Topic search**: `topic:smtlib pushed:>$CUTOFF`
|
||||
2. **Topic search variant**: `topic:smt-lib pushed:>$CUTOFF`
|
||||
3. **Topic search variant**: `topic:smt2 pushed:>$CUTOFF`
|
||||
4. **Benchmark filename pattern**: `filename:*.smt2 pushed:>$CUTOFF` (limit to top results)
|
||||
5. **Benchmarks directory pattern**: `path:benchmarks *.smt2 in:path pushed:>$CUTOFF`
|
||||
6. **README mention**: `SMT-LIB benchmarks in:readme stars:>2 pushed:>$CUTOFF`
|
||||
7. **Organization-level search**: repos under `SMT-Competition` org, if any are not already excluded
|
||||
|
||||
For each search, collect: `full_name`, `html_url`, `description`, `stargazers_count`,
|
||||
`updated_at`, `pushed_at`, `default_branch`, `topics`.
|
||||
|
||||
Deduplicate by `full_name`. Limit to 200 total candidates before filtering.
|
||||
|
||||
## Step 4: Filter Out Official Benchmark Sets
|
||||
|
||||
For each candidate repo:
|
||||
|
||||
1. Check if `full_name.lower()` is in the exclusion set from Step 2.
|
||||
2. Check if the repo is owned by `SMT-LIB` or `SMT-Competition` org (case-insensitive).
|
||||
3. Check if the repo is a fork of a known official repo (if `fork: true` and parent is
|
||||
in the exclusion set).
|
||||
|
||||
Discard repos that match any of the above. Keep the rest as community benchmarks.
|
||||
|
||||
Also apply quality filters to reduce noise — skip repos that:
|
||||
- Have 0 stars and fewer than 3 `.smt2` files (likely a student homework or test repo
|
||||
with minimal public value); use your judgement — if the repo description clearly
|
||||
describes a research benchmark, keep it regardless of star count.
|
||||
- Were created but never pushed after creation (empty repos).
|
||||
- Have names that are clearly course assignment repositories
|
||||
(e.g. contain `homework`, `assignment`, `hw[0-9]`, `cs[0-9]{3}`).
|
||||
|
||||
## Step 5: Classify Remaining Repos
|
||||
|
||||
For each repo that survives filtering, classify it into one of these categories:
|
||||
|
||||
| Category | Description |
|
||||
|----------|-------------|
|
||||
| **Solver evaluation** | Benchmarks used to evaluate or compare SMT solvers |
|
||||
| **Verification** | Benchmarks from program verification or model checking |
|
||||
| **Security / CTF** | Benchmarks from security research or CTF challenges |
|
||||
| **Theory / logic** | Benchmarks exploring specific SMT theories or logics |
|
||||
| **Tool output** | Benchmarks generated by another tool (e.g. a compiler, fuzzer) |
|
||||
| **Education** | Course materials or tutorials with benchmark examples |
|
||||
| **Other / unknown** | Does not fit another category |
|
||||
|
||||
Base the classification on: repo description, README (fetch if web-fetch is available),
|
||||
topics, and directory structure. A single brief `web-fetch` of the repo's README is
|
||||
sufficient; do not fetch individual `.smt2` files.
|
||||
|
||||
Note the dominant SMT logic(s) present, if discernible from the description or topics
|
||||
(e.g. QF_BV, QF_LIA, QF_S, NIA, …).
|
||||
|
||||
## Step 6: Generate the Discussion Report
|
||||
|
||||
Create a GitHub Discussion. Use heading level 3 or deeper (`###`, `####`, …) for all
|
||||
section headers; never use `##` or `#` in the body.
|
||||
Wrap long tables in `<details>` tags to keep the report scannable.
|
||||
|
||||
Title: `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`
|
||||
|
||||
Structure the report as follows:
|
||||
|
||||
```markdown
|
||||
**Period covered**: [cutoff date] – [today's date]
|
||||
**Repositories found**: N community repos (after excluding M official sets)
|
||||
**New this run**: N (not listed in previous report)
|
||||
|
||||
### Overview
|
||||
|
||||
1–2 sentences summarising the breadth of community SMT-LIB benchmarks found.
|
||||
|
||||
### Community Benchmark Repositories
|
||||
|
||||
Use `###` for category headers. Within each category, list repos as a markdown table.
|
||||
For each repo include:
|
||||
- Repo link (`[owner/repo](html_url)`)
|
||||
- Stars
|
||||
- Last pushed date
|
||||
- Dominant logic(s) (if known)
|
||||
- Brief description (from repo description or README, max 120 chars)
|
||||
|
||||
#### Solver Evaluation
|
||||
|
||||
| Repository | ⭐ | Last pushed | Logic(s) | Description |
|
||||
|------------|-----|------------|---------|-------------|
|
||||
| [owner/repo](url) | N | YYYY-MM-DD | QF_BV, QF_LIA | … |
|
||||
|
||||
#### Verification
|
||||
|
||||
[same table structure]
|
||||
|
||||
#### Security / CTF
|
||||
|
||||
[same table structure]
|
||||
|
||||
#### Theory / Logic
|
||||
|
||||
[same table structure]
|
||||
|
||||
#### Tool Output
|
||||
|
||||
[same table structure]
|
||||
|
||||
#### Education
|
||||
|
||||
[same table structure]
|
||||
|
||||
#### Other / Unknown
|
||||
|
||||
[same table structure]
|
||||
|
||||
---
|
||||
|
||||
### Exclusions Applied
|
||||
|
||||
<details>
|
||||
<summary>Official SMT-LIB sets excluded from this report</summary>
|
||||
|
||||
List Zenodo record IDs and GitHub repos identified as official distributions.
|
||||
|
||||
| Source | Identifier | Notes |
|
||||
|--------|-----------|-------|
|
||||
| Zenodo | [10.5281/zenodo.XXXXXXX](https://zenodo.org/record/XXXXXXX) | Official QF_BV benchmark set |
|
||||
| GitHub | [SMT-LIB/benchmarks-non-incremental](https://github.com/SMT-LIB/benchmarks-non-incremental) | |
|
||||
|
||||
</details>
|
||||
|
||||
---
|
||||
|
||||
### Methodology
|
||||
|
||||
Brief note on search queries used, cutoff date, and any quality filters applied.
|
||||
```
|
||||
|
||||
## Step 7: Update Cache Memory
|
||||
|
||||
After posting the discussion, update cache memory with:
|
||||
- `official_repos`: updated exclusion set (union of previous + newly found)
|
||||
- `known_community_repos`: union of previous + repos listed in this report
|
||||
- `last_run_date`: today's ISO-8601 date
|
||||
- `report_url`: URL of the GitHub Discussion created
|
||||
|
||||
## Guidelines
|
||||
|
||||
- **Be conservative with exclusions**: when in doubt whether a repo is "official",
|
||||
keep it in the community list rather than silently dropping it.
|
||||
- **Be accurate**: only include repos that genuinely contain SMT-LIB `.smt2` files
|
||||
or clearly describe themselves as SMT-LIB benchmark collections.
|
||||
- **Avoid noise**: student homework repos and trivially small repos add clutter;
|
||||
apply the quality filters from Step 4 judiciously.
|
||||
- **No source code changes**: DO NOT create pull requests or modify any source files.
|
||||
- **No copyrighted content**: DO NOT reproduce benchmark file contents; only post
|
||||
links and metadata.
|
||||
- **Always cite sources**: include the full GitHub URL for every listed repository.
|
||||
- **Use cache**: skip repos already classified in a previous run to keep runtime short.
|
||||
- **Fail gracefully**: if GitHub search rate-limits the workflow, post whatever was
|
||||
collected so far with a note that the search was incomplete.
|
||||
|
||||
## Important Notes
|
||||
|
||||
- DO NOT create pull requests or modify source files.
|
||||
- DO close older SMT-LIB Benchmarks discussions automatically (configured).
|
||||
- DO always call `create_discussion` or `noop` before the workflow ends.
|
||||
Failing to produce any safe output triggers an automatic failure issue.
|
||||
- DO use cache memory to avoid re-processing repos already surveyed.
|
||||
- DO limit individual `web-fetch` calls (README fetches) to repos where the
|
||||
description alone is insufficient for classification.
|
||||
Loading…
Add table
Add a link
Reference in a new issue