Add smtlib-benchmark-finder agentic workflow (#9629)

Monthly agentic workflow that discovers community-contributed SMT-LIB benchmark repositories on GitHub, excluding official distributions from smtlib.org and Zenodo, and posts a categorised summary as a GitHub Discussion. ## Workflow steps - **Exclusion list** — scrapes `smtlib.cs.uiowa.edu` + paginates the [Zenodo SMT-LIB community](https://zenodo.org/communities/smt-lib/) (up to 300 records) to extract official repo URLs; hard-excludes `SMT-LIB/*` and `SMT-Competition/*` orgs - **GitHub search** — 7 search strategies (topic tags `smtlib`/`smt-lib`/`smt2`, filename/path patterns, README mentions) scoped to repos pushed since the last run (90-day window on first run) - **Filter + classify** — removes official sets and noise (homework repos, empty repos); categorises survivors into: solver evaluation, verification, security/CTF, theory/logic, tool output, education, other - **Discussion report** — posts `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`; older discussions auto-closed after 90 days - **Cache** — stores classified repo sets and run date so subsequent runs are incremental ## Files - `.github/workflows/smtlib-benchmark-finder.md` — workflow definition - `.github/workflows/smtlib-benchmark-finder.lock.yml` — compiled GitHub Actions YAML (`gh aw compile`) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-07-29 18:23:49 +00:00 · 2026-05-26 15:28:11 -07:00 · 2026-05-26 15:28:11 -07:00 · 29c4e539f6
commit 29c4e539f6
parent c3f5365a95
3 changed files with 1791 additions and 5 deletions
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@ -1,6 +1,8 @@
-version: 2
 updates:
-  - package-ecosystem: "github-actions"
-    directory: "/"
-    schedule:
-      interval: "weekly"
+- directory: /
+  ignore:
+  - dependency-name: "github/gh-aw-actions/**" # Managed by gh aw compile. Version-locked to the gh-aw compiler; do not bump.
+  package-ecosystem: github-actions
+  schedule:
+    interval: weekly
+version: 2
--- a/.github/workflows/smtlib-benchmark-finder.lock.yml
+++ b/.github/workflows/smtlib-benchmark-finder.lock.yml
--- a/.github/workflows/smtlib-benchmark-finder.md
+++ b/.github/workflows/smtlib-benchmark-finder.md
@ -0,0 +1,342 @@
+---
+description: >
+  Monthly SMTLIB Benchmark Finder.
+  Searches GitHub for repositories containing SMT-LIB benchmarks (.smt2 files),
+  excludes repositories that belong to the official SMT-LIB benchmark sets
+  (linked from smtlib.org and hosted on Zenodo), and posts a curated summary
+  of community-contributed benchmark links as a GitHub Discussion.
+
+on:
+  schedule:
+    - cron: "0 8 1 * *"
+  workflow_dispatch:
+
+timeout-minutes: 60
+
+permissions: read-all
+
+network:
+  allowed:
+    - defaults
+    - github
+    - smtlib.cs.uiowa.edu
+    - zenodo.org
+
+tools:
+  cache-memory: true
+  web-fetch: {}
+  github:
+    toolsets: [default, repos]
+  bash: [":*"]
+
+safe-outputs:
+  mentions: false
+  allowed-github-references: []
+  max-bot-mentions: 1
+  create-discussion:
+    title-prefix: "[SMT-LIB Benchmarks] "
+    category: "Agentic Workflows"
+    close-older-discussions: true
+    expires: 90d
+  missing-tool:
+    create-issue: true
+  noop:
+    report-as-issue: false
+
+---
+
+# SMTLIB Benchmark Finder
+
+## Job Description
+
+Your name is ${{ github.workflow }}. You are a research analyst for the Z3 theorem
+prover repository `${{ github.repository }}`. Your mission is to discover GitHub
+repositories that host SMT-LIB benchmarks, exclude the ones that are already part
+of the official SMT-LIB benchmark distribution (linked from smtlib.org and published
+on Zenodo), and post a curated summary of community-contributed benchmark links as a
+GitHub Discussion.
+
+## Step 1: Load Cache and Determine Run Mode
+
+Check cache memory for:
+- `official_repos`: set of GitHub repository full names (`owner/repo`) and Zenodo
+  record IDs already identified as official SMT-LIB benchmark sets
+- `known_community_repos`: set of repo full names already listed in a previous report
+- `last_run_date`: ISO-8601 date string of the previous run
+
+Use the cache to skip repos already classified. On the very first run (no cache),
+perform a full discovery pass. On subsequent runs focus on repos pushed or created
+since `last_run_date`.
+
+## Step 2: Collect Official SMT-LIB Benchmark Sets to Exclude
+
+### 2.1 Scrape smtlib.org
+
+Fetch the SMT-LIB benchmarks page and extract all linked Zenodo DOIs and GitHub URLs:
+
+```bash
+curl -s "https://smtlib.cs.uiowa.edu/benchmarks.shtml" -o /tmp/smtlib-benchmarks.html
+# Also try the main page and any mirror
+curl -s "https://smtlib.cs.uiowa.edu/" -o /tmp/smtlib-home.html
+```
+
+Parse both files for:
+- Zenodo DOI links (`doi.org/10.5281/zenodo.*` or `zenodo.org/record/*`)
+- GitHub repository URLs (`github.com/...`)
+- Any other hosted benchmark archive links
+
+### 2.2 Enumerate Zenodo SMT-LIB Community Records
+
+Query the Zenodo API for all records in the SMT-LIB community:
+
+```bash
+curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=1" \
+  -o /tmp/zenodo-smtlib-page1.json
+
+# Check if there are more pages (paginate until empty)
+curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=2" \
+  -o /tmp/zenodo-smtlib-page2.json 2>/dev/null || true
+
+curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=3" \
+  -o /tmp/zenodo-smtlib-page3.json 2>/dev/null || true
+```
+
+For each Zenodo record extract:
+- Record ID (e.g. `5827900`)
+- Title
+- Any GitHub repository URLs listed in the description or related identifiers
+
+```bash
+python3 - <<'PYEOF'
+import json, re
+
+def extract_github_repos(text):
+    pattern = r'github\.com/([A-Za-z0-9_.-]+/[A-Za-z0-9_.-]+)'
+    return set(re.findall(pattern, text or ''))
+
+official_repos = set()
+official_zenodo_ids = set()
+
+for fname in ['/tmp/zenodo-smtlib-page1.json', '/tmp/zenodo-smtlib-page2.json',
+              '/tmp/zenodo-smtlib-page3.json']:
+    try:
+        data = json.load(open(fname))
+    except Exception:
+        continue
+    for hit in data.get('hits', {}).get('hits', []):
+        rid = str(hit.get('id', ''))
+        official_zenodo_ids.add(rid)
+        metadata = hit.get('metadata', {})
+        description = metadata.get('description', '')
+        related = ' '.join(
+            r.get('identifier', '')
+            for r in metadata.get('related_identifiers', [])
+        )
+        title = metadata.get('title', '')
+        for repo in extract_github_repos(description + ' ' + related + ' ' + title):
+            official_repos.add(repo.lower().rstrip('.'))
+
+with open('/tmp/smtlib-benchmarks.html') as f:
+    html = f.read()
+for repo in extract_github_repos(html):
+    official_repos.add(repo.lower().rstrip('.'))
+
+print("OFFICIAL_ZENODO_IDS:", ','.join(sorted(official_zenodo_ids)) or '(none)')
+print("OFFICIAL_GITHUB_REPOS:", ','.join(sorted(official_repos)) or '(none)')
+PYEOF
+```
+
+### 2.3 Well-Known Official Repository Patterns
+
+Regardless of the above scrape, always exclude:
+- Any repo under the `SMT-LIB` GitHub organization (`SMT-LIB/*`)
+- Any repo whose name matches `smt-comp-*` that is under `SMT-Competition` org
+- The Z3 repo itself (`Z3Prover/z3`) and its forks
+
+Combine all official sources into a single exclusion set stored at
+`/tmp/official_exclusions.txt` (one `owner/repo` pattern per line, lowercase).
+
+## Step 3: Search GitHub for Community SMT-LIB Benchmark Repositories
+
+Use multiple GitHub search strategies to find repos containing `.smt2` benchmark
+files that are NOT part of the official distribution. Search for repos updated
+since the last run date (or the last 90 days for the initial run).
+
+Compute the cutoff date first:
+```bash
+CUTOFF=$(date -d "90 days ago" +%Y-%m-%d 2>/dev/null || date -v-90d +%Y-%m-%d)
+echo "Using cutoff date: $CUTOFF"
+```
+
+### Search Strategies
+
+Use the GitHub MCP server tools to run these searches:
+
+1. **Topic search**: `topic:smtlib pushed:>$CUTOFF`
+2. **Topic search variant**: `topic:smt-lib pushed:>$CUTOFF`
+3. **Topic search variant**: `topic:smt2 pushed:>$CUTOFF`
+4. **Benchmark filename pattern**: `filename:*.smt2 pushed:>$CUTOFF` (limit to top results)
+5. **Benchmarks directory pattern**: `path:benchmarks *.smt2 in:path pushed:>$CUTOFF`
+6. **README mention**: `SMT-LIB benchmarks in:readme stars:>2 pushed:>$CUTOFF`
+7. **Organization-level search**: repos under `SMT-Competition` org, if any are not already excluded
+
+For each search, collect: `full_name`, `html_url`, `description`, `stargazers_count`,
+`updated_at`, `pushed_at`, `default_branch`, `topics`.
+
+Deduplicate by `full_name`. Limit to 200 total candidates before filtering.
+
+## Step 4: Filter Out Official Benchmark Sets
+
+For each candidate repo:
+
+1. Check if `full_name.lower()` is in the exclusion set from Step 2.
+2. Check if the repo is owned by `SMT-LIB` or `SMT-Competition` org (case-insensitive).
+3. Check if the repo is a fork of a known official repo (if `fork: true` and parent is
+   in the exclusion set).
+
+Discard repos that match any of the above. Keep the rest as community benchmarks.
+
+Also apply quality filters to reduce noise — skip repos that:
+- Have 0 stars and fewer than 3 `.smt2` files (likely a student homework or test repo
+  with minimal public value); use your judgement — if the repo description clearly
+  describes a research benchmark, keep it regardless of star count.
+- Were created but never pushed after creation (empty repos).
+- Have names that are clearly course assignment repositories
+  (e.g. contain `homework`, `assignment`, `hw[0-9]`, `cs[0-9]{3}`).
+
+## Step 5: Classify Remaining Repos
+
+For each repo that survives filtering, classify it into one of these categories:
+
+| Category | Description |
+|----------|-------------|
+| **Solver evaluation** | Benchmarks used to evaluate or compare SMT solvers |
+| **Verification** | Benchmarks from program verification or model checking |
+| **Security / CTF** | Benchmarks from security research or CTF challenges |
+| **Theory / logic** | Benchmarks exploring specific SMT theories or logics |
+| **Tool output** | Benchmarks generated by another tool (e.g. a compiler, fuzzer) |
+| **Education** | Course materials or tutorials with benchmark examples |
+| **Other / unknown** | Does not fit another category |
+
+Base the classification on: repo description, README (fetch if web-fetch is available),
+topics, and directory structure. A single brief `web-fetch` of the repo's README is
+sufficient; do not fetch individual `.smt2` files.
+
+Note the dominant SMT logic(s) present, if discernible from the description or topics
+(e.g. QF_BV, QF_LIA, QF_S, NIA, …).
+
+## Step 6: Generate the Discussion Report
+
+Create a GitHub Discussion. Use heading level 3 or deeper (`###`, `####`, …) for all
+section headers; never use `##` or `#` in the body.
+Wrap long tables in `<details>` tags to keep the report scannable.
+
+Title: `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`
+
+Structure the report as follows:
+
+```markdown
+**Period covered**: [cutoff date] – [today's date]
+**Repositories found**: N community repos (after excluding M official sets)
+**New this run**: N (not listed in previous report)
+
+### Overview
+
+1–2 sentences summarising the breadth of community SMT-LIB benchmarks found.
+
+### Community Benchmark Repositories
+
+Use `###` for category headers. Within each category, list repos as a markdown table.
+For each repo include:
+- Repo link (`[owner/repo](html_url)`)
+- Stars
+- Last pushed date
+- Dominant logic(s) (if known)
+- Brief description (from repo description or README, max 120 chars)
+
+#### Solver Evaluation
+
+| Repository | ⭐ | Last pushed | Logic(s) | Description |
+|------------|-----|------------|---------|-------------|
+| [owner/repo](url) | N | YYYY-MM-DD | QF_BV, QF_LIA | … |
+
+#### Verification
+
+[same table structure]
+
+#### Security / CTF
+
+[same table structure]
+
+#### Theory / Logic
+
+[same table structure]
+
+#### Tool Output
+
+[same table structure]
+
+#### Education
+
+[same table structure]
+
+#### Other / Unknown
+
+[same table structure]
+
+---
+
+### Exclusions Applied
+
+<details>
+<summary>Official SMT-LIB sets excluded from this report</summary>
+
+List Zenodo record IDs and GitHub repos identified as official distributions.
+
+| Source | Identifier | Notes |
+|--------|-----------|-------|
+| Zenodo | [10.5281/zenodo.XXXXXXX](https://zenodo.org/record/XXXXXXX) | Official QF_BV benchmark set |
+| GitHub | [SMT-LIB/benchmarks-non-incremental](https://github.com/SMT-LIB/benchmarks-non-incremental) | |
+
+</details>
+
+---
+
+### Methodology
+
+Brief note on search queries used, cutoff date, and any quality filters applied.
+```
+
+## Step 7: Update Cache Memory
+
+After posting the discussion, update cache memory with:
+- `official_repos`: updated exclusion set (union of previous + newly found)
+- `known_community_repos`: union of previous + repos listed in this report
+- `last_run_date`: today's ISO-8601 date
+- `report_url`: URL of the GitHub Discussion created
+
+## Guidelines
+
+- **Be conservative with exclusions**: when in doubt whether a repo is "official",
+  keep it in the community list rather than silently dropping it.
+- **Be accurate**: only include repos that genuinely contain SMT-LIB `.smt2` files
+  or clearly describe themselves as SMT-LIB benchmark collections.
+- **Avoid noise**: student homework repos and trivially small repos add clutter;
+  apply the quality filters from Step 4 judiciously.
+- **No source code changes**: DO NOT create pull requests or modify any source files.
+- **No copyrighted content**: DO NOT reproduce benchmark file contents; only post
+  links and metadata.
+- **Always cite sources**: include the full GitHub URL for every listed repository.
+- **Use cache**: skip repos already classified in a previous run to keep runtime short.
+- **Fail gracefully**: if GitHub search rate-limits the workflow, post whatever was
+  collected so far with a note that the search was incomplete.
+
+## Important Notes
+
+- DO NOT create pull requests or modify source files.
+- DO close older SMT-LIB Benchmarks discussions automatically (configured).
+- DO always call `create_discussion` or `noop` before the workflow ends.
+  Failing to produce any safe output triggers an automatic failure issue.
+- DO use cache memory to avoid re-processing repos already surveyed.
+- DO limit individual `web-fetch` calls (README fetches) to repos where the
+  description alone is insufficient for classification.