Add smtlib-benchmark-finder agentic workflow (#9629)

Monthly agentic workflow that discovers community-contributed SMT-LIB benchmark repositories on GitHub, excluding official distributions from smtlib.org and Zenodo, and posts a categorised summary as a GitHub Discussion. ## Workflow steps - **Exclusion list** — scrapes `smtlib.cs.uiowa.edu` + paginates the [Zenodo SMT-LIB community](https://zenodo.org/communities/smt-lib/) (up to 300 records) to extract official repo URLs; hard-excludes `SMT-LIB/*` and `SMT-Competition/*` orgs - **GitHub search** — 7 search strategies (topic tags `smtlib`/`smt-lib`/`smt2`, filename/path patterns, README mentions) scoped to repos pushed since the last run (90-day window on first run) - **Filter + classify** — removes official sets and noise (homework repos, empty repos); categorises survivors into: solver evaluation, verification, security/CTF, theory/logic, tool output, education, other - **Discussion report** — posts `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`; older discussions auto-closed after 90 days - **Cache** — stores classified repo sets and run date so subsequent runs are incremental ## Files - `.github/workflows/smtlib-benchmark-finder.md` — workflow definition - `.github/workflows/smtlib-benchmark-finder.lock.yml` — compiled GitHub Actions YAML (`gh aw compile`) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-06-14 21:05:39 +00:00 · 2026-05-26 15:28:11 -07:00 · 2026-05-26 15:28:11 -07:00 · 29c4e539f6
commit 29c4e539f6
parent c3f5365a95
3 changed files with 1791 additions and 5 deletions
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@ -1,6 +1,8 @@
 version: 2
 updates:
-  - package-ecosystem: "github-actions"
+- directory: /
-    directory: "/"
+  ignore:
-    schedule:
+  - dependency-name: "github/gh-aw-actions/**" # Managed by gh aw compile. Version-locked to the gh-aw compiler; do not bump.
-      interval: "weekly"
+  package-ecosystem: github-actions
  schedule:
    interval: weekly
 version: 2
--- a/.github/workflows/smtlib-benchmark-finder.lock.yml
+++ b/.github/workflows/smtlib-benchmark-finder.lock.yml
--- a/.github/workflows/smtlib-benchmark-finder.md
+++ b/.github/workflows/smtlib-benchmark-finder.md
@ -0,0 +1,342 @@
 ---
 description: >
  Monthly SMTLIB Benchmark Finder.
  Searches GitHub for repositories containing SMT-LIB benchmarks (.smt2 files),
  excludes repositories that belong to the official SMT-LIB benchmark sets
  (linked from smtlib.org and hosted on Zenodo), and posts a curated summary
  of community-contributed benchmark links as a GitHub Discussion.
 on:
  schedule:
    - cron: "0 8 1 * *"
  workflow_dispatch:
 timeout-minutes: 60
 permissions: read-all
 network:
  allowed:
    - defaults
    - github
    - smtlib.cs.uiowa.edu
    - zenodo.org
 tools:
  cache-memory: true
  web-fetch: {}
  github:
    toolsets: [default, repos]
  bash: [":*"]
 safe-outputs:
  mentions: false
  allowed-github-references: []
  max-bot-mentions: 1
  create-discussion:
    title-prefix: "[SMT-LIB Benchmarks] "
    category: "Agentic Workflows"
    close-older-discussions: true
    expires: 90d
  missing-tool:
    create-issue: true
  noop:
    report-as-issue: false
 ---
 # SMTLIB Benchmark Finder
 ## Job Description
 Your name is ${{ github.workflow }}. You are a research analyst for the Z3 theorem
 prover repository `${{ github.repository }}`. Your mission is to discover GitHub
 repositories that host SMT-LIB benchmarks, exclude the ones that are already part
 of the official SMT-LIB benchmark distribution (linked from smtlib.org and published
 on Zenodo), and post a curated summary of community-contributed benchmark links as a
 GitHub Discussion.
 ## Step 1: Load Cache and Determine Run Mode
 Check cache memory for:
 - `official_repos`: set of GitHub repository full names (`owner/repo`) and Zenodo
  record IDs already identified as official SMT-LIB benchmark sets
 - `known_community_repos`: set of repo full names already listed in a previous report
 - `last_run_date`: ISO-8601 date string of the previous run
 Use the cache to skip repos already classified. On the very first run (no cache),
 perform a full discovery pass. On subsequent runs focus on repos pushed or created
 since `last_run_date`.
 ## Step 2: Collect Official SMT-LIB Benchmark Sets to Exclude
 ### 2.1 Scrape smtlib.org
 Fetch the SMT-LIB benchmarks page and extract all linked Zenodo DOIs and GitHub URLs:
 ```bash
 curl -s "https://smtlib.cs.uiowa.edu/benchmarks.shtml" -o /tmp/smtlib-benchmarks.html
 # Also try the main page and any mirror
 curl -s "https://smtlib.cs.uiowa.edu/" -o /tmp/smtlib-home.html
 ```
 Parse both files for:
 - Zenodo DOI links (`doi.org/10.5281/zenodo.*` or `zenodo.org/record/*`)
 - GitHub repository URLs (`github.com/...`)
 - Any other hosted benchmark archive links
 ### 2.2 Enumerate Zenodo SMT-LIB Community Records
 Query the Zenodo API for all records in the SMT-LIB community:
 ```bash
 curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=1" \
  -o /tmp/zenodo-smtlib-page1.json
 # Check if there are more pages (paginate until empty)
 curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=2" \
  -o /tmp/zenodo-smtlib-page2.json 2>/dev/null || true
 curl -s "https://zenodo.org/api/records?communities=smt-lib&size=100&page=3" \
  -o /tmp/zenodo-smtlib-page3.json 2>/dev/null || true
 ```
 For each Zenodo record extract:
 - Record ID (e.g. `5827900`)
 - Title
 - Any GitHub repository URLs listed in the description or related identifiers
 ```bash
 python3 - <<'PYEOF'
 import json, re
 def extract_github_repos(text):
    pattern = r'github\.com/([A-Za-z0-9_.-]+/[A-Za-z0-9_.-]+)'
    return set(re.findall(pattern, text or ''))
 official_repos = set()
 official_zenodo_ids = set()
 for fname in ['/tmp/zenodo-smtlib-page1.json', '/tmp/zenodo-smtlib-page2.json',
              '/tmp/zenodo-smtlib-page3.json']:
    try:
        data = json.load(open(fname))
    except Exception:
        continue
    for hit in data.get('hits', {}).get('hits', []):
        rid = str(hit.get('id', ''))
        official_zenodo_ids.add(rid)
        metadata = hit.get('metadata', {})
        description = metadata.get('description', '')
        related = ' '.join(
            r.get('identifier', '')
            for r in metadata.get('related_identifiers', [])
        )
        title = metadata.get('title', '')
        for repo in extract_github_repos(description + ' ' + related + ' ' + title):
            official_repos.add(repo.lower().rstrip('.'))
 with open('/tmp/smtlib-benchmarks.html') as f:
    html = f.read()
 for repo in extract_github_repos(html):
    official_repos.add(repo.lower().rstrip('.'))
 print("OFFICIAL_ZENODO_IDS:", ','.join(sorted(official_zenodo_ids)) or '(none)')
 print("OFFICIAL_GITHUB_REPOS:", ','.join(sorted(official_repos)) or '(none)')
 PYEOF
 ```
 ### 2.3 Well-Known Official Repository Patterns
 Regardless of the above scrape, always exclude:
 - Any repo under the `SMT-LIB` GitHub organization (`SMT-LIB/*`)
 - Any repo whose name matches `smt-comp-*` that is under `SMT-Competition` org
 - The Z3 repo itself (`Z3Prover/z3`) and its forks
 Combine all official sources into a single exclusion set stored at
 `/tmp/official_exclusions.txt` (one `owner/repo` pattern per line, lowercase).
 ## Step 3: Search GitHub for Community SMT-LIB Benchmark Repositories
 Use multiple GitHub search strategies to find repos containing `.smt2` benchmark
 files that are NOT part of the official distribution. Search for repos updated
 since the last run date (or the last 90 days for the initial run).
 Compute the cutoff date first:
 ```bash
 CUTOFF=$(date -d "90 days ago" +%Y-%m-%d 2>/dev/null || date -v-90d +%Y-%m-%d)
 echo "Using cutoff date: $CUTOFF"
 ```
 ### Search Strategies
 Use the GitHub MCP server tools to run these searches:
 1. **Topic search**: `topic:smtlib pushed:>$CUTOFF`
 2. **Topic search variant**: `topic:smt-lib pushed:>$CUTOFF`
 3. **Topic search variant**: `topic:smt2 pushed:>$CUTOFF`
 4. **Benchmark filename pattern**: `filename:*.smt2 pushed:>$CUTOFF` (limit to top results)
 5. **Benchmarks directory pattern**: `path:benchmarks *.smt2 in:path pushed:>$CUTOFF`
 6. **README mention**: `SMT-LIB benchmarks in:readme stars:>2 pushed:>$CUTOFF`
 7. **Organization-level search**: repos under `SMT-Competition` org, if any are not already excluded
 For each search, collect: `full_name`, `html_url`, `description`, `stargazers_count`,
 `updated_at`, `pushed_at`, `default_branch`, `topics`.
 Deduplicate by `full_name`. Limit to 200 total candidates before filtering.
 ## Step 4: Filter Out Official Benchmark Sets
 For each candidate repo:
 1. Check if `full_name.lower()` is in the exclusion set from Step 2.
 2. Check if the repo is owned by `SMT-LIB` or `SMT-Competition` org (case-insensitive).
 3. Check if the repo is a fork of a known official repo (if `fork: true` and parent is
   in the exclusion set).
 Discard repos that match any of the above. Keep the rest as community benchmarks.
 Also apply quality filters to reduce noise — skip repos that:
 - Have 0 stars and fewer than 3 `.smt2` files (likely a student homework or test repo
  with minimal public value); use your judgement — if the repo description clearly
  describes a research benchmark, keep it regardless of star count.
 - Were created but never pushed after creation (empty repos).
 - Have names that are clearly course assignment repositories
  (e.g. contain `homework`, `assignment`, `hw[0-9]`, `cs[0-9]{3}`).
 ## Step 5: Classify Remaining Repos
 For each repo that survives filtering, classify it into one of these categories:
 | Category | Description |
 |----------|-------------|
 | **Solver evaluation** | Benchmarks used to evaluate or compare SMT solvers |
 | **Verification** | Benchmarks from program verification or model checking |
 | **Security / CTF** | Benchmarks from security research or CTF challenges |
 | **Theory / logic** | Benchmarks exploring specific SMT theories or logics |
 | **Tool output** | Benchmarks generated by another tool (e.g. a compiler, fuzzer) |
 | **Education** | Course materials or tutorials with benchmark examples |
 | **Other / unknown** | Does not fit another category |
 Base the classification on: repo description, README (fetch if web-fetch is available),
 topics, and directory structure. A single brief `web-fetch` of the repo's README is
 sufficient; do not fetch individual `.smt2` files.
 Note the dominant SMT logic(s) present, if discernible from the description or topics
 (e.g. QF_BV, QF_LIA, QF_S, NIA, …).
 ## Step 6: Generate the Discussion Report
 Create a GitHub Discussion. Use heading level 3 or deeper (`###`, `####`, …) for all
 section headers; never use `##` or `#` in the body.
 Wrap long tables in `<details>` tags to keep the report scannable.
 Title: `[SMT-LIB Benchmarks] Community Benchmark Repository Survey — [Month YYYY]`
 Structure the report as follows:
 ```markdown
 **Period covered**: [cutoff date] – [today's date]
 **Repositories found**: N community repos (after excluding M official sets)
 **New this run**: N (not listed in previous report)
 ### Overview
 1–2 sentences summarising the breadth of community SMT-LIB benchmarks found.
 ### Community Benchmark Repositories
 Use `###` for category headers. Within each category, list repos as a markdown table.
 For each repo include:
 - Repo link (`[owner/repo](html_url)`)
 - Stars
 - Last pushed date
 - Dominant logic(s) (if known)
 - Brief description (from repo description or README, max 120 chars)
 #### Solver Evaluation
 | Repository | ⭐ | Last pushed | Logic(s) | Description |
 |------------|-----|------------|---------|-------------|
 | [owner/repo](url) | N | YYYY-MM-DD | QF_BV, QF_LIA | … |
 #### Verification
 [same table structure]
 #### Security / CTF
 [same table structure]
 #### Theory / Logic
 [same table structure]
 #### Tool Output
 [same table structure]
 #### Education
 [same table structure]
 #### Other / Unknown
 [same table structure]
 ---
 ### Exclusions Applied
 <details>
 <summary>Official SMT-LIB sets excluded from this report</summary>
 List Zenodo record IDs and GitHub repos identified as official distributions.
 | Source | Identifier | Notes |
 |--------|-----------|-------|
 | Zenodo | [10.5281/zenodo.XXXXXXX](https://zenodo.org/record/XXXXXXX) | Official QF_BV benchmark set |
 | GitHub | [SMT-LIB/benchmarks-non-incremental](https://github.com/SMT-LIB/benchmarks-non-incremental) | |
 </details>
 ---
 ### Methodology
 Brief note on search queries used, cutoff date, and any quality filters applied.
 ```
 ## Step 7: Update Cache Memory
 After posting the discussion, update cache memory with:
 - `official_repos`: updated exclusion set (union of previous + newly found)
 - `known_community_repos`: union of previous + repos listed in this report
 - `last_run_date`: today's ISO-8601 date
 - `report_url`: URL of the GitHub Discussion created
 ## Guidelines
 - **Be conservative with exclusions**: when in doubt whether a repo is "official",
  keep it in the community list rather than silently dropping it.
 - **Be accurate**: only include repos that genuinely contain SMT-LIB `.smt2` files
  or clearly describe themselves as SMT-LIB benchmark collections.
 - **Avoid noise**: student homework repos and trivially small repos add clutter;
  apply the quality filters from Step 4 judiciously.
 - **No source code changes**: DO NOT create pull requests or modify any source files.
 - **No copyrighted content**: DO NOT reproduce benchmark file contents; only post
  links and metadata.
 - **Always cite sources**: include the full GitHub URL for every listed repository.
 - **Use cache**: skip repos already classified in a previous run to keep runtime short.
 - **Fail gracefully**: if GitHub search rate-limits the workflow, post whatever was
  collected so far with a note that the search was incomplete.
 ## Important Notes
 - DO NOT create pull requests or modify source files.
 - DO close older SMT-LIB Benchmarks discussions automatically (configured).
 - DO always call `create_discussion` or `noop` before the workflow ends.
  Failing to produce any safe output triggers an automatic failure issue.
 - DO use cache memory to avoid re-processing repos already surveyed.
 - DO limit individual `web-fetch` calls (README fetches) to repos where the
  description alone is insufficient for classification.