Agent-Logs-Url: https://github.com/Z3Prover/z3/sessions/35080496-0ba4-4fbc-a441-5f5c430e978c Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
4.4 KiB
TPTP Integration Test Report — copilot/integrate-tptp-into-src-shell
Date: 2026-05-12
Branch: copilot/integrate-tptp-into-src-shell
Z3 version: 4.17.0 (64-bit)
Test corpus: 500 randomly-selected TPTP v8.1.2 problems
Domains sampled: ARI (Arithmetic, n=247), SET (Set Theory, n=168), SYN (Syntactic, n=85)
Invocation: z3 -T:5 <file.p>
Note:
tptp.orgwas not accessible from the test sandbox (DNS blocked). Problems were drawn from the GoelandProver/GoelandBenchmarks repository, which mirrors TPTP v8.1.2 problem files in the same format.
Executive Summary
| Category | Count | Pct |
|---|---|---|
| Correct answer | 178 | 35.6% |
| Wrong answer | 25 | 5.0% |
| Parse error - negative literal | 52 | 10.4% |
| Parse error - rational literal | 40 | 8.0% |
| Parse error - TFF type declaration | 96 | 19.2% |
| Sort mismatch (TFF arithmetic) | 46 | 9.2% |
| Crash (unhandled z3_error) | 40 | 8.0% |
| Timeout (process killed at 5s) | 23 | 4.6% |
Overall correct rate: 178/500 = 35.6%
Timing
| Metric | Value |
|---|---|
| Minimum | 0.010 s |
| Maximum | 10.141 s |
| Mean | 1.099 s |
| Median | 0.029 s |
| Under 1 s | 435 (87.0%) |
| 1-5 s | 2 (0.4%) |
| Over 5 s | 63 (12.6%) |
Per-Domain Results
| Domain | Correct | Total | Rate |
|---|---|---|---|
| ARI (Arithmetic) | 0 | 247 | 0.0% |
| SET (Set Theory) | 113 | 168 | 67.3% |
| SYN (Syntactic) | 65 | 85 | 76.5% |
Bug Findings
Bug 1 - Negative integer literals not parsed (52 files)
The TFF parser rejects negative integer literals such as -2 or -4.
Example (ARI007_1.p):
tff(n2_less_2,conjecture, $less(-2,2) ).
Error: TPTP parse error: unexpected character '-' at 32:11
Every TFF problem using negative constants fails immediately.
Bug 2 - Rational/real literals not parsed (40 files)
The parser rejects rational literals written as p/q (TPTP syntax for rationals, e.g. 1/2).
Error pattern: TPTP parse error: unexpected character '/' at 32:11
Bug 3 - TFF user-type declarations parse error (96 files)
TFF problems that introduce predicate or function type signatures fail to parse.
Example (ARI094_1.p):
tff(p_type,type, p: $int > $o ).
Error: TPTP parse error: expected ')' at 32:19
The : $int > $o type product syntax is not supported.
Bug 4 - Sort mismatch for TFF typed variables (46 files)
When a TFF formula uses typed quantifier variables such as [X: $int], the TPTP frontend binds them to the uninterpreted sort U rather than Z3's Int sort, causing a sort mismatch when they are passed to arithmetic built-ins.
Example (ARI005_1.p):
tff(prove_12_less_something,conjecture, ? [X: $int] : $less(12,X) ).
Error: TPTP frontend error: Sort mismatch at argument #2 for function (declare-fun $less (U U) Bool) supplied sort: Int
Bug 5 - Arithmetic built-ins treated as uninterpreted (25 wrong answers)
Even when parsing succeeds, simple TFF arithmetic conjectures return CounterSatisfiable instead of Theorem. The arithmetic predicates $less, $lesseq, $greater, $greatereq are being declared as uninterpreted predicates over sort U, not mapped to Z3 built-in arithmetic relations.
Examples of wrong answers:
- ARI002_1.p:
~ $less(3,2)→ expected Theorem, got CounterSatisfiable - ARI014_1.p:
$lesseq(2,2)→ expected Theorem, got CounterSatisfiable - ARI015_1.p:
$less(2,3)→ expected Theorem, got CounterSatisfiable
Bug 6 - Unhandled z3_error exception on timeout (40 crashes)
When a problem hits the timeout, Z3 crashes with:
terminate called after throwing an instance of 'z3_error'
what(): timeout
instead of printing % SZS status Timeout and exiting cleanly. The TPTP frontend does not catch z3_error around the solver call.
SET and SYN Domain (67-77% Success)
The SET and SYN domains use purely first-order FOF (untyped) formulas with no arithmetic and work reasonably well. Main failure modes are timeouts on harder combinatorial problems.
Recommendations
- Fix negative/rational literal parsing in the TPTP5 lexer
- Map TFF arithmetic sorts ($int, $rat, $real) to Z3 Int/Real sorts
- Map arithmetic predicates ($less, $lesseq, etc.) to Z3 built-in arithmetic relations
- Support TFF type declaration syntax (name: sort > sort)
- Catch z3_error in the TPTP frontend and output SZS Timeout status instead of crashing