3
0
Fork 0
mirror of https://github.com/Z3Prover/z3 synced 2026-05-16 23:25:36 +00:00
z3/tptp-test-report.md
copilot-swe-agent[bot] a07922f2b7
Add TPTP integration test report (500 files, z3 -T:5)
Agent-Logs-Url: https://github.com/Z3Prover/z3/sessions/35080496-0ba4-4fbc-a441-5f5c430e978c

Co-authored-by: NikolajBjorner <3085284+NikolajBjorner@users.noreply.github.com>
2026-05-12 17:40:32 +00:00

4.4 KiB

TPTP Integration Test Report — copilot/integrate-tptp-into-src-shell

Date: 2026-05-12
Branch: copilot/integrate-tptp-into-src-shell
Z3 version: 4.17.0 (64-bit)
Test corpus: 500 randomly-selected TPTP v8.1.2 problems
Domains sampled: ARI (Arithmetic, n=247), SET (Set Theory, n=168), SYN (Syntactic, n=85)
Invocation: z3 -T:5 <file.p>

Note: tptp.org was not accessible from the test sandbox (DNS blocked). Problems were drawn from the GoelandProver/GoelandBenchmarks repository, which mirrors TPTP v8.1.2 problem files in the same format.


Executive Summary

Category Count Pct
Correct answer 178 35.6%
Wrong answer 25 5.0%
Parse error - negative literal 52 10.4%
Parse error - rational literal 40 8.0%
Parse error - TFF type declaration 96 19.2%
Sort mismatch (TFF arithmetic) 46 9.2%
Crash (unhandled z3_error) 40 8.0%
Timeout (process killed at 5s) 23 4.6%

Overall correct rate: 178/500 = 35.6%


Timing

Metric Value
Minimum 0.010 s
Maximum 10.141 s
Mean 1.099 s
Median 0.029 s
Under 1 s 435 (87.0%)
1-5 s 2 (0.4%)
Over 5 s 63 (12.6%)

Per-Domain Results

Domain Correct Total Rate
ARI (Arithmetic) 0 247 0.0%
SET (Set Theory) 113 168 67.3%
SYN (Syntactic) 65 85 76.5%

Bug Findings

Bug 1 - Negative integer literals not parsed (52 files)

The TFF parser rejects negative integer literals such as -2 or -4.

Example (ARI007_1.p):

tff(n2_less_2,conjecture, $less(-2,2) ).

Error: TPTP parse error: unexpected character '-' at 32:11

Every TFF problem using negative constants fails immediately.


Bug 2 - Rational/real literals not parsed (40 files)

The parser rejects rational literals written as p/q (TPTP syntax for rationals, e.g. 1/2).

Error pattern: TPTP parse error: unexpected character '/' at 32:11


Bug 3 - TFF user-type declarations parse error (96 files)

TFF problems that introduce predicate or function type signatures fail to parse.

Example (ARI094_1.p):

tff(p_type,type, p: $int > $o ).

Error: TPTP parse error: expected ')' at 32:19

The : $int > $o type product syntax is not supported.


Bug 4 - Sort mismatch for TFF typed variables (46 files)

When a TFF formula uses typed quantifier variables such as [X: $int], the TPTP frontend binds them to the uninterpreted sort U rather than Z3's Int sort, causing a sort mismatch when they are passed to arithmetic built-ins.

Example (ARI005_1.p):

tff(prove_12_less_something,conjecture, ? [X: $int] : $less(12,X) ).

Error: TPTP frontend error: Sort mismatch at argument #2 for function (declare-fun $less (U U) Bool) supplied sort: Int


Bug 5 - Arithmetic built-ins treated as uninterpreted (25 wrong answers)

Even when parsing succeeds, simple TFF arithmetic conjectures return CounterSatisfiable instead of Theorem. The arithmetic predicates $less, $lesseq, $greater, $greatereq are being declared as uninterpreted predicates over sort U, not mapped to Z3 built-in arithmetic relations.

Examples of wrong answers:

  • ARI002_1.p: ~ $less(3,2) → expected Theorem, got CounterSatisfiable
  • ARI014_1.p: $lesseq(2,2) → expected Theorem, got CounterSatisfiable
  • ARI015_1.p: $less(2,3) → expected Theorem, got CounterSatisfiable

Bug 6 - Unhandled z3_error exception on timeout (40 crashes)

When a problem hits the timeout, Z3 crashes with:

terminate called after throwing an instance of 'z3_error'
  what():  timeout

instead of printing % SZS status Timeout and exiting cleanly. The TPTP frontend does not catch z3_error around the solver call.


SET and SYN Domain (67-77% Success)

The SET and SYN domains use purely first-order FOF (untyped) formulas with no arithmetic and work reasonably well. Main failure modes are timeouts on harder combinatorial problems.


Recommendations

  1. Fix negative/rational literal parsing in the TPTP5 lexer
  2. Map TFF arithmetic sorts ($int, $rat, $real) to Z3 Int/Real sorts
  3. Map arithmetic predicates ($less, $lesseq, etc.) to Z3 built-in arithmetic relations
  4. Support TFF type declaration syntax (name: sort > sort)
  5. Catch z3_error in the TPTP frontend and output SZS Timeout status instead of crashing