Model
Score
Benchmark leaderboard performance
GPT 5.4(OpenAI)
67.2% ± 2.4%Opus 4.6(Anthropic)
65.7% ± 2.6%GPT 5.2 Codex(OpenAI)
65.3% ± 2.6%Gemini 3.1 Pro(Google)
65.1% ± 2.4%0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
View all 8 results
Model
Score
Benchmark leaderboard performance
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
View all 8 results