ChessBench Logo

chess-bench

13 models·150 puzzles

Star it on GitHub
Puzzle Browser
Model
88/150 58.7%
Showing all puzzle types.
150 puzzles1 / 150
Puzzle 1
Mate 1800-1200
Black to move and find checkmate in 1 move.
Last moveExpectedLast move: d2c3
Correct
Expected
Qc2#
g2c2
Model
Qc2#
g2c2
Prompt590
Completion2,738
Cost$0.000000
Latency22138ms
Raw Output
g2c2
Puzzle
Benchmarks
Sortable leaderboard across models
13 models
Total Cost: 54.4444Total Tokens: 21,202,143
Click headers to sort
Rank
#1
Grok 4.1 Fast
58.7%
33.3%53.3%93.3%73.3%40%
2.3765
2,294,216
#2
Gemini 3.1 Pro Preview
55.3%
43.3%50%83.3%56.7%43.3%
4.5200
433,511
#3
Grok 4.20 Beta
53.3%
20%43.3%90%83.3%30%
7.8989
1,730,162
#4
Gemini 3 Flash Preview
48.7%
36.7%30%83.3%53.3%40%
0.8700
624,635
#5
Gemini 3.1 Flash Image Preview
46.0%
30%33.3%90%40%36.7%
2.4400
481,223
#6
GPT-5.4
32.7%
6.7%23.3%76.7%43.3%13.3%
7.6800
607,885
#7
Qwen 3.6 Plus
28.0%
20%10%86.7%16.7%6.7%
0.0000
7,667,154
#8
Qwen 3.6 Plus Preview
22.0%
13.3%23.3%56.7%16.7%0%
0.0000
4,428,821
#9
Claude Opus 4.6
16.7%
13.3%6.7%46.7%10%6.7%
13.0000
617,547
#10
GLM-5
12.0%
20%6.7%33.3%0%0%
1.8700
689,388
#11
Claude Sonnet 4.6
10.7%
3.3%6.7%40%3.3%0%
8.9400
704,037
#12
Claude Haiku 4.5
8.7%
13.3%0%30%0%0%
0.1490
359,946
#13
Gemini 2.5 Pro
5.3%
10%0%16.7%0%0%
4.7000
563,618
Sort by accuracy and track-level breakdown; includes total cost and total tokens.