Best AI models for reasoning
GPQA Diamond is a set of graduate-level science questions written by domain experts and filtered so that PhD students with internet access still struggle. It's the most reliable signal we have for "does this model actually reason" vs "is it pattern-matching training data".
- 🥇grok-4xAI Corp.·$3.00 / $15.00 per 1M87.5%87.5%
- 🥈
gpt-5.4OpenAI Inc.·$2.50 / $15.00 per 1M86.5%86.5% - 🥉
gpt-5.2OpenAI Inc.·$1.75 / $14.00 per 1M84.8%84.8% - 4
gemini-2.5-proGoogle LLC (Gemini API)·$1.25 / $10.00 per 1M84.0%84.0% - 5claude-opus-4-7Anthropic PBC·$5.00 / $25.00 per 1M83.4%83.4%
- 6
o3OpenAI Inc.·$2.00 / $8.00 per 1M83.3%83.3% - 7
gpt-5.1OpenAI Inc.·$1.25 / $10.00 per 1M83.2%83.2% - 8
gpt-5.2-codexOpenAI Responses·$1.75 / $14.00 per 1M82.1%82.1% - 9
gpt-5-chatOpenAI Inc.·$1.25 / $10.00 per 1M81.7%81.7% - 10claude-opus-4-6Anthropic PBC·$5.00 / $25.00 per 1M81.2%81.2%
- 11claude-opus-4-5Anthropic PBC·$5.00 / $25.00 per 1M79.6%79.6%
- 12
o1OpenAI Inc.·$15.00 / $60.00 per 1M78.0%78.0% - 13claude-sonnet-4-6Anthropic PBC·$3.00 / $15.00 per 1M76.8%76.8%
- 14grok-3xAI Corp.·$5.00 / $25.00 per 1M75.4%75.4%
- 15
o3-miniOpenAI Inc.·$1.10 / $4.40 per 1M74.8%74.8% - 16claude-sonnet-4-5Anthropic PBC·$3.00 / $15.00 per 1M74.2%74.2%
- 17
gpt-4.1OpenAI Inc.·$2.00 / $8.00 per 1M74.1%74.1% - 18
deepseek-ai/DeepSeek-R1Together AI Inc.·$3.00 / $7.00 per 1M71.5%71.5% - 19claude-sonnet-4Anthropic PBC·$3.00 / $15.00 per 1M70.1%70.1%
- 20
kimi-k2Google LLC (Vertex AI)·$0.60 / $2.50 per 1M70.0%70.0% - 21
gemini-2.5-flashGoogle LLC (Gemini API)·$0.30 / $2.50 per 1M68.3%68.3% - 22
claude-3-7-sonnet@us-east5Google LLC (Vertex AI)·$3.00 / $15.00 per 1M65.2%65.2% - 23
gpt-4.1-miniOpenAI Inc.·$0.40 / $1.60 per 1M64.8%64.8% - 24MiniMax-M2MiniMax·$0.30 / $1.20 per 1M62.5%62.5%
- 25claude-haiku-4-5Anthropic PBC·$1.00 / $5.00 per 1M62.4%62.4%
- 26
deepseek-ai/DeepSeek-V3Together AI Inc.·$1.25 / $1.25 per 1M59.1%59.1% - 27
gpt-4oOpenAI Inc.·$2.50 / $10.00 per 1M53.6%53.6% - 28
meta-llama/llama-3.3-70b-instructNovita AI·$0.39 / $0.39 per 1M50.5%50.5% - 29
gpt-4.1-nanoOpenAI Inc.·$0.10 / $0.40 per 1M47.3%47.3%
Explore other rankings
How we rank
Scores for GPQA Diamond are sourced from official model cards, Artificial Analysis, and public leaderboards. When a model is available through multiple providers (e.g. Anthropic direct, AWS Bedrock, Google Vertex), we show one canonical entry per model family so the ranking isn't polluted by duplicates. Benchmarks measure specific skills — always validate on your own workload before committing.
One API for every model on this list
Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.
Get started free