Best AI models for reasoning
GPQA Diamond is a set of graduate-level science questions written by domain experts and filtered so that PhD students with internet access still struggle. It's the most reliable signal we have for "does this model actually reason" vs "is it pattern-matching training data".
- 🥇
gemini-3.1-pro-previewGoogle LLC (Gemini API)·$2.00 / $12.00 per 1M94.1%94.1% - 🥈
gpt-5.5OpenAI Inc.·$5.00 / $30.00 per 1M93.5%93.5% - 🥉minimax-m3MiniMax·$0.30 / $1.20 per 1M92.9%92.9%
- 4
qwen3.7-maxAlibaba Cloud·$2.50 / $7.50 per 1M92.3%92.3% - 5
gemini-3.5-flash@usGoogle LLC (Vertex AI)·$1.50 / $9.00 per 1M92.2%92.2% - 6claude-opus-4-8Anthropic PBC·$5.00 / $25.00 per 1M92.0%92.0%
- 7
gpt-5.4OpenAI Inc.·$2.50 / $15.00 per 1M92.0%92.0% - 8
gpt-5.3-codexOpenAI Responses·$1.75 / $14.00 per 1M91.5%91.5% - 9claude-opus-4-7Anthropic PBC·$5.00 / $25.00 per 1M91.4%91.4%
- 10
kimi-k2.6Moonshot AI·$0.95 / $4.00 per 1M91.1%91.1% - 11
gemini-3-pro-previewGoogle LLC (Gemini API)·$2.00 / $12.00 per 1M90.8%90.8% - 12
gpt-5.2-chatOpenAI Inc.·$1.75 / $14.00 per 1M90.3%90.3% - 13grok-4.3xAI Corp.·$1.25 / $2.50 per 1M90.1%90.1%
- 14
gpt-5.2-codexOpenAI Responses·$1.75 / $14.00 per 1M89.9%89.9% - 15
gemini-3-flash-previewGoogle LLC (Gemini API)·$0.50 / $3.00 per 1M89.8%89.8% - 16claude-opus-4-6Anthropic PBC·$5.00 / $25.00 per 1M89.6%89.6%
- 17
deepseek-v4-flashDeepSeek·$0.14 / $0.28 per 1M89.4%89.4% - 18
qwen/qwen3.5-397b-a17bNovita AI·$0.60 / $3.60 per 1M89.3%89.3% - 19
deepseek-v4-proDeepSeek·$0.43 / $0.87 per 1M88.8%88.8% - 20
qwen3.6-plusAlibaba Cloud·$0.50 / $3.00 per 1M88.2%88.2% - 21
kimi-k2.5Moonshot AI·$0.60 / $3.00 per 1M87.9%87.9% - 22grok-4xAI Corp.·$3.00 / $15.00 per 1M87.7%87.7%
- 23claude-sonnet-4-6Anthropic PBC·$3.00 / $15.00 per 1M87.5%87.5%
- 24
gpt-5.4-miniOpenAI Inc.·$0.75 / $4.50 per 1M87.5%87.5% - 25MiniMax-M2.7MiniMax·$0.30 / $1.20 per 1M87.4%87.4%
- 26
gpt-5.1OpenAI Inc.·$1.25 / $10.00 per 1M87.3%87.3% - 27
xiaomimimo/mimo-v2-proNovita AI·$2.00 / $6.00 per 1M87.0%87.0% - 28
GLM-5.1Z AI·$1.40 / $4.40 per 1M86.8%86.8% - 29claude-opus-4-5Anthropic PBC·$5.00 / $25.00 per 1M86.6%86.6%
- 30
XiaomiMiMo/MiMo-V2.5-ProDeepInfra Inc.·$1.00 / $3.00 per 1M86.6%86.6%
Explore other rankings
How we rank
Scores for GPQA Diamond come from Artificial Analysis, an independent AI benchmarking service. When a model is available through multiple providers (e.g. Anthropic direct, AWS Bedrock, Google Vertex), we show one canonical entry per model family so the ranking isn't polluted by duplicates. Benchmarks measure specific skills — always validate on your own workload before committing.
One API for every model on this list
Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.
Get started free