Requesty

Best AI models for reasoning

GPQA Diamond is a set of graduate-level science questions written by domain experts and filtered so that PhD students with internet access still struggle. It's the most reliable signal we have for "does this model actually reason" vs "is it pattern-matching training data".

  1. 🥇
    Google LLC (Gemini API) logo
    gemini-3.1-pro-preview
    Google LLC (Gemini API)·$2.00 / $12.00 per 1M
    94.1%
  2. 🥈
    OpenAI Inc. logo
    gpt-5.5
    OpenAI Inc.·$5.00 / $30.00 per 1M
    93.5%
  3. 🥉
    minimax-m3
    MiniMax·$0.30 / $1.20 per 1M
    92.9%
  4. 4
    Alibaba Cloud logo
    qwen3.7-max
    Alibaba Cloud·$2.50 / $7.50 per 1M
    92.3%
  5. 5
    Google LLC (Vertex AI) logo
    gemini-3.5-flash@us
    Google LLC (Vertex AI)·$1.50 / $9.00 per 1M
    92.2%
  6. 6
    Anthropic PBC logo
    claude-opus-4-8
    Anthropic PBC·$5.00 / $25.00 per 1M
    92.0%
  7. 7
    OpenAI Inc. logo
    gpt-5.4
    OpenAI Inc.·$2.50 / $15.00 per 1M
    92.0%
  8. 8
    OpenAI Responses logo
    gpt-5.3-codex
    OpenAI Responses·$1.75 / $14.00 per 1M
    91.5%
  9. 9
    Anthropic PBC logo
    claude-opus-4-7
    Anthropic PBC·$5.00 / $25.00 per 1M
    91.4%
  10. 10
    Moonshot AI logo
    kimi-k2.6
    Moonshot AI·$0.95 / $4.00 per 1M
    91.1%
  11. 11
    Google LLC (Gemini API) logo
    gemini-3-pro-preview
    Google LLC (Gemini API)·$2.00 / $12.00 per 1M
    90.8%
  12. 12
    OpenAI Inc. logo
    gpt-5.2-chat
    OpenAI Inc.·$1.75 / $14.00 per 1M
    90.3%
  13. 13
    grok-4.3
    xAI Corp.·$1.25 / $2.50 per 1M
    90.1%
  14. 14
    OpenAI Responses logo
    gpt-5.2-codex
    OpenAI Responses·$1.75 / $14.00 per 1M
    89.9%
  15. 15
    Google LLC (Gemini API) logo
    gemini-3-flash-preview
    Google LLC (Gemini API)·$0.50 / $3.00 per 1M
    89.8%
  16. 16
    Anthropic PBC logo
    claude-opus-4-6
    Anthropic PBC·$5.00 / $25.00 per 1M
    89.6%
  17. 17
    DeepSeek logo
    deepseek-v4-flash
    DeepSeek·$0.14 / $0.28 per 1M
    89.4%
  18. 18
    Novita AI logo
    qwen/qwen3.5-397b-a17b
    Novita AI·$0.60 / $3.60 per 1M
    89.3%
  19. 19
    DeepSeek logo
    deepseek-v4-pro
    DeepSeek·$0.43 / $0.87 per 1M
    88.8%
  20. 20
    Alibaba Cloud logo
    qwen3.6-plus
    Alibaba Cloud·$0.50 / $3.00 per 1M
    88.2%
  21. 21
    Moonshot AI logo
    kimi-k2.5
    Moonshot AI·$0.60 / $3.00 per 1M
    87.9%
  22. 22
    grok-4
    xAI Corp.·$3.00 / $15.00 per 1M
    87.7%
  23. 23
    Anthropic PBC logo
    claude-sonnet-4-6
    Anthropic PBC·$3.00 / $15.00 per 1M
    87.5%
  24. 24
    OpenAI Inc. logo
    gpt-5.4-mini
    OpenAI Inc.·$0.75 / $4.50 per 1M
    87.5%
  25. 25
    MiniMax-M2.7
    MiniMax·$0.30 / $1.20 per 1M
    87.4%
  26. 26
    OpenAI Inc. logo
    gpt-5.1
    OpenAI Inc.·$1.25 / $10.00 per 1M
    87.3%
  27. 27
    Novita AI logo
    xiaomimimo/mimo-v2-pro
    Novita AI·$2.00 / $6.00 per 1M
    87.0%
  28. 28
    Z AI logo
    GLM-5.1
    Z AI·$1.40 / $4.40 per 1M
    86.8%
  29. 29
    Anthropic PBC logo
    claude-opus-4-5
    Anthropic PBC·$5.00 / $25.00 per 1M
    86.6%
  30. 30
    DeepInfra Inc. logo
    XiaomiMiMo/MiMo-V2.5-Pro
    DeepInfra Inc.·$1.00 / $3.00 per 1M
    86.6%

How we rank

Scores for GPQA Diamond come from Artificial Analysis, an independent AI benchmarking service. When a model is available through multiple providers (e.g. Anthropic direct, AWS Bedrock, Google Vertex), we show one canonical entry per model family so the ranking isn't polluted by duplicates. Benchmarks measure specific skills — always validate on your own workload before committing.

One API for every model on this list

Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.

Get started free