Best AI models for math
AIME 2024 is 15 competition math problems each solved by the top US high schoolers. Models need real symbolic reasoning to succeed — memorization gets you nowhere since solutions involve multiple novel steps.
- 🥇
gpt-5.4OpenAI Inc.·$2.50 / $15.00 per 1M94.8%94.8% - 🥈
gpt-5.2-chatOpenAI Inc.·$1.75 / $14.00 per 1M92.3%92.3% - 🥉
o3OpenAI Inc.·$2.00 / $8.00 per 1M91.6%91.6% - 4claude-opus-4-7Anthropic PBC·$5.00 / $25.00 per 1M91.5%91.5%
- 5grok-4xAI Corp.·$3.00 / $15.00 per 1M90.1%90.1%
- 6
gpt-5.1OpenAI Inc.·$1.25 / $10.00 per 1M90.1%90.1% - 7
gpt-5.2-codexOpenAI Responses·$1.75 / $14.00 per 1M89.4%89.4% - 8
gpt-5-chatOpenAI Inc.·$1.25 / $10.00 per 1M88.5%88.5% - 9
gemini-2.5-proGoogle LLC (Gemini API)·$1.25 / $10.00 per 1M88.0%88.0% - 10claude-opus-4-6Anthropic PBC·$5.00 / $25.00 per 1M87.3%87.3%
- 11grok-3xAI Corp.·$5.00 / $25.00 per 1M83.9%83.9%
- 12
o3-miniOpenAI Inc.·$1.10 / $4.40 per 1M83.2%83.2% - 13claude-opus-4-5Anthropic PBC·$5.00 / $25.00 per 1M83.1%83.1%
- 14claude-sonnet-4-6Anthropic PBC·$3.00 / $15.00 per 1M80.4%80.4%
- 15
kimi-k2Google LLC (Vertex AI)·$0.60 / $2.50 per 1M80.1%80.1% - 16
deepseek-ai/DeepSeek-R1Together AI Inc.·$3.00 / $7.00 per 1M79.8%79.8% - 17claude-sonnet-4-5Anthropic PBC·$3.00 / $15.00 per 1M76.1%76.1%
- 18
o1OpenAI Inc.·$15.00 / $60.00 per 1M74.4%74.4% - 19
gpt-4.1OpenAI Inc.·$2.00 / $8.00 per 1M72.4%72.4% - 20
gemini-2.5-flashGoogle LLC (Gemini API)·$0.30 / $2.50 per 1M72.1%72.1% - 21claude-sonnet-4Anthropic PBC·$3.00 / $15.00 per 1M69.4%69.4%
- 22MiniMax-M2MiniMax·$0.30 / $1.20 per 1M68.4%68.4%
- 23
gpt-4.1-miniOpenAI Inc.·$0.40 / $1.60 per 1M58.3%58.3% - 24
claude-3-7-sonnet@europe-west1Google LLC (Vertex AI)·$3.00 / $15.00 per 1M55.4%55.4% - 25claude-haiku-4-5Anthropic PBC·$1.00 / $5.00 per 1M52.1%52.1%
- 26
deepseek-ai/DeepSeek-V3Together AI Inc.·$1.25 / $1.25 per 1M39.2%39.2%
Explore other rankings
How we rank
Scores for AIME 2024 are sourced from official model cards, Artificial Analysis, and public leaderboards. When a model is available through multiple providers (e.g. Anthropic direct, AWS Bedrock, Google Vertex), we show one canonical entry per model family so the ranking isn't polluted by duplicates. Benchmarks measure specific skills — always validate on your own workload before committing.
One API for every model on this list
Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.
Get started free