Best AI models for math
The Math Index aggregates competition and advanced math evaluations (including AIME). These problems require real symbolic reasoning across multiple novel steps — memorization gets a model nowhere.
- 🥇
gpt-5.2-chatOpenAI Inc.·$1.75 / $14.00 per 1M99.099.0 - 🥈
gpt-5-codexOpenAI Responses·$1.25 / $10.00 per 1M98.798.7 - 🥉
gemini-3-flash-previewGoogle LLC (Gemini API)·$0.50 / $3.00 per 1M97.097.0 - 4
xiaomimimo/mimo-v2-flashNovita AI·$0.10 / $0.30 per 1M96.396.3 - 5
gemini-3-pro-previewGoogle LLC (Gemini API)·$2.00 / $12.00 per 1M95.795.7 - 6
gpt-5.1-codexOpenAI Responses·$1.25 / $10.00 per 1M95.795.7 - 7
GLM-4.7Z AI·$0.60 / $2.20 per 1M95.095.0 - 8
kimi-k2Google LLC (Vertex AI)·$0.60 / $2.50 per 1M94.794.7 - 9
gpt-5OpenAI Inc.·$1.25 / $10.00 per 1M94.394.3 - 10
gpt-5.1-chatOpenAI Inc.·$1.25 / $10.00 per 1M94.094.0 - 11gpt-oss-120bGroq Inc.·$0.15 / $0.75 per 1M93.493.4
- 12grok-4xAI Corp.·$3.00 / $15.00 per 1M92.792.7
- 13
deepseek-v3.2Google LLC (Vertex AI)·$0.56 / $1.68 per 1M92.092.0 - 14claude-opus-4-5Anthropic PBC·$5.00 / $25.00 per 1M91.391.3
- 15
Qwen/Qwen3-235B-A22B-Instruct-2507DeepInfra Inc.·$0.07 / $0.10 per 1M91.091.0 - 16
o4-miniOpenAI Inc.·$1.10 / $4.40 per 1M90.790.7 - 17
gpt-5-miniOpenAI Inc.·$0.25 / $2.00 per 1M90.790.7 - 18grok-4-fastxAI Corp.·$0.20 / $0.50 per 1M89.789.7
- 19gpt-oss-20bGroq Inc.·$0.10 / $0.50 per 1M89.389.3
- 20
o3OpenAI Inc.·$2.00 / $8.00 per 1M88.388.3 - 21
gemini-2.5-proGoogle LLC (Gemini API)·$1.25 / $10.00 per 1M87.787.7 - 22
GLM-4.6Z AI·$0.60 / $2.20 per 1M86.086.0 - 23grok-3-minixAI Corp.·$0.30 / $0.50 per 1M84.784.7
- 24
gpt-5-nanoOpenAI Inc.·$0.05 / $0.40 per 1M83.783.7 - 25
zai-org/GLM-4.5-AirDeepInfra Inc.·$0.20 / $1.10 per 1M80.780.7 - 26MiniMax-M2MiniMax·$0.30 / $1.20 per 1M78.378.3
- 27
deepseek-r1-turboNovita AI·$0.70 / $2.50 per 1M76.076.0 - 28
gemini-2.5-flashGoogle LLC (Gemini API)·$0.30 / $2.50 per 1M73.373.3 - 29
deepseek-r1-distill-qwen-32bNovita AI·$0.30 / $0.30 per 1M63.063.0 - 30grok-3xAI Corp.·$5.00 / $25.00 per 1M58.058.0
Explore other rankings
Smartest overall
Ranked by Intelligence Index
Best for coding
Ranked by Coding Index
Best coding agent
Ranked by Terminal-Bench Hard
Best for reasoning
Ranked by GPQA Diamond
Best for tool use
Ranked by τ²-Bench
Best for knowledge
Ranked by MMLU Pro
Cheapest
Lowest input + output price per 1M tokens
Longest context
Max tokens in a single prompt
How we rank
Scores for Math Index come from Artificial Analysis, an independent AI benchmarking service. When a model is available through multiple providers (e.g. Anthropic direct, AWS Bedrock, Google Vertex), we show one canonical entry per model family so the ranking isn't polluted by duplicates. Benchmarks measure specific skills — always validate on your own workload before committing.
One API for every model on this list
Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.
Get started free