finish_reason mix per model, April 2026
finish_reason mix per model, April 2026
Successful completions only, normalized to 100%. Click a legend item to isolate.
Which AI models are used most for tool calling? In April 2026 Claude Opus 4.6 returned `finish_reason = tool_calls` 59% of the time on the Requesty gateway, the most agentic model on the platform. Gemini 2.5 Flash came second at 37%. Same-family Claude Sonnet 4.5 only 9%, and the entire OpenAI lineup (GPT-4o, GPT-4.1-mini, GPT-4.1-nano, GPT-5-mini) sat under 4%.
Why it mattersTwo models from the same provider can have completely different agentic profiles, which means choosing a frontier model for an agent based on brand alone is a coin flip. The headline "Anthropic is agentic" framing on the per-provider chart is really an Opus 4.6 effect: Sonnet 4.5 behaves more like a chat model in production traffic, despite both being marketed as agentic-capable.
Key findings
- 01claude-opus-4-6: 59% tool_calls. The single most agentic model on the platform.
- 02gemini-2.5-flash: 37% tool_calls. The mid-tier general-purpose model that is doing real agentic work.
- 03claude-sonnet-4-5: 9% tool_calls. The same provider, the same family, dramatically less agentic.
- 04OpenAI lineup (gpt-4o, gpt-4.1-mini, gpt-4.1-nano, gpt-5-mini): all under 4% tool_calls.
- 05Practical implication: the "agentic provider" framing on the per-provider chart is really an "Opus 4.6 effect". Anthropic-direct looks agentic because Opus is.
Data
| Model | tool_calls(percent) | stop(percent) | length(percent) |
|---|---|---|---|
| claude-opus-4-6 | 59.40% | 39.50% | 1.10% |
| gemini-2.5-flash | 36.60% | 61.20% | 2.10% |
| claude-sonnet-4-5 | 9.10% | 90.70% | 0.20% |
| gpt-5-mini | 3.50% | 94.00% | 2.40% |
| gpt-4o | 0.20% | 99.80% | 0.00% |
| gpt-4.1-mini | 0.20% | 99.80% | 0.00% |
| deepseek-chat | 0.50% | 97.20% | 2.30% |
| gpt-4.1-nano | 0.00% | 99.90% | 0.00% |
| gemini-2.5-flash-lite | 0.00% | 99.80% | 0.20% |
| grok-4-1-fast | 0.10% | 99.80% | 0.10% |
