Token-weighted tool_calls share per provider, April 2026
Token-weighted tool_calls share, April 2026
The same finish_reason mix, but weighted by output tokens instead of request count. Switch tabs to see how the picture changes.
What share of LLM output tokens is spent on tool calls vs chat? In April 2026 on the Requesty gateway, Anthropic emitted 38.8% of its output tokens on `tool_calls` vs 54.2% of requests, so agentic completions are roughly 30% smaller than chat ones. OpenAI Responses showed the opposite: 34.2% of tokens vs 26.4% of requests. Vertex (Claude) had the biggest negative gap (6.1% of tokens vs 27.6% of requests).
Why it mattersCounting requests overweights short tool-call payloads; counting tokens overweights long chat replies. Two providers with the same request-level agentic share can have wildly different agentic token shares, which matters for capacity planning, billing reconciliation, and any benchmark that aggregates over tokens rather than calls. Pick the wrong axis and the same provider can look 5× more or less agentic than it actually is.
Key findings
- 01Anthropic: 38.8% of output tokens vs 54.2% of requests. Agentic completions are ~30% smaller than chat ones. tool_calls payloads are compact.
- 02OpenAI Responses: 34.2% of output tokens vs 26.4% of requests. The opposite shape. agentic completions emit more tokens than chat ones.
- 03Vertex (Claude): 6.1% of tokens vs 27.6% of requests. The biggest negative gap on the chart. Claude on Vertex is dominated by lots of small tool-call payloads, while chat completions on the same route are heavy.
- 04Vertex (Gemini): 1.5% of tokens vs 14.1% of requests. Same shape as Vertex (Claude) but more extreme. Gemini chat replies are huge, so agentic completions barely register on the token-weighted view.
- 05xAI: 17.2% of tokens vs 2.9% of requests. Few agentic calls, but each one is verbose.
- 06OpenAI direct: 2.7% of tokens vs 3.4% of requests. The two views agree. there is barely any agentic load on this route in either framing.
Data
| Provider | Tool-call output-token share(percent) | Tool-call request share(percent) | Gap (token - request)(percent) |
|---|---|---|---|
| Moonshot | 54.70% | 75.00% | -20.30% |
| Minimaxi | 52.50% | 50.80% | 1.70% |
| Anthropic | 38.80% | 54.20% | -15.40% |
| OpenAI Responses | 34.20% | 26.40% | 7.80% |
| Azure | 18.00% | 27.90% | -9.90% |
| xAI | 17.20% | 2.90% | 14.30% |
| Bedrock | 14.40% | 7.00% | 7.40% |
| Alibaba | 12.20% | 1.70% | 10.50% |
| Vertex (Claude) | 6.10% | 27.60% | -21.50% |
| Novita | 3.00% | 1.90% | 1.10% |
| OpenAI | 2.70% | 3.40% | -0.70% |
| Vertex (Gemini) | 1.50% | 14.10% | -12.60% |
| DeepSeek | 1.20% | 1.50% | -0.30% |
| Mistral | 1.00% | 1.90% | -0.90% |
| Nebius | 0.90% | 3.50% | -2.60% |
| Groq | 0.80% | 1.00% | -0.20% |
| DeepInfra | 0.30% | 0.10% | 0.20% |
