---
id: tool-call-token-share-april-2026
slug: tool-call-token-share-april-2026
title: "Token-weighted tool_calls share per provider, April 2026"
topic: agentic
period: Apr 2026
updated: 2026-05-09
license: CC BY 4.0
canonical: https://requesty.ai/data/tool-call-token-share-april-2026
---

# Token-weighted tool_calls share per provider, April 2026

> What share of LLM output tokens is spent on tool calls vs chat? In April 2026 on the Requesty gateway, Anthropic emitted 38.8% of its output tokens on `tool_calls` vs 54.2% of requests, so agentic completions are roughly 30% smaller than chat ones. OpenAI Responses showed the opposite: 34.2% of tokens vs 26.4% of requests. Vertex (Claude) had the biggest negative gap (6.1% of tokens vs 27.6% of requests).

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-09.*

## Why it matters

Counting requests overweights short tool-call payloads; counting tokens overweights long chat replies. Two providers with the same request-level agentic share can have wildly different agentic token shares, which matters for capacity planning, billing reconciliation, and any benchmark that aggregates over tokens rather than calls. Pick the wrong axis and the same provider can look 5× more or less agentic than it actually is.

## Questions this answers

- What share of AI output tokens is spent on tool calls?
- Are tool-call payloads bigger or smaller than chat replies?
- Why do request-counts and token-counts disagree on agentic share?
- Which providers have the most token-heavy tool calls?

## Key findings

1. Anthropic: 38.8% of output tokens vs 54.2% of requests. Agentic completions are ~30% smaller than chat ones. tool_calls payloads are compact.
2. OpenAI Responses: 34.2% of output tokens vs 26.4% of requests. The opposite shape. agentic completions emit more tokens than chat ones.
3. Vertex (Claude): 6.1% of tokens vs 27.6% of requests. The biggest negative gap on the chart. Claude on Vertex is dominated by lots of small tool-call payloads, while chat completions on the same route are heavy.
4. Vertex (Gemini): 1.5% of tokens vs 14.1% of requests. Same shape as Vertex (Claude) but more extreme. Gemini chat replies are huge, so agentic completions barely register on the token-weighted view.
5. xAI: 17.2% of tokens vs 2.9% of requests. Few agentic calls, but each one is verbose.
6. OpenAI direct: 2.7% of tokens vs 3.4% of requests. The two views agree. there is barely any agentic load on this route in either framing.

## Data

| Provider | Tool-call output-token share (percent) | Tool-call request share (percent) | Gap (token - request) (percent) |
| --- | --- | --- | --- |
| Moonshot | 54.70% | 75.00% | -20.30% |
| Minimaxi | 52.50% | 50.80% | 1.70% |
| Anthropic | 38.80% | 54.20% | -15.40% |
| OpenAI Responses | 34.20% | 26.40% | 7.80% |
| Azure | 18.00% | 27.90% | -9.90% |
| xAI | 17.20% | 2.90% | 14.30% |
| Bedrock | 14.40% | 7.00% | 7.40% |
| Alibaba | 12.20% | 1.70% | 10.50% |
| Vertex (Claude) | 6.10% | 27.60% | -21.50% |
| Novita | 3.00% | 1.90% | 1.10% |
| OpenAI | 2.70% | 3.40% | -0.70% |
| Vertex (Gemini) | 1.50% | 14.10% | -12.60% |
| DeepSeek | 1.20% | 1.50% | -0.30% |
| Mistral | 1.00% | 1.90% | -0.90% |
| Nebius | 0.90% | 3.50% | -2.60% |
| Groq | 0.80% | 1.00% | -0.20% |
| DeepInfra | 0.30% | 0.10% | 0.20% |

## Cite as

**APA.** Requesty (2026). Token-weighted tool_calls share per provider, April 2026. Requesty Data. https://requesty.ai/data/tool-call-token-share-april-2026

```bibtex
@misc{requesty_tool_call_token_share_april_2026,
  author       = {{Requesty}},
  title        = {Token-weighted tool\_calls share per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://requesty.ai/data/tool-call-token-share-april-2026}},
  note         = {Requesty Data}
}
```

---

Downloads: [JSON](https://requesty.ai/data/tool-call-token-share-april-2026/data.json) · [CSV](https://requesty.ai/data/tool-call-token-share-april-2026/data.csv) · [Markdown](https://requesty.ai/data/tool-call-token-share-april-2026/data.md)