How much LLM output is reasoning tokens?

How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline. The industry narrative is "everything is reasoning now", but the data says reasoning is concentrated in a specific subset of routes, and even there, absolute volume is dwarfed by regular completion output. The Anthropic and Bedrock 0% is a measurement artefact, not a usage signal, which matters for any cost or quality comparison that relies on the reasoning-tokens column.

Which providers use the most reasoning models in 2026?

How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline. The industry narrative is "everything is reasoning now", but the data says reasoning is concentrated in a specific subset of routes, and even there, absolute volume is dwarfed by regular completion output. The Anthropic and Bedrock 0% is a measurement artefact, not a usage signal, which matters for any cost or quality comparison that relies on the reasoning-tokens column.

Why does Anthropic show 0% reasoning tokens?

How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline. The industry narrative is "everything is reasoning now", but the data says reasoning is concentrated in a specific subset of routes, and even there, absolute volume is dwarfed by regular completion output. The Anthropic and Bedrock 0% is a measurement artefact, not a usage signal, which matters for any cost or quality comparison that relies on the reasoning-tokens column.

Are AI agents mostly thinking or mostly responding?

How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline. The industry narrative is "everything is reasoning now", but the data says reasoning is concentrated in a specific subset of routes, and even there, absolute volume is dwarfed by regular completion output. The Anthropic and Bedrock 0% is a measurement artefact, not a usage signal, which matters for any cost or quality comparison that relies on the reasoning-tokens column.

Data/Agentic workloads/Apr 2026

Reasoning-token share of provider output, April 2026

Name: Reasoning-token share of provider output, April 2026
Creator: Requesty
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Agentic workloads, LLM, gateway, provider, metrics, How much LLM output is reasoning tokens?, Which providers use the most reasoning models in 2026?, Why does Anthropic show 0% reasoning tokens?, Are AI agents mostly thinking or mostly responding?

How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline.

Why it mattersThe industry narrative is "everything is reasoning now", but the data says reasoning is concentrated in a specific subset of routes, and even there, absolute volume is dwarfed by regular completion output. The Anthropic and Bedrock 0% is a measurement artefact, not a usage signal, which matters for any cost or quality comparison that relies on the reasoning-tokens column.

Period

Apr 2026

Updated

May 9, 2026

ID

reasoning-share-april-2026

§ 01

Key findings

01High-reasoning routes: Groq 82%, Coding 79%, xAI 60%, z.ai 51%.
02Frontier routes around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%.
03Vertex (Claude) does not appear here: Anthropic does not report reasoning tokens separately, so Claude thinking output is not counted.
04Azure at 18%, leans on GPT-4.1-class models more than the latest reasoning checkpoints.
05Anthropic, Bedrock, Mistral, Moonshot: 0%. Anthropic does not report reasoning tokens separately (thinking is inline). Mistral and Moonshot have no reasoning models routed.
06Industry narrative is "everything is reasoning now". The data says reasoning is concentrated in a specific subset of providers and even there the absolute volume is dwarfed by regular completion output.

§ 02

Data

Provider	Reasoning share(percent)
Groq	82.30%
Coding	79.00%
xAI	59.70%
z.ai	51.30%
Vertex (Gemini)	39.90%
Minimaxi	37.20%
OpenAI	35.90%
OpenAI Responses	32.50%
Azure	18.10%
Novita	3.00%
DeepSeek	2.70%

§ 03

Cite as

APA

Click to copy

BibTeX

Click to copy

§ 04

Cited in

What the gateway saw in April 2026/blog/provider-trends-april-2026-agentic-share-latency

ID: reasoning-share-april-2026·Updated May 9, 2026·Period Apr 2026