What the routing layer measured.

Empirical observations from Requesty's production gateway: latency distributions, caching behaviour, agentic tool-use patterns, and failure taxonomies across every provider we route to. Each dataset ships with methodology, machine-readable exports, and a citation block.

No. 001

Agentic workloads

11 datasets

finish_reason mix by provider

Which AI providers serve the most agentic traffic? In April 2026 Anthropic-direct returned `finish_reason = tool_calls` on 52% of successful completions on the Requesty gateway, about 2× the next provider and 17× higher than OpenAI direct. OpenAI Responses (26%), Vertex (Claude) (23%) and Azure (23%) formed a clear second tier. Splitting Vertex into Gemini and Claude cohorts shows the gap inside that route: Vertex (Claude) 23% vs Vertex (Gemini) 13%.

Anthropic tool_calls52%

Apr 2026View data →

finish_reason by model

Which AI models are used most for tool calling? In April 2026 Claude Opus 4.6 returned `finish_reason = tool_calls` 59% of the time on the Requesty gateway, the most agentic model on the platform. Gemini 2.5 Flash came second at 37%. Same-family Claude Sonnet 4.5 only 9%, and the entire OpenAI lineup (GPT-4o, GPT-4.1-mini, GPT-4.1-nano, GPT-5-mini) sat under 4%.

opus-4-6 tool_calls59%

Apr 2026View data →

Token-weighted tool_calls

What share of LLM output tokens is spent on tool calls vs chat? In April 2026 on the Requesty gateway, Anthropic emitted 38.8% of its output tokens on `tool_calls` vs 54.2% of requests, so agentic completions are roughly 30% smaller than chat ones. OpenAI Responses showed the opposite: 34.2% of tokens vs 26.4% of requests. Vertex (Claude) had the biggest negative gap (6.1% of tokens vs 27.6% of requests).

Anthropic token share38.8% of tokens

Apr 2026View data →

OSS family share

Which open-weight AI model is most popular in 2026? On the Requesty gateway, OSS-routed traffic went from Qwen-dominated in late 2025 (34-38% share in Nov-Dec) to DeepSeek-dominated in January 2026 (77% after the R1 launch), and back to a genuinely diversified state by April (DeepSeek 47%, Kimi 17%, MiniMax 15%). Qwen collapsed from 38% to under 4% almost overnight when DeepSeek R1 shipped.

DeepSeek share10% to 77% to 47%

Nov 2025 - Apr 2026View data →

Reasoning-token share

How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline.

High-reasoningGroq 82%

Apr 2026View data →

Cost per user by agent

How much does a typical coding agent user spend per month? Across nine agents observed over twelve months through the Requesty gateway, the weighted average rose from $14/month to $54/month ($91 for active users with 2+ active days). Claude Code active users average $108/month (median $23, P95 $296) in April 2026. Roo Code active users spend $79/month, OpenCode $104/month, and Cline $49/month.

Claude Code Apr 26$78/month

May 2025 to Apr 2026View data →

Agent cache hit rate

Which coding agents use prompt caching most effectively? In April 2026, Claude Code led at 92% cache hit rate (cached_tokens / input_tokens), followed by OpenCode at 89%. Kilo Code sits at 46% with 62K avg input tokens. The gap is architectural: agents that maintain consistent context prefixes across sequential calls achieve dramatically higher cache reuse.

Cache leaderClaude Code 92%

Apr 2026View data →

Agent model share

How much of coding agent spend goes to Claude? In April 2026, Claude models power 79% to 100% of spend across all nine coding agents observed through the Requesty gateway. Claude Code is nearly 100% locked to Claude (expected, as Anthropic's own product). Zed is the most model-diverse at 59% Claude / 41% OpenAI. OpenCode has the highest non-Claude adoption among open-source agents at 13% OpenAI.

Claude dominance79% to 100%

Apr 2026View data →

Agent finish reasons

How do coding agent API calls end? In April 2026, Roo Code leads with 91% of calls finishing via tool_calls, the primary agentic pattern. Claude Code follows at 73%. Cline (81% stop) and Aider (87% stop) favor single-turn completions. Kilo Code shows 63% tool_calls and 28% stop, a balanced mix of agentic and single-turn patterns.

Most agenticRoo Code 91%

Apr 2026View data →

Agent session depth

How many API calls does a single coding session make? In April 2026, Claude Code has the deepest sessions at 16 median calls per trace and reaches 209 calls at P95, reflecting complex multi-step coding workflows. Roo Code sessions are shallower at 11 median calls but more numerous (6,247 traces vs 594 for Claude Code).

Deepest sessionsClaude Code 16 calls

Apr 2026View data →

Agent streaming adoption

Do coding agents stream their API responses? In April 2026, most agents stream nearly 100% of calls. Aider is the major outlier at 22% streaming, preferring batch completions. Claude Code streams 93% of calls. Aider also has the highest reasoning token intensity at 82%, suggesting it relies on reasoning models in non-streaming mode.

Streaming leader100% (4 agents)

Apr 2026View data →

No. 002

Latency and performance

6 datasets

Latency leaderboard

Which AI provider has the lowest latency in April 2026? On the Requesty gateway xAI led p50 at 0.6 s, with Novita (0.8 s), Azure (1.0 s) and Mistral (1.4 s) close behind. Vertex (Claude) was the slowest at 13.7 s, 23× the fastest and 2.8× slower than Vertex (Gemini) at 4.9 s on the same Vertex route. Anthropic-direct sat mid-pack at 5.8 s with a 52.6 s p95 long tail.

p50 spread0.6s to 9.1s

Apr 2026View data →

Throughput density

How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq.

Throughput leaderGroq 320 t/s

Apr 2026View data →

Streaming TTFT

Which AI provider has the fastest time-to-first-token? In April 2026 on streaming-and-successful Requesty requests, Azure led TTFT at 593 ms with a 960 ms p50 total, the streaming-UX winner on both axes. xAI was among the fastest on total latency (5.68 s) but slowest to first token (3.27 s), which suggests buffered upstream behaviour rather than true streaming. Vertex (Gemini) and Vertex (Claude) sit at very different points: Gemini totals 3.05 s, Claude totals 8.03 s on the same Vertex route.

Fastest TTFTAzure 596ms

Apr 2026View data →

p50 latency YoY

Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it.

OSS YoY-89% to -93%

Apr 2025  to  Apr 2026View data →

Cache hit rate

Which AI providers have the highest prompt-cache hit rate? In April 2026 Anthropic-direct led the Requesty gateway at 77% (cached_tokens / input_tokens), Bedrock Claude was healthy at 57%, and Vertex (Claude) trailed at 24%. Same Claude model family, 3× lower hit rate. Vertex (Gemini) sat at 10% and Mistral at 4%, the floor among major routes.

Cache leaderAnthropic 77%

Apr 2026View data →

Claude Code latency by provider

How does Claude Code latency vary by cloud provider? In April 2026, Anthropic Haiku is the fastest at 1.8s median provider latency. Opus latency is remarkably consistent across providers (4.5-4.9s). Vertex Sonnet is the slowest at 6.2s, roughly 40% slower than the same model on Anthropic direct.

Apr 2026View data →

No. 003

Reliability and ops

4 datasets

Operational metrics

How reliable is each LLM provider in production? In April 2026 the top eight providers on the Requesty gateway (OpenAI, Anthropic, Vertex (Gemini), Bedrock, DeepSeek, Novita, xAI) sat at 95-99% success rate. Azure trailed at 78%, Vertex (Claude) at 84%, Mistral at 86%, and Moonshot at 6%, a real reliability outlier. Streaming adoption is bimodal too: Azure 68%, Anthropic 57%, everyone else under 30%.

Reliability leaderxAI 99%

Apr 2026View data →

Provider errors

Why do LLM provider requests fail? Among April 2026 requests on the Requesty gateway where the upstream provider returned a non-success response, 65.8% were 429 (rate limit), 19.4% were 400 (bad request: schema mismatches, oversized payloads), and 9.4% were 403 (forbidden). 5xx availability incidents (503, 502, 529, 500, 504, 520) summed to ~4.8%. Router- and gateway-level rejections are filtered out so the chart shows only what providers themselves emit when they fail.

Top code429 (65.8%)

Apr 2026View data →

Policy vs direct reliability

How much does using a routing policy improve LLM reliability? In April 2026 the Requesty managed-fallback policy cohort hit 99.25% eventual success rate, vs 85.01% for users calling a single provider directly. That is a 14.2 pp lift, up from a +3.0 pp gap in January. Policy reliability held a tight 97.5-99.3% band across all four months while the direct cohort swung 12 pp; the widening is driven by direct-cohort regressions, not policy degradation.

Apr 2026 policy lead+14.2 pp vs direct

Jan 2026 - Apr 2026View data →

Agent error rate

How reliable are AI coding agents? In April 2026, Roo Code leads with a 2.5% error rate across 147K calls. Claude Code sits at 7.0% across 494K calls. Forge trails at 11.2% across 1.1K calls. Kilo Code shows 10.0% error rate across 23K calls.

Most reliableRoo Code 2.5%

Apr 2026View data →

For agents

Crawl /data/llms.txt for an indexed list of every dataset with abstracts and machine-readable links.

llms.txt

For citations

Each dataset ships APA + BibTeX, a permanent slug, and revision history. Time-windowed slugs never break.

See an example

For analysts

Every dataset exports machine-readable JSON, CSV, and Markdown. Schemas are stable, units are explicit.

See JSON