Latency leaderboard per provider, April 2026
Latency leaderboard, April 2026 (top 10 providers by volume)
Switch between p50, p95 and time-to-first-token. Hover any row for all three.
Which AI provider has the lowest latency in April 2026? On the Requesty gateway xAI led p50 at 0.6 s, with Novita (0.8 s), Azure (1.0 s) and Mistral (1.4 s) close behind. Vertex (Claude) was the slowest at 13.7 s, 23× the fastest and 2.8× slower than Vertex (Gemini) at 4.9 s on the same Vertex route. Anthropic-direct sat mid-pack at 5.8 s with a 52.6 s p95 long tail.
Why it mattersTotal p50 latency is dominated by workload type, not pure provider speed. The 23× spread is partly silicon, partly streaming behaviour, but mostly the size and tool-call complexity of requests being sent. The Vertex-Claude tail is heavy agentic Claude Code traffic, not slow inference. Reading the leaderboard literally without that context will mislead any provider-selection decision.
Key findings
- 01p50 spans 23× from fastest to slowest: xAI 0.6 s to Vertex (Claude) 13.7 s.
- 02Fast tier: xAI (0.6 s), Novita (0.8 s), Azure (1.0 s), Mistral (1.4 s).
- 03Vertex split is striking: Vertex (Gemini) 4.9 s, Vertex (Claude) 13.7 s. Same provider routing, very different workload weight.
- 04Frontier-Claude tier: Anthropic 5.8 s, with long-tail variance Anthropic p95 52.6 s, DeepSeek p95 74.0 s.
- 05TTFT is decoupled. Azure is fastest to first token (0.6 s) despite a 1.0 s total p50.
- 06xAI: fast on total but slow to first token (3.27 s TTFT). Suggests buffered or non-streaming upstream behaviour.
Data
| Provider | p50 latency(milliseconds) | p95 latency(milliseconds) | p50 TTFT(milliseconds) |
|---|---|---|---|
| xAI | 600 ms | 10.9 s | 3.27 s |
| Novita | 800 ms | 18.5 s | 3.10 s |
| Azure | 1.00 s | 8.80 s | 600 ms |
| Mistral | 1.40 s | 9.80 s | 1.01 s |
| OpenAI | 2.50 s | 17.9 s | 1.84 s |
| Bedrock | 2.80 s | 23.8 s | 1.86 s |
| Vertex (Gemini) | 4.90 s | 27.2 s | 1.28 s |
| Anthropic | 5.80 s | 52.6 s | 2.14 s |
| Moonshot | 5.90 s | 64.1 s | 2.62 s |
| DeepSeek | 9.00 s | 74.0 s | 1.17 s |
| Vertex (Claude) | 13.7 s | 115.2 s | 1.44 s |
Cite as
Cited in
- What the gateway saw in April 2026/blog/provider-trends-april-2026-agentic-share-latency
