---
id: latency-leaderboard-april-2026
slug: provider-latency-leaderboard-april-2026
title: "Latency leaderboard per provider, April 2026"
topic: latency
period: Apr 2026
updated: 2026-05-09
license: CC BY 4.0
canonical: https://requesty.ai/data/provider-latency-leaderboard-april-2026
---

# Latency leaderboard per provider, April 2026

> Which AI provider has the lowest latency in April 2026? On the Requesty gateway xAI led p50 at 0.6 s, with Novita (0.8 s), Azure (1.0 s) and Mistral (1.4 s) close behind. Vertex (Claude) was the slowest at 13.7 s, 23× the fastest and 2.8× slower than Vertex (Gemini) at 4.9 s on the same Vertex route. Anthropic-direct sat mid-pack at 5.8 s with a 52.6 s p95 long tail.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-09.*

## Why it matters

Total p50 latency is dominated by workload type, not pure provider speed. The 23× spread is partly silicon, partly streaming behaviour, but mostly the size and tool-call complexity of requests being sent. The Vertex-Claude tail is heavy agentic Claude Code traffic, not slow inference. Reading the leaderboard literally without that context will mislead any provider-selection decision.

## Questions this answers

- Which LLM provider has the lowest latency in 2026?
- What is the fastest LLM provider for chat completions?
- Why is Vertex Claude so slow compared to Anthropic direct?
- What is the p95 latency of OpenAI vs Anthropic?

## Key findings

1. p50 spans 23× from fastest to slowest: xAI 0.6 s to Vertex (Claude) 13.7 s.
2. Fast tier: xAI (0.6 s), Novita (0.8 s), Azure (1.0 s), Mistral (1.4 s).
3. Vertex split is striking: Vertex (Gemini) 4.9 s, Vertex (Claude) 13.7 s. Same provider routing, very different workload weight.
4. Frontier-Claude tier: Anthropic 5.8 s, with long-tail variance Anthropic p95 52.6 s, DeepSeek p95 74.0 s.
5. TTFT is decoupled. Azure is fastest to first token (0.6 s) despite a 1.0 s total p50.
6. xAI: fast on total but slow to first token (3.27 s TTFT). Suggests buffered or non-streaming upstream behaviour.

## Data

| Provider | p50 latency (milliseconds) | p95 latency (milliseconds) | p50 TTFT (milliseconds) |
| --- | --- | --- | --- |
| xAI | 600 ms | 10.9 s | 3.27 s |
| Novita | 800 ms | 18.5 s | 3.10 s |
| Azure | 1.00 s | 8.80 s | 600 ms |
| Mistral | 1.40 s | 9.80 s | 1.01 s |
| OpenAI | 2.50 s | 17.9 s | 1.84 s |
| Bedrock | 2.80 s | 23.8 s | 1.86 s |
| Vertex (Gemini) | 4.90 s | 27.2 s | 1.28 s |
| Anthropic | 5.80 s | 52.6 s | 2.14 s |
| Moonshot | 5.90 s | 64.1 s | 2.62 s |
| DeepSeek | 9.00 s | 74.0 s | 1.17 s |
| Vertex (Claude) | 13.7 s | 115.2 s | 1.44 s |

## Caveats

- TTFT (first_token_latency_ns) was not populated before 2026, so any TTFT YoY is impossible.
- p95 is highly sensitive to the tail of long completions; treat it as an upper bound for "what the worst 5% of users feel" rather than a steady-state operating point.

## Cite as

**APA.** Requesty (2026). Latency leaderboard per provider, April 2026. Requesty Data. https://requesty.ai/data/provider-latency-leaderboard-april-2026

```bibtex
@misc{requesty_provider_latency_leaderboard_april_2026,
  author       = {{Requesty}},
  title        = {Latency leaderboard per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://requesty.ai/data/provider-latency-leaderboard-april-2026}},
  note         = {Requesty Data}
}
```

## Cited in

- [What the gateway saw in April 2026](https://requesty.ai/blog/provider-trends-april-2026-agentic-share-latency)

---

Downloads: [JSON](https://requesty.ai/data/provider-latency-leaderboard-april-2026/data.json) · [CSV](https://requesty.ai/data/provider-latency-leaderboard-april-2026/data.csv) · [Markdown](https://requesty.ai/data/provider-latency-leaderboard-april-2026/data.md)