---
id: latency-yoy-april-2026
slug: provider-latency-yoy-april-2026
title: "p50 latency YoY: April 2025 vs April 2026"
topic: latency
period: Apr 2025  to  Apr 2026
updated: 2026-05-09
license: CC BY 4.0
canonical: https://requesty.ai/data/provider-latency-yoy-april-2026
---

# p50 latency YoY: April 2025 vs April 2026

> Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it.

*Topic: Latency and performance. Period: Apr 2025  to  Apr 2026. Last updated 2026-05-09.*

## Why it matters

The OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

## Questions this answers

- How has LLM latency changed from 2025 to 2026?
- Are open-source LLMs as fast as OpenAI now?
- Which AI providers got faster in 2026?
- Why are some LLM routes getting slower year-over-year?

## Key findings

1. OSS aggregator routes (xAI, DeepInfra, Alibaba, Novita, Nebius) compressed 89-93% YoY.
2. xAI: 9.1 s  to  0.6 s (-93%). DeepInfra: 15.8 s  to  1.4 s (-91%).
3. DeepSeek: 24.3 s  to  9.2 s (-62%). Still slow but dramatically faster.
4. Frontier providers barely moved: OpenAI -5%, Anthropic 0%.
5. Vertex (Claude) is the lone exception: 6.0 s  to  13.8 s (+131%). The route stayed put while heavy agentic Claude Code workloads moved onto it, so the work itself got bigger.
6. Practical implication: routing easy work to a cheap OSS path used to cost 5-25 seconds, now costs sub-second.

## Data

| Provider | Apr 2025 p50 (milliseconds) | Apr 2026 p50 (milliseconds) | YoY delta (percent) |
| --- | --- | --- | --- |
| xAI | 9.10 s | 600 ms | -93.00% |
| DeepInfra | 15.8 s | 1.40 s | -91.00% |
| Alibaba | 5.80 s | 500 ms | -91.00% |
| Novita | 8.80 s | 800 ms | -91.00% |
| Nebius | 22.1 s | 2.30 s | -89.00% |
| DeepSeek | 24.3 s | 9.20 s | -62.00% |
| Coding | 7.90 s | 6.10 s | -23.00% |
| OpenAI | 2.60 s | 2.50 s | -5.00% |
| Anthropic | 5.90 s | 5.90 s | 0.00% |
| Vertex (Claude) | 6.00 s | 13.8 s | 131.00% |

## Caveats

- Vertex (Gemini) had no meaningful 2025 traffic so it is not in this chart. Only Vertex (Claude) is YoY-comparable.
- Vertex (Claude) Apr 2025 sample is small and the workload that lived on it has changed substantially, so the +131% delta is more about workload mix than a true latency regression.
- Customer-base composition changed YoY, so the workload mix hitting these providers is different. Latency YoY is robust to this because it is wall-clock duration not affected by the request mix in aggregate, but interpret it as "providers behave differently AND the work has shifted", not as a controlled experiment.
- The `successful` flag semantics may have changed between 2025 and 2026, but quantiles over wall-clock duration are not affected.

## Cite as

**APA.** Requesty (2026). p50 latency YoY: April 2025 vs April 2026. Requesty Data. https://requesty.ai/data/provider-latency-yoy-april-2026

```bibtex
@misc{requesty_provider_latency_yoy_april_2026,
  author       = {{Requesty}},
  title        = {p50 latency YoY: April 2025 vs April 2026},
  year         = {2026},
  howpublished = {\url{https://requesty.ai/data/provider-latency-yoy-april-2026}},
  note         = {Requesty Data}
}
```

## Cited in

- [What the gateway saw in April 2026](https://requesty.ai/blog/provider-trends-april-2026-agentic-share-latency)

---

Downloads: [JSON](https://requesty.ai/data/provider-latency-yoy-april-2026/data.json) · [CSV](https://requesty.ai/data/provider-latency-yoy-april-2026/data.csv) · [Markdown](https://requesty.ai/data/provider-latency-yoy-april-2026/data.md)