---
id: provider-throughput-april-2026
slug: provider-throughput-density-april-2026
title: "Provider throughput density, April 2026"
topic: latency
period: Apr 2026
updated: 2026-05-10
license: CC BY 4.0
canonical: https://requesty.ai/data/provider-throughput-density-april-2026
---

# Provider throughput density, April 2026

> How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-10.*

## Why it matters

Throughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

## Questions this answers

- What is the fastest LLM provider in tokens per second?
- How fast does Groq stream compared to Anthropic?
- Which LLM has the best streaming throughput?
- Is Vertex Claude faster than Anthropic direct in practice?

## Key findings

1. Groq leads at 320 tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon.
2. Vertex (Gemini) is second at 130 tok/sec, followed by Mistral at 120 tok/sec.
3. Vertex (Claude) at 56 tok/sec is faster per-token than Anthropic-direct at 46 tok/sec, even though Vertex (Claude)'s total request latency is 2.4× higher (Vertex (Claude) requests emit ~3× more output tokens on average).
4. OSS-aggregator routes (Nebius, Minimaxi, DeepInfra) cluster in the 23-26 tok/sec band.
5. Bedrock is the slowest at 15 tok/sec, 21× behind Groq.

## Data

| Provider | p50 tokens / sec | p50 ms / token (milliseconds) |
| --- | --- | --- |
| Groq | 320 | 3 ms |
| Vertex (Gemini) | 130 | 8 ms |
| Mistral | 120 | 8 ms |
| xAI | 65 | 16 ms |
| OpenAI | 57 | 18 ms |
| Novita | 56 | 18 ms |
| Vertex (Claude) | 56 | 18 ms |
| Anthropic | 46 | 22 ms |
| OpenAI Responses | 44 | 23 ms |
| Azure | 39 | 26 ms |
| DeepSeek | 31 | 32 ms |
| Alibaba | 28 | 36 ms |
| Moonshot | 27 | 37 ms |
| Nebius | 26 | 39 ms |
| Minimaxi | 24 | 41 ms |
| DeepInfra | 24 | 42 ms |
| Bedrock | 15 | 66 ms |

## Caveats

- p50 of a per-request rate, not a global rate. Two providers with the same throughput density can have very different total latencies if their typical output sizes differ (Vertex Claude vs Anthropic-direct is the clearest example).
- Computed on successful completions with output_tokens > 0 and total_latency_ns > 0.
- Apr 2026 only; this is a snapshot, not a trend.

## Cite as

**APA.** Requesty (2026). Provider throughput density, April 2026. Requesty Data. https://requesty.ai/data/provider-throughput-density-april-2026

```bibtex
@misc{requesty_provider_throughput_density_april_2026,
  author       = {{Requesty}},
  title        = {Provider throughput density, April 2026},
  year         = {2026},
  howpublished = {\url{https://requesty.ai/data/provider-throughput-density-april-2026}},
  note         = {Requesty Data}
}
```

---

Downloads: [JSON](https://requesty.ai/data/provider-throughput-density-april-2026/data.json) · [CSV](https://requesty.ai/data/provider-throughput-density-april-2026/data.csv) · [Markdown](https://requesty.ai/data/provider-throughput-density-april-2026/data.md)