---
id: cache-hit-april-2026
slug: cache-hit-rate-by-provider-april-2026
title: "Prompt-cache hit rate per provider, April 2026"
topic: latency
period: Apr 2026
updated: 2026-05-09
license: CC BY 4.0
canonical: https://requesty.ai/data/cache-hit-rate-by-provider-april-2026
---

# Prompt-cache hit rate per provider, April 2026

> Which AI providers have the highest prompt-cache hit rate? In April 2026 Anthropic-direct led the Requesty gateway at 77% (cached_tokens / input_tokens), Bedrock Claude was healthy at 57%, and Vertex (Claude) trailed at 24%. Same Claude model family, 3× lower hit rate. Vertex (Gemini) sat at 10% and Mistral at 4%, the floor among major routes.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-09.*

## Why it matters

Prompt caching directly cuts the per-request cost of long, repeated context. The difference between a 77% hit rate and a 24% hit rate on the same model family is roughly a 3× reduction in input tokens billed at full price. The Vertex-Claude gap looks like a configuration issue rather than a platform limitation, which means Claude users on Vertex are leaving substantial savings on the table without a code change.

## Questions this answers

- Which AI providers have the best prompt caching hit rate?
- Why is prompt caching so much worse on Vertex Claude than on Anthropic direct?
- How much does prompt caching reduce LLM inference cost in production?
- Which providers should I avoid if I rely on prompt caching?

## Key findings

1. Anthropic-direct: 77% cache hit, the leader by a wide margin.
2. Bedrock Claude: 57%. OpenAI: 36%. DeepSeek: 48%. Healthy.
3. Vertex (Claude): 24%. Same model as Anthropic-direct (77%) and Bedrock (57%), 3× lower hit rate. Configuration gap.
4. Vertex (Gemini): 10%. The floor among major routes.
5. Mistral: 4%. Roughly the floor; prompt caching is not a meaningful lever on that route today.
6. Moonshot reports 88% but it is a measurement artefact at 6% success rate; do not quote it.

## Data

| Provider | Cache hit rate (percent) |
| --- | --- |
| Anthropic | 77.50% |
| Bedrock | 56.90% |
| DeepSeek | 48.30% |
| Azure | 41.00% |
| OpenAI | 36.40% |
| xAI | 35.70% |
| Novita | 31.90% |
| Vertex (Claude) | 23.50% |
| Vertex (Gemini) | 9.60% |
| Mistral | 4.10% |

## Caveats

- Moonshot 88% cache-hit reading is a measurement artefact at 6% success rate. Excluded from the leader panel.
- cached_tokens semantics differ slightly by provider (which tokens count as "cached"). The ratio is meaningful but not strictly apples-to-apples across providers.

## Cite as

**APA.** Requesty (2026). Prompt-cache hit rate per provider, April 2026. Requesty Data. https://requesty.ai/data/cache-hit-rate-by-provider-april-2026

```bibtex
@misc{requesty_cache_hit_rate_by_provider_april_2026,
  author       = {{Requesty}},
  title        = {Prompt-cache hit rate per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://requesty.ai/data/cache-hit-rate-by-provider-april-2026}},
  note         = {Requesty Data}
}
```

## Cited in

- [What the gateway saw in April 2026](https://requesty.ai/blog/provider-trends-april-2026-agentic-share-latency)

---

Downloads: [JSON](https://requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.json) · [CSV](https://requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.csv) · [Markdown](https://requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.md)