AI models with the longest context window
A larger context window means more tokens you can fit in a single prompt — useful for whole-codebase analysis, long document Q&A, and agentic workflows. Note: effective quality often degrades past 128K tokens; prompt caching (supported on many models) is usually a better approach for repeated long context than brute-forcing more tokens in every call.
- 🥇grok-4-fastxAI Corp.·— max output2M2M
- 🥈grok-4.2-betaxAI Corp.·— max output2M2M
- 🥉grok-4-1-fast-reasoningxAI Corp.·— max output2M2M
- 4grok-4-fast-non-reasoningxAI Corp.·— max output2M2M
- 5grok-4-1-fast-non-reasoningxAI Corp.·— max output2M2M
- 6
gpt-5.5@eastus2Microsoft Azure AI·128K max output1.1M1.1M - 7
gpt-5.4@francecentralMicrosoft Azure AI·128K max output1.1M1.1M - 8
gpt-5.4@eastus2Microsoft Azure AI·128K max output1.1M1.1M - 9
gpt-5.4@swedencentralMicrosoft Azure AI·128K max output1.1M1.1M - 10
openai-responses/gpt-5.4@eastus2Microsoft Azure AI·128K max output1.1M1.1M - 11
openai-responses/gpt-5.4-pro@eastus2Microsoft Azure AI·128K max output1.1M1.1M - 12
gpt-5.5-proOpenAI Inc.·128K max output1.1M1.1M - 13
gpt-5.4-proOpenAI Inc.·128K max output1.1M1.1M - 14
gpt-5.4OpenAI Inc.·128K max output1.1M1.1M - 15
gpt-5.5OpenAI Inc.·128K max output1.1M1.1M - 16
gpt-5.5-proOpenAI Responses·128K max output1.1M1.1M - 17
gpt-5.4-proOpenAI Responses·128K max output1.1M1.1M - 18
gpt-5.5OpenAI Responses·128K max output1.1M1.1M - 19
gpt-5.4OpenAI Responses·128K max output1.1M1.1M - 20
gemini-3.1-flash-lite-previewGoogle LLC (Gemini API)·66K max output1.0M1.0M - 21
gemini-3-flash-previewGoogle LLC (Gemini API)·66K max output1.0M1.0M - 22
gemini-3.1-pro-previewGoogle LLC (Gemini API)·66K max output1.0M1.0M - 23
gemini-3-pro-previewGoogle LLC (Gemini API)·66K max output1.0M1.0M - 24
gemini-2.5-flash-liteGoogle LLC (Gemini API)·66K max output1.0M1.0M - 25
gemini-2.5-proGoogle LLC (Gemini API)·66K max output1.0M1.0M - 26
gemini-2.5-flashGoogle LLC (Gemini API)·66K max output1.0M1.0M - 27
gemini-2.0-flash-001Google LLC (Gemini API)·8K max output1.0M1.0M - 28gemini-2.5-pro@us-south1Coding API·66K max output1.0M1.0M
- 29gemini-2.5-pro@europe-central2Coding API·66K max output1.0M1.0M
- 30gemini-2.5-flash@us-central1Coding API·66K max output1.0M1.0M
Explore other rankings
How we rank
Ranked by the model's maximum context window. Context window is the total tokens (input + output) the model can process in a single request. Note that effective quality often degrades well below the advertised maximum — most production workloads get better results from prompt caching and retrieval than from stuffing more tokens in every call.
One API for every model on this list
Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.
Get started free