Requesty

AI models with the longest context window

A larger context window means more tokens you can fit in a single prompt — useful for whole-codebase analysis, long document Q&A, and agentic workflows. Note: effective quality often degrades past 128K tokens; prompt caching (supported on many models) is usually a better approach for repeated long context than brute-forcing more tokens in every call.

  1. 🥇
    grok-4-fast
    xAI Corp.·— max output
    2M
  2. 🥈
    grok-4.2-beta
    xAI Corp.·— max output
    2M
  3. 🥉
    grok-4-1-fast-reasoning
    xAI Corp.·— max output
    2M
  4. 4
    grok-4-fast-non-reasoning
    xAI Corp.·— max output
    2M
  5. 5
    grok-4-1-fast-non-reasoning
    xAI Corp.·— max output
    2M
  6. 6
    Microsoft Azure AI logo
    gpt-5.5@eastus2
    Microsoft Azure AI·128K max output
    1.1M
  7. 7
    Microsoft Azure AI logo
    gpt-5.4@francecentral
    Microsoft Azure AI·128K max output
    1.1M
  8. 8
    Microsoft Azure AI logo
    gpt-5.4@eastus2
    Microsoft Azure AI·128K max output
    1.1M
  9. 9
    Microsoft Azure AI logo
    gpt-5.4@swedencentral
    Microsoft Azure AI·128K max output
    1.1M
  10. 10
    Microsoft Azure AI logo
    openai-responses/gpt-5.4@eastus2
    Microsoft Azure AI·128K max output
    1.1M
  11. 11
    Microsoft Azure AI logo
    openai-responses/gpt-5.4-pro@eastus2
    Microsoft Azure AI·128K max output
    1.1M
  12. 12
    OpenAI Inc. logo
    gpt-5.5-pro
    OpenAI Inc.·128K max output
    1.1M
  13. 13
    OpenAI Inc. logo
    gpt-5.4-pro
    OpenAI Inc.·128K max output
    1.1M
  14. 14
    OpenAI Inc. logo
    gpt-5.4
    OpenAI Inc.·128K max output
    1.1M
  15. 15
    OpenAI Inc. logo
    gpt-5.5
    OpenAI Inc.·128K max output
    1.1M
  16. 16
    OpenAI Responses logo
    gpt-5.5-pro
    OpenAI Responses·128K max output
    1.1M
  17. 17
    OpenAI Responses logo
    gpt-5.4-pro
    OpenAI Responses·128K max output
    1.1M
  18. 18
    OpenAI Responses logo
    gpt-5.5
    OpenAI Responses·128K max output
    1.1M
  19. 19
    OpenAI Responses logo
    gpt-5.4
    OpenAI Responses·128K max output
    1.1M
  20. 20
    Google LLC (Gemini API) logo
    gemini-3.1-flash-lite-preview
    Google LLC (Gemini API)·66K max output
    1.0M
  21. 21
    Google LLC (Gemini API) logo
    gemini-3-flash-preview
    Google LLC (Gemini API)·66K max output
    1.0M
  22. 22
    Google LLC (Gemini API) logo
    gemini-3.1-pro-preview
    Google LLC (Gemini API)·66K max output
    1.0M
  23. 23
    Google LLC (Gemini API) logo
    gemini-3-pro-preview
    Google LLC (Gemini API)·66K max output
    1.0M
  24. 24
    Google LLC (Gemini API) logo
    gemini-2.5-flash-lite
    Google LLC (Gemini API)·66K max output
    1.0M
  25. 25
    Google LLC (Gemini API) logo
    gemini-2.5-pro
    Google LLC (Gemini API)·66K max output
    1.0M
  26. 26
    Google LLC (Gemini API) logo
    gemini-2.5-flash
    Google LLC (Gemini API)·66K max output
    1.0M
  27. 27
    Google LLC (Gemini API) logo
    gemini-2.0-flash-001
    Google LLC (Gemini API)·8K max output
    1.0M
  28. 28
    gemini-2.5-pro@us-south1
    Coding API·66K max output
    1.0M
  29. 29
    gemini-2.5-pro@europe-central2
    Coding API·66K max output
    1.0M
  30. 30
    gemini-2.5-flash@us-central1
    Coding API·66K max output
    1.0M

How we rank

Ranked by the model's maximum context window. Context window is the total tokens (input + output) the model can process in a single request. Note that effective quality often degrades well below the advertised maximum — most production workloads get better results from prompt caching and retrieval than from stuffing more tokens in every call.

One API for every model on this list

Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.

Get started free