Claude Code has taken the developer world by storm. As a terminal-first agentic CLI tool, it can ingest entire codebases, run tests, execute shell commands, and perform complex refactors with an impressive success rate.
However, power comes at a cost. Because Claude Code operates over codebase-scale context windows and runs continuous sub-agent loops, it consumes massive volumes of tokens. Developers frequently hit Anthropic's rate limits (429 errors) and encounter significant API bills on complex projects.
Fortunately, Claude Code is designed to be highly configurable. By pointing the CLI to Requesty, you can solve the rate limit bottleneck, track full execution analytics, use native web search, and even run alternative frontier models like OpenAI's GPT-5.5 or Google's Gemini 3.5 Flash directly inside the Anthropic CLI.
Here is a step-by-step guide to connecting Claude Code to Requesty.
Why Route Claude Code Through an LLM Gateway?
When you run Claude Code natively, it connects directly to Anthropic's endpoints. While this works well for quick tasks, production workflows and large refactoring jobs benefit from an intermediate control plane.
1. Run Any Model in the Claude CLI
By default, the CLI is hardcoded to Anthropic's models. When you route through Requesty, you can pass any model string. Want to handle rapid classification or simple file checks using an ultra-cheap, low-latency model like Gemini 3.5 Flash? Or do you want to compare code synthesis results with OpenAI's GPT-5.5? Requesty handles the protocol translation, allowing you to run over 300+ models within the same CLI interface.
2. Multi-Provider Fallbacks (Zero Downtime)
If Anthropic experiences temporary latency or downtime, your agentic loop normally crashes, losing all state. With Requesty, you can configure an automatic failover policy. If a request to Claude 4 or Opus 4.8 fails, Requesty routes the call to an equivalent-tier model like GPT-5.5 or Gemini 3.5 Pro. The CLI keeps running, and your task completes without interruption.
3. Unified Cost and Execution Analytics
Claude Code's sub-agents run in the background, making dozens of calls that are hard to track individually. Requesty automatically captures full execution telemetry. You can view time to first token (TTFT), cost per session, and precise token usage in the Requesty Live Logs.
By labeling your CLI API keys, you can isolate your local development spend from your production applications.
4. Native Web Search Without Configuration
Requesty offers native web search capabilities handled directly at the routing layer. If your model needs real-time information—such as looking up the latest API changes for a library or researching a newly released package—Requesty can perform the web search, retrieve markdown results, and supply them to the model context.
Step 1: Generate Your Requesty API Key
First, you need to create an API key to authenticate your CLI calls:
- Sign in to your Requesty Dashboard.
- Navigate to Manage API Keys and click Create API Key.
- Give it a descriptive name (e.g.,
claude-code-cli). - Copy the generated key.
Step 2: Configure Claude Code to Use Requesty
Claude Code looks for specific environment variables to resolve its API endpoints and credentials. You can configure this either for a single terminal session or permanently.
Option A: Shell Environment Variables (Temporary)
To quickly test the integration, export these variables in your active terminal:
export ANTHROPIC_BASE_URL="https://router.requesty.ai/v1"
export ANTHROPIC_API_KEY="rqy_your_requesty_api_key"Once exported, run claude as usual. All API traffic will route through Requesty.
Option B: Persistent Configuration via settings.json (Recommended)
To avoid exporting environment variables every time you open a new terminal, you can add them directly to Claude Code's global configuration file.
- Open (or create) the user-level configuration file at
~/.claude/settings.json. - Add your Requesty credentials under the
envblock:
{
"$schema": "https://json-schema.org/claude-code-settings.json",
"env": {
"ANTHROPIC_BASE_URL": "https://router.requesty.ai/v1",
"ANTHROPIC_API_KEY": "rqy_your_requesty_api_key"
}
}Note: For project-specific settings, you can also create a .claude/settings.local.json file in your repository root, which will override the global settings.
Step 3: Running Custom Models
Once Claude Code is pointing to Requesty, you can specify custom models. The CLI supports this in two ways:
1. Literal Model Passthrough
You can pass any model identifier directly in the CLI call using the --model flag:
claude --model google/gemini-2.5-proBecause Requesty is fully compatible with the Anthropic Messages format, it receives the literal string google/gemini-2.5-pro, translates the payload on the fly, and routes the request to Google. The Claude Code interface remains completely functional.
2. Custom Model Options in the CLI Picker
If you prefer using the interactive model selector inside the CLI, you can define a custom model option using environment variables:
export ANTHROPIC_CUSTOM_MODEL_OPTION="openai/gpt-5.5"
export ANTHROPIC_CUSTOM_MODEL_OPTION_NAME="GPT-5.5"
export ANTHROPIC_CUSTOM_MODEL_OPTION_DESCRIPTION="OpenAI's latest frontier model via Requesty"This inserts "GPT-5.5" as a selectable option directly inside the interactive /model command menu.
3. Gateway Model Auto-Discovery
If you are running the latest version of Claude Code, you can enable automatic model discovery:
export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1When active, Claude Code queries Requesty's /v1/models endpoint at startup and dynamically populates your local model selector with all models currently enabled in your Requesty account.
Step 4: Enabling Web Search
One of the standout features of the Requesty gateway is native web search. Traditional setups require configuring custom search nodes or managing separate search API quotas.
To enable native search for your Claude Code sessions:
- In the Requesty Dashboard, go to your active routing policy or API key settings.
- Toggle Enable Web Search to on.
- Configure your desired Web Search Context Size (low, medium, or high).
Now, when Claude Code determines it needs external or real-time information to complete a task, Requesty automatically executes the search, parses the markdown results, and merges them into the model's prompt context—giving your agent a live connection to the web.
Analyzing Your CLI Runs
Once your integration is active, you can monitor everything in real time. Open your Requesty Live Logs to see:
- Full Cost Attribution: Inspect the exact cost of each multi-file refactor.
- Latency Breakdowns: Compare the Time to First Token (TTFT) and total generation time across different models.
- Prompt Caching Efficiency: Check your cache hit rates to see how much input token cost you are saving on large codebase context evaluations.
By combining the powerful terminal-first capabilities of Claude Code with the flexibility, cost savings, and reliability of Requesty, you get a highly optimized AI development environment.
To get started, sign up at Requesty and configure your ~/.claude/settings.json today!
Frequently asked questions
- Can I use models other than Claude inside Claude Code?
- Yes. By pointing Claude Code to Requesty, you can route your requests to any model supported by our gateway—including OpenAI GPT-5.5, Gemini 3.5 Flash, and Llama 3. This lets you optimize for speed and cost depending on the complexity of your refactoring task.
- Does native web search work when routing Claude Code through Requesty?
- Yes. Requesty's native web search capabilities are handled directly at the routing layer. If you enable web search on your API key or routing policy, Requesty injects real-time search context into the model's environment, providing up-to-date information without requiring local configuration.
- How do I set the custom gateway URL in Claude Code?
- You can set the gateway URL by exporting the `ANTHROPIC_BASE_URL` environment variable or by adding it permanently to the `env` block in your global `~/.claude/settings.json` file.
- MAY '26
Agentic Coding Tools Compared (2026): Claude Code, Cursor, Codex, Aider, and the Gateway That Connects Them
Claude Code, Cursor 3, OpenAI Codex, Aider, Roo Code, and Cline are all shipping autonomous agents in 2026. Here is how they compare on architecture, pricing, benchmarks, and which LLM gateway they support.
- AUG '25
Roo Code + GPT-5 with Requesty: Autonomous Full-Stack Dev in Your IDE
- MAR '25
Level Up Your Coding with Roo Code and Requesty
- MAY '26
LLM Observability in Production: The Metrics That Actually Matter
Most teams instrument their LLM calls wrong. They track total cost and call count, then wonder why their agent suddenly takes 40 seconds to respond. Here is what to measure instead, how to debug common production issues, and what a useful LLM dashboard actually looks like.

