Can I use models other than Claude inside Claude Code?

Yes. By pointing Claude Code to Requesty, you can route your requests to any model supported by our gateway—including OpenAI GPT-5.5, Gemini 3.5 Flash, and Llama 3. This lets you optimize for speed and cost depending on the complexity of your refactoring task.

Does native web search work when routing Claude Code through Requesty?

Yes. Requesty's native web search capabilities are handled directly at the routing layer. If you enable web search on your API key or routing policy, Requesty injects real-time search context into the model's environment, providing up-to-date information without requiring local configuration.

How do I set the custom gateway URL in Claude Code?

You can set the gateway URL by exporting the `ANTHROPIC_BASE_URL` environment variable or by adding it permanently to the `env` block in your global `~/.claude/settings.json` file.

An AI gateway between your app and 400+ LLM providers. Change your base URL to router.requesty.ai and instantly get intelligent routing, fallbacks, cost optimization, caching, governance, and observability.

One line of code: client = OpenAI(base_url='https://router.requesty.ai/v1', api_key='your-key'). Works with all major SDKs.

How does it reduce costs?

Smart routing to cheaper equivalent models, caching, automatic fallback from expensive providers, per-user spending limits, and real-time cost analytics.

Does it work with Cursor, Cline, Continue?

Yes. Native OAuth integrations. Any model, unlimited requests, no rate limits.

How does pricing work?

5% markup on model costs. All features included. Enterprise plans available with volume discounts.

Can I use my own API keys?

Yes. Bring your own keys for any provider while getting Requesty's routing and observability. Or use our unified key.

Supercharging Claude Code: How to Connect Anthropic's Official CLI to Requesty

Claude Code has taken the developer world by storm. As a terminal-first agentic CLI tool, it can ingest entire codebases, run tests, execute shell commands, and perform complex refactors with an impressive success rate.

However, power comes at a cost. Because Claude Code operates over codebase-scale context windows and runs continuous sub-agent loops, it consumes massive volumes of tokens. Developers frequently hit Anthropic's rate limits (429 errors) and encounter significant API bills on complex projects.

Fortunately, Claude Code is designed to be highly configurable. By pointing the CLI to Requesty, you can solve the rate limit bottleneck, track full execution analytics, use native web search, and even run alternative frontier models like OpenAI's GPT-5.5 or Google's Gemini 3.5 Flash directly inside the Anthropic CLI.

Here is a step-by-step guide to connecting Claude Code to Requesty.

Why Route Claude Code Through an LLM Gateway?

When you run Claude Code natively, it connects directly to Anthropic's endpoints. While this works well for quick tasks, production workflows and large refactoring jobs benefit from an intermediate control plane.

1. Run Any Model in the Claude CLI

By default, the CLI is hardcoded to Anthropic's models. When you route through Requesty, you can pass any model string. Want to handle rapid classification or simple file checks using an ultra-cheap, low-latency model like Gemini 3.5 Flash? Or do you want to compare code synthesis results with OpenAI's GPT-5.5? Requesty handles the protocol translation, allowing you to run over 300+ models within the same CLI interface.

2. Multi-Provider Fallbacks (Zero Downtime)

If Anthropic experiences temporary latency or downtime, your agentic loop normally crashes, losing all state. With Requesty, you can configure an automatic failover policy. If a request to Claude 4 or Opus 4.8 fails, Requesty routes the call to an equivalent-tier model like GPT-5.5 or Gemini 3.5 Pro. The CLI keeps running, and your task completes without interruption.

3. Unified Cost and Execution Analytics

Claude Code's sub-agents run in the background, making dozens of calls that are hard to track individually. Requesty automatically captures full execution telemetry. You can view time to first token (TTFT), cost per session, and precise token usage in the Requesty Live Logs.

By labeling your CLI API keys, you can isolate your local development spend from your production applications.

4. Native Web Search Without Configuration

Requesty offers native web search capabilities handled directly at the routing layer. If your model needs real-time information—such as looking up the latest API changes for a library or researching a newly released package—Requesty can perform the web search, retrieve markdown results, and supply them to the model context.

Step 1: Generate Your Requesty API Key

First, you need to create an API key to authenticate your CLI calls:

Sign in to your Requesty Dashboard.
Navigate to Manage API Keys and click Create API Key.
Give it a descriptive name (e.g., claude-code-cli).
Copy the generated key.

Step 2: Configure Claude Code to Use Requesty

Claude Code looks for specific environment variables to resolve its API endpoints and credentials. You can configure this either for a single terminal session or permanently.

Option A: Shell Environment Variables (Temporary)

To quickly test the integration, export these variables in your active terminal:

Shell

export ANTHROPIC_BASE_URL="https://router.requesty.ai/v1"
export ANTHROPIC_API_KEY="rqy_your_requesty_api_key"

Once exported, run claude as usual. All API traffic will route through Requesty.

Option B: Persistent Configuration via `settings.json` (Recommended)

To avoid exporting environment variables every time you open a new terminal, you can add them directly to Claude Code's global configuration file.

Open (or create) the user-level configuration file at ~/.claude/settings.json.
Add your Requesty credentials under the env block:

JSON

{
  "$schema": "https://json-schema.org/claude-code-settings.json",
  "env": {
    "ANTHROPIC_BASE_URL": "https://router.requesty.ai/v1",
    "ANTHROPIC_API_KEY": "rqy_your_requesty_api_key"
  }
}

Note: For project-specific settings, you can also create a .claude/settings.local.json file in your repository root, which will override the global settings.

Step 3: Running Custom Models

Once Claude Code is pointing to Requesty, you can specify custom models. The CLI supports this in two ways:

1. Literal Model Passthrough

You can pass any model identifier directly in the CLI call using the --model flag:

Shell

claude --model google/gemini-2.5-pro

Because Requesty is fully compatible with the Anthropic Messages format, it receives the literal string google/gemini-2.5-pro, translates the payload on the fly, and routes the request to Google. The Claude Code interface remains completely functional.

2. Custom Model Options in the CLI Picker

If you prefer using the interactive model selector inside the CLI, you can define a custom model option using environment variables:

Shell

export ANTHROPIC_CUSTOM_MODEL_OPTION="openai/gpt-5.5"
export ANTHROPIC_CUSTOM_MODEL_OPTION_NAME="GPT-5.5"
export ANTHROPIC_CUSTOM_MODEL_OPTION_DESCRIPTION="OpenAI's latest frontier model via Requesty"

This inserts "GPT-5.5" as a selectable option directly inside the interactive /model command menu.

3. Gateway Model Auto-Discovery

If you are running the latest version of Claude Code, you can enable automatic model discovery:

Shell

export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1

When active, Claude Code queries Requesty's /v1/models endpoint at startup and dynamically populates your local model selector with all models currently enabled in your Requesty account.

Step 4: Enabling Web Search

One of the standout features of the Requesty gateway is native web search. Traditional setups require configuring custom search nodes or managing separate search API quotas.

To enable native search for your Claude Code sessions:

In the Requesty Dashboard, go to your active routing policy or API key settings.
Toggle Enable Web Search to on.
Configure your desired Web Search Context Size (low, medium, or high).

Now, when Claude Code determines it needs external or real-time information to complete a task, Requesty automatically executes the search, parses the markdown results, and merges them into the model's prompt context—giving your agent a live connection to the web.

Analyzing Your CLI Runs

Once your integration is active, you can monitor everything in real time. Open your Requesty Live Logs to see:

Full Cost Attribution: Inspect the exact cost of each multi-file refactor.
Latency Breakdowns: Compare the Time to First Token (TTFT) and total generation time across different models.
Prompt Caching Efficiency: Check your cache hit rates to see how much input token cost you are saving on large codebase context evaluations.

By combining the powerful terminal-first capabilities of Claude Code with the flexibility, cost savings, and reliability of Requesty, you get a highly optimized AI development environment.

To get started, sign up at Requesty and configure your ~/.claude/settings.json today!

Frequently asked questions

Can I use models other than Claude inside Claude Code?: Yes. By pointing Claude Code to Requesty, you can route your requests to any model supported by our gateway—including OpenAI GPT-5.5, Gemini 3.5 Flash, and Llama 3. This lets you optimize for speed and cost depending on the complexity of your refactoring task.
Does native web search work when routing Claude Code through Requesty?: Yes. Requesty's native web search capabilities are handled directly at the routing layer. If you enable web search on your API key or routing policy, Requesty injects real-time search context into the model's environment, providing up-to-date information without requiring local configuration.
How do I set the custom gateway URL in Claude Code?: You can set the gateway URL by exporting the `ANTHROPIC_BASE_URL` environment variable or by adding it permanently to the `env` block in your global `~/.claude/settings.json` file.

Supercharging Claude Code: How to Connect Anthropic's Official CLI to Requesty

Why Route Claude Code Through an LLM Gateway?

1. Run Any Model in the Claude CLI

2. Multi-Provider Fallbacks (Zero Downtime)

3. Unified Cost and Execution Analytics

4. Native Web Search Without Configuration

Step 1: Generate Your Requesty API Key

Step 2: Configure Claude Code to Use Requesty

Option A: Shell Environment Variables (Temporary)

Option B: Persistent Configuration via `settings.json` (Recommended)

Step 3: Running Custom Models

1. Literal Model Passthrough

2. Custom Model Options in the CLI Picker

3. Gateway Model Auto-Discovery

Step 4: Enabling Web Search

Analyzing Your CLI Runs

Frequently asked questions

Agentic Coding Tools Compared (2026): Claude Code, Cursor, Codex, Aider, and the Gateway That Connects Them

Roo Code + GPT-5 with Requesty: Autonomous Full-Stack Dev in Your IDE

Level Up Your Coding with Roo Code and Requesty

LLM Observability in Production: The Metrics That Actually Matter

Supercharging Claude Code: How to Connect Anthropic's Official CLI to Requesty

Why Route Claude Code Through an LLM Gateway?

1. Run Any Model in the Claude CLI

2. Multi-Provider Fallbacks (Zero Downtime)

3. Unified Cost and Execution Analytics

4. Native Web Search Without Configuration

Step 1: Generate Your Requesty API Key

Step 2: Configure Claude Code to Use Requesty

Option A: Shell Environment Variables (Temporary)

Option B: Persistent Configuration via settings.json (Recommended)

Step 3: Running Custom Models

1. Literal Model Passthrough

2. Custom Model Options in the CLI Picker

3. Gateway Model Auto-Discovery

Step 4: Enabling Web Search

Analyzing Your CLI Runs

Frequently asked questions

Agentic Coding Tools Compared (2026): Claude Code, Cursor, Codex, Aider, and the Gateway That Connects Them

Roo Code + GPT-5 with Requesty: Autonomous Full-Stack Dev in Your IDE

Level Up Your Coding with Roo Code and Requesty

LLM Observability in Production: The Metrics That Actually Matter

Option B: Persistent Configuration via `settings.json` (Recommended)