Requesty
Back|APR '26INTEGRATIONS / AGENTS
8 MIN READ|

Claude Cowork, on 300+ models: the Requesty integration

Thibault Jaigu
Thibault Jaigu
CEO & Co-Founder
Last updated

Claude Cowork is Anthropic's agentic desktop app for knowledge work. It runs on your machine, touches local files and applications, and completes multi step tasks the way a junior analyst would. As of April 9, 2026 it ships with every paid Claude plan (Pro, Team, Enterprise) and a quieter feature that matters much more for teams: Third Party Inference mode, a setting that lets Cowork route all its model calls through a gateway you control instead of Anthropic's cloud (Anthropic docs).

Point that setting at Requesty and the same desktop app, same UI, same Cowork behavior runs against 300+ models (Sonnet, Opus, Haiku, GPT, Gemini, Mistral, Llama, Bedrock variants), with gateway level fallback, load balancing, prompt caching, and per request cost tracking bolted on. No Anthropic account is required at launch. No code changes. Ten minutes of setup.

This post explains what Cowork does, why the integration exists, how to configure it, and what it changes for the three teams who care most: developers, platform operators, and security or finance leads signing off on the rollout.

The 150 token answer

Claude Cowork with Requesty is Anthropic's desktop agent pointed at a 300+ model gateway. Same Cowork features (local file access, scheduled tasks, multi step execution, Zoom and DocuSign connectors) but model choice, fallback, caching, EU residency, and cost attribution now live at the gateway instead of the client. You set one header (X-Title: Claude-Cowork) and everything that runs through the Cowork tab is tagged, billed, and observable separately from the rest of your AI stack. Enterprise fleets deploy via .mobileconfig (macOS) or .reg (Windows) without enabling Developer Mode on individual machines.

What Claude Cowork actually does

Anthropic's framing is that Cowork is "built around the outcome" rather than the prompt (Anthropic product page). Instead of "answer this question", the pattern is "organize my vendor contracts folder by renewal date" or "pull this week's metrics and draft the board update". Cowork breaks the goal into steps, uses local files and applications, and loops the user in before consequential actions (Claude research preview post).

Anthropic positions it at a specific audience: researchers, analysts, operations teams, legal, finance. Not developers (that's Claude Code). The CNBC coverage put it as giving "the average office worker a productivity boost". The February 2026 enterprise push added role based access controls, group spend limits, usage analytics through an Analytics API, and expanded OpenTelemetry support (eWeek).

One feature is worth flagging because it is the reason this integration exists.

Third Party Inference: the feature teams were quietly waiting for

Tucked inside Cowork is a mode Anthropic calls Cowork on 3P. It ships a single configuration surface that says: route every model call somewhere else. Anthropic's own overview doc is unusually direct about when to use it:

ReasonAnthropic's exact framing
Data never touches Anthropic"organizations whose security, regulatory, or contractual requirements prevent them from sending data to Anthropic's first party infrastructure"
Data stays in region"in region data residency and cannot send conversation data to the United States"
Public sector and defense"agencies and contractors operating under FedRAMP, ITAR, or sovereign cloud mandates"
Multi provider flexibilityAny gateway exposing /v1/messages with the right headers

In other words Anthropic built a sanctioned path out of its own API. The gateway "needs to expose /v1/messages and forward the anthropic-beta and anthropic-version headers" (ProductCompass walkthrough). Requesty exposes exactly that endpoint with exactly those headers, which is why Cowork can treat Requesty as if it were Anthropic and the user cannot tell the difference.

Why this integration matters, in three audiences

For developers and power users

Cowork ships with a model picker. Without Third Party Inference, that picker is Anthropic only. With Requesty configured, the same picker exposes 300+ models using the same provider/model-name shape Requesty uses elsewhere (Requesty integration docs):

Text
anthropic/claude-sonnet-4-5
openai/gpt-4o
google/gemini-2.5-pro
bedrock/claude-opus-4-6
mistral/mistral-large-latest

You can also reference a policy by name (policy/prod, policy/eu-only, policy/cheap-first) and let the gateway resolve model selection, fallback order, and load balance weights from Requesty's config. Same Cowork UI, one string in the model picker, a full routing graph behind it. This matters because Cowork's appeal is that it works on whole outcomes: if the frontier model is down or slow, a mid turn failover keeps the task moving instead of stopping halfway through a spreadsheet build.

Latency wise, Requesty's benchmark post puts gateway overhead at roughly 16ms per request against 55ms for OpenRouter and 124ms for self hosted LiteLLM. On a Cowork task that makes twenty model calls, that is the difference between a ~0.3s tax and a ~2s tax.

For platform and DevEx teams

The operational story is cost attribution. Cowork is an agent: one user session can fire hundreds of model calls across Sonnet, Haiku, and a frontier escalation. Without a way to tag that traffic, Cowork spend looks identical to any other Anthropic API traffic and finance cannot tell apart "legal team drafting" from "engineering team building" from "one runaway loop".

The Requesty configuration sets an X-Title: Claude-Cowork header on every outbound request. In Requesty's Usage Analytics, that header becomes a filter dimension. Cowork traffic sits next to Claude Code, Cline, Cursor, Aider, or a direct API integration in the same dashboard, each isolated by title. Per API key spend attribution lets you hand each team its own key and see exactly who spent what without standing up a separate billing system (labeling API keys for cost attribution).

This also solves a quieter problem. Cowork runs locally with conversation history on device (Anthropic 3P overview), so by default admins have no visibility into what employees are doing with it. Requesty's request logs (opt in, with PII redaction) give you the missing half: the prompts and completions flowing through the gateway, visible to whichever admin role you grant.

For security, compliance, and finance leads

Three things land in this audience's in tray.

EU data residency. Switch the gateway URL from https://router.requesty.ai to https://router.eu.requesty.ai and Cowork traffic stays on EU infrastructure. For GDPR conscious industries (legal, finance, healthcare analytics) this is the difference between an approved rollout and a six week InfoSec review followed by a no.

Fleet deployment without Developer Mode. Developer Mode is a per user setting, which is awful for IT. Requesty's setup page exports a signed configuration profile in either .mobileconfig format (pushable through Jamf, Intune, Workspace ONE) or .reg format (Windows Group Policy). Machines pick up the gateway config on next login. Users never see a developer toggle.

Spend controls. Cowork is agentic, which means an enthusiastic prompt can chain into many model calls. Requesty Budgets and Alerts lets you cap org level spend, alert at 80% of budget, and hard stop at 100%, while individual API keys can have their own lower caps. Anthropic's enterprise tier offers group spend limits (eWeek coverage) but those apply only to Anthropic model spend. The gateway version works for every model in the picker.

Setup in five steps

From the Requesty docs page:

  1. Enable Developer Mode. Help → Troubleshooting → Enable Developer Mode. (Skip this step for fleet rollouts using the exported profile.)
  2. Open Third Party Inference. Developer → Configure Third Party Inference, select "Gateway" backend.
  3. Configure the gateway.
    Text
    Gateway base URL:  https://router.requesty.ai
    API key:           <your Requesty key from app.requesty.ai/api-keys>
    Auth scheme:       bearer
    Extra headers:     X-Title: Claude-Cowork
  4. Apply locally. Saves the configuration on the machine.
  5. Restart Cowork and choose "Continue with Gateway" at launch. No Anthropic sign in required.

For EU traffic, substitute https://router.eu.requesty.ai in step 3. If you see 401 Unauthorized on first run, the usual culprit is a stray whitespace in the pasted API key or the auth scheme set to something other than bearer.

Models and policies available from the Cowork picker

Anything Requesty routes becomes a valid Cowork model string. The useful ones to know:

StringWhat you get
anthropic/claude-sonnet-4-5Default Cowork workhorse, stays on Anthropic
anthropic/claude-opus-4-6Frontier reasoning for hard synthesis steps
openai/gpt-4oBroad tool use, strong structured output
google/gemini-2.5-proLong context, often cheaper on large file workflows
bedrock/claude-opus-4-6Same model, AWS Bedrock execution and data path
mistral/mistral-large-latestEU hosted option, useful for residency cases
policy/prodResolved by Requesty: fallback chain, weights, region
policy/eu-onlyForces EU region models regardless of picker selection

The [1m] suffix unlocks 1M token context variants on supported models (Requesty docs).

What this does not do

Three honest limitations.

  1. Plugins and connectors still run through Anthropic. Zoom summaries, DocuSign integration, Google Drive (TestingCatalog coverage) use Anthropic's plugin infrastructure even when model inference is routed through a gateway. If your compliance bar is "nothing touches Anthropic", you need to disable those connectors.
  2. Mobile Cowork (iOS and Android) does not support Third Party Inference. Desktop only. The mobile apps still route through Anthropic.
  3. Org level Approved Models rules still apply. If your Requesty org restricts what models any key can call, Cowork inherits that allow list. Features and models not enabled at the org do not appear in the Cowork picker either.

The upgrade path

If you already route Claude Code or Cursor or Cline through Requesty, this is a one header addition. Same API key, same policies, same analytics, new source (X-Title: Claude-Cowork) showing up in the usage filter dropdown within minutes of first use. If you have not onboarded yet, this is the gentlest possible first integration: no code to write, no agent framework to stand up, no CI pipeline to modify. A product leader with a Requesty key and admin rights on their laptop can validate the whole loop in a coffee break and show finance a line item broken out by application by the end of the day.

The headline Anthropic used for the GA announcement was that Cowork brings "Claude Code power for knowledge work" (research preview post). The honest version for teams is: it brings Claude Code's model flexibility story to knowledge work, once you point it at a gateway. That gateway is what this integration is.

Setup page: docs.requesty.ai/integrations/claude-cowork. Sign up: app.requesty.ai.

Frequently asked questions

What is Claude Cowork?
Claude Cowork is Anthropic's desktop agent for knowledge work. It runs on macOS and Windows, accesses local files and applications, and completes multi step tasks (organizing files, preparing reports, extracting data from unstructured documents) with the user looped in before significant actions. It went generally available on all paid plans on April 9, 2026.
What does the Requesty integration unlock?
Point Cowork's Third Party Inference setting at https://router.requesty.ai and the app's model picker exposes 300+ models from Anthropic, OpenAI, Google, AWS Bedrock, Mistral, Meta, and others. One gateway, one API key, one invoice, with automatic fallback, load balancing, and per request cost attribution. No Anthropic account is required when launching with 'Continue with Gateway'.
Why would a team route Cowork through a gateway instead of Anthropic's direct API?
Four reasons Anthropic's own documentation flags: regulatory or contractual constraints that block sending data to Anthropic first party infrastructure, in region data residency requirements (FedRAMP, ITAR, EU GDPR), existing enterprise gateway investments that already proxy Claude Code and API traffic, and multi model flexibility (GPT, Gemini, Llama, or a local model in the same desktop app).
How do you set it up?
Enable Developer Mode (Help → Troubleshooting → Enable Developer Mode), open Developer → Configure Third Party Inference, select 'Gateway' as the backend, enter base URL https://router.requesty.ai, paste your Requesty API key, set auth scheme to 'bearer', and add header X-Title: Claude-Cowork. Apply locally, restart Cowork, choose 'Continue with Gateway' at launch. For EU data residency use https://router.eu.requesty.ai.
Can this be deployed across a fleet without enabling Developer Mode on each laptop?
Yes. Requesty's configuration page exports a .mobileconfig profile for macOS (Jamf, Intune, Workspace ONE) or a .reg file for Windows Group Policy. Admins push the config through MDM and machines pick up the gateway endpoint on next login, without individual users touching Developer Mode.
How is traffic separated from other Requesty usage in analytics?
The integration requires the header X-Title: Claude-Cowork on every request. In Requesty Usage Analytics, filter by X-Title to see Cowork spend separately from Claude Code, Cline, Cursor, Aider, or programmatic API traffic. Per API key attribution lets finance split spend by team without standing up another billing system.
Related reading