Requesty's model governance stack solves the problem every platform team hits once they pass five engineers using LLMs: who gets access to what, and how do you enforce it without becoming a bottleneck? The answer is three layers that compose. Approved Models sets the org wide floor. Access Lists carve it into team shaped slices. Expiring API keys ensure credentials rotate without manual intervention. This post walks through how to wire them up together.
Full feature docs: Approved Models and Access Lists.
The problem you hit at scale
Small teams share one API key and nobody cares which model gets called. Then reality arrives:
- The intern calls GPT 5 in a loop and burns $4,000 overnight on a summarization experiment that could have used a $0.10/M token model.
- A compliance requirement lands saying customer data can only touch EU resident models, but half your keys route everywhere.
- Production breaks because someone switched the model field in a config file to a reasoning model that takes 40 seconds per request.
- An old key leaks in a public repo and nobody knows which team owned it or what it could access.
Each of these is a governance failure. The traditional answer is "just be careful" or "put it in a wiki." The actual answer is policy as configuration at the gateway layer.
Layer 1: Approved Models (the org wide whitelist)
This is the broadest control. From the Admin Panel you define which models exist for your organization. Everything not on this list is invisible to your members and inaccessible to your keys.
| What it controls | How it works |
|---|---|
| Model visibility in dashboard | Non approved models hidden from Model Library and Chat |
| GET /v1/models response | Only returns approved models for the calling key |
| Routing policy targets | Fallback chains and load balancers only pick from approved set |
| New model releases | Not approved by default, requires explicit admin action |
Quick start presets
If you are just getting started, Requesty ships one click presets:
| Preset | What it approves |
|---|---|
| EU Only | Models hosted in EU data centers exclusively |
| EU + ZDR | EU models plus those with Zero Data Retention policies |
| US Only | US hosted models |
| ZDR Only | Any model with Zero Data Retention regardless of region |
Pick a preset, then fine tune. Remove what you do not need, add specific models from other regions if your compliance posture allows it.
The design choice: deny by default
New models that appear in the Requesty catalog are not automatically approved. This is the opposite of how most teams operate (where everything is allowed until someone complains). Deny by default means your security posture never degrades silently. You review new models on your schedule and approve them deliberately.
This pattern mirrors how Cloudflare's AI Gateway handles rate limiting at the infrastructure layer, and how LiteLLM's proxy enforces model scoping per virtual key. The difference is Requesty combines all three controls (model whitelist, team scoping, and key expiration) in a single managed surface rather than requiring you to wire up separate systems.
Layer 2: Access Lists (team shaped subsets)
Approved Models answers "what can the org use?" Access Lists answer "what can this specific team or workload use?"
An access list is a named, reusable collection of model IDs. You create it once, then attach it to groups (for team wide policies) or directly to API keys (for workload specific restrictions).
How the hierarchy resolves
When a request arrives, Requesty checks these layers in order. The first non empty layer wins:
API key's own access list
↓ (if none)
Union of access lists from the key's groups
↓ (if none)
Organization Approved Models
↓ (if nothing configured)
Full catalog (everything allowed)This means you can set broad permissions at the org level and progressively tighten for specific teams or production workloads without touching the org config.
Real world example: three teams, one org
Imagine you have approved 40 models at the org level. Here is how you carve that into team policies:
| Team | Access list name | Models included | Attached to |
|---|---|---|---|
| Frontend (chat features) | Chat Production | openai/gpt-4.1-mini, anthropic/claude-haiku-4-5 | Group: Frontend Eng |
| Data Science (experimentation) | Research Wide | 30 models across all providers | Group: Data Science |
| Production Agents | Agent Strict | anthropic/claude-sonnet-4-5, openai/gpt-4.1 | Directly on 3 API keys |
| Customer Support Bot | Support EU | vertex/gemini-2.5-flash, anthropic/claude-haiku-4-5 | Group: Support Ops |
The data science team can experiment freely across 30 models. The production agent keys can only ever call two. The support bot is locked to EU resident models. All from one admin panel, no application code changes.
Creating and attaching (step by step)
Create the list:
- Admin Panel → Access Lists → Create Access List
- Name it clearly (e.g. "Production Agents Q2 2026")
- Search and select models by provider or name
- Save
Attach to a group:
- Admin Panel → Groups → expand target group
- Access List section → Manage → select from dropdown
- Save. Every key in that group immediately inherits the restriction.
Attach to a specific key:
- Admin Panel → API Keys → select the key (or select multiple for bulk)
- Action bar → Attach Access List → pick from dropdown
- Done. This overrides the group list for that key only.
What developers see
When a developer calls GET /v1/models with their key, they only see models their key is allowed to use. Tools like Claude Code, Cursor, GitHub Copilot, and Open WebUI all call this endpoint to populate their model dropdowns. The restriction is invisible in the best way: developers never see models they cannot use, so there is no confusion, no failed requests, no Slack threads asking "why did my request 403?"
If someone does manage to hardcode a model ID that is not on their list, the request fails with provider violates policy and is never forwarded to the upstream provider. Zero data leaves your gateway.
Layer 3: Expiring API keys (enforced rotation)
The third piece of the governance stack is temporal. Even with perfect model access controls, a key that lives forever is a key that eventually leaks. Requesty supports key expiration: you set a date, the key auto revokes at that time, and your security log records the event.
Why expiration matters for team governance
| Scenario | Without expiration | With expiration |
|---|---|---|
| Contractor engagement ends | Key lives on, forgotten | Key dies on contract end date |
| Quarterly security rotation | Someone files a ticket, maybe | Automatic, zero human action |
| Hackathon or POC | Temporary key becomes permanent | Key expires Monday morning |
| Incident response | Revoke all keys manually | Expired keys already dead |
This pattern is standard in mature credential management. AWS IAM temporary credentials, GitHub fine grained tokens with expiry, and GCP service account key rotation all enforce the same principle: credentials should have a natural death. Requesty brings that same discipline to LLM access keys.
Pairing expiration with access lists
The combination is powerful. A contractor gets a key that:
- Expires in 90 days (matches their contract)
- Has an access list limiting them to 3 models (matches their workload)
- Has a monthly spend limit of $500 (matches their budget)
Three constraints, one key, zero ongoing admin work. When the contract ends the key dies on its own. No offboarding ticket, no forgotten revocation.
Putting it all together: a governance playbook
Here is the sequence for a platform engineer setting this up from scratch:
Step 1: Set your org wide floor
Go to Approved Models. Start with a preset if compliance dictates a region, otherwise approve the models your teams have asked for. Be generous here. This is the ceiling, not the assignment.
Step 2: Create access lists per policy boundary
Think in terms of policies, not teams. A team might need different policies for different workloads:
| Policy | Models | Rationale |
|---|---|---|
| Cheap and fast | gpt-4.1-mini, claude-haiku-4-5, gemini-2.5-flash | High volume, low cost workloads |
| Frontier reasoning | o3, claude-sonnet-4-5 | Complex tasks that justify the cost |
| EU compliant | vertex/gemini, any ZDR model | Customer data workloads under GDPR |
| Production locked | Exactly 2 models | Stable, tested, no surprises |
Step 3: Map policies to groups and keys
Attach the broad lists to groups. Attach the strict lists directly to production keys. The hierarchy handles the rest.
Step 4: Set key expiration
For every key that serves a temporary purpose (contractor, POC, hackathon, staging environment), set an expiration date at creation time. For production keys, set quarterly expiration and rotate on schedule.
Step 5: Review monthly
New models ship constantly. Review the Approved Models list when a new release drops. Check Usage Analytics to find models that are approved but unused (remove them to reduce surface area). Audit which keys have expired and confirm replacements are in place.
What this replaces
Without a gateway governance layer, teams cobble together:
| Approach | Problem |
|---|---|
| One shared API key for everyone | No attribution, no access control, catastrophic blast radius on leak |
| Per provider key management | N providers × M teams = NM keys to track, no unified policy |
| Application code checks | Scattered, inconsistent, bypassable, not auditable |
| Honor system wiki pages | Nobody reads wikis under deadline pressure |
| Manual Slack approvals | Bottleneck on one person, no audit trail |
Requesty replaces all five with configuration. The admin panel is the single source of truth. The API enforces it. The audit log proves it.
Cross referencing: how other platforms handle this
The pattern of inserting a governance layer between teams and model providers is becoming industry standard:
LiteLLM Proxy takes the open source, self hosted approach. Virtual keys scoped to teams with model routing aliases give platform engineers full control, but require you to run and maintain the proxy yourself.
Cloudflare AI Gateway focuses on the infrastructure angle (rate limiting, caching, observability) but leaves fine grained team RBAC to other tools.
Portkey offers a commercial gateway with organization level credential sharing and model allow lists per workspace.
Requesty's differentiator is combining the model whitelist, the named access list hierarchy, key expiration, and the guardrails layer into one managed surface. You do not need to compose three tools to get governance. And because Requesty is the router itself (not a proxy in front of another proxy), the access control is zero latency overhead on the routing decision.
TL;DR
- Approved Models is your org wide whitelist. New models are denied by default. Start broad.
- Access Lists are named subsets you attach to groups or keys. They narrow the whitelist per team or per workload without touching org config.
- Resolution order: key list beats group union beats org approved beats full catalog.
- Expiring keys enforce rotation automatically. Set an expiry at creation, the key auto revokes on schedule.
- Developers see nothing they cannot use. GET /v1/models respects the resolved list, so tool dropdowns are always correct.
- Zero application code. All governance lives in the admin panel and applies at the gateway.
- Docs: Approved Models | Access Lists
Frequently asked questions
- What is the difference between Approved Models and Access Lists?
- Approved Models is the organization wide whitelist. It defines the broadest set of models anyone in your org can use. Access Lists are named subsets of that whitelist that you attach to specific groups or API keys to narrow access further. Think of Approved Models as the ceiling and Access Lists as the room dividers.
- Can one API key belong to multiple groups with different access lists?
- Yes. When a key belongs to multiple groups that each have an access list, Requesty takes the union of all group lists. The key can use any model that appears in at least one of its groups' lists. If you need to restrict further, attach a list directly to the key itself, which overrides the group union entirely.
- What happens when I remove a model from an access list that a team is actively using?
- The change takes effect immediately. Any subsequent request targeting that model from a key governed by that list will receive a provider violates policy error. The request is never forwarded upstream. Check Usage Analytics before removing to confirm the model is not in active use.
- Do expiring API keys delete themselves or just stop working?
- They stop working. The key is auto revoked at the expiration time, meaning requests return an authentication error, but the key record remains visible in the Admin Panel for audit. You can see expired keys in the security log and confirm the rotation happened.
- How does this compare to managing access at the provider level directly?
- Managing per provider means maintaining separate key sets, separate policies, and separate audit trails for every provider you use. Requesty collapses that into one governance layer regardless of whether you route to OpenAI, Anthropic, Google, or all three. One access list can contain models from five providers and still be managed as a single policy.
- APR '26
Guardrails for LLM traffic: what gets masked, and why it's org-wide
Requesty Guardrails scan every request and every response for PII, credentials, and financial data — masking matches before the model sees them and before the response returns. One admin toggle, zero application code, no bypass.
- FEB '26
Label your API keys: the cost-attribution trick most teams miss
Requesty API keys carry arbitrary key-value labels. That one feature unlocks per-team, per-feature, per-customer spend attribution without a single line of instrumentation code. Here's the pattern.
- JAN '26
Routing policies 101: fallback, load balancing, and latency in production
The three routing-policy primitives every LLM gateway needs — fallback chains, weighted load balancing, and latency-based selection — and when to use each. Written for teams deploying multi-model production setups.

