Most LLM teams treat user feedback like an afterthought: a thumbs-up widget in the UI that nobody reads. That's a mistake. A single bit of post-hoc quality signal per request is the cheapest eval you will ever run, and the one that's closest to what users actually want. Requesty's Request Feedback API is a metadata sidecar that attaches arbitrary signals — ratings, tags, comments, user IDs — to any completed request by its request_id, so the feedback can drive routing, prompt changes, and quality regression alerts.
This post covers the why, the wire format, and three patterns worth copying.
Why feedback beats offline evals at the 80% mark
Offline evals catch the cases you thought to write. Users hit the cases you didn't. A 2024 audit I keep citing found about 15% of production LLM failures happen on inputs that never appear in the eval set — because users are creative and your test writers aren't. That gap is where feedback wins.
| Signal | Latency to detect issue | Covers unknown cases? | Cost |
|---|---|---|---|
| Offline evals | Next eval run (daily) | No — only what's tested | High (eng time) |
| Canary metrics | Minutes | Partial — aggregate only | Low |
| User feedback | Real-time | Yes | Near-zero |
| Post-hoc audit | Days | Yes | High (human) |
Feedback plugs the gap between "we have tests" and "we know the product is working."
The wire format (10 lines of code)
Every Requesty chat completion returns a request_id. Capture it, store it against whatever user action produced it, and POST back when you learn whether the output was good.
from openai import OpenAI
import httpx
client = OpenAI(
base_url="https://router.requesty.ai/v1",
api_key="your-requesty-api-key",
)
resp = client.chat.completions.create(
model="policy/support-v2",
messages=[{"role": "user", "content": user_question}],
)
answer = resp.choices[0].message.content
request_id = resp.id # keep this around
# Later — when the user clicks 👍 / 👎, or your eval pipeline scores it:
httpx.post(
f"https://api.requesty.ai/feedback/{request_id}",
headers={"Authorization": "Bearer your-requesty-api-key"},
json={
"data": {
"rating": 5, # or 1, whatever scale you use
"helpful": True,
"message": "Resolved the user's issue.",
"user_id": "u_28419",
"tags": ["support", "resolved", "first-reply"],
}
},
)The data object is completely free-form. Rating, helpful, message, user_id, tags — or whatever schema your team uses. Multiple submissions merge: you can POST a thumbs-up from the UI immediately and a structured eval score from your nightly pipeline later, and both end up on the same request.
Three patterns worth copying
1. Feedback-driven canary promotion
Run a 90/10 load-balancing policy. Your stable routing policy gets 90%, an experimental one gets 10%. Pipe feedback into your warehouse. Daily: compute avg(rating) and count(helpful=false) per policy. If the experimental one wins by a margin you set, promote its weight to 20, then 50, then 100.
This is the cleanest version of "progressive delivery" for LLMs — you don't need an eval harness or a team of labellers. Real users are the evaluator.
2. Bad-feedback → auto-escalate
Tag low-rated requests in real time and route the next turn from that same user to a stronger model. Pseudocode:
if last_rating_from(user_id) <= 2:
resp = client.chat.completions.create(
model="policy/escalated", # opus / gpt-5 tier
messages=messages,
)
else:
resp = client.chat.completions.create(
model="policy/default",
messages=messages,
)You're using the feedback stream as a routing primitive — a "the cheap model failed this user once, give them the good one" signal. See also: Routing policies 101 for how to build policies like policy/escalated.
3. Regression alerts with tags
Tag every feedback submission with the product surface that generated it (support, onboarding, checkout, etc.) and the prompt version (prompt_v=2025-11-08). When you ship a new prompt, watch the rolling helpful=false rate per surface. If checkout drops 3 points overnight, the new prompt is the suspect. Roll back.
That's the whole workflow. It's not complicated — it's just that most teams don't have the plumbing to wire feedback back to requests in the first place, which is what the API gives you.
What to track, and what to ignore
A common mistake is to capture too much. For most teams, the high-signal fields are:
rating(1–5 or thumbs) — always capturehelpful(boolean) — yes/no is easier for users than a 5-point scale; capture both if you canuser_id— for cohort analysistagswith the prompt version and the routing policy — so you can attribute regressions
Skip free-form text from users in the main data field. Collect it in message if they volunteer it, but don't gate feedback on writing a comment — 90% of users won't.
The one thing to take away
Production feedback is a continuous, free, real-user eval. It catches regressions your test suite can't, it pays for itself the first time you roll back a bad prompt because of it, and it's 15 lines of code to wire up. Routing gateways make it cheap; your product team makes it valuable.
Route → observe → feedback → route again. That's the loop.
Frequently asked questions
- What is Request Feedback in Requesty?
- Request Feedback is a Requesty API endpoint that lets your application attach post-hoc signals — ratings, comments, booleans, tags, user IDs — to a completed LLM request using its request_id. The feedback is stored alongside the request and surfaced in the dashboard analytics, so you can slice quality by model, prompt, user segment, or any custom tag.
- How do I send feedback for an LLM response?
- Capture the request_id from the chat completion response, then POST to https://api.requesty.ai/feedback/{request_id} with a JSON body containing any fields you care about — rating, helpful, message, user_id, tags. Multiple submissions merge, so you can append enrichment later (e.g. after an eval job runs).
- What's the difference between feedback and evals?
- Feedback is a lightweight production signal you collect continuously from real users or downstream checks. Evals are heavier, structured test suites run offline. Feedback tells you which specific production requests failed and why — a signal your evals probably don't catch because users hit edge cases you didn't write tests for.
- Can I use feedback to pick routing policies automatically?
- Not automatically on Requesty today — policies are configured explicitly, not learned. But you can feed the feedback stream into your own pipeline: filter requests with rating < 3 in the last 24h, check which model served them, and promote or demote candidates in your load-balancing policy. Some teams do this on a daily cadence.
- Does feedback data affect billing or rate limits?
- No. The feedback API is a metadata sidecar — no token cost, no rate-limit impact on your inference traffic. It writes to the request log, nothing else.
- JAN '26
Routing policies 101: fallback, load balancing, and latency in production
The three routing-policy primitives every LLM gateway needs — fallback chains, weighted load balancing, and latency-based selection — and when to use each. Written for teams deploying multi-model production setups.
- APR '26
Agentic routing, benchmarked: Requesty adds 16ms of overhead, OpenRouter adds 55ms
Agentic routing is the decision layer inside a multi-agent LLM system that picks which model or sub-agent handles an incoming request. Here's what it does, what it costs, and how the gateways compare.
- JAN '26
Designing fallback retries: why Requesty uses 500ms → 4s with jitter
A look at the retry schedule behind Requesty's fallback policies, why exponential backoff with jitter beats a tight retry loop, and the failure modes it actually protects against.

