Requesty
Back|MAY '25ROUTING / REQUESTY FEATURES
4 MIN READ|

Smarter-Than-Human Model Picking: Introducing Requesty Smart Routing

Thibault Jaigu
Thibault Jaigu
CEO & Co-Founder
Published

https://www.youtube.com/watch?v=fx3gX7ZSC9c

TL;DR – Stop guessing which LLM is “best” for every prompt. Requesty Smart Routing automatically classifies each task (code, chat, SQL, creative writing, etc.) in ~50 ms and forwards it to the optimal model (GPT-4o, Claude-3, Gemini-Flash, DeepSeek, Mistral-Large, you name it). One API key, zero context-switching, up to 80 % cost-savings and consistent latency.


1. Why We Built Smart Routing

Even power users struggle to juggle the expanding LLM zoo:

Task“Best” model todayTokens / $1Latency
Short chit-chatGemini-Flash-2.5~3 000⚡ Fast
Mid-sized codingClaude-4 Sonnet~1 100🟡 Medium
Long-form blogGPT-4o~240🔴 Slow

Tomorrow the table changes again.

Developers either (a) hard-code a single premium model and overpay, or (b) expose end-users to an intimidating “Pick your engine” drop-down. Both hurt UX and margins.

Smart Routing removes that decision entirely.


2. How It Works (Under the Hood)

  1. Task Classifier A compact, in-house transformer (≈65 M params, distilled from 50 k annotated examples) inspects the system + user prompt and predicts a task label in 20–100 ms. Example labels: chat_small, code_medium, sql, creative_long, image_insight.
  2. Policy Engine A YAML/JSON policy maps each label to:
  • Preferred model(s)
  • Budget ceiling
  • Max latency SLA
  • Fallback chain yamlCopyEditcode_medium: primary: "anthropic/claude-4-sonnet" fallback: ["openai/gpt-4o-mini", "mistral/mixtral-8x7b"] max_usd: 0.005 # per request max_latency_ms: 20000
  1. Router Gateway The same endpoint you’re already using: https://router.requesty.ai/v1/chat/completions Simply set model: "smart-task" (or any alias you choose) and pass your prompt. The gateway:
  • Calls the classifier
  • Consults policy
  • Forwards to the chosen provider
  • Logs everything in Live Logs & Analytics
  1. Observability Loop Every response is tagged with chosen_model, tokens, latency_ms, and cost_usd. These metrics feed back into policy tuning and your dashboards.

3. Live Demo Recap

In the launch video we:

  1. Connected OpenWebUI to router.requesty.ai/v1 with the alias smart-task.
  2. Asked “Who built you?” → Router picked Gemini-Flash in 2.8 s for under $0.001.
  3. Asked “Code a Snake game in Python.” → Router switched to Claude-4 Sonnet.
  4. Follow-up “Write a blog post about this Snake game.” → Router used Perplexity Sonar-Pro (fast, cheap long-form).

Three requests, three providers, zero manual switches.

Latency tax: ~65 ms average over 1 000 runs – imperceptible to humans.


4. Key Benefits

💡What you getWhy it matters
One API, all modelsNo more env-var gymnastics or vendor-specific SDKs.Faster prototyping, simpler back-end.
Automatic cost trimmingCheaper models handle lightweight tasks.Teams report 40-80 % savings.
Consistent UXUsers never face “Model selector anxiety”.Higher retention, fewer support tickets.
Live analyticsPer-task spend, latency, error rates.Data to renegotiate budgets or tweak prompts.
No vendor lock-inSwap vendors via config, not code.Future-proof as new models drop weekly.

5. Smart Routing vs. DIY Prompt Engineering

Manual approachRequesty Smart Routing
EffortBuild & host classifier, maintain policies, integrate N SDKs.Plug & play
CoverageDepends on your data set.50 k-prompt corpus, updated monthly.
Edge casesYou chase moving target alone.We ship global fixes once for everyone.
ObservabilityStitch together logs from each provider.Unified Live Logs & Analytics.

6. Quick Start

Shell
curl https://router.requesty.ai/v1/chat/completions \
  -H "Authorization: Bearer $REQUESTY_API_KEY" \
  -d '{
        "model": "smart-task",
        "messages": [
           {"role":"user","content":"Generate a SQL query to find the 5 most active users last month"}
        ]
      }'

That’s literally it. 💫


7. Road-Map Sneak Peek

  • Reinforcement Learning loop – We’ll let your app vote 👍/👎 so the router learns your domain-specific preferences.
  • Fine-grained policy UI – Non-dev teammates can tweak cost limits and fallbacks without touching YAML.
  • Hybrid local-+-cloud routing – Seamlessly blend on-prem models with cloud giants.

8. Try It Today

  • 🆓 $6 in credits for every new workspace → requesty.ai
  • 📺 Watch the 2-min demo in the launch post
  • 🐙 Star us on GitHub (open-sourcing the policy spec soon)
  • 💬 Join our Discord to suggest new routing rules or models

Stop debating “which model should I use?” — let Requesty decide in real time and focus on building products your users love.