Smarter-Than-Human Model Picking: Introducing Requesty Smart Routing

https://www.youtube.com/watch?v=fx3gX7ZSC9c

TL;DR – Stop guessing which LLM is “best” for every prompt. Requesty Smart Routing automatically classifies each task (code, chat, SQL, creative writing, etc.) in ~50 ms and forwards it to the optimal model (GPT-4o, Claude-3, Gemini-Flash, DeepSeek, Mistral-Large, you name it). One API key, zero context-switching, up to 80 % cost-savings and consistent latency.

1. Why We Built Smart Routing

Even power users struggle to juggle the expanding LLM zoo:

Task	“Best” model today	Tokens / $1	Latency
Short chit-chat	Gemini-Flash-2.5	~3 000	⚡ Fast
Mid-sized coding	Claude-4 Sonnet	~1 100	🟡 Medium
Long-form blog	GPT-4o	~240	🔴 Slow

Tomorrow the table changes again.

Developers either (a) hard-code a single premium model and overpay, or (b) expose end-users to an intimidating “Pick your engine” drop-down. Both hurt UX and margins.

Smart Routing removes that decision entirely.

2. How It Works (Under the Hood)

Task Classifier A compact, in-house transformer (≈65 M params, distilled from 50 k annotated examples) inspects the system + user prompt and predicts a task label in 20–100 ms. Example labels: chat_small, code_medium, sql, creative_long, image_insight.
Policy Engine A YAML/JSON policy maps each label to:

Preferred model(s)
Budget ceiling
Max latency SLA
Fallback chain yamlCopyEditcode_medium: primary: "anthropic/claude-4-sonnet" fallback: ["openai/gpt-4o-mini", "mistral/mixtral-8x7b"] max_usd: 0.005 # per request max_latency_ms: 20000

Router Gateway The same endpoint you’re already using: https://router.requesty.ai/v1/chat/completions Simply set model: "smart-task" (or any alias you choose) and pass your prompt. The gateway:

Calls the classifier
Consults policy
Forwards to the chosen provider
Logs everything in Live Logs & Analytics

Observability Loop Every response is tagged with chosen_model, tokens, latency_ms, and cost_usd. These metrics feed back into policy tuning and your dashboards.

3. Live Demo Recap

In the launch video we:

Connected OpenWebUI to router.requesty.ai/v1 with the alias smart-task.
Asked “Who built you?” → Router picked Gemini-Flash in 2.8 s for under $0.001.
Asked “Code a Snake game in Python.” → Router switched to Claude-4 Sonnet.
Follow-up “Write a blog post about this Snake game.” → Router used Perplexity Sonar-Pro (fast, cheap long-form).

Three requests, three providers, zero manual switches.

Latency tax: ~65 ms average over 1 000 runs – imperceptible to humans.

4. Key Benefits

💡	What you get	Why it matters
One API, all models	No more env-var gymnastics or vendor-specific SDKs.	Faster prototyping, simpler back-end.
Automatic cost trimming	Cheaper models handle lightweight tasks.	Teams report 40-80 % savings.
Consistent UX	Users never face “Model selector anxiety”.	Higher retention, fewer support tickets.
Live analytics	Per-task spend, latency, error rates.	Data to renegotiate budgets or tweak prompts.
No vendor lock-in	Swap vendors via config, not code.	Future-proof as new models drop weekly.

5. Smart Routing vs. DIY Prompt Engineering

	Manual approach	Requesty Smart Routing
Effort	Build & host classifier, maintain policies, integrate N SDKs.	Plug & play
Coverage	Depends on your data set.	50 k-prompt corpus, updated monthly.
Edge cases	You chase moving target alone.	We ship global fixes once for everyone.
Observability	Stitch together logs from each provider.	Unified Live Logs & Analytics.

6. Quick Start

Shell

curl https://router.requesty.ai/v1/chat/completions \
  -H "Authorization: Bearer $REQUESTY_API_KEY" \
  -d '{
        "model": "smart-task",
        "messages": [
           {"role":"user","content":"Generate a SQL query to find the 5 most active users last month"}
        ]
      }'

That’s literally it. 💫

7. Road-Map Sneak Peek

Reinforcement Learning loop – We’ll let your app vote 👍/👎 so the router learns your domain-specific preferences.
Fine-grained policy UI – Non-dev teammates can tweak cost limits and fallbacks without touching YAML.
Hybrid local-+-cloud routing – Seamlessly blend on-prem models with cloud giants.

8. Try It Today

🆓 $6 in credits for every new workspace → requesty.ai
📺 Watch the 2-min demo in the launch post
🐙 Star us on GitHub (open-sourcing the policy spec soon)
💬 Join our Discord to suggest new routing rules or models

Stop debating “which model should I use?” — let Requesty decide in real time and focus on building products your users love.