Self-Hosting Requesty on Kubernetes: The Complete Helm Deployment Guide

As organizations scale their AI applications, many are looking for ways to maintain complete control over their infrastructure while leveraging the power of unified LLM routing. Self-hosting Requesty on Kubernetes provides the perfect balance of flexibility, security, and performance for teams with specific compliance requirements or existing Kubernetes infrastructure.

In this comprehensive guide, we'll walk through everything you need to know about deploying Requesty on Kubernetes using Helm, from initial setup to production-ready configurations. Whether you're managing air-gapped environments or simply want full control over your LLM gateway, this guide will help you get Requesty running smoothly in your Kubernetes cluster.

Why Self-Host Requesty on Kubernetes?

Before diving into the technical details, let's explore why self-hosting Requesty might be the right choice for your organization.

Complete Data Control: When you self-host Requesty, all your LLM traffic, caching data, and usage analytics remain within your infrastructure. This is crucial for organizations handling sensitive data or operating in regulated industries.

Compliance and Data Residency: Many enterprises have strict requirements about where data can be stored and processed. Self-hosting ensures you meet these requirements while still benefiting from Requesty's smart routing and optimization features.

Air-Gapped Environments: For organizations operating in secure, isolated networks, self-hosting is often the only option. Requesty's Helm chart supports air-gapped installations, allowing you to deploy without internet access.

Cost Optimization at Scale: While Requesty's cloud offering already provides up to 80% cost savings through intelligent caching and routing, self-hosting gives you additional control over resource allocation and scaling strategies.

Prerequisites and System Requirements

Before beginning your Requesty deployment, ensure you have the following prerequisites in place:

Kubernetes Cluster Requirements

Kubernetes Version: 1.19 or higher (1.24+ recommended for latest features)
Cluster Resources: Minimum 3 nodes with 4 CPU cores and 16GB RAM each
Storage: SSD-class persistent storage with at least 100GB available
Networking: Cluster networking configured with DNS and ingress controller

Required Tools

Helm: Version 3.8 or higher installed on your local machine
kubectl: Configured with access to your target cluster
Storage Class: Dynamic provisioning enabled for persistent volumes
Ingress Controller: NGINX or similar for external access

Recommended Resources for Production

For production deployments handling significant traffic through Requesty's 160+ supported models, we recommend:

API Gateway Pods: 3 replicas with 2 CPU / 4GB RAM each
Cache Layer: Redis with 8GB+ memory for optimal performance
Database: PostgreSQL with 4 CPU / 16GB RAM for analytics and metadata
Object Storage: S3-compatible storage for long-term cache persistence

Step-by-Step Deployment Guide

Now let's walk through the complete deployment process for Requesty on Kubernetes.

Step 1: Add the Requesty Helm Repository

First, add the official Requesty Helm repository to your local Helm installation:

bash helm repo add requesty https://helm.requesty.ai helm repo update

For air-gapped environments, download the chart package separately and transfer it to your isolated network.

Step 2: Create a Namespace

Create a dedicated namespace for your Requesty deployment:

bash kubectl create namespace requesty

This isolation helps with resource management and security policies.

Step 3: Configure Your values.yaml

Create a custom values.yaml file to configure your Requesty deployment. Here's a production-ready example:

YAML

 
# Requesty API Configuration
 
requesty: replicaCount: 3 resources: limits: cpu: "2" memory: "4Gi" requests: cpu: "1" memory: "2Gi"
 
# Core configuration
 
config:
 
# Enable all routing features
 
smartRouting: enabled: true fallbackPolicies: enabled: true caching: enabled: true ttl: 3600
 
# Security settings
 
guardrails: enabled: true promptInjection: true piiRedaction: true
 
# PostgreSQL Configuration
 
postgresql: deploy: true  # Set to false for external database auth: database: requesty username: requesty password: "changeme"  # Use secrets in production resources: limits: cpu: "4" memory: "16Gi" requests: cpu: "2" memory: "8Gi" persistence: enabled: true size: 100Gi storageClass: fast-ssd
 
# Redis Configuration
 
redis: deploy: true  # Set to false for external Redis auth: enabled: true password: "changeme"  # Use secrets in production master: resources: limits: cpu: "2" memory: "8Gi" requests: cpu: "1" memory: "4Gi" persistence: enabled: true size: 50Gi
 
# Ingress Configuration
 
ingress: enabled: true className: nginx annotations: cert-manager.io/cluster-issuer: letsencrypt-prod hosts:
 
- host: requesty.yourdomain.com
 
paths:
 
- path: /
 
pathType: Prefix tls:
 
- secretName: requesty-tls
 
hosts:
 
- requesty.yourdomain.com
 
# Object Storage (S3-compatible)
 
objectStorage: provider: s3 s3: bucket: requesty-cache region: us-east-1 endpoint: https://s3.amazonaws.com accessKeyId: valueFrom: secretKeyRef: name: s3-credentials key: access-key-id secretAccessKey: valueFrom: secretKeyRef: name: s3-credentials key: secret-access-key ```
 
### Step 4: Create Required Secrets
 
Before deploying, create the necessary Kubernetes secrets:
 
```bash
 
# S3 credentials
 
kubectl create secret generic s3-credentials \ --from-literal=access-key-id=YOUR_ACCESS_KEY \ --from-literal=secret-access-key=YOUR_SECRET_KEY \ -n requesty
 
# Database passwords (if using external)
 
kubectl create secret generic db-credentials \ --from-literal=postgres-password=YOUR_DB_PASSWORD \ --from-literal=redis-password=YOUR_REDIS_PASSWORD \ -n requesty ```
 
### Step 5: Deploy Requesty
 
Now deploy Requesty using your custom configuration:
 
```bash helm install requesty requesty/requesty \ --namespace requesty \ --values values.yaml \ --wait ```
 
The `--wait` flag ensures Helm waits for all pods to be ready before completing.
 
### Step 6: Verify the Deployment
 
Check that all pods are running successfully:
 
```bash kubectl get pods -n requesty ```
 
You should see output similar to: ``` NAME                                    READY   STATUS    RESTARTS   AGE requesty-api-7d9b8c6f5-abc123          1/1     Running   0          2m requesty-api-7d9b8c6f5-def456          1/1     Running   0          2m requesty-api-7d9b8c6f5-ghi789          1/1     Running   0          2m requesty-postgresql-0                   1/1     Running   0          2m requesty-redis-master-0                 1/1     Running   0          2m ```
 
## Advanced Configuration Options
 
Requesty's Helm chart supports numerous advanced configurations to optimize your deployment for specific use cases.
 
### External Database Configuration
 
For production environments, you may want to use managed database services:
 
```yaml postgresql: deploy: false external: host: your-rds-instance.region.rds.amazonaws.com port: 5432 database: requesty username: requesty passwordSecret: name: db-credentials key: postgres-password ```
 
### High Availability Setup
 
Enable high availability for critical components:
 
```yaml requesty: replicaCount: 5 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution:
 
- weight: 100
 
podAffinityTerm: labelSelector: matchExpressions:
 
- key: app.kubernetes.io/name
 
operator: In values:
 
- requesty
 
topologyKey: kubernetes.io/hostname ```
 
### Custom Model Configuration
 
Configure specific models and their routing preferences:
 
```yaml requesty: config: models:
 
# Configure model-specific settings
 
gpt-4o: maxTokens: 8192 temperature: 0.7 claude-4: maxTokens: 16384 temperature: 0.5
 
# Smart routing preferences
 
smartRouting: costWeight: 0.4 latencyWeight: 0.3 qualityWeight: 0.3 ```
 
### Resource Monitoring
 
Enable Prometheus metrics for monitoring:
 
```yaml metrics: enabled: true serviceMonitor: enabled: true namespace: monitoring labels: prometheus: kube-prometheus ```
 
## Security Best Practices
 
When self-hosting Requesty, security should be a top priority. Here are essential security configurations:
 
### Network Policies
 
Implement strict network policies to control traffic:
 
```yaml networkPolicy: enabled: true ingress:
 
- from:
- namespaceSelector:
 
matchLabels: name: requesty
 
- podSelector:
 
matchLabels: app: requesty ```
 
### Secret Management
 
Use external secret management solutions:
 
```yaml externalSecrets: enabled: true backend: vault vaultServer: https://vault.yourdomain.com auth: method: kubernetes role: requesty ```
 
### Enable Guardrails
 
Requesty's [security guardrails](https://docs.requesty.ai/features/security) protect against various threats:
 
```yaml requesty: config: guardrails: promptInjection: enabled: true action: block piiRedaction: enabled: true patterns:
 
- ssn
- credit_card
- email
 
contentFiltering: enabled: true categories:
 
- violence
- hate_speech

Troubleshooting Common Issues

Here are solutions to common deployment challenges:

Pod Restart Loops

If pods are restarting frequently, check resource limits:

Increase memory limits if OOMKilled
Check database connectivity
Verify secret configurations

Slow Response Times

Optimize caching and routing:

Ensure Redis has sufficient memory
Enable auto-caching for frequently used prompts
Configure fallback policies for model availability

Storage Issues

For persistent volume problems:

Verify StorageClass supports dynamic provisioning
Check available disk space on nodes
Ensure proper permissions for volume mounts

Monitoring and Maintenance

Once deployed, maintain optimal performance with these practices:

Health Checks

Configure comprehensive health checks:

yaml requesty: livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5

Backup Strategy

Implement regular backups:

Database snapshots every 6 hours
Redis persistence for cache data
Configuration backups in version control

Scaling Policies

Configure horizontal pod autoscaling:

yaml autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilizationPercentage: 70 targetMemoryUtilizationPercentage: 80

Integration with Your Applications

After deployment, integrate your applications with self-hosted Requesty:

API Configuration

Update your application to use the self-hosted endpoint:

Python

 
client = OpenAI( api_key="your-requesty-api-key", base_url="https://requesty.yourdomain.com/v1" )
 
# Use any of the 160+ models
 
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) ```
 
### Enable Advanced Features
 
Take advantage of Requesty's advanced features:
 
- [Smart routing](https://www.requesty.ai/solution/smart-routing) for automatic model selection
- [Structured outputs](https://docs.requesty.ai/features/structured-outputs) for consistent JSON responses
- [Streaming](https://docs.requesty.ai/features/streaming) for real-time responses
 
## Conclusion
 
Self-hosting Requesty on Kubernetes gives you complete control over your LLM infrastructure while maintaining all the benefits of unified routing, intelligent caching, and cost optimization. With proper configuration and monitoring, your self-hosted Requesty deployment can handle millions of requests while reducing costs by up to 80%.
 
Whether you're managing sensitive data, operating in air-gapped environments, or simply prefer full control over your infrastructure, Requesty's Helm chart makes deployment straightforward and maintainable.
 
Ready to get started? [Sign up for Requesty](https://app.requesty.ai/sign-up) to get your API keys and access to our complete documentation. For enterprise deployments and dedicated support, check out our [enterprise features](https://www.requesty.ai/enterprise).
 
Have questions about self-hosting? Join our [Discord community](https://discord.gg/Td3rwAHgt4) where our team and 15k+ developers are ready to help you optimize your LLM infrastructure.