LiteLLM & OpenRouter (Universal LLM Gateway)
One API to Rule All LLM Providers
Learn how to use a single unified API to access OpenAI, Anthropic, Google, local models, and 100+ providers. Switch models without changing code, optimize costs, and build provider-agnostic applications.
Why You Need an LLM Gateway
Provider Lock-in is the Biggest Risk in AI Development
The Multi-Provider Problem
Every AI provider has a different API format: OpenAI uses messages with roles, Anthropic puts system prompt separately, Google has parts and contents, local models need different endpoints. If you hard-code one provider, switching is a nightmare.
Think of it like payment gateways in India. If you directly integrate Razorpay API, switching to PayU means rewriting everything. But if you use a universal payment layer (like Juspay), you can switch gateways without changing your app code. LLM gateways do the same for AI.
Why Multi-Provider Matters:
- Provider Outages - OpenAI had multiple outages in 2024-25. You need failover to Anthropic/Google.
- Cost Optimization - Different models are cheaper for different tasks. Route simple tasks to cheap models.
- Best Model for Task - Claude is better at code, GPT-4 at general chat, Gemini at long docs. Use each where it excels.
- Regulatory Compliance - Some data cannot go to US providers. Route to local models or EU providers as needed.
- New Model Testing - New models launch weekly. Easy switching lets you test and adopt quickly.
Two Main Solutions:
- LiteLLM - Open-source Python library and proxy server. Self-hosted. Full control. 100+ providers supported.
- OpenRouter - Hosted service. One API key, all providers. Simple pricing. No infrastructure to manage.
Note: If you are building a production AI application, using an LLM gateway from day one saves months of refactoring later. Provider lock-in is the technical debt of AI development.
LiteLLM - The Open-Source Universal API
100+ Providers, One OpenAI-Format API
What is LiteLLM?
LiteLLM is an open-source Python package that provides a unified OpenAI-format interface to 100+ LLM providers. You use the standard OpenAI completion() function, and LiteLLM translates it to whatever provider you specify. It also comes as a proxy server that any language can use via HTTP.
How LiteLLM Works:
- Model String Format - Specify provider/model as a string: "openai/gpt-4o", "anthropic/claude-sonnet-4-20250514", "ollama/llama3.1", "gemini/gemini-2.0-flash"
- Automatic Translation - Converts OpenAI format to each provider native format behind the scenes
- Same Response Format - All responses normalized to OpenAI format regardless of provider
- Streaming - Unified streaming interface across all providers
- Tool Calling - Function calling works the same way across providers that support it
LiteLLM Proxy Server:
LiteLLM can run as a standalone proxy server that exposes OpenAI-compatible endpoints. Any application (Python, Node.js, Rust, Go, mobile apps) can use it via standard HTTP requests.
- Virtual Keys - Create API keys for different teams/users with spending limits
- Rate Limiting - Per-key and per-model rate limits
- Spend Tracking - Track costs per key, team, model, and user
- Load Balancing - Distribute requests across multiple API keys or model deployments
- Fallbacks - Automatic failover: if OpenAI fails, try Anthropic, then Google
Supported Providers (Selection):
OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Cohere, Mistral, Groq, Together AI, Ollama, vLLM, HuggingFace, Replicate, Perplexity, DeepSeek, and 80+ more.
Note: LiteLLM proxy is like having your own private API gateway for AI. Deploy it once, and every service in your architecture talks to it instead of directly to providers.
OpenRouter - Hosted LLM Gateway
One API Key, Every Model, Zero Infrastructure
What is OpenRouter?
OpenRouter is a hosted LLM gateway service. You get one API key, and through it, you can access models from OpenAI, Anthropic, Google, Meta, Mistral, and many more. No need to create accounts with each provider or manage multiple API keys.
If LiteLLM is like hosting your own payment gateway server, OpenRouter is like using Stripe - someone else manages the infrastructure, you just use the API.
OpenRouter Key Features:
- Unified API - OpenAI-compatible format. Change the base URL and model name, everything else stays same.
- Model Marketplace - Browse 200+ models with pricing, context length, and quality comparisons side by side.
- Automatic Routing - Set preferences (cheapest, fastest, best quality) and OpenRouter picks the optimal model.
- Free Tier Models - Some models available for free (with rate limits). Great for experimentation.
- Usage Dashboard - Track spending, popular models, request counts across all providers in one place.
LiteLLM vs OpenRouter:
| Factor | LiteLLM | OpenRouter |
|---|---|---|
| Hosting | Self-hosted (you manage) | Hosted (managed service) |
| API Keys | Need each provider key | Single OpenRouter key |
| Cost | Free (OSS) + hosting | Small markup on model costs |
| Control | Full control, customizable | Limited to their features |
| Data Privacy | Your infrastructure | Through OpenRouter servers |
| Best For | Enterprise, compliance | Startups, quick setup |
Note: OpenRouter is perfect for startups and indie developers who want access to all models without managing multiple provider accounts. LiteLLM is better for enterprises who need full control.
Building Provider-Agnostic AI Applications
Architecture Patterns That Scale
Pattern 1: Smart Router with Fallback Chain
Configure a priority chain: try the best model first, fall back to alternatives if it fails or is slow:
- Primary - Claude Sonnet (best quality for your use case)
- Fallback 1 - GPT-4o (if Claude is down)
- Fallback 2 - Gemini Flash (if both are down, at least serve something)
- Fallback 3 - Local Llama via Ollama (offline fallback)
Pattern 2: Cost-Optimized Model Routing
Route requests to different models based on complexity:
- Simple questions (classification, yes/no) - GPT-4o-mini or Gemini Flash Lite (~Rs 0.01/request)
- Standard tasks (chat, summarization) - GPT-4o or Claude Sonnet (~Rs 0.5/request)
- Complex reasoning (analysis, coding) - Claude Opus or o1 (~Rs 5/request)
- Savings - 60-80% cost reduction vs using top model for everything
Pattern 3: A/B Testing Models
Run the same requests through different models and compare quality:
- Send 50% traffic to Claude Sonnet, 50% to GPT-4o
- Collect user feedback (thumbs up/down) on responses
- After 1000 requests, analyze which model performs better for your use case
- Gradually shift traffic to the winner
Pattern 4: Hybrid Local + Cloud
Use local models for sensitive data, cloud for everything else:
- Local (Ollama) - Process PII, medical data, financial records
- Cloud (OpenAI/Claude) - General queries, creative tasks, complex reasoning
- LiteLLM routes based on content classification
Note: The cost-optimized routing pattern alone can save 60-80% on AI costs. Classify request complexity first (using a cheap model), then route to the appropriate tier.
Production Considerations
What to Watch Out For
Pitfall 1: Assuming All Models Are Interchangeable
Different models have different strengths. A prompt optimized for GPT-4 may not work well with Claude or Gemini. You may need model-specific prompt templates even when using a unified API. Test your prompts on each model you plan to use.
Pitfall 2: Added Latency
Every proxy layer adds latency. LiteLLM proxy adds ~10-50ms. OpenRouter adds ~50-200ms (network hop to their servers). For latency-sensitive applications, measure the impact carefully.
Best Practices:
- Test Prompts Per Model - Do not assume one prompt fits all. Evaluate quality on each model in your fallback chain.
- Monitor Latency - Track P50, P95, P99 latency per provider. Set appropriate timeouts.
- Set Spending Limits - Use LiteLLM virtual keys with budgets. Prevent runaway costs from bugs or attacks.
- Log Everything - Log request/response metadata (model, tokens, latency, cost) for debugging and optimization.
- Handle Provider Differences - Some features (vision, tool use, JSON mode) are not available on all models. Check capabilities before routing.
Data Privacy Considerations:
- LiteLLM (self-hosted) - Data only goes to the provider you choose. You control the routing.
- OpenRouter - Your data passes through OpenRouter servers before reaching the provider. Check their data policy.
- Enterprise - For sensitive data, self-hosted LiteLLM with on-premise models is the safest option.
Note: A unified API does not mean models are interchangeable. Always test your specific prompts on each model in your routing configuration.
Interview Questions
Q: Why would you use an LLM gateway like LiteLLM instead of calling providers directly?
LLM gateways prevent provider lock-in, enable automatic failover during outages, simplify multi-provider access through a unified API, enable cost-optimized routing (cheap models for simple tasks), provide centralized logging and spend tracking, and make it easy to A/B test models. Without a gateway, switching providers requires rewriting integration code.
Q: What is the difference between LiteLLM and OpenRouter?
LiteLLM is self-hosted open-source. You need your own provider API keys, host the proxy yourself, but get full control and data privacy. OpenRouter is a hosted service with one API key for all providers but adds a markup and routes data through their servers. LiteLLM for enterprises needing control; OpenRouter for startups wanting simplicity.
Q: How would you implement cost-optimized model routing?
First, classify request complexity using a cheap model (GPT-4o-mini) or rule-based logic. Route simple tasks (classification, yes/no) to the cheapest model, standard tasks to mid-tier models, and complex reasoning to premium models. This can reduce costs by 60-80%. Use LiteLLM fallback chains for reliability and track per-model quality metrics to ensure routing does not degrade user experience.
Q: What are the risks of using an LLM proxy layer?
(1) Added latency (10-200ms per hop). (2) Single point of failure if the proxy goes down. (3) Prompts optimized for one model may not work well on fallback models. (4) Data privacy concerns with hosted solutions. (5) Feature parity issues - not all models support the same features. Mitigate with: redundant proxy instances, model-specific prompt testing, self-hosting for sensitive data.
Q: How would you handle a scenario where your primary LLM provider goes down?
Configure a fallback chain in LiteLLM: primary model (e.g., Claude) -> fallback 1 (GPT-4o) -> fallback 2 (Gemini Flash) -> fallback 3 (local Ollama). Set timeouts per provider (e.g., 15s). Monitor provider health with synthetic requests. Alert on failover events. Ensure prompts are tested on all fallback models. Keep local model as ultimate backup for critical paths.
Frequently Asked Questions
What is LiteLLM & OpenRouter?
Learn how to use a single unified API to access OpenAI, Anthropic, Google, local models, and 100+ providers. Switch models without changing code, optimize costs, and build provider-agnostic applications.
How does LiteLLM & OpenRouter work?
Provider Lock-in is the Biggest Risk in AI Development The Multi-Provider Problem Every AI provider has a different API format: OpenAI uses messages with roles, Anthropic puts system prompt separately, Google has parts and contents, local models need different endpoints. If you hard-code one provider, switching is a…
Related topics
Practice this on DevInterviewMaster
Read the full LiteLLM & OpenRouter (Universal LLM Gateway) breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.