AI & AutomationFree to read

LiteLLM & OpenRouter (Universal LLM Gateway)

One API to Rule All LLM Providers

Learn how to use a single unified API to access OpenAI, Anthropic, Google, local models, and 100+ providers. Switch models without changing code, optimize costs, and build provider-agnostic applications.

Why You Need an LLM Gateway

Provider Lock-in is the Biggest Risk in AI Development

The Multi-Provider Problem

Every AI provider has a different API format: OpenAI uses messages with roles, Anthropic puts system prompt separately, Google has parts and contents, local models need different endpoints. If you hard-code one provider, switching is a nightmare.

Think of it like payment gateways in India. If you directly integrate Razorpay API, switching to PayU means rewriting everything. But if you use a universal payment layer (like Juspay), you can switch gateways without changing your app code. LLM gateways do the same for AI.

Why Multi-Provider Matters:

Provider Outages - OpenAI had multiple outages in 2024-25. You need failover to Anthropic/Google.
Cost Optimization - Different models are cheaper for different tasks. Route simple tasks to cheap models.
Best Model for Task - Claude is better at code, GPT-4 at general chat, Gemini at long docs. Use each where it excels.
Regulatory Compliance - Some data cannot go to US providers. Route to local models or EU providers as needed.
New Model Testing - New models launch weekly. Easy switching lets you test and adopt quickly.

Two Main Solutions:

LiteLLM - Open-source Python library and proxy server. Self-hosted. Full control. 100+ providers supported.
OpenRouter - Hosted service. One API key, all providers. Simple pricing. No infrastructure to manage.

Note: If you are building a production AI application, using an LLM gateway from day one saves months of refactoring later. Provider lock-in is the technical debt of AI development.

LiteLLM - The Open-Source Universal API

100+ Providers, One OpenAI-Format API

What is LiteLLM?

LiteLLM is an open-source Python package that provides a unified OpenAI-format interface to 100+ LLM providers. You use the standard OpenAI completion() function, and LiteLLM translates it to whatever provider you specify. It also comes as a proxy server that any language can use via HTTP.

How LiteLLM Works:

Model String Format - Specify provider/model as a string: "openai/gpt-4o", "anthropic/claude-sonnet-4-20250514", "ollama/llama3.1", "gemini/gemini-2.0-flash"
Automatic Translation - Converts OpenAI format to each provider native format behind the scenes
Same Response Format - All responses normalized to OpenAI format regardless of provider
Streaming - Unified streaming interface across all providers
Tool Calling - Function calling works the same way across providers that support it

LiteLLM Proxy Server:

LiteLLM can run as a standalone proxy server that exposes OpenAI-compatible endpoints. Any application (Python, Node.js, Rust, Go, mobile apps) can use it via standard HTTP requests.

Virtual Keys - Create API keys for different teams/users with spending limits
Rate Limiting - Per-key and per-model rate limits
Spend Tracking - Track costs per key, team, model, and user
Load Balancing - Distribute requests across multiple API keys or model deployments
Fallbacks - Automatic failover: if OpenAI fails, try Anthropic, then Google

Supported Providers (Selection):

OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Cohere, Mistral, Groq, Together AI, Ollama, vLLM, HuggingFace, Replicate, Perplexity, DeepSeek, and 80+ more.

Note: LiteLLM proxy is like having your own private API gateway for AI. Deploy it once, and every service in your architecture talks to it instead of directly to providers.

OpenRouter - Hosted LLM Gateway

One API Key, Every Model, Zero Infrastructure

What is OpenRouter?

OpenRouter is a hosted LLM gateway service. You get one API key, and through it, you can access models from OpenAI, Anthropic, Google, Meta, Mistral, and many more. No need to create accounts with each provider or manage multiple API keys.

If LiteLLM is like hosting your own payment gateway server, OpenRouter is like using Stripe - someone else manages the infrastructure, you just use the API.

OpenRouter Key Features:

Unified API - OpenAI-compatible format. Change the base URL and model name, everything else stays same.
Model Marketplace - Browse 200+ models with pricing, context length, and quality comparisons side by side.
Automatic Routing - Set preferences (cheapest, fastest, best quality) and OpenRouter picks the optimal model.
Free Tier Models - Some models available for free (with rate limits). Great for experimentation.
Usage Dashboard - Track spending, popular models, request counts across all providers in one place.

LiteLLM vs OpenRouter:

Factor	LiteLLM	OpenRouter
Hosting	Self-hosted (you manage)	Hosted (managed service)
API Keys	Need each provider key	Single OpenRouter key
Cost	Free (OSS) + hosting	Small markup on model costs
Control	Full control, customizable	Limited to their features
Data Privacy	Your infrastructure	Through OpenRouter servers
Best For	Enterprise, compliance	Startups, quick setup

Note: OpenRouter is perfect for startups and indie developers who want access to all models without managing multiple provider accounts. LiteLLM is better for enterprises who need full control.

Building Provider-Agnostic AI Applications

Architecture Patterns That Scale

Pattern 1: Smart Router with Fallback Chain

Configure a priority chain: try the best model first, fall back to alternatives if it fails or is slow:

Primary - Claude Sonnet (best quality for your use case)
Fallback 1 - GPT-4o (if Claude is down)
Fallback 2 - Gemini Flash (if both are down, at least serve something)
Fallback 3 - Local Llama via Ollama (offline fallback)

Pattern 2: Cost-Optimized Model Routing

Route requests to different models based on complexity:

Simple questions (classification, yes/no) - GPT-4o-mini or Gemini Flash Lite (~Rs 0.01/request)
Standard tasks (chat, summarization) - GPT-4o or Claude Sonnet (~Rs 0.5/request)
Complex reasoning (analysis, coding) - Claude Opus or o1 (~Rs 5/request)
Savings - 60-80% cost reduction vs using top model for everything

Pattern 3: A/B Testing Models

Run the same requests through different models and compare quality:

Send 50% traffic to Claude Sonnet, 50% to GPT-4o
Collect user feedback (thumbs up/down) on responses
After 1000 requests, analyze which model performs better for your use case
Gradually shift traffic to the winner

Pattern 4: Hybrid Local + Cloud

Use local models for sensitive data, cloud for everything else:

Local (Ollama) - Process PII, medical data, financial records
Cloud (OpenAI/Claude) - General queries, creative tasks, complex reasoning
LiteLLM routes based on content classification

Note: The cost-optimized routing pattern alone can save 60-80% on AI costs. Classify request complexity first (using a cheap model), then route to the appropriate tier.

Production Considerations

What to Watch Out For

Pitfall 1: Assuming All Models Are Interchangeable

Different models have different strengths. A prompt optimized for GPT-4 may not work well with Claude or Gemini. You may need model-specific prompt templates even when using a unified API. Test your prompts on each model you plan to use.

Pitfall 2: Added Latency

Every proxy layer adds latency. LiteLLM proxy adds ~10-50ms. OpenRouter adds ~50-200ms (network hop to their servers). For latency-sensitive applications, measure the impact carefully.

Best Practices:

Test Prompts Per Model - Do not assume one prompt fits all. Evaluate quality on each model in your fallback chain.
Monitor Latency - Track P50, P95, P99 latency per provider. Set appropriate timeouts.
Set Spending Limits - Use LiteLLM virtual keys with budgets. Prevent runaway costs from bugs or attacks.
Log Everything - Log request/response metadata (model, tokens, latency, cost) for debugging and optimization.
Handle Provider Differences - Some features (vision, tool use, JSON mode) are not available on all models. Check capabilities before routing.

Data Privacy Considerations:

LiteLLM (self-hosted) - Data only goes to the provider you choose. You control the routing.
OpenRouter - Your data passes through OpenRouter servers before reaching the provider. Check their data policy.
Enterprise - For sensitive data, self-hosted LiteLLM with on-premise models is the safest option.

Note: A unified API does not mean models are interchangeable. Always test your specific prompts on each model in your routing configuration.

Interview Questions

Q: Why would you use an LLM gateway like LiteLLM instead of calling providers directly?

LLM gateways prevent provider lock-in, enable automatic failover during outages, simplify multi-provider access through a unified API, enable cost-optimized routing (cheap models for simple tasks), provide centralized logging and spend tracking, and make it easy to A/B test models. Without a gateway, switching providers requires rewriting integration code.

Q: What is the difference between LiteLLM and OpenRouter?

LiteLLM is self-hosted open-source. You need your own provider API keys, host the proxy yourself, but get full control and data privacy. OpenRouter is a hosted service with one API key for all providers but adds a markup and routes data through their servers. LiteLLM for enterprises needing control; OpenRouter for startups wanting simplicity.

Q: How would you implement cost-optimized model routing?

First, classify request complexity using a cheap model (GPT-4o-mini) or rule-based logic. Route simple tasks (classification, yes/no) to the cheapest model, standard tasks to mid-tier models, and complex reasoning to premium models. This can reduce costs by 60-80%. Use LiteLLM fallback chains for reliability and track per-model quality metrics to ensure routing does not degrade user experience.

Q: What are the risks of using an LLM proxy layer?

(1) Added latency (10-200ms per hop). (2) Single point of failure if the proxy goes down. (3) Prompts optimized for one model may not work well on fallback models. (4) Data privacy concerns with hosted solutions. (5) Feature parity issues - not all models support the same features. Mitigate with: redundant proxy instances, model-specific prompt testing, self-hosting for sensitive data.

Q: How would you handle a scenario where your primary LLM provider goes down?

Configure a fallback chain in LiteLLM: primary model (e.g., Claude) -> fallback 1 (GPT-4o) -> fallback 2 (Gemini Flash) -> fallback 3 (local Ollama). Set timeouts per provider (e.g., 15s). Monitor provider health with synthetic requests. Alert on failover events. Ensure prompts are tested on all fallback models. Keep local model as ultimate backup for critical paths.

Frequently Asked Questions

What is LiteLLM & OpenRouter?

Learn how to use a single unified API to access OpenAI, Anthropic, Google, local models, and 100+ providers. Switch models without changing code, optimize costs, and build provider-agnostic applications.

How does LiteLLM & OpenRouter work?

Provider Lock-in is the Biggest Risk in AI Development The Multi-Provider Problem Every AI provider has a different API format: OpenAI uses messages with roles, Anthropic puts system prompt separately, Google has parts and contents, local models need different endpoints. If you hard-code one provider, switching is a…

Browse all AI & Automation topics →

Practice this on DevInterviewMaster

Read the full LiteLLM & OpenRouter (Universal LLM Gateway) breakdown with interactive demos, quizzes, and Hinglish notes.

Open the interactive topic →

800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.

LiteLLM & OpenRouter (Universal LLM Gateway)

Why You Need an LLM Gateway

LiteLLM - The Open-Source Universal API

OpenRouter - Hosted LLM Gateway

Building Provider-Agnostic AI Applications

Production Considerations

Interview Questions

Frequently Asked Questions

Related topics

Practice this on DevInterviewMaster