OpenAI API
The Most Popular AI API in the World
Master the OpenAI API ecosystem - from Chat Completions to Assistants, Vision, Function Calling, and beyond. The API that started the AI revolution and powers millions of applications.
OpenAI API Landscape
The Gateway to GPT-4, DALL-E, Whisper, and More
What is the OpenAI API?
The OpenAI API gives developers programmatic access to OpenAI models - GPT-4o, GPT-4 Turbo, o1, o3, DALL-E 3, Whisper, TTS, and Embeddings. It is a REST API with client libraries for Python, Node.js, and more.
Think of OpenAI API as the Paytm of AI - it was the first to make powerful AI accessible to everyone through a simple interface, and now it is the default that everyone builds on.
Core API Endpoints:
- Chat Completions - The main endpoint. Send messages, get AI responses. Powers chatbots, code assistants, content generation.
- Assistants API - Managed agents with built-in memory, file handling, code execution, and tool use.
- Embeddings - Convert text to vectors for search, RAG, similarity matching.
- Images (DALL-E) - Generate and edit images from text descriptions.
- Audio (Whisper/TTS) - Speech-to-text and text-to-speech.
- Moderations - Content safety classification. Free to use!
Model Tiers (2025-26):
| Model | Best For | Input Cost/1M tokens |
|---|---|---|
| GPT-4o | Best all-rounder, vision, fast | ~$2.50 |
| GPT-4o-mini | Cost-effective, simple tasks | ~$0.15 |
| o1 / o3 | Complex reasoning, math, code | ~$15-60 |
| text-embedding-3-small | Embeddings, RAG | ~$0.02 |
Note: OpenAI API format has become the de facto standard. Most other providers (Anthropic, Google, local models) offer OpenAI-compatible endpoints, making it easy to switch providers.
Chat Completions API - The Foundation
Messages In, Completion Out
How Chat Completions Work
You send an array of messages with roles (system, user, assistant) and the model returns a completion. The system message defines behavior, user messages are inputs, and assistant messages are previous AI responses for conversation context.
Key Parameters That Matter:
- temperature (0-2) - Controls randomness. 0 = deterministic, 0.7 = creative, 1.5+ = very random. Use 0 for factual tasks, 0.7-1.0 for creative writing.
- max_tokens - Maximum tokens in the response. Always set this to prevent runaway costs. A typical response is 200-500 tokens.
- top_p - Nucleus sampling. Alternative to temperature. 0.1 = only top 10% probability tokens considered.
- response_format - Force JSON output. Set to { type: "json_object" } for structured responses. Prevents hallucinated format.
- stream - Get tokens as they generate. Essential for chat UIs - user sees response building in real-time.
Function Calling (Tool Use):
The most powerful feature. You define functions with JSON Schema, and the model decides when to call them and with what arguments. The model outputs a structured tool call, your code executes it, and you feed the result back.
- Use Cases - Database queries, API calls, calculations, web search, file operations
- Parallel Tool Calls - GPT-4o can request multiple tool calls in a single response
- Forced Tool Use - Set tool_choice to force the model to use a specific function
Structured Outputs:
New feature that guarantees the response matches a JSON Schema exactly. No more parsing errors or validation failures. The model is constrained during generation to only output valid JSON matching your schema.
Note: Function calling turns GPT from a text generator into an action-taker. It is the foundation for building AI agents, chatbots with backend integrations, and automated workflows.
Assistants API - Managed AI Agents
Threads, Memory, Files, and Code Execution - All Managed
What is the Assistants API?
The Assistants API is OpenAI managed infrastructure for building AI agents. Instead of you managing conversation history, file storage, code execution, and tool orchestration, OpenAI handles it all server-side.
Think of Chat Completions as raw ingredients - you cook everything yourself. Assistants API is like ordering a thali from Swiggy - everything is prepared, plated, and delivered ready to eat.
Key Concepts:
- Assistant - A configured AI entity with instructions, model, tools, and file access. Think of it as a specialized employee.
- Thread - A conversation session. Automatically manages message history. No need to resend all messages each time.
- Run - When you ask the assistant to respond. It processes messages, calls tools, generates responses.
- Vector Store - Upload files (PDFs, docs, code) and the assistant can search them using RAG automatically.
Built-in Tools:
- Code Interpreter - Runs Python code in a sandbox. Can analyze data, generate charts, process files. Like having a data scientist on call.
- File Search - RAG built-in. Upload documents, assistant searches them to answer questions. Vector embeddings handled automatically.
- Function Calling - Same as Chat Completions but integrated into the Run lifecycle.
Assistants vs Chat Completions:
| Feature | Chat Completions | Assistants API |
|---|---|---|
| Memory | You manage | Automatic (Threads) |
| File handling | You build RAG | Built-in Vector Store |
| Code execution | Not available | Built-in sandbox |
| Control | Full control | Less control |
| Cost | Pay per token | Higher (storage + compute) |
Note: Use Assistants API for rapid prototyping and when you need file processing or code execution. Use Chat Completions when you need full control or want to minimize costs.
Vision, Audio, and Multimodal Capabilities
GPT Can See, Hear, and Speak
Vision (GPT-4o)
GPT-4o natively understands images. You can send images (as URLs or base64) alongside text in the Chat Completions API. The model can describe images, read text from screenshots, analyze charts, understand diagrams, and even solve visual math problems.
- Use Cases - Receipt scanning, UI screenshot analysis, document OCR, product image analysis, medical image triage
- Pricing - Based on image resolution. Low-res (~85 tokens), high-res (~765 tokens per 512x512 tile)
- Tips - Use detail: "low" for simple images to save cost. Use detail: "high" only when fine details matter.
Audio APIs:
- Whisper (Speech-to-Text) - Industry-leading accuracy. Supports 97+ languages. Can transcribe and translate. Rs 0.5/minute approximately.
- TTS (Text-to-Speech) - Natural-sounding voices (alloy, echo, fable, onyx, nova, shimmer). Great for audio content, accessibility, voice assistants.
- Realtime API - WebSocket-based real-time speech conversation. Like talking to AI on a phone call. Low latency, interruption support.
Image Generation (DALL-E 3):
- Generate images from text descriptions
- 1024x1024, 1024x1792, 1792x1024 sizes
- High quality but no fine-grained editing control
- Good for: marketing content, concept art, illustrations
Note: GPT-4o is truly multimodal - it can process text, images, and audio in the same request. This opens up use cases that were impossible with text-only models.
Best Practices and Production Tips
Ship Reliable AI Applications
Error Handling & Resilience:
- Retry with Exponential Backoff - Rate limits (429) and server errors (500, 503) are common. Always retry with backoff: 1s, 2s, 4s, 8s.
- Timeout Settings - Set request timeout to 30-60 seconds. Streaming helps - you get tokens incrementally so timeout is less of an issue.
- Fallback Models - If GPT-4o is down/slow, fall back to GPT-4o-mini. If OpenAI is down entirely, fall back to Anthropic Claude.
- Idempotency - Same input should produce consistent results. Use temperature=0 and seed parameter for reproducibility.
Cost Optimization:
- Use the Right Model - GPT-4o-mini for simple tasks (10x cheaper). GPT-4o only when quality demands it. o1 only for complex reasoning.
- Reduce Token Usage - Shorter system prompts, summarize conversation history, use structured outputs to avoid retries.
- Caching - Cache responses for identical/similar queries. Semantic caching can match similar questions.
- Batch API - 50% discount for non-time-sensitive requests. Process thousands of queries at half price with 24-hour turnaround.
Security Considerations:
- API Key Security - Never expose keys in frontend code. Use server-side proxy. Rotate keys regularly.
- Prompt Injection - Users can manipulate the model through crafted inputs. Validate and sanitize user input. Use system prompt guardrails.
- Data Privacy - By default, OpenAI does not train on API data. But verify your data processing agreement for compliance.
- Content Filtering - Use the Moderations API (free!) to screen user input and model output for harmful content.
Note: Never expose your OpenAI API key in frontend JavaScript. Always proxy requests through your backend server. A leaked key can result in thousands of dollars in unauthorized usage.
Interview Questions
Q: What is function calling in the OpenAI API and how does it work?
Function calling allows the model to output structured JSON requesting execution of developer-defined functions. You define functions with names, descriptions, and JSON Schema parameters. The model decides when to call them based on the conversation. Your code executes the function and returns results to the model. This enables agents, database queries, API integrations, and any external action.
Q: When would you use the Assistants API vs Chat Completions?
Use Assistants API when you need: automatic conversation memory (threads), file processing with built-in RAG (vector stores), code execution (code interpreter), or rapid prototyping. Use Chat Completions when you need: full control over the conversation flow, cost optimization (Assistants has storage costs), custom RAG implementation, or multi-provider support.
Q: How would you handle rate limits and errors in a production OpenAI integration?
(1) Exponential backoff retry for 429/500/503 errors. (2) Request timeouts of 30-60 seconds. (3) Fallback to cheaper/different models when primary is unavailable. (4) Queue requests during rate limit windows. (5) Monitor usage dashboards and set spending limits. (6) Use streaming to get partial results even if connection drops.
Q: How do you optimize OpenAI API costs for a high-volume application?
(1) Use GPT-4o-mini for simple tasks (10x cheaper than GPT-4o). (2) Cache responses for repeated queries. (3) Shorten system prompts and summarize conversation history. (4) Use Batch API for 50% discount on non-urgent requests. (5) Set max_tokens to limit response length. (6) Use structured outputs to avoid retries from parsing failures.
Q: What are the security risks of using the OpenAI API and how do you mitigate them?
Risks: (1) API key exposure in frontend code - use server-side proxy. (2) Prompt injection - validate/sanitize user input, use system prompt guardrails. (3) Data leakage - understand data retention policies, use API for business data (not trained on by default). (4) Harmful content - use free Moderations API to screen input/output. (5) Cost attacks - set spending limits, rate limit per user.
Frequently Asked Questions
What is OpenAI API?
Master the OpenAI API ecosystem - from Chat Completions to Assistants, Vision, Function Calling, and beyond. The API that started the AI revolution and powers millions of applications.
How does OpenAI API work?
The Gateway to GPT-4, DALL-E, Whisper, and More What is the OpenAI API? The OpenAI API gives developers programmatic access to OpenAI models - GPT-4o, GPT-4 Turbo, o1, o3, DALL-E 3, Whisper, TTS, and Embeddings.
Related topics
Practice this on DevInterviewMaster
Read the full OpenAI API breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.