Instructor (Pydantic-based Structured Output)
The Easiest Way to Get Type-Safe Data from LLMs
Learn Instructor - the most popular library for extracting validated, type-safe structured data from LLMs using Pydantic models. Write Python classes, get perfect data.
What is Instructor?
Define a Python Class, Get Perfect Structured Data from Any LLM
Simple Definition:
Instructor is a Python library that lets you define the exact structure of LLM output using Pydantic models. Instead of parsing free text or writing JSON schemas manually, you write a Python class with type hints, and Instructor handles the rest - calling the LLM, validating the output, retrying if validation fails.
Think of it like ordering food at a restaurant with a fixed menu. Instead of describing your meal in free text and hoping the kitchen understands, you pick from a structured menu (Pydantic model) and the kitchen (LLM) delivers exactly what the menu promised.
Real-World Analogy - Income Tax Return Filing:
When you file ITR, you fill a structured form - income fields, deduction fields, tax computation. The form validates everything (Section 80C cannot exceed 1.5L, PAN must be 10 characters). Instructor is like that ITR form for LLM outputs - structured fields with built-in validation.
Why Instructor Over Raw Function Calling:
| Feature | Raw Function Calling | Instructor |
|---|---|---|
| Schema Definition | Manual JSON Schema | Python class with type hints |
| Validation | You implement it | Pydantic auto-validates |
| Retry on Failure | You implement it | Built-in with max_retries |
| Custom Validators | Post-processing code | Pydantic validators in the model |
| Multi-Provider | Different API per provider | Same code, swap provider |
| IDE Support | Dict access (no autocomplete) | Full autocomplete + type checking |
Note: Instructor is the most popular structured output library with 10K+ GitHub stars. It is used by companies like Zapier, Notion, and hundreds of startups. If you work with LLMs in Python, you need to know Instructor.
How Instructor Works - The Core Flow
Define, Call, Validate, Retry - All Automated
The Instructor Pipeline:
- Define: Create a Pydantic model describing your desired output structure
- Patch: Wrap your LLM client with Instructor (one line of code)
- Call: Make an API call specifying the Pydantic model as response_model
- Validate: Instructor automatically validates the output against the model
- Retry: If validation fails, Instructor sends the error back to the LLM with a request to fix it
- Return: You get a fully typed Python object with IDE autocomplete
Conceptual Flow:
1. You define:
class UserInfo(BaseModel):
name: str
age: int = Field(ge=0, le=150)
email: str
2. You call:
user = client.chat.completions.create(
response_model=UserInfo,
messages=[{"role": "user", "content": resume_text}]
)
3. You get:
user.name # "Rahul Sharma" (string, autocomplete works)
user.age # 28 (integer, validated 0-150)
user.email # "rahul@gmail.com" (string)The Magic: Automatic Retry with Validation Feedback
This is where Instructor truly shines. If the LLM returns age: -5 (fails ge=0 validation), Instructor automatically sends the error message back to the LLM asking it to fix the output. The LLM sees the specific validation error and corrects itself. This retry loop runs up to max_retries times.
Like a teacher marking a student's answer wrong and saying "Age cannot be negative, try again." The student corrects the mistake.
Note: The automatic retry with validation feedback is what makes Instructor special. It turns a 90% accurate extraction into a 99%+ accurate one by giving the LLM specific feedback about what went wrong.
Advanced Instructor Patterns
Beyond Basic Extraction - Power Features
Pattern 1: Custom Validators
Add business logic validation directly in your Pydantic model. Instructor will enforce these rules and retry if they fail.
class InvoiceItem(BaseModel):
description: str
quantity: int = Field(ge=1)
unit_price: float = Field(ge=0)
total: float
@field_validator("total")
def validate_total(cls, v, values):
expected = values["quantity"] * values["unit_price"]
if abs(v - expected) > 0.01:
raise ValueError("Total must equal quantity * unit_price")
return vPattern 2: Streaming Structured Output
For large outputs, Instructor supports streaming partial objects. You get partial results as the LLM generates them, each partial result validated against the schema. Great for real-time UI updates.
Pattern 3: Nested Models and Lists
Extract complex hierarchical data using nested Pydantic models:
class Address(BaseModel):
street: str
city: str
state: str
pincode: str = Field(pattern=r"^\d{6}")
class Employee(BaseModel):
name: str
department: Literal["engineering", "sales", "hr", "finance"]
addresses: list[Address]
skills: list[str] = Field(min_length=1, max_length=20)Pattern 4: Multi-Provider Support
Instructor works with OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Ollama (local models), and more. Same Pydantic model, same code, different provider. Just change the client initialization.
Note: Custom validators are the killer feature for production use. You can encode complex business rules directly in the model, and Instructor enforces them automatically with retries.
Real-World Use Cases
Production Applications of Instructor
Use Case 1: Resume Parser for Indian Job Market
Extract structured data from resumes - name, phone, email, skills, experience, education, certifications. Add validators for Indian-specific formats (10-digit phone, 6-digit pincode, PAN format).
Use Case 2: Customer Support Ticket Classifier
Classify support tickets into categories (billing, technical, delivery, returns), extract priority, sentiment, and key entities. Use Literal types for enums and validators for business rules.
Use Case 3: Invoice Data Extraction
Extract line items, totals, GST numbers, dates from scanned invoices. Use nested models for line items and validators to verify math (total = sum of line items + tax).
Use Case 4: Content Moderation Pipeline
class ModerationResult(BaseModel):
is_safe: bool
categories: list[Literal["hate", "violence", "sexual", "self-harm", "spam"]]
severity: Literal["none", "low", "medium", "high"]
explanation: str
action: Literal["allow", "flag", "block"]
# One call, fully structured, validated, type-safeNote: Instructor is particularly powerful for data extraction tasks. It turns hours of regex/parser development into a few lines of Pydantic model definition.
Limitations and When NOT to Use Instructor
Know When Instructor Is Not the Right Tool
Limitation 1: Not for Free-Form Text Generation
If you want the LLM to write an essay, poem, email, or any free-form text, Instructor adds unnecessary overhead. It is designed for structured data extraction, not text generation.
Limitation 2: Retry Costs
Each retry is a full API call. If your validator is too strict and fails frequently, you pay for multiple calls per request. Monitor retry rates and loosen validators if retries exceed 20%.
Limitation 3: Python Only (for now)
Instructor is a Python library. If your stack is TypeScript/JavaScript, look at Zod-based alternatives or the TypeScript port (instructor-js). The ecosystem is Python-first.
Limitation 4: Complex Validators Can Confuse the LLM
If your validator has complex inter-field dependencies, the LLM might struggle to satisfy all constraints simultaneously. The error messages from complex validators can be confusing for the retry mechanism.
Fix: Keep validators simple and focused. For complex validation, do it in a separate post-processing step rather than in the Pydantic model.
Note: Instructor is best for structured data extraction with clear schemas. For free-form text or very complex validation, consider other approaches.
Interview Questions - Instructor
Q: What problem does Instructor solve?
Instructor solves the problem of getting reliable, type-safe, validated structured data from LLMs. Instead of parsing free text or writing manual JSON schemas, you define a Pydantic model and Instructor handles API calls, validation, and retries automatically. It bridges the gap between unstructured LLM output and typed application code.
Q: How does Instructor handle validation failures?
When Pydantic validation fails, Instructor automatically sends the validation error message back to the LLM along with the original request, asking it to fix the output. This retry loop runs up to max_retries times. The LLM sees the specific error (e.g., "age must be >= 0") and corrects itself. This significantly improves extraction accuracy.
Q: What is the advantage of Instructor over raw Function Calling?
Instructor provides: (1) Pythonic schema definition via Pydantic models instead of manual JSON Schema. (2) Automatic validation with custom validators. (3) Built-in retry with validation feedback. (4) IDE autocomplete and type checking. (5) Multi-provider support with the same code. Raw Function Calling requires you to implement all of this manually.
Q: When should you NOT use Instructor?
Do not use Instructor for: (1) Free-form text generation (essays, emails, poems). (2) When retries are too expensive for your use case. (3) Non-Python stacks (though TypeScript port exists). (4) When output does not need structured validation. Use it when you need typed, validated data extraction.
Q: How does Instructor support multiple LLM providers?
Instructor wraps (patches) different LLM client libraries. You write your Pydantic model once, and it works with OpenAI, Anthropic, Google Gemini, Mistral, Cohere, and even local models via Ollama. The same extraction code works across all providers - you only change the client initialization.
Frequently Asked Questions
What is Instructor?
Learn Instructor - the most popular library for extracting validated, type-safe structured data from LLMs using Pydantic models. Write Python classes, get perfect data.
How does Instructor work?
Define a Python Class, Get Perfect Structured Data from Any LLM Simple Definition: Instructor is a Python library that lets you define the exact structure of LLM output using Pydantic models . Instead of parsing free text or writing JSON schemas manually, you write a Python class with type hints, and Instructor…
Related topics
Practice this on DevInterviewMaster
Read the full Instructor (Pydantic-based Structured Output) breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.