AI & AutomationFree to read

Instructor (Pydantic-based Structured Output)

The Easiest Way to Get Type-Safe Data from LLMs

Learn Instructor - the most popular library for extracting validated, type-safe structured data from LLMs using Pydantic models. Write Python classes, get perfect data.

What is Instructor?

Define a Python Class, Get Perfect Structured Data from Any LLM

Simple Definition:

Instructor is a Python library that lets you define the exact structure of LLM output using Pydantic models. Instead of parsing free text or writing JSON schemas manually, you write a Python class with type hints, and Instructor handles the rest - calling the LLM, validating the output, retrying if validation fails.

Think of it like ordering food at a restaurant with a fixed menu. Instead of describing your meal in free text and hoping the kitchen understands, you pick from a structured menu (Pydantic model) and the kitchen (LLM) delivers exactly what the menu promised.

Real-World Analogy - Income Tax Return Filing:

When you file ITR, you fill a structured form - income fields, deduction fields, tax computation. The form validates everything (Section 80C cannot exceed 1.5L, PAN must be 10 characters). Instructor is like that ITR form for LLM outputs - structured fields with built-in validation.

Why Instructor Over Raw Function Calling:

Feature	Raw Function Calling	Instructor
Schema Definition	Manual JSON Schema	Python class with type hints
Validation	You implement it	Pydantic auto-validates
Retry on Failure	You implement it	Built-in with max_retries
Custom Validators	Post-processing code	Pydantic validators in the model
Multi-Provider	Different API per provider	Same code, swap provider
IDE Support	Dict access (no autocomplete)	Full autocomplete + type checking

Note: Instructor is the most popular structured output library with 10K+ GitHub stars. It is used by companies like Zapier, Notion, and hundreds of startups. If you work with LLMs in Python, you need to know Instructor.

How Instructor Works - The Core Flow

Define, Call, Validate, Retry - All Automated

The Instructor Pipeline:

Define: Create a Pydantic model describing your desired output structure
Patch: Wrap your LLM client with Instructor (one line of code)
Call: Make an API call specifying the Pydantic model as response_model
Validate: Instructor automatically validates the output against the model
Retry: If validation fails, Instructor sends the error back to the LLM with a request to fix it
Return: You get a fully typed Python object with IDE autocomplete

Conceptual Flow:

1. You define:
   class UserInfo(BaseModel):
       name: str
       age: int = Field(ge=0, le=150)
       email: str

2. You call:
   user = client.chat.completions.create(
       response_model=UserInfo,
       messages=[{"role": "user", "content": resume_text}]
   )

3. You get:
   user.name  # "Rahul Sharma" (string, autocomplete works)
   user.age   # 28 (integer, validated 0-150)
   user.email # "rahul@gmail.com" (string)

The Magic: Automatic Retry with Validation Feedback

This is where Instructor truly shines. If the LLM returns age: -5 (fails ge=0 validation), Instructor automatically sends the error message back to the LLM asking it to fix the output. The LLM sees the specific validation error and corrects itself. This retry loop runs up to max_retries times.

Like a teacher marking a student's answer wrong and saying "Age cannot be negative, try again." The student corrects the mistake.

Note: The automatic retry with validation feedback is what makes Instructor special. It turns a 90% accurate extraction into a 99%+ accurate one by giving the LLM specific feedback about what went wrong.

Advanced Instructor Patterns

Beyond Basic Extraction - Power Features

Pattern 1: Custom Validators

Add business logic validation directly in your Pydantic model. Instructor will enforce these rules and retry if they fail.

class InvoiceItem(BaseModel):
    description: str
    quantity: int = Field(ge=1)
    unit_price: float = Field(ge=0)
    total: float

    @field_validator("total")
    def validate_total(cls, v, values):
        expected = values["quantity"] * values["unit_price"]
        if abs(v - expected) > 0.01:
            raise ValueError("Total must equal quantity * unit_price")
        return v

Pattern 2: Streaming Structured Output

For large outputs, Instructor supports streaming partial objects. You get partial results as the LLM generates them, each partial result validated against the schema. Great for real-time UI updates.

Pattern 3: Nested Models and Lists

Extract complex hierarchical data using nested Pydantic models:

class Address(BaseModel):
    street: str
    city: str
    state: str
    pincode: str = Field(pattern=r"^\d{6}")

class Employee(BaseModel):
    name: str
    department: Literal["engineering", "sales", "hr", "finance"]
    addresses: list[Address]
    skills: list[str] = Field(min_length=1, max_length=20)

Pattern 4: Multi-Provider Support

Instructor works with OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Ollama (local models), and more. Same Pydantic model, same code, different provider. Just change the client initialization.

Note: Custom validators are the killer feature for production use. You can encode complex business rules directly in the model, and Instructor enforces them automatically with retries.

Real-World Use Cases

Production Applications of Instructor

Use Case 1: Resume Parser for Indian Job Market

Extract structured data from resumes - name, phone, email, skills, experience, education, certifications. Add validators for Indian-specific formats (10-digit phone, 6-digit pincode, PAN format).

Use Case 2: Customer Support Ticket Classifier

Classify support tickets into categories (billing, technical, delivery, returns), extract priority, sentiment, and key entities. Use Literal types for enums and validators for business rules.

Use Case 3: Invoice Data Extraction

Extract line items, totals, GST numbers, dates from scanned invoices. Use nested models for line items and validators to verify math (total = sum of line items + tax).

Use Case 4: Content Moderation Pipeline

class ModerationResult(BaseModel):
    is_safe: bool
    categories: list[Literal["hate", "violence", "sexual", "self-harm", "spam"]]
    severity: Literal["none", "low", "medium", "high"]
    explanation: str
    action: Literal["allow", "flag", "block"]

# One call, fully structured, validated, type-safe

Note: Instructor is particularly powerful for data extraction tasks. It turns hours of regex/parser development into a few lines of Pydantic model definition.

Limitations and When NOT to Use Instructor

Know When Instructor Is Not the Right Tool

Limitation 1: Not for Free-Form Text Generation

If you want the LLM to write an essay, poem, email, or any free-form text, Instructor adds unnecessary overhead. It is designed for structured data extraction, not text generation.

Limitation 2: Retry Costs

Each retry is a full API call. If your validator is too strict and fails frequently, you pay for multiple calls per request. Monitor retry rates and loosen validators if retries exceed 20%.

Limitation 3: Python Only (for now)

Instructor is a Python library. If your stack is TypeScript/JavaScript, look at Zod-based alternatives or the TypeScript port (instructor-js). The ecosystem is Python-first.

Limitation 4: Complex Validators Can Confuse the LLM

If your validator has complex inter-field dependencies, the LLM might struggle to satisfy all constraints simultaneously. The error messages from complex validators can be confusing for the retry mechanism.

Fix: Keep validators simple and focused. For complex validation, do it in a separate post-processing step rather than in the Pydantic model.

Note: Instructor is best for structured data extraction with clear schemas. For free-form text or very complex validation, consider other approaches.

Interview Questions - Instructor

Q: What problem does Instructor solve?

Instructor solves the problem of getting reliable, type-safe, validated structured data from LLMs. Instead of parsing free text or writing manual JSON schemas, you define a Pydantic model and Instructor handles API calls, validation, and retries automatically. It bridges the gap between unstructured LLM output and typed application code.

Q: How does Instructor handle validation failures?

When Pydantic validation fails, Instructor automatically sends the validation error message back to the LLM along with the original request, asking it to fix the output. This retry loop runs up to max_retries times. The LLM sees the specific error (e.g., "age must be >= 0") and corrects itself. This significantly improves extraction accuracy.

Q: What is the advantage of Instructor over raw Function Calling?

Instructor provides: (1) Pythonic schema definition via Pydantic models instead of manual JSON Schema. (2) Automatic validation with custom validators. (3) Built-in retry with validation feedback. (4) IDE autocomplete and type checking. (5) Multi-provider support with the same code. Raw Function Calling requires you to implement all of this manually.

Q: When should you NOT use Instructor?

Do not use Instructor for: (1) Free-form text generation (essays, emails, poems). (2) When retries are too expensive for your use case. (3) Non-Python stacks (though TypeScript port exists). (4) When output does not need structured validation. Use it when you need typed, validated data extraction.

Q: How does Instructor support multiple LLM providers?

Instructor wraps (patches) different LLM client libraries. You write your Pydantic model once, and it works with OpenAI, Anthropic, Google Gemini, Mistral, Cohere, and even local models via Ollama. The same extraction code works across all providers - you only change the client initialization.

Frequently Asked Questions

What is Instructor?

Learn Instructor - the most popular library for extracting validated, type-safe structured data from LLMs using Pydantic models. Write Python classes, get perfect data.

How does Instructor work?

Define a Python Class, Get Perfect Structured Data from Any LLM Simple Definition: Instructor is a Python library that lets you define the exact structure of LLM output using Pydantic models . Instead of parsing free text or writing JSON schemas manually, you write a Python class with type hints, and Instructor…

Browse all AI & Automation topics →

Practice this on DevInterviewMaster

Read the full Instructor (Pydantic-based Structured Output) breakdown with interactive demos, quizzes, and Hinglish notes.

Open the interactive topic →

800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.

Instructor (Pydantic-based Structured Output)

What is Instructor?

How Instructor Works - The Core Flow

Advanced Instructor Patterns

Real-World Use Cases

Limitations and When NOT to Use Instructor

Interview Questions - Instructor

Frequently Asked Questions

Related topics

Practice this on DevInterviewMaster