Chunking Strategies (Recursive, Semantic, Agentic)
How You Split Your Documents Decides How Smart Your RAG Gets
Master the art and science of document chunking - from basic character splitting to advanced semantic and agentic approaches. Learn why chunk size, overlap, and strategy choice can make or break your retrieval quality.
What is Chunking and Why Does It Matter?
Breaking Big Documents Into Bite-Sized Pieces for AI
The Newspaper Analogy
Imagine trying to find one specific fact in a 500-page book. If someone gives you the entire book, you will spend forever searching. But if someone cuts the book into labeled index cards (one topic per card), you can find the right card in seconds. Chunking does exactly this for your AI system. It breaks large documents into smaller pieces (chunks) that embedding models can process and vector databases can search efficiently. The way you cut determines whether the right card shows up when your user asks a question.
Why Not Just Embed the Whole Document?
- Token Limits: Most embedding models have 512 token limits. A 50-page PDF has ~25,000 tokens. It physically cannot fit.
- Signal Dilution: A 10,000 word document embedded as one vector mixes many topics. The embedding becomes a vague average of everything, good at matching nothing specifically.
- Precision: If a user asks about "GST rates for electronics," you want to retrieve the specific paragraph about it, not the entire 100-page tax guide.
- LLM Context: You need to fit retrieved chunks into the LLM prompt. Smaller, focused chunks give the LLM exactly the context it needs.
The Chunking Dilemma
- Too Small: "The price is Rs 499." - Missing context. Price of what? For whom?
- Too Large: Entire chapter about pricing - too much noise, embedding becomes vague
- Just Right: Complete paragraph about the specific product pricing with enough context to be self-contained
Note: Chunking is where most teams get it wrong. The difference between a mediocre RAG and an excellent one often comes down to chunking strategy, not the LLM or embedding model.
Basic Strategies - Character, Token, and Recursive
The Foundation - Three Approaches Everyone Should Know
1. Fixed-Size Character Splitting (Simplest)
Cut text every N characters, regardless of content. Like cutting a roti with a ruler every 10 cm - you might cut right through a sabzi piece.
- How: Split every 500 characters with 50-character overlap
- Pros: Dead simple, predictable chunk sizes
- Cons: Cuts mid-sentence, mid-paragraph, mid-thought. Destroys meaning.
- Use When: Almost never in production. Only for quick prototypes.
2. Recursive Character Splitting (The Workhorse)
The most popular strategy in production. Tries to split at natural boundaries in order of preference: double newlines (paragraphs), single newlines, sentences, then characters. Like cutting roti at natural fold lines.
- How: Try splitting at "\n\n" first. If chunks are still too big, split at "\n". Then at ". ". Then at " ". Last resort: characters.
- Pros: Respects natural text boundaries, consistent chunk sizes, easy to implement
- Cons: Unaware of meaning - a paragraph about two different topics stays together
- Use When: Default choice for most RAG applications. Start here.
3. Token-Based Splitting
Similar to character splitting but counts tokens (the actual units embedding models process) instead of characters. Ensures no chunk exceeds the model token limit.
- How: Use tiktoken (OpenAI) or model tokenizer to count actual tokens
- Pros: Guaranteed to fit within model token limits
- Cons: Still unaware of meaning boundaries
- Use When: When you need precise token count control (e.g., strict LLM context limits)
Note: Recursive character splitting is the default recommendation for 80% of RAG use cases. It is simple, respects natural text boundaries, and works well with most content types.
Advanced Strategies - Semantic and Agentic
Intelligent Chunking That Understands Meaning
Semantic Chunking
Instead of splitting at fixed sizes, semantic chunking uses embeddings to detect topic boundaries. It embeds each sentence, then measures similarity between consecutive sentences. When similarity drops sharply (meaning the topic changed), it creates a chunk boundary.
- How It Works: Embed sentence 1, embed sentence 2, compute similarity. If similarity drops below threshold, split here. This means each chunk contains sentences about the same topic.
- Pros: Chunks are topically coherent - one chunk = one idea. Best retrieval quality.
- Cons: Slower (needs embeddings), variable chunk sizes (some huge, some tiny), requires tuning the similarity threshold
- Best For: Long documents where topics shift frequently (research papers, legal documents, manuals)
Agentic Chunking
Use an LLM to intelligently decide where to chunk. The LLM reads the document and creates meaning-preserving chunks with summaries and context. Like having a human editor cut the document into self-contained sections.
- How It Works: Feed document sections to an LLM with instructions: "Split this into self-contained chunks. Each chunk should make sense on its own. Add a brief summary to each chunk."
- Pros: Highest quality chunks, each chunk is self-contained with context, can handle complex structures
- Cons: Expensive (LLM API cost per document), slow, non-deterministic
- Best For: High-value documents where quality justifies cost (legal contracts, medical records, financial reports)
Document Structure-Aware Chunking
Use the document structure itself (headings, sections, HTML tags) as chunk boundaries. Each section becomes a chunk.
- How: Split at H1/H2/H3 headings. Each heading + its content = one chunk.
- Pros: Respects author intent (they organized the doc this way for a reason)
- Cons: Sections can be wildly different sizes. Some too small, some too large.
- Best For: Well-structured documents (documentation, wikis, articles with clear headings)
Note: Semantic chunking gives the best retrieval quality for most complex documents. Agentic chunking is even better but 10-100x more expensive. Choose based on document value and budget.
Chunk Size, Overlap, and Practical Tuning
The Numbers That Matter
Chunk Size Guidelines
Use Case | Chunk Size | Why
--------------------------|---------------|-------------------
Fact-based QA | 200-500 tokens| Small, precise answers
General knowledge RAG | 500-1000 tokens| Balance of context and precision
Legal/Contract analysis | 1000-2000 tokens| Need full clause context
Code documentation | Function/class | Natural boundaries
Conversation logs | Per message | Each message is a unitOverlap - The Safety Net
Overlap means consecutive chunks share some text. If chunk 1 ends at sentence 10, chunk 2 starts at sentence 8 (2-sentence overlap).
- Why Overlap? Without it, a key fact at the boundary of two chunks might be split and neither chunk has the complete information.
- How Much? 10-20% of chunk size is standard. For 500-token chunks, use 50-100 token overlap.
- Too Much Overlap: Wastes storage and increases processing (many near-duplicate chunks).
- Too Little: Risk losing context at boundaries.
Adding Context to Chunks
A standalone chunk like "The rate is 18%." is useless without context. Two techniques to fix this:
- Contextual Headers: Prepend the document title and section heading to every chunk. "[GST Guide > Electronics Rates] The rate is 18% for laptops..."
- Parent Document Retrieval: Embed small chunks for precision but retrieve the larger parent section for context. Small chunk matches accurately, parent provides full context to LLM.
- Chunk Summary Prefix: Add a 1-line summary at the start of each chunk: "Summary: GST rates for electronic goods in India. | The rate is 18%..."
Note: Always add context to your chunks. A chunk without its section heading and document title is like a page torn out of a book - technically readable but missing crucial context.
Chunking Mistakes That Wreck RAG
Avoid These and Your RAG Will Thank You
Mistake 1: One Chunk Size for All Content
Using 500-token chunks for everything - FAQ answers, legal contracts, code files. A 50-word FAQ answer padded to 500 tokens adds noise. A 5000-word legal clause split into 500-token chunks loses clause context. Fix: Adapt chunk size to content type. Short for FAQ, longer for legal, function-level for code.
Mistake 2: No Overlap at All
Chunks with zero overlap mean critical information at boundaries gets split. "The penalty for late filing is" in chunk 1, "Rs 10,000 per day" in chunk 2. Neither chunk has the complete fact. Fix: Always use 10-20% overlap.
Mistake 3: Chunks Without Context
"The price is Rs 499." Price of what? Which product? Which plan? A chunk without its section heading, document title, or topic context is ambiguous. Fix: Prepend contextual headers or use parent document retrieval.
Mistake 4: Not Evaluating Chunk Quality
Changing chunk size from 500 to 1000 tokens - did retrieval improve or degrade? Most teams never measure. Fix: Build an evaluation set. Test different chunk sizes. Measure Recall@10 and answer accuracy for each configuration.
Mistake 5: Splitting Tables Across Chunks
A table that starts in chunk 1 and continues in chunk 2 is useless in both chunks. Fix: Detect tables during parsing and keep each table as a single chunk (or convert to a self-contained format).
Note: The most impactful improvement to most RAG systems is not a better LLM or embedding model - it is better chunking. Experiment with sizes, add context, and measure the results.
Interview Questions
Q: What is chunking in RAG and why is it necessary?
Chunking breaks large documents into smaller pieces for embedding and retrieval. It is necessary because: (1) Embedding models have token limits (512 tokens). (2) Large document embeddings become vague averages that match nothing specifically. (3) Users need precise answers from specific paragraphs, not entire documents. (4) LLM context windows need focused, relevant chunks, not entire books. The way you chunk directly determines retrieval quality.
Q: Compare recursive character splitting with semantic chunking.
Recursive splitting splits at natural text boundaries (paragraphs, sentences) in a hierarchy. Simple, fast, predictable sizes, but unaware of meaning - two different topics in one paragraph stay together. Semantic chunking uses embeddings to detect topic boundaries by measuring sentence similarity. Creates topically coherent chunks but is slower, produces variable sizes, and needs threshold tuning. Use recursive for 80% of cases; semantic for complex documents where topics shift frequently.
Q: What is the role of chunk overlap and how much should you use?
Overlap ensures information at chunk boundaries is not lost. Without overlap, a fact split between two chunks is incomplete in both. Standard overlap is 10-20% of chunk size (50-100 tokens for 500-token chunks). Too much overlap wastes storage and creates near-duplicate chunks. Too little risks losing boundary context. The goal is that any important fact appears fully in at least one chunk.
Q: What is agentic chunking and when would you use it?
Agentic chunking uses an LLM to intelligently decide chunk boundaries, creating self-contained pieces with summaries and context. Each chunk makes sense on its own. It produces the highest quality chunks but is expensive (LLM cost per document), slow, and non-deterministic. Use it for high-value documents where quality justifies cost - legal contracts, medical records, financial reports. Not practical for millions of documents due to cost.
Frequently Asked Questions
What is Chunking Strategies?
Master the art and science of document chunking - from basic character splitting to advanced semantic and agentic approaches. Learn why chunk size, overlap, and strategy choice can make or break your retrieval quality.
How does Chunking Strategies work?
Breaking Big Documents Into Bite-Sized Pieces for AI The Newspaper Analogy Imagine trying to find one specific fact in a 500-page book. If someone gives you the entire book, you will spend forever searching.
Related topics
Practice this on DevInterviewMaster
Read the full Chunking Strategies (Recursive, Semantic, Agentic) breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.