Vector Databases (Pinecone, Weaviate, ChromaDB, Qdrant)
Where AI Stores Its Memory - The Database Built for Embeddings
Learn how vector databases store, index, and search through millions of embeddings in milliseconds. Understand Pinecone, Weaviate, ChromaDB, and Qdrant - the infrastructure backbone of every RAG system and semantic search engine.
What is a Vector Database?
A Database Designed for AI - Not Your Grandfather's SQL
The Spotify Analogy
When Spotify recommends songs, it does not search by song title or artist name (that would be keyword search). Instead, every song is represented as a vector - a list of numbers capturing its vibe, tempo, mood, genre. When you play a sad Hindi song, Spotify finds other songs whose vectors are close to it in this numerical space. A vector database is the engine that makes this "find similar things" search blazing fast, even across millions of items.
Why Not Just Use PostgreSQL?
Traditional databases are built for exact matches: "Find user where email = tanuj@gmail.com." Vector databases are built for nearest neighbor search: "Find the 10 documents most similar to this query embedding." This is fundamentally different:
- Regular DB: Exact match on structured data using B-tree indexes
- Vector DB: Approximate similarity search on high-dimensional vectors using ANN indexes (HNSW, IVF)
- Scale: Comparing a 768-dim vector against 10 million vectors using brute force takes seconds. With vector DB indexes, it takes milliseconds.
Core Operations of a Vector DB
- Insert/Upsert: Store vectors with metadata (document text, source URL, timestamps)
- Search (KNN/ANN): Given a query vector, find the K nearest neighbors
- Filter: Combine vector search with metadata filters ("similar documents from last 30 days only")
- Delete: Remove vectors by ID or metadata filter
- Update: Modify vectors or their metadata
Note: Vector databases are to AI what SQL databases are to web apps - the essential storage layer. Every RAG system, semantic search engine, and recommendation system needs one.
How Vector Search Actually Works - ANN Indexes
The Algorithms Behind Millisecond Search
The Problem: Brute Force is Too Slow
If you have 10 million vectors with 768 dimensions each, comparing your query against every single vector means 10 million distance calculations of 768-dimensional vectors. That is 7.68 billion floating-point operations per query. Impossible for real-time search. Vector databases solve this using Approximate Nearest Neighbor (ANN) algorithms.
HNSW (Hierarchical Navigable Small World) - Most Popular
Think of it like a multi-level city map. Level 3 has major highways (few nodes, long-distance connections). Level 2 has state roads. Level 1 has city streets. Level 0 has every single address. To find a destination, start at the top level (highway), jump to the right region, then descend through levels until you reach the exact neighborhood. Instead of checking all 10M vectors, HNSW typically checks only 100-200.
- Speed: Sub-millisecond for millions of vectors
- Accuracy: 95-99% recall (finds 95-99 of true top-100 nearest neighbors)
- Tradeoff: Uses more memory (stores graph structure in RAM)
IVF (Inverted File Index) - The Clustering Approach
Imagine organizing a library by topic shelves. First, cluster all books into 100 topic shelves. When someone asks for a book about cooking, go to the "food" shelf and search only there instead of the entire library. IVF clusters vectors into groups (Voronoi cells), then only searches the nearest clusters.
- Speed: Fast, especially with quantization
- Accuracy: Depends on number of clusters probed
- Tradeoff: Less memory than HNSW, but needs retraining when data distribution changes
Product Quantization (PQ) - Compression
Shrinks vectors to use much less memory. A 768-dim float32 vector uses 3KB. With PQ, it can be compressed to 96 bytes - 32x smaller. Some accuracy is lost, but for large datasets the memory savings are essential.
Note: HNSW is the most widely used algorithm in production vector databases. It offers the best speed-accuracy tradeoff for most use cases at the cost of higher memory usage.
The Big Four - Pinecone, Weaviate, ChromaDB, Qdrant
Choosing Your Vector Database
Pinecone - The Managed King
- Type: Fully managed cloud service (no self-hosting option)
- Best For: Teams that want zero infrastructure headache
- Strengths: Dead simple API, automatic scaling, serverless tier, great docs
- Weaknesses: Vendor lock-in, no self-hosting, can get expensive at scale
- Think of it as: Zomato cloud kitchen - you just place orders, they handle everything
Weaviate - The Feature-Rich Hybrid
- Type: Open-source + managed cloud option
- Best For: Teams needing hybrid search (vector + keyword) and built-in ML modules
- Strengths: GraphQL API, built-in vectorizers, hybrid search, multi-tenancy
- Weaknesses: Heavier resource footprint, steeper learning curve
- Think of it as: Full-service restaurant - does everything, but more complex to manage
ChromaDB - The Developer Favorite
- Type: Open-source, embeddable (runs in your process)
- Best For: Prototyping, small-medium datasets, local development
- Strengths: Simplest API, in-memory mode, Python-native, great for notebooks
- Weaknesses: Not designed for massive scale, limited production features
- Think of it as: Street food stall - fast, simple, perfect for quick meals but not a banquet
Qdrant - The Performance Champion
- Type: Open-source (Rust-based) + managed cloud
- Best For: Teams needing high performance and advanced filtering
- Strengths: Written in Rust (fast), excellent filtering, payload indexes, quantization built-in
- Weaknesses: Smaller community than Pinecone/Weaviate
- Think of it as: German-engineered sports car - pure performance focus
Note: ChromaDB for prototyping, Pinecone for managed simplicity, Qdrant for performance, Weaviate for feature-richness. There is no single best - it depends on your specific needs.
Vector DB in a RAG Pipeline
How Vector Databases Power Real AI Applications
The RAG Pipeline Flow
INGESTION PHASE (once):
[Documents] --> [Chunking] --> [Embedding Model] --> [Vector DB]
1000 PDFs 500-word all-mpnet-v2 Qdrant
chunks 768-dim vectors with metadata
QUERY PHASE (every request):
[User Query] --> [Embed Query] --> [Vector DB Search] --> [Top 5 chunks]
"What is 768-dim cosine similarity Most relevant
GST rate?" vector + metadata filter document pieces
[Top 5 chunks] --> [LLM Prompt] --> [Generated Answer]
context "Based on "The GST rate
documents context..." for laptops is 18%"Metadata Filtering - The Secret Weapon
Vector similarity alone is not enough. You often need to combine it with metadata filters:
- Time filter: "Similar documents from last 7 days only" - for news or time-sensitive data
- Category filter: "Similar products in Electronics category only" - for Flipkart-style search
- Access control: "Documents this user has permission to see" - for enterprise apps
- Language filter: "Hindi documents only" - for multilingual systems
Hybrid Search (Vector + Keyword)
Sometimes pure vector search misses exact terms. If someone searches for "error code ERR_2847", vector search might return documents about generic errors. Hybrid search combines:
- Vector search: Finds semantically similar documents (understands meaning)
- Keyword search (BM25): Finds documents with exact terms (catches specific codes, names)
- Fusion: Combine both result sets using Reciprocal Rank Fusion (RRF) for best results
Note: Metadata filtering and hybrid search are what separate production-quality RAG from demo-quality RAG. Always include metadata with your vectors and consider hybrid search for technical content.
Common Vector DB Mistakes
Pitfalls That Can Sink Your AI Application
Mistake 1: Not Storing Enough Metadata
Many teams store only the vector and document text. Then they realize they need to filter by date, category, source, or user access level - but the metadata is not there. Fix: Store rich metadata from day one. It is much easier to add it upfront than to re-index millions of vectors later.
Mistake 2: Wrong Distance Metric
Cosine similarity, dot product, and Euclidean distance give different results. Most embedding models are trained for cosine similarity. Using dot product without normalized vectors gives wrong rankings. Fix: Check your embedding model documentation for the recommended distance metric.
Mistake 3: Over-Engineering for Scale You Do Not Have
Deploying a distributed Qdrant cluster for 50,000 vectors is like renting a godown to store 10 boxes. ChromaDB or even pgvector can handle small datasets perfectly. Fix: Start simple. ChromaDB for under 100K vectors, dedicated vector DB for millions.
Mistake 4: Ignoring Index Tuning
Default HNSW parameters (ef_construction, M) are a compromise. For your specific data, tuning these can improve recall by 5-10% or reduce latency by 50%. Fix: Benchmark different parameter values on your data and tune for your quality-speed requirements.
Mistake 5: No Backup or Versioning Strategy
Vector databases can corrupt or lose data like any database. If you cannot re-embed everything quickly (expensive, slow), you need backups. Fix: Always keep the original documents and embedding model version. Regular snapshots of the vector DB.
Note: The most painful mistake is not storing metadata upfront. Re-indexing millions of vectors because you forgot to add a timestamp field can take days and cost real money.
Interview Questions
Q: Why do we need a specialized vector database instead of using PostgreSQL?
Traditional databases like PostgreSQL use B-tree indexes optimized for exact matches. Vector databases use ANN (Approximate Nearest Neighbor) indexes like HNSW that are designed for high-dimensional similarity search. Comparing a 768-dim query against 10M vectors with brute force takes seconds; with HNSW it takes milliseconds. PostgreSQL has pgvector extension for small datasets, but purpose-built vector DBs handle millions of vectors with better performance, filtering, and scaling.
Q: Explain how HNSW index works.
HNSW builds a multi-level graph. Top levels have few nodes with long-range connections (like highways). Lower levels have more nodes with short-range connections (like local streets). To find nearest neighbors, start at the top level, greedily navigate to the closest node, then descend to the next level and repeat. This narrows the search space from millions to a few hundred comparisons, achieving sub-millisecond search with 95-99% recall.
Q: When would you use ChromaDB vs Pinecone vs Qdrant?
ChromaDB: Prototyping, local development, small datasets (under 100K vectors), Python notebooks. Pinecone: Production apps where you want zero infrastructure management, auto-scaling, and are okay with vendor lock-in and higher costs. Qdrant: Production apps needing high performance, advanced filtering, and you want open-source with self-hosting option. Qdrant (Rust-based) often has the best raw performance.
Q: What is hybrid search and why is it important?
Hybrid search combines vector (semantic) search with keyword (BM25) search. Vector search understands meaning but may miss exact terms like error codes or product IDs. Keyword search finds exact matches but misses semantic understanding. Combining them with Reciprocal Rank Fusion gives the best of both worlds. Critical for technical documentation, e-commerce search, and any system where both meaning and exact terms matter.
Frequently Asked Questions
What is Vector Databases?
Learn how vector databases store, index, and search through millions of embeddings in milliseconds. Understand Pinecone, Weaviate, ChromaDB, and Qdrant - the infrastructure backbone of every RAG system and semantic search engine.
How does Vector Databases work?
A Database Designed for AI - Not Your Grandfather's SQL The Spotify Analogy When Spotify recommends songs, it does not search by song title or artist name (that would be keyword search). Instead, every song is represented as a vector - a list of numbers capturing its vibe, tempo, mood, genre.
Related topics
Practice this on DevInterviewMaster
Read the full Vector Databases (Pinecone, Weaviate, ChromaDB, Qdrant) breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.