DevInterviewMasterStart free →
AI & AutomationFree to read

Portfolio Project: RAG Chatbot with Citations

Build a Production-Grade AI Chatbot That Cites Its Sources

Create an impressive portfolio project - a Retrieval Augmented Generation chatbot that answers questions from your documents and provides verifiable source citations. The project that gets you hired.

Why This Project Will Get You Hired

The #1 Portfolio Project for AI Engineer Interviews

A RAG chatbot with citations demonstrates the most in-demand skills in AI engineering: working with LLMs, vector databases, document processing, and building production-grade AI systems. Every AI company is building some form of RAG system, so this project directly shows you can do the job.

Why RAG Chatbots Are Everywhere

Think about it - every company has internal documents that employees need to search through. HR policies, product documentation, legal contracts, customer support knowledge bases. A RAG chatbot turns this mountain of documents into a conversational interface. Companies like Flipkart, Infosys, and Wipro are all building internal RAG systems.

What Makes This Project Stand Out

  • Source Citations: Most chatbot demos just give answers. Yours will show EXACTLY which document and paragraph the answer came from. This is what enterprises need.
  • Multi-Format Support: Handle PDFs, Word docs, web pages, and markdown files. Not just plain text.
  • Conversation Memory: Follow-up questions work naturally. The bot remembers context.
  • Evaluation Pipeline: Built-in quality metrics showing your bot accuracy. This impresses senior engineers.
  • Production Architecture: Not a Jupyter notebook demo. A real application with API, frontend, and monitoring.

Skills You Will Demonstrate

  • Document processing and chunking strategies
  • Embedding models and vector databases
  • Retrieval strategies (hybrid search, reranking)
  • Prompt engineering for grounded answers
  • Citation extraction and source linking
  • Conversation memory management
  • API design and deployment

Note: This single project can replace 5 smaller projects on your resume. It covers document processing, vector search, LLM integration, API design, and evaluation - all the core AI engineering skills.

Architecture Overview

How the System Works - End to End

The RAG Pipeline

The system works in two phases:

Phase 1 - Ingestion (Offline):

  • 1. User uploads documents (PDF, DOCX, MD, web URLs)
  • 2. Documents are parsed and cleaned
  • 3. Text is split into chunks (500-1000 tokens each with overlap)
  • 4. Each chunk is converted to an embedding vector
  • 5. Vectors are stored in a vector database with metadata (source, page, section)

Phase 2 - Query (Real-time):

  • 1. User asks a question
  • 2. Question is converted to an embedding vector
  • 3. Vector database finds the top-K most similar chunks
  • 4. Retrieved chunks + question are sent to the LLM
  • 5. LLM generates an answer grounded in the retrieved context
  • 6. System extracts citations linking answer parts to source chunks
  • 7. Answer with citations is returned to the user

Technology Stack Recommendation

ComponentRecommendedAlternative
BackendPython FastAPINode.js Express
Vector DBChromaDB (simple)Pinecone (managed)
LLMOpenAI GPT-4o-miniAnthropic Claude
EmbeddingsOpenAI text-embedding-3-smallSentence Transformers
FrontendReact + TailwindNext.js
Document ParserLangChain loadersUnstructured.io

Note: Start with ChromaDB for development - it runs locally with no setup. Switch to Pinecone or Weaviate when deploying to production.

Document Processing & Chunking

The Foundation - Getting Your Documents Ready for AI

Chunking is the most underrated part of RAG. Bad chunking leads to bad retrieval, which leads to bad answers. No amount of prompt engineering can fix poorly chunked documents.

Chunking Strategies

  • Fixed Size Chunking: Split every N tokens with M token overlap. Simple but can break mid-sentence or mid-paragraph. Use 500-1000 tokens with 100-200 overlap.
  • Semantic Chunking: Split on natural boundaries - paragraphs, sections, headings. Preserves meaning but chunks vary in size.
  • Recursive Chunking: Try to split on double newlines first, then single newlines, then sentences, then tokens. Best general-purpose approach.
  • Parent-Child Chunking: Store small chunks for retrieval but return the parent chunk (larger context) to the LLM. Best of both worlds.

Metadata Is Critical for Citations

Each chunk must carry metadata that enables citation:

  • source_file: Original filename or URL
  • page_number: Page number in the original document
  • section_title: The heading or section this chunk belongs to
  • chunk_index: Position within the document
  • created_at: When the document was ingested

Without proper metadata, you cannot generate accurate citations. This is what separates a toy project from a production one.

Common Chunking Mistakes

  • Chunks too small - lose context, retrieval is poor
  • Chunks too large - too much noise, LLM gets confused
  • No overlap - important information at chunk boundaries gets lost
  • Ignoring document structure - tables, lists, and code blocks get broken
  • Not cleaning data - headers, footers, page numbers pollute chunks

Note: Spend 40% of your project time on document processing and chunking. This is where most RAG projects fail. Perfect retrieval with a mediocre LLM beats bad retrieval with GPT-4.

Citation System Design

The Feature That Makes Your Project Interview-Ready

Citations transform your chatbot from a toy to a trustworthy tool. When the AI says something, users can verify it by clicking through to the original source. This is what enterprise customers demand.

How Citation Extraction Works

The citation system works by instructing the LLM to reference specific sources in its answer:

  • Step 1: When sending context to the LLM, number each source chunk: [Source 1], [Source 2], etc.
  • Step 2: Instruct the LLM: "When you use information from a source, cite it as [1], [2] etc."
  • Step 3: Parse the LLM response to extract citation markers
  • Step 4: Map citation numbers back to the original chunk metadata
  • Step 5: Return the answer with clickable source links

Citation Display Formats

  • Inline Citations: "The return policy allows 30-day returns [1]. International orders have a 15-day window [2]." Each number links to the source.
  • Source Cards: Below the answer, show cards with document name, page number, and a relevant snippet from the source.
  • Highlight Mode: Users can click a citation to see the exact text in the original document that was used.

Handling Citation Edge Cases

  • What if the LLM synthesizes from multiple sources? Show all relevant sources.
  • What if the LLM adds knowledge not in the sources? Flag it as "AI-generated, not from your documents."
  • What if no relevant source exists? Say "I could not find this information in your documents" instead of guessing.

Note: In interviews, the citation system is what interviewers will ask about most. Be prepared to explain how you ensure citation accuracy and handle cases where the LLM does not cite correctly.

Evaluation & Quality Metrics

Proving Your Bot Actually Works

Having an evaluation pipeline shows you think like a production engineer, not just a hobbyist. This is what separates senior candidates from junior ones.

Metrics to Track

  • Retrieval Precision: Of the chunks retrieved, how many were actually relevant? Target: above 80%.
  • Retrieval Recall: Of all relevant chunks in the database, how many were retrieved? Target: above 70%.
  • Answer Faithfulness: Does the answer only use information from the retrieved context? No hallucinations. Target: above 95%.
  • Citation Accuracy: Do the citations actually point to the right sources? Target: above 90%.
  • Answer Relevance: Does the answer actually address the question asked? Target: above 85%.

Building Your Test Suite

Create a golden dataset of 50+ question-answer pairs:

  • Simple factual questions with single-source answers
  • Multi-hop questions requiring information from multiple documents
  • Questions with no answer in the documents (should say "not found")
  • Ambiguous questions that need clarification
  • Questions that test citation accuracy specifically

Using RAGAS Framework

RAGAS is an open-source framework specifically designed for RAG evaluation. It provides automated metrics for faithfulness, answer relevance, context precision, and context recall. Include RAGAS scores in your README to impress reviewers.

Note: Include your evaluation results in the project README. Numbers like 95% faithfulness, 87% retrieval precision tell reviewers you take quality seriously.

Deployment & Showcasing

Making Your Project Demo-Ready

Deployment Architecture

  • Backend: Deploy FastAPI on Railway, Render, or Fly.io (free tiers available)
  • Vector DB: Use Pinecone free tier (supports up to 100K vectors)
  • Frontend: Deploy React app on Vercel
  • Demo Data: Pre-load with interesting documents (Indian Constitution, company policies template, technical documentation)

GitHub README Must-Haves

  • Architecture diagram showing the full RAG pipeline
  • Demo GIF or video showing the chatbot in action with citations
  • Evaluation metrics and how you measured them
  • Clear setup instructions that actually work
  • Design decisions documented - why this chunking strategy, why this vector DB
  • Known limitations and future improvements

Interview Talking Points

  • Why you chose recursive chunking over fixed-size
  • How you handle multi-hop questions across documents
  • Your approach to citation accuracy validation
  • How you would scale this to millions of documents
  • Trade-offs between different embedding models
  • Cost optimization strategies for production deployment

Note: A live demo beats a GitHub repo every time. Even if the free tier is slow, having a working URL that interviewers can try is incredibly powerful.

Interview Questions - RAG Chatbot

Q1: How would you improve retrieval quality in your RAG system?

Answer: Multiple approaches: (1) Hybrid search combining vector similarity with BM25 keyword search for better coverage. (2) Query expansion - rewrite the user question into multiple search queries. (3) Reranking - use a cross-encoder model to rerank retrieved chunks by relevance. (4) Metadata filtering - use document metadata to narrow search scope. (5) Better chunking - experiment with semantic chunking that preserves document structure. The most impactful is usually hybrid search + reranking.

Q2: How do you ensure the chatbot does not hallucinate?

Answer: Multi-layer approach: (1) Strong system prompt instructing the LLM to only use provided context and say "not found" if unsure. (2) Citation requirement forces the LLM to ground every claim. (3) Post-processing validation checking if the answer text actually appears in cited sources. (4) Confidence scoring based on retrieval similarity scores. (5) Evaluation pipeline measuring faithfulness on a test dataset. Temperature setting at 0 or very low for factual responses.

Q3: How would you scale this system to handle millions of documents?

Answer: (1) Switch to a managed vector database like Pinecone or Weaviate that handles scaling automatically. (2) Implement async document processing pipeline with a message queue for ingestion. (3) Add semantic caching to avoid re-computing answers for similar queries. (4) Use metadata-based pre-filtering to reduce the search space. (5) Implement tiered storage - frequently accessed documents in fast storage, archived ones in cold storage. (6) Horizontal scaling of the API layer with load balancing.

Frequently Asked Questions

What is Portfolio Project: RAG Chatbot with Citations?

Create an impressive portfolio project - a Retrieval Augmented Generation chatbot that answers questions from your documents and provides verifiable source citations. The project that gets you hired.

How does Portfolio Project: RAG Chatbot with Citations work?

The #1 Portfolio Project for AI Engineer Interviews A RAG chatbot with citations demonstrates the most in-demand skills in AI engineering: working with LLMs, vector databases, document processing, and building production-grade AI systems. Every AI company is building some form of RAG system, so this project directly…

Browse all AI & Automation topics →

Practice this on DevInterviewMaster

Read the full Portfolio Project: RAG Chatbot with Citations breakdown with interactive demos, quizzes, and Hinglish notes.

Open the interactive topic →

800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.