DevInterviewMasterStart free →
AI & AutomationFree to read

Key AI Research Papers (Must-Read)

The Papers That Built Modern AI - From Attention to Agents

A curated guide to the most important AI papers every AI engineer must read - Transformers, Scaling Laws, ReAct, Toolformer, RAG, Self-RAG, RETRO, and more.

Why Read Papers?

Papers Are the Source Code of AI:

Every framework (LangChain, CrewAI, AutoGen) is built on ideas from research papers. If you only use frameworks without understanding the papers, you're cargo-culting - you won't know why things work or how to debug when they break.

Foundation Papers

1. Attention Is All You Need (Vaswani et al., 2017)

  • Impact: Introduced the Transformer architecture - the foundation of ALL modern LLMs
  • Key ideas: Self-attention, multi-head attention, positional encoding, encoder-decoder architecture
  • Why read: GPT, BERT, Claude, Gemini - all are Transformers. This is paper zero.
  • Read time: ~2 hours (dense but essential)

2. Scaling Laws for Neural Language Models (Kaplan et al., 2020)

  • Impact: Showed performance scales predictably with compute, data, and parameters
  • Key ideas: Power-law relationships, compute-optimal training, diminishing returns
  • Why read: Explains why labs keep making bigger models and how to predict performance

3. Training Compute-Optimal LLMs - Chinchilla (Hoffmann et al., 2022)

  • Impact: Proved smaller models + more data beats bigger models + less data
  • Key ideas: ~20 tokens/parameter optimal ratio, compute-efficient frontier
  • Why read: Changed how every model after 2022 was trained (LLaMA, Mistral, Gemma)

Agent & Reasoning Papers

4. ReAct: Synergizing Reasoning and Acting (Yao et al., 2022)

  • Impact: The foundation of ALL modern AI agents
  • Key ideas: Interleave thinking (Thought) with doing (Action/Observation) in a loop
  • Why read: LangChain agents, AutoGen, every agent framework implements ReAct

5. Toolformer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023)

  • Impact: Showed LLMs can learn WHEN and HOW to call external tools (calculators, search, APIs)
  • Key ideas: Self-supervised tool learning, API call insertion, tool selection
  • Why read: Foundation of function calling/tool use in GPT, Claude, and all agent frameworks

6. CAMEL: Communicative Agents for Mind Exploration (Li et al., 2023)

  • Impact: Pioneered role-playing multi-agent collaboration
  • Key ideas: Inception prompting, AI-AI conversations, role assignment, task decomposition
  • Why read: Foundation for multi-agent systems like CrewAI and AutoGen

7. AutoGen: Enabling Multi-Agent Conversation (Wu et al., 2023)

  • Impact: Microsoft's framework paper for multi-agent conversations
  • Key ideas: Conversable agents, human-in-the-loop, flexible agent topologies
  • Why read: Defines the multi-agent conversation paradigm used by AutoGen framework

RAG & Retrieval Papers

8. RAG: Retrieval-Augmented Generation (Lewis et al., 2020)

  • Impact: Introduced the RAG paradigm - retrieve context, then generate
  • Key ideas: Parametric + non-parametric memory, DPR retriever, sequence-to-sequence generator
  • Why read: Every RAG pipeline today is based on this paper's architecture

9. Self-RAG: Learning to Retrieve, Generate, and Critique (Asai et al., 2023)

  • Impact: Model decides WHEN to retrieve and self-critiques its own outputs
  • Key ideas: Reflection tokens, adaptive retrieval, self-grading quality
  • Why read: The future of RAG - agents that know when they need more context

10. RETRO: Improving Language Models by Retrieving from Trillions of Tokens (Borgeaud et al., 2022)

  • Impact: Retrieval integrated into model architecture, not just inference
  • Key ideas: Chunked cross-attention, retrieval database, architecture-level RAG
  • Why read: Shows retrieval can be baked into model training, not just added at inference

Reading Strategy for Engineers

How to Read ML Papers Efficiently:

  1. 1. Abstract + Intro: Understand the problem and claimed contribution (5 min)
  2. 2. Figures + Tables: Most important results are in the figures (10 min)
  3. 3. Method section: How they did it - the technical meat (30 min)
  4. 4. Skip the proofs: Unless you're doing research, skip mathematical proofs
  5. 5. YouTube explainers: Watch Yannic Kilcher or AI Coffee Break for paper walkthroughs

Priority Order (Read These First):

  1. Attention Is All You Need
  2. ReAct paper
  3. RAG paper
  4. Chinchilla / Scaling Laws
  5. Toolformer
  6. CAMEL / AutoGen
  7. Self-RAG / RETRO

Interview Questions

  1. Q: What is the most important paper for understanding modern LLMs?
    A: "Attention Is All You Need" (2017) - introduced the Transformer architecture that powers GPT, BERT, Claude, Gemini, and all modern language models.
  2. Q: How does the ReAct paper relate to AI agents?
    A: ReAct showed that interleaving reasoning (Thought) with acting (Action → Observation) in a loop enables LLMs to use tools and solve complex tasks. Every major agent framework implements this pattern.
  3. Q: What's the difference between RAG and RETRO?
    A: RAG retrieves context at inference time and prepends it to the prompt. RETRO integrates retrieval into the model architecture itself via chunked cross-attention during training, making retrieval a core part of the model.

Frequently Asked Questions

What is Key AI Research Papers?

A curated guide to the most important AI papers every AI engineer must read - Transformers, Scaling Laws, ReAct, Toolformer, RAG, Self-RAG, RETRO, and more.

How does Key AI Research Papers work?

Papers Are the Source Code of AI: Every framework (LangChain, CrewAI, AutoGen) is built on ideas from research papers. If you only use frameworks without understanding the papers, you're cargo-culting - you won't know why things work or how to debug when they break.

Browse all AI & Automation topics →

Practice this on DevInterviewMaster

Read the full Key AI Research Papers (Must-Read) breakdown with interactive demos, quizzes, and Hinglish notes.

Open the interactive topic →

800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.