Agent Memory
Giving AI Agents the Ability to Remember
Learn how AI agents maintain memory across conversations - from short-term chat history to long-term knowledge persistence. Build agents that remember user preferences, learn from past interactions, and improve over time.
Why Do Agents Need Memory?
The Goldfish Problem - Every Conversation Starts from Zero
The Problem:
By default, LLMs are stateless - they have no memory between API calls. Each conversation starts completely fresh, as if the AI has amnesia. This is like a customer support agent who forgets you the moment you hang up and call back.
Memory transforms a stateless AI into a personalized assistant that knows your preferences, remembers past conversations, and builds knowledge over time.
Real-World Example - Your Favorite Chai Wala:
Think about the chai wala outside your office. After a few visits:
- He remembers you like cutting chai with less sugar
- He knows you come at 10 AM and 4 PM
- He remembers you asked about green tea last week and tells you he got it now
- He knows your colleague prefers black coffee and asks if he is coming today
This is exactly what agent memory does - it turns a generic AI into one that knows you and adapts to you.
Types of Agent Memory:
| Memory Type | Duration | What It Stores | Human Analogy |
|---|---|---|---|
| Working Memory | Current conversation | Chat messages, tool results | What you are thinking about right now |
| Short-term Memory | Recent sessions | Recent interactions, summaries | What happened today/this week |
| Long-term Memory | Permanent | User preferences, facts, knowledge | Learned knowledge, people you know |
| Episodic Memory | Permanent | Past experiences, outcomes | Specific events you remember |
Note: Memory is what turns a generic AI chatbot into a personalized assistant. Without memory, every conversation is like meeting a stranger. With memory, it is like talking to a colleague who knows you.
Working Memory - The Conversation Buffer
What the Agent Remembers Within a Single Conversation
How It Works:
Working memory is simply the list of messages in the current conversation that gets sent to the LLM with each API call. It includes user messages, assistant responses, tool calls, and tool results.
The challenge: LLMs have a finite context window (4K to 200K tokens). Long conversations eventually exceed this limit. You need strategies to manage this.
Context Window Management Strategies:
- Sliding Window: Keep only the last N messages. Simple but loses early context. Like forgetting what was discussed at the start of a long meeting.
- Summarization: Periodically summarize older messages into a compact summary. Keep the summary + recent messages. Like meeting notes - you do not remember every word but you have the key points.
- Token Budget: Set a max token budget for history. Truncate oldest messages when approaching the limit.
- Smart Selection: Use an LLM to decide which past messages are most relevant to the current question and include only those.
Best Practice - Hybrid Approach:
Context sent to LLM:
1. System Prompt (fixed)
2. Long-term memory facts (from DB)
"User prefers vegetarian food, lives in Bangalore"
3. Summary of older conversation
"Earlier we discussed flight options to Goa for March..."
4. Last 10 messages (full detail)
5. Current user message
Total: Fits within context window while preserving key informationNote: The hybrid approach (summary + recent messages + relevant long-term facts) gives the best balance of memory and context window efficiency.
Long-term Memory - Remembering Across Conversations
Persistent Knowledge That Survives Session Boundaries
The Core Challenge:
When a conversation ends, all working memory is lost. Long-term memory solves this by persisting important information to a database so it can be retrieved in future conversations.
What to Store in Long-term Memory:
- User Preferences: "Prefers window seats, vegetarian, likes technical explanations"
- Learned Facts: "Works at TCS as a senior developer, team of 8 people"
- Past Decisions: "Chose React over Angular for the new project (Feb 2026)"
- Interaction Patterns: "Usually asks about code review on Mondays, deployment help on Fridays"
Storage Approaches:
| Approach | How | Best For |
|---|---|---|
| Key-Value Store | Redis/DynamoDB with user_id keys | Structured preferences, settings |
| Vector Database | Embed memories as vectors, semantic search | Unstructured knowledge, fuzzy recall |
| Knowledge Graph | Entities and relationships in Neo4j | Complex relationships between facts |
| Hybrid | KV for structured + Vectors for unstructured | Production systems (most common) |
Memory Extraction - How the Agent Decides What to Remember:
After each conversation turn, run a "memory extraction" prompt:
"Based on this conversation, extract any new facts about the user
that should be remembered for future conversations.
Conversation: [last few messages]
Extract in JSON format:
{
"preferences": [...],
"facts": [...],
"action_items": [...]
}"
Example output:
{
"preferences": ["Prefers dark mode", "Likes concise explanations"],
"facts": ["Working on e-commerce project with Next.js"],
"action_items": ["Follow up on deployment issue next week"]
}Note: Long-term memory is what makes AI assistants truly personal. The combination of vector search (for fuzzy recall) and key-value storage (for structured facts) works best in production.
Episodic Memory - Learning from Past Experiences
Remembering What Worked and What Did Not
What is Episodic Memory?
Episodic memory stores complete past experiences - not just facts, but the full context of what happened, what the agent did, and what the outcome was. This lets the agent learn from past successes and failures.
Think of it like a doctor who remembers: "Last time a patient had these symptoms, I prescribed X and it worked. But for patient Y with similar symptoms, X did not work because of their allergy, so I used Z instead."
Example - Customer Support Agent:
Episodic Memory Store:
Episode 1: (Success)
Context: User complaint about slow Swiggy delivery
Action: Checked order status, found restaurant delay
Resolution: Offered 20% discount coupon
Outcome: User satisfied (5-star rating)
Learning: Restaurant delays -> proactive discount works well
Episode 2: (Failure)
Context: User complaint about wrong item delivered
Action: Offered refund
Resolution: User wanted replacement, not refund
Outcome: User frustrated (1-star rating)
Learning: Wrong items -> always ask "refund or replacement?" first
Now when a new complaint comes in, the agent searches
episodic memory for similar situations and applies
the learned strategy.How to Implement Episodic Memory:
- Store episodes: After each completed task, save the context, actions taken, and outcome
- Embed episodes: Convert to vector embeddings for semantic search
- Retrieve similar episodes: When a new task comes in, search for similar past episodes
- Apply learnings: Include relevant past episodes in the prompt as examples
- Score outcomes: Track success/failure to weight episodes (prefer successful strategies)
Note: Episodic memory is what makes agents get better over time. By learning from past successes and failures, the agent builds practical wisdom that no amount of pre-training can provide.
Memory Architecture for Production
Building a Complete Memory System
Production Memory Architecture:
[Agent receives user message]
|
v
[Memory Retrieval Phase]
|-- Fetch user profile (Redis/Postgres)
|-- Search semantic memory (vector DB)
| Query: embed(user_message) -> top 5 relevant memories
|-- Search episodic memory (vector DB)
| Query: similar past situations -> top 3 episodes
|-- Get conversation summary (Redis)
|
v
[Compose Context]
System prompt + user profile + relevant memories
+ episodes + conversation summary + recent messages
|
v
[LLM generates response]
|
v
[Memory Storage Phase (async, after response)]
|-- Extract new facts -> store in long-term memory
|-- Update user profile if new preferences detected
|-- If task completed -> store as episode
|-- Update conversation summaryFrameworks with Built-in Memory:
- Mem0 (formerly MemGPT): Purpose-built memory layer for AI agents. Handles extraction, storage, and retrieval automatically. Very popular.
- LangGraph Persistence: Built-in checkpointing and state persistence for LangGraph agents.
- Zep: Long-term memory service for AI assistants. Provides automatic fact extraction and temporal awareness.
- Letta: Stateful AI agents with self-editing memory. The agent decides what to remember and forget.
Memory Gotchas:
- Stale Memories: User changed preferences but old memory persists. Solution: timestamp memories and prefer recent ones.
- Contradictions: New info contradicts stored memory. Solution: detect conflicts and update (or ask user).
- Privacy: Storing personal data has legal implications (GDPR, DPDPA). Solution: clear consent, data deletion capability, encryption.
- Memory Bloat: Storing too much irrelevant info degrades retrieval quality. Solution: periodic cleanup, relevance scoring.
Note: Memory systems store personal data. Always implement consent mechanisms, encryption, and the ability for users to view and delete their stored memories.
Interview Questions - Agent Memory
Q: What are the different types of memory in AI agents?
Four types: (1) Working memory - current conversation messages (limited by context window). (2) Short-term memory - recent session summaries. (3) Long-term memory - persistent facts and preferences stored in databases. (4) Episodic memory - past experiences with outcomes, enabling the agent to learn from successes and failures.
Q: How do you handle context window limits for long conversations?
Best approach is hybrid: (1) Keep a running summary of the conversation that gets updated periodically. (2) Include the last N messages in full detail. (3) Pull relevant long-term memories via semantic search. (4) Set a token budget and prioritize recent + relevant content. This gives the LLM the context it needs without exceeding the window.
Q: Vector DB vs Key-Value store for agent memory?
Vector DB (Pinecone, Qdrant) is best for unstructured memories that need semantic/fuzzy search - "find memories related to travel preferences." Key-Value stores (Redis) are best for structured, exact-match data - "get this user's dietary preference." Production systems typically use both: KV for structured profiles, vectors for unstructured knowledge.
Q: How does episodic memory help agents improve over time?
Episodic memory stores past task executions with their outcomes (success/failure). When a new similar task comes in, the agent retrieves relevant episodes and uses successful strategies while avoiding failed approaches. It is like an experienced employee who has "seen this before" and knows what works - practical wisdom from real experiences rather than just training data.
Frequently Asked Questions
What is Agent Memory?
Learn how AI agents maintain memory across conversations - from short-term chat history to long-term knowledge persistence. Build agents that remember user preferences, learn from past interactions, and improve over time.
How does Agent Memory work?
The Goldfish Problem - Every Conversation Starts from Zero The Problem: By default, LLMs are stateless - they have no memory between API calls. Each conversation starts completely fresh, as if the AI has amnesia.
Related topics
Practice this on DevInterviewMaster
Read the full Agent Memory breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.