AI & AutomationFree to read

Agent Memory

Giving AI Agents the Ability to Remember

Learn how AI agents maintain memory across conversations - from short-term chat history to long-term knowledge persistence. Build agents that remember user preferences, learn from past interactions, and improve over time.

Why Do Agents Need Memory?

The Goldfish Problem - Every Conversation Starts from Zero

The Problem:

By default, LLMs are stateless - they have no memory between API calls. Each conversation starts completely fresh, as if the AI has amnesia. This is like a customer support agent who forgets you the moment you hang up and call back.

Memory transforms a stateless AI into a personalized assistant that knows your preferences, remembers past conversations, and builds knowledge over time.

Real-World Example - Your Favorite Chai Wala:

Think about the chai wala outside your office. After a few visits:

He remembers you like cutting chai with less sugar
He knows you come at 10 AM and 4 PM
He remembers you asked about green tea last week and tells you he got it now
He knows your colleague prefers black coffee and asks if he is coming today

This is exactly what agent memory does - it turns a generic AI into one that knows you and adapts to you.

Types of Agent Memory:

Memory Type	Duration	What It Stores	Human Analogy
Working Memory	Current conversation	Chat messages, tool results	What you are thinking about right now
Short-term Memory	Recent sessions	Recent interactions, summaries	What happened today/this week
Long-term Memory	Permanent	User preferences, facts, knowledge	Learned knowledge, people you know
Episodic Memory	Permanent	Past experiences, outcomes	Specific events you remember

Note: Memory is what turns a generic AI chatbot into a personalized assistant. Without memory, every conversation is like meeting a stranger. With memory, it is like talking to a colleague who knows you.

Working Memory - The Conversation Buffer

What the Agent Remembers Within a Single Conversation

How It Works:

Working memory is simply the list of messages in the current conversation that gets sent to the LLM with each API call. It includes user messages, assistant responses, tool calls, and tool results.

The challenge: LLMs have a finite context window (4K to 200K tokens). Long conversations eventually exceed this limit. You need strategies to manage this.

Context Window Management Strategies:

Sliding Window: Keep only the last N messages. Simple but loses early context. Like forgetting what was discussed at the start of a long meeting.
Summarization: Periodically summarize older messages into a compact summary. Keep the summary + recent messages. Like meeting notes - you do not remember every word but you have the key points.
Token Budget: Set a max token budget for history. Truncate oldest messages when approaching the limit.
Smart Selection: Use an LLM to decide which past messages are most relevant to the current question and include only those.

Best Practice - Hybrid Approach:

Context sent to LLM:

1. System Prompt (fixed)
2. Long-term memory facts (from DB)
   "User prefers vegetarian food, lives in Bangalore"
3. Summary of older conversation
   "Earlier we discussed flight options to Goa for March..."
4. Last 10 messages (full detail)
5. Current user message

Total: Fits within context window while preserving key information

Note: The hybrid approach (summary + recent messages + relevant long-term facts) gives the best balance of memory and context window efficiency.

Long-term Memory - Remembering Across Conversations

Persistent Knowledge That Survives Session Boundaries

The Core Challenge:

When a conversation ends, all working memory is lost. Long-term memory solves this by persisting important information to a database so it can be retrieved in future conversations.

What to Store in Long-term Memory:

User Preferences: "Prefers window seats, vegetarian, likes technical explanations"
Learned Facts: "Works at TCS as a senior developer, team of 8 people"
Past Decisions: "Chose React over Angular for the new project (Feb 2026)"
Interaction Patterns: "Usually asks about code review on Mondays, deployment help on Fridays"

Storage Approaches:

Approach	How	Best For
Key-Value Store	Redis/DynamoDB with user_id keys	Structured preferences, settings
Vector Database	Embed memories as vectors, semantic search	Unstructured knowledge, fuzzy recall
Knowledge Graph	Entities and relationships in Neo4j	Complex relationships between facts
Hybrid	KV for structured + Vectors for unstructured	Production systems (most common)

Memory Extraction - How the Agent Decides What to Remember:

After each conversation turn, run a "memory extraction" prompt:

"Based on this conversation, extract any new facts about the user
that should be remembered for future conversations.

Conversation: [last few messages]

Extract in JSON format:
{
  "preferences": [...],
  "facts": [...],
  "action_items": [...]
}"

Example output:
{
  "preferences": ["Prefers dark mode", "Likes concise explanations"],
  "facts": ["Working on e-commerce project with Next.js"],
  "action_items": ["Follow up on deployment issue next week"]
}

Note: Long-term memory is what makes AI assistants truly personal. The combination of vector search (for fuzzy recall) and key-value storage (for structured facts) works best in production.

Episodic Memory - Learning from Past Experiences

Remembering What Worked and What Did Not

What is Episodic Memory?

Episodic memory stores complete past experiences - not just facts, but the full context of what happened, what the agent did, and what the outcome was. This lets the agent learn from past successes and failures.

Think of it like a doctor who remembers: "Last time a patient had these symptoms, I prescribed X and it worked. But for patient Y with similar symptoms, X did not work because of their allergy, so I used Z instead."

Example - Customer Support Agent:

Episodic Memory Store:

Episode 1: (Success)
  Context: User complaint about slow Swiggy delivery
  Action: Checked order status, found restaurant delay
  Resolution: Offered 20% discount coupon
  Outcome: User satisfied (5-star rating)
  Learning: Restaurant delays -> proactive discount works well

Episode 2: (Failure)
  Context: User complaint about wrong item delivered
  Action: Offered refund
  Resolution: User wanted replacement, not refund
  Outcome: User frustrated (1-star rating)
  Learning: Wrong items -> always ask "refund or replacement?" first

Now when a new complaint comes in, the agent searches
episodic memory for similar situations and applies
the learned strategy.

How to Implement Episodic Memory:

Store episodes: After each completed task, save the context, actions taken, and outcome
Embed episodes: Convert to vector embeddings for semantic search
Retrieve similar episodes: When a new task comes in, search for similar past episodes
Apply learnings: Include relevant past episodes in the prompt as examples
Score outcomes: Track success/failure to weight episodes (prefer successful strategies)

Note: Episodic memory is what makes agents get better over time. By learning from past successes and failures, the agent builds practical wisdom that no amount of pre-training can provide.

Memory Architecture for Production

Building a Complete Memory System

Production Memory Architecture:

[Agent receives user message]
        |
        v
[Memory Retrieval Phase]
  |-- Fetch user profile (Redis/Postgres)
  |-- Search semantic memory (vector DB)
  |     Query: embed(user_message) -> top 5 relevant memories
  |-- Search episodic memory (vector DB)
  |     Query: similar past situations -> top 3 episodes
  |-- Get conversation summary (Redis)
        |
        v
[Compose Context]
  System prompt + user profile + relevant memories
  + episodes + conversation summary + recent messages
        |
        v
[LLM generates response]
        |
        v
[Memory Storage Phase (async, after response)]
  |-- Extract new facts -> store in long-term memory
  |-- Update user profile if new preferences detected
  |-- If task completed -> store as episode
  |-- Update conversation summary

Frameworks with Built-in Memory:

Mem0 (formerly MemGPT): Purpose-built memory layer for AI agents. Handles extraction, storage, and retrieval automatically. Very popular.
LangGraph Persistence: Built-in checkpointing and state persistence for LangGraph agents.
Zep: Long-term memory service for AI assistants. Provides automatic fact extraction and temporal awareness.
Letta: Stateful AI agents with self-editing memory. The agent decides what to remember and forget.

Memory Gotchas:

Stale Memories: User changed preferences but old memory persists. Solution: timestamp memories and prefer recent ones.
Contradictions: New info contradicts stored memory. Solution: detect conflicts and update (or ask user).
Privacy: Storing personal data has legal implications (GDPR, DPDPA). Solution: clear consent, data deletion capability, encryption.
Memory Bloat: Storing too much irrelevant info degrades retrieval quality. Solution: periodic cleanup, relevance scoring.

Note: Memory systems store personal data. Always implement consent mechanisms, encryption, and the ability for users to view and delete their stored memories.

Interview Questions - Agent Memory

Q: What are the different types of memory in AI agents?

Four types: (1) Working memory - current conversation messages (limited by context window). (2) Short-term memory - recent session summaries. (3) Long-term memory - persistent facts and preferences stored in databases. (4) Episodic memory - past experiences with outcomes, enabling the agent to learn from successes and failures.

Q: How do you handle context window limits for long conversations?

Best approach is hybrid: (1) Keep a running summary of the conversation that gets updated periodically. (2) Include the last N messages in full detail. (3) Pull relevant long-term memories via semantic search. (4) Set a token budget and prioritize recent + relevant content. This gives the LLM the context it needs without exceeding the window.

Q: Vector DB vs Key-Value store for agent memory?

Vector DB (Pinecone, Qdrant) is best for unstructured memories that need semantic/fuzzy search - "find memories related to travel preferences." Key-Value stores (Redis) are best for structured, exact-match data - "get this user's dietary preference." Production systems typically use both: KV for structured profiles, vectors for unstructured knowledge.

Q: How does episodic memory help agents improve over time?

Episodic memory stores past task executions with their outcomes (success/failure). When a new similar task comes in, the agent retrieves relevant episodes and uses successful strategies while avoiding failed approaches. It is like an experienced employee who has "seen this before" and knows what works - practical wisdom from real experiences rather than just training data.

Frequently Asked Questions

What is Agent Memory?

Learn how AI agents maintain memory across conversations - from short-term chat history to long-term knowledge persistence. Build agents that remember user preferences, learn from past interactions, and improve over time.

How does Agent Memory work?

The Goldfish Problem - Every Conversation Starts from Zero The Problem: By default, LLMs are stateless - they have no memory between API calls. Each conversation starts completely fresh, as if the AI has amnesia.

Browse all AI & Automation topics →

Practice this on DevInterviewMaster

Read the full Agent Memory breakdown with interactive demos, quizzes, and Hinglish notes.

Open the interactive topic →

800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.

Agent Memory

Why Do Agents Need Memory?

Working Memory - The Conversation Buffer

Long-term Memory - Remembering Across Conversations

Episodic Memory - Learning from Past Experiences

Memory Architecture for Production

Interview Questions - Agent Memory

Frequently Asked Questions

Related topics

Practice this on DevInterviewMaster