Memory Management Pattern
Why your agent keeps 'forgetting'
Imagine talking to someone who can only remember the last 5 minutes of a conversation. Helpful for a quick chat, useless for a long project. LLMs are exactly like this — they only 'see' what fits in their context window (a limited amount of text). The Memory Management pattern is how we give an agent a notebook and a filing cabinet so it can remember more than the last 5 minutes.
Key points
- Short-term memory = the context window of the current task (limited size).
- Long-term memory = facts saved outside the model and recalled later.
- Techniques: summarise old turns, store key facts, retrieve relevant memories.
The one-line definition
Memory Management is the practice of deciding what an agent keeps in its limited context window right now (short-term memory) and what it saves outside the model to recall later (long-term memory), so the agent stays smart without running out of space.
Note: Context window = the agent's desk. It's small. Long-term memory = the filing cabinet.
First, what is a 'context window'?
An LLM reads everything as text and there is a hard limit on how much text it can read at once — that limit is the context window, measured in tokens (roughly chunks of words). When a conversation gets long, the oldest messages must be dropped or summarised, or you hit the limit and the call fails. So memory is really about fitting the most useful information into a small space.
Two kinds of memory (the big picture)
SHORT-TERM MEMORY LONG-TERM MEMORY (context window, this task) (database, forever) ────────────────────────── ────────────────────
┌──────────────────────┐ ┌──────────────────┐ │ system instructions │ │ user prefers │ │ last few messages │ │ window seats │ │ recent tool results │ │ user's name │ └──────────┬───────────┘ │ past decisions │ │ small + temporary │ ...thousands... │ │ └────────┬─────────┘ ▼ │ recall ┌───────────┐ needs old fact? │ relevant │ 🧠 LLM │ ◄──────────────────────────┘ bits └───────────┘ fits on the 'desk' lives in the 'cabinet'
What happens when the desk gets full
Turn 1 Turn 2 Turn 3 ... Turn 40 Turn 41 ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ msg│ │ msg│ │ msg│ ... │ msg│ │ msg│ └────┘ └────┘ └────┘ └────┘ └────┘ \________________/ │ too big for the context window! ▼ ┌───────────────┐ summarise old turns │ 'Earlier the │ into ONE short note │ user asked │ ───────────────┐ │ about X, we │ │ │ decided Y.' │ ▼ └───────────────┘ ┌────────────────────┐ │ summary + recent │ │ turns now FIT ✅ │ └────────────────────┘
A tiny code example (read it like English)
Here's the core idea: keep recent messages as-is, but once history gets too long, summarise the old part and save important facts to long-term memory. Notice we never throw information away blindly — we compress it or file it.
MAX_RECENT = 6 # keep only the last 6 turns on the 'desk'
def build_context(history, long_term_store, user_query):
# 1. SHORT-TERM: keep just the most recent turns
recent = history[-MAX_RECENT:]
# 2. Summarise the older turns so we don't lose them
older = history[:-MAX_RECENT]
summary = llm("Summarise briefly:", older) if older else ""
# 3. LONG-TERM: recall facts relevant to this query
memories = long_term_store.search(user_query, top_k=3)
# 4. Stuff it all into the prompt (fits in the window now)
return [summary] + memories + recent + [user_query]
When do you need real memory management?
| Scenario | Recommendation | Why |
|---|---|---|
| A short one-off question | ❌ Not needed | Everything already fits in the context window. |
| A long, multi-turn conversation | ✅ Summarise old turns | Old messages won't fit; compress them to stay under the limit. |
| Remembering a user across separate sessions | ✅ Long-term memory | The context window resets each session; a store persists facts. |
| A personal assistant that learns your preferences | ✅ Long-term memory | Saved facts let it act consistently every time. |
Memory mistakes beginners make
| Mistake | Consequence | Fix |
|---|---|---|
| Pasting the entire conversation into every call. | You blow past the context window and the call fails or gets expensive. | Keep only recent turns; summarise the rest; store key facts long-term. |
| Treating the context window as 'permanent memory'. | The agent 'forgets' everything when the session ends. | Persist anything that must survive in an external store (DB / vector store). |
| Saving everything to long-term memory. | Recall returns noise; the agent gets confused and slow. | Save only durable, useful facts; let trivial chatter expire. |
Remember these lines
- Short-term = context window (small, this task). Long-term = external store (durable).
- Don't dump everything in the prompt: summarise old turns, recall only relevant facts.
- If a fact must survive the session, it belongs in long-term memory, not the prompt.
Key takeaways
- Memory management splits into short-term (the limited context window) and long-term (external store).
- The context window is measured in tokens and has a hard limit you must respect.
- Core techniques: summarise old turns, store key facts, retrieve only relevant memories.
- Persist anything that must outlive the current session; never rely on the prompt for that.
Frequently Asked Questions
What is Memory Management Pattern?
Imagine talking to someone who can only remember the last 5 minutes of a conversation. Helpful for a quick chat, useless for a long project.
How does Memory Management Pattern work?
Memory Management is the practice of deciding what an agent keeps in its limited context window right now (short-term memory) and what it saves outside the model to recall later (long-term memory), so the agent stays smart without running out of space.
What are the key takeaways about Memory Management Pattern?
Memory management splits into short-term (the limited context window) and long-term (external store). The context window is measured in tokens and has a hard limit you must respect. Core techniques: summarise old turns, store key facts, retrieve only relevant memories. Persist anything that must outlive the current session; never rely on the prompt for that.
Related topics
Practice this on DevInterviewMaster
Read the full Memory Management Pattern breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.