Agentic AI PatternsFree to read

Memory Management Pattern

Why your agent keeps 'forgetting'

Imagine talking to someone who can only remember the last 5 minutes of a conversation. Helpful for a quick chat, useless for a long project. LLMs are exactly like this — they only 'see' what fits in their context window (a limited amount of text). The Memory Management pattern is how we give an agent a notebook and a filing cabinet so it can remember more than the last 5 minutes.

Key points

Short-term memory = the context window of the current task (limited size).
Long-term memory = facts saved outside the model and recalled later.
Techniques: summarise old turns, store key facts, retrieve relevant memories.

The one-line definition

Memory Management is the practice of deciding what an agent keeps in its limited context window right now (short-term memory) and what it saves outside the model to recall later (long-term memory), so the agent stays smart without running out of space.

Note: Context window = the agent's desk. It's small. Long-term memory = the filing cabinet.

First, what is a 'context window'?

An LLM reads everything as text and there is a hard limit on how much text it can read at once — that limit is the context window, measured in tokens (roughly chunks of words). When a conversation gets long, the oldest messages must be dropped or summarised, or you hit the limit and the call fails. So memory is really about fitting the most useful information into a small space.

Two kinds of memory (the big picture)

SHORT-TERM MEMORY LONG-TERM MEMORY (context window, this task) (database, forever) ────────────────────────── ────────────────────

┌──────────────────────┐ ┌──────────────────┐ │ system instructions │ │ user prefers │ │ last few messages │ │ window seats │ │ recent tool results │ │ user's name │ └──────────┬───────────┘ │ past decisions │ │ small + temporary │ ...thousands... │ │ └────────┬─────────┘ ▼ │ recall ┌───────────┐ needs old fact? │ relevant │ 🧠 LLM │ ◄──────────────────────────┘ bits └───────────┘ fits on the 'desk' lives in the 'cabinet'

What happens when the desk gets full

Turn 1 Turn 2 Turn 3 ... Turn 40 Turn 41 ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ msg│ │ msg│ │ msg│ ... │ msg│ │ msg│ └────┘ └────┘ └────┘ └────┘ └────┘ \________________/ │ too big for the context window! ▼ ┌───────────────┐ summarise old turns │ 'Earlier the │ into ONE short note │ user asked │ ───────────────┐ │ about X, we │ │ │ decided Y.' │ ▼ └───────────────┘ ┌────────────────────┐ │ summary + recent │ │ turns now FIT ✅ │ └────────────────────┘

A tiny code example (read it like English)

Here's the core idea: keep recent messages as-is, but once history gets too long, summarise the old part and save important facts to long-term memory. Notice we never throw information away blindly — we compress it or file it.

MAX_RECENT = 6   # keep only the last 6 turns on the 'desk'

def build_context(history, long_term_store, user_query):
    # 1. SHORT-TERM: keep just the most recent turns
    recent = history[-MAX_RECENT:]

    # 2. Summarise the older turns so we don't lose them
    older = history[:-MAX_RECENT]
    summary = llm("Summarise briefly:", older) if older else ""

    # 3. LONG-TERM: recall facts relevant to this query
    memories = long_term_store.search(user_query, top_k=3)

    # 4. Stuff it all into the prompt (fits in the window now)
    return [summary] + memories + recent + [user_query]

When do you need real memory management?

Scenario	Recommendation	Why
A short one-off question	❌ Not needed	Everything already fits in the context window.
A long, multi-turn conversation	✅ Summarise old turns	Old messages won't fit; compress them to stay under the limit.
Remembering a user across separate sessions	✅ Long-term memory	The context window resets each session; a store persists facts.
A personal assistant that learns your preferences	✅ Long-term memory	Saved facts let it act consistently every time.

Memory mistakes beginners make

Mistake	Consequence	Fix
Pasting the entire conversation into every call.	You blow past the context window and the call fails or gets expensive.	Keep only recent turns; summarise the rest; store key facts long-term.
Treating the context window as 'permanent memory'.	The agent 'forgets' everything when the session ends.	Persist anything that must survive in an external store (DB / vector store).
Saving everything to long-term memory.	Recall returns noise; the agent gets confused and slow.	Save only durable, useful facts; let trivial chatter expire.

Remember these lines

Short-term = context window (small, this task). Long-term = external store (durable).
Don't dump everything in the prompt: summarise old turns, recall only relevant facts.
If a fact must survive the session, it belongs in long-term memory, not the prompt.

Key takeaways

Memory management splits into short-term (the limited context window) and long-term (external store).
The context window is measured in tokens and has a hard limit you must respect.
Core techniques: summarise old turns, store key facts, retrieve only relevant memories.
Persist anything that must outlive the current session; never rely on the prompt for that.

Frequently Asked Questions

What is Memory Management Pattern?

Imagine talking to someone who can only remember the last 5 minutes of a conversation. Helpful for a quick chat, useless for a long project.

How does Memory Management Pattern work?

Memory Management is the practice of deciding what an agent keeps in its limited context window right now (short-term memory) and what it saves outside the model to recall later (long-term memory), so the agent stays smart without running out of space.

What are the key takeaways about Memory Management Pattern?

Memory management splits into short-term (the limited context window) and long-term (external store). The context window is measured in tokens and has a hard limit you must respect. Core techniques: summarise old turns, store key facts, retrieve only relevant memories. Persist anything that must outlive the current session; never rely on the prompt for that.

Browse all Agentic AI Patterns topics →

Practice this on DevInterviewMaster

Read the full Memory Management Pattern breakdown with interactive demos, quizzes, and Hinglish notes.

Open the interactive topic →

800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.

Memory Management Pattern

Key points

The one-line definition

First, what is a 'context window'?

Two kinds of memory (the big picture)

What happens when the desk gets full

A tiny code example (read it like English)

When do you need real memory management?

Memory mistakes beginners make

Remember these lines

Key takeaways

Frequently Asked Questions

Related topics

Practice this on DevInterviewMaster