Smart Memory: Context Management

How to Spend a 200K Budget

Claude's context window is 200K tokens (roughly 150,000 English words). Sounds massive, but in an Agent scenario it gets consumed faster than you'd expect:

A single Read call might return thousands of tokens of file content. A Bash call might dump massive log output. After a dozen turns, the history alone eats most of the budget.

What happens when you run out? That's where Claude Code's compression strategy kicks in.

Auto-Compaction: Summarize, Don't Discard

When context usage hits 75-92%, Claude Code automatically triggers compaction:

Compaction isn't deletion. It summarizes earlier turns into concise recaps, keeping essential information (e.g., "user asked me to fix a bug in X, I edited and tests pass") while dropping verbose details (full file contents, raw command output).

Customizable compaction

Claude Code has pre/post hooks for the compaction step. You can mark certain messages as "never compress" or inject additional context after compression.

This is useful for critical project constraints — like "never modify production.config.js" — that must survive every compaction cycle.

"Search, Don't Index"

This is Claude Code's most controversial — and most educational — design decision.

What you'd expect

Many AI coding tools use RAG (Retrieval-Augmented Generation):

On startup, embed the entire codebase into vectors
When the user asks a question, find the most relevant code snippets
Inject those snippets into the model's context
Model answers based on the retrieved context

What Claude Code actually does

No RAG. It lets the model call Grep (ripgrep) and Glob tools to search directly.

Why grep beats RAG here

Dimension	RAG	grep search
Needs indexing	Yes. Slow startup, must stay in sync	No. Search on demand
External dependencies	Needs embedding service	Just ripgrep (local)
Security	Code may be sent to external embedding API	Fully local, no code leakage
Accuracy	Semantic approximation, may miss things	Exact match, never misses
Complexity	Must maintain index, handle incremental updates	Zero maintenance
Token cost	Lower (only inject relevant snippets)	Higher (model may search multiple times)

The core trade-off

RAG saves tokens but adds complexity. Grep costs more tokens but is dead simple.

When the model is smart enough to know what to search for, and the context window is large enough (200K), the "brute force" approach of grep + model judgment outperforms a carefully designed retrieval pipeline.

Anthropic's team has said: they tried RAG and their internal benchmarks showed the grep approach performed better.

The Memory Hierarchy

Claude Code's "memory" isn't just conversation history. It has a layered system:

Layer	Lifetime	Contents
CLAUDE.md	Permanent (you maintain it)	Project rules, preferences, architecture info
Auto-memory	Across sessions	Patterns Claude learns about your project
Message history	Current session	Your conversation with Claude
Tool cache	Current session	File contents already read (avoids duplicate reads)
System Prompt	Each API call	Tool list, project info, rules

What This Looks Like in Practice

Short conversations (a few turns): No compaction needed, 200K is plenty
Medium conversations (a dozen turns): One compaction, barely noticeable
Long conversations (dozens of turns): Multiple compactions, key info preserved but early details may fade
Across sessions: CLAUDE.md and auto-memory ensure project knowledge is never lost

Key Takeaways

Summarize, don't discard — old messages become concise recaps, not empty holes
Simple beats clever — grep search is more reliable than RAG in this context
Layered memory — different lifetimes need different storage strategies
Security first — never send code to external services for indexing

You now understand all of Claude Code's core systems. Final chapter — Build Your Own: From Reader to Builder.

Smart Memory: Context Management ​

How to Spend a 200K Budget ​

Auto-Compaction: Summarize, Don't Discard ​

"Search, Don't Index" ​

What you'd expect ​

What Claude Code actually does ​

Why grep beats RAG here ​

The Memory Hierarchy ​

What This Looks Like in Practice ​

Key Takeaways ​