Context Compact
Core LoopKeep Active Context Small and Stable|493 LOC|4 tools
Compaction isn't deleting history -- it's relocating detail so the agent can keep working.
s01 > s02 > s03 > s04 > s05 > [ s06 ] > s07 > s08 > s09 > s10 > s11 > s12 > s13 > s14 > s15 > s16 > s17 > s18 > s19
What You'll Learn
- Why long sessions inevitably run out of context space, and what happens when they do
- A four-lever compression strategy: persisted output, micro-compact, auto-compact, and manual compact
- How to move detail out of active memory without losing it
- How to keep a session alive indefinitely by summarizing and continuing
Your agent from s05 is capable. It reads files, runs commands, edits code, and delegates subtasks. But try something ambitious -- ask it to refactor a module that touches 30 files. After reading all of them and running 20 shell commands, you will notice the responses get worse. The model starts forgetting what it already read. It repeats work. Eventually the API rejects your request entirely. You have hit the context window limit, and without a plan for that, your agent is stuck.
The Problem
Every API call to the model includes the entire conversation so far: every user message, every assistant response, every tool call and its result. The model's context window (the total amount of text it can hold in working memory at once) is finite. A single read_file on a 1000-line source file costs roughly 4,000 tokens (roughly word-sized pieces -- a 1,000-line file uses about 4,000 tokens). Read 30 files and run 20 bash commands, and you have burned through 100,000+ tokens. The context is full, but the work is only half done.
The naive fix -- just truncating old messages -- throws away information the agent might need later. A smarter approach compresses strategically: keep the important bits, move the bulky details to disk, and summarize when the conversation gets too long. That is what this chapter builds.
The Solution
We use four levers, each working at a different stage of the pipeline, from output-time filtering to full conversation summarization.
Every tool call:
+------------------+
| Tool call result |
+------------------+
|
v
[Lever 0: persisted-output] (at tool execution time)
Large outputs (>50KB, bash >30KB) are written to disk
and replaced with a <persisted-output> preview marker.
|
v
[Lever 1: micro_compact] (silent, every turn)
Replace tool_result > 3 turns old
with "[Previous: used {tool_name}]"
(preserves read_file results as reference material)
|
v
[Check: tokens > 50000?]
| |
no yes
| |
v v
continue [Lever 2: auto_compact]
Save transcript to .transcripts/
LLM summarizes conversation.
Replace all messages with [summary].
|
v
[Lever 3: compact tool]
Model calls compact explicitly.
Same summarization as auto_compact.
How It Works
Step 1: Lever 0 -- Persisted Output
The first line of defense runs at tool execution time, before a result even enters the conversation. When a tool result exceeds a size threshold, we write the full output to disk and replace it with a short preview. This prevents a single giant command output from consuming half the context window.
// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
if (anthropicClient) {
return anthropicClient;
}
anthropicClient = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
baseURL: process.env.ANTHROPIC_BASE_URL || void 0
});
return anthropicClient;
}
function createLoopContext() {
return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {
The model can later read_file the stored path to access the full content if needed. Nothing is lost -- the detail just lives on disk instead of in the conversation.
Step 2: Lever 1 -- Micro-Compact
Before each LLM call, we scan for old tool results and replace them with one-line placeholders. This is invisible to the user and runs every turn. The key subtlety: we preserve read_file results because those serve as reference material the model often needs to look back at.
// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
if (anthropicClient) {
return anthropicClient;
}
anthropicClient = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
baseURL: process.env.ANTHROPIC_BASE_URL || void 0
});
return anthropicClient;
}
function createLoopContext() {
return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {
Step 3: Lever 2 -- Auto-Compact
When micro-compaction is not enough and the token count crosses a threshold, the harness takes a bigger step: it saves the full transcript to disk for recovery, asks the LLM to summarize the entire conversation, and then replaces all messages with that summary. The agent continues from the summary as if nothing happened.
// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
if (anthropicClient) {
return anthropicClient;
}
anthropicClient = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
baseURL: process.env.ANTHROPIC_BASE_URL || void 0
});
return anthropicClient;
}
function createLoopContext() {
return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {
Step 4: Lever 3 -- Manual Compact
The compact tool lets the model itself trigger summarization on demand. It uses exactly the same mechanism as auto-compact. The difference is who decides: auto-compact fires on a threshold, manual compact fires when the agent judges it is the right time to compress.
Step 5: Integration in the Agent Loop
All four levers compose naturally inside the main loop:
// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
if (anthropicClient) {
return anthropicClient;
}
anthropicClient = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
baseURL: process.env.ANTHROPIC_BASE_URL || void 0
});
return anthropicClient;
}
function createLoopContext() {
return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {
Transcripts preserve full history on disk. Large outputs are saved to .task_outputs/tool-results/. Nothing is truly lost -- just moved out of active context.
What Changed From s05
| Component | Before (s05) | After (s06) |
|---|---|---|
| Tools | 5 | 5 (base + compact) |
| Context mgmt | None | Four-lever compression |
| Persisted-output | None | Large outputs -> disk + preview |
| Micro-compact | None | Old results -> placeholders |
| Auto-compact | None | Token threshold trigger |
| Transcripts | None | Saved to .transcripts/ |
Try It
npm run s06
- Ask the agent to run
pwd - Ask it to run
ls -la - Ask it to summarize the current workspace in one sentence
- Ask it to create
notes/hello.tsand print the file content
What You've Mastered
At this point, you can:
- Explain why a long agent session degrades and eventually fails without compression
- Intercept oversized tool outputs before they enter the context window
- Silently replace stale tool results with lightweight placeholders each turn
- Trigger a full conversation summarization -- automatically on a threshold or manually via a tool call
- Preserve full transcripts on disk so nothing is permanently lost
Stage 1 Complete
You now have a complete single-agent system. Starting from a bare API call in s01, you have built up tool use, structured planning, sub-agent delegation, dynamic skill loading, and context compression. Your agent can read, write, execute, plan, delegate, and work indefinitely without running out of memory. That is a real coding agent.
Before moving on, consider going back to s01 and rebuilding the whole stack from scratch without looking at the code. If you can write all six layers from memory, you truly own the ideas -- not just the implementation.
Stage 2 begins with s07 and hardens this foundation. You will add permission controls, hook systems, persistent memory, error recovery, and more. The single agent you built here becomes the kernel that everything else wraps around.
Key Takeaway
Compaction is not deleting history -- it is relocating detail so the agent can keep working.