s06

Context Compact

Core Loop

Keep Active Context Small and Stable|493 LOC|4 tools

Compaction isn't deleting history -- it's relocating detail so the agent can keep working.

s01 > s02 > s03 > s04 > s05 > [ s06 ] > s07 > s08 > s09 > s10 > s11 > s12 > s13 > s14 > s15 > s16 > s17 > s18 > s19

What You'll Learn

Why long sessions inevitably run out of context space, and what happens when they do
A four-lever compression strategy: persisted output, micro-compact, auto-compact, and manual compact
How to move detail out of active memory without losing it
How to keep a session alive indefinitely by summarizing and continuing

Your agent from s05 is capable. It reads files, runs commands, edits code, and delegates subtasks. But try something ambitious -- ask it to refactor a module that touches 30 files. After reading all of them and running 20 shell commands, you will notice the responses get worse. The model starts forgetting what it already read. It repeats work. Eventually the API rejects your request entirely. You have hit the context window limit, and without a plan for that, your agent is stuck.

The Problem

Every API call to the model includes the entire conversation so far: every user message, every assistant response, every tool call and its result. The model's context window (the total amount of text it can hold in working memory at once) is finite. A single read_file on a 1000-line source file costs roughly 4,000 tokens (roughly word-sized pieces -- a 1,000-line file uses about 4,000 tokens). Read 30 files and run 20 bash commands, and you have burned through 100,000+ tokens. The context is full, but the work is only half done.

The naive fix -- just truncating old messages -- throws away information the agent might need later. A smarter approach compresses strategically: keep the important bits, move the bulky details to disk, and summarize when the conversation gets too long. That is what this chapter builds.

The Solution

We use four levers, each working at a different stage of the pipeline, from output-time filtering to full conversation summarization.

Every tool call:
+------------------+
| Tool call result |
+------------------+
        |
        v
[Lever 0: persisted-output]     (at tool execution time)
  Large outputs (>50KB, bash >30KB) are written to disk
  and replaced with a <persisted-output> preview marker.
        |
        v
[Lever 1: micro_compact]        (silent, every turn)
  Replace tool_result > 3 turns old
  with "[Previous: used {tool_name}]"
  (preserves read_file results as reference material)
        |
        v
[Check: tokens > 50000?]
   |               |
   no              yes
   |               |
   v               v
continue    [Lever 2: auto_compact]
              Save transcript to .transcripts/
              LLM summarizes conversation.
              Replace all messages with [summary].
                    |
                    v
            [Lever 3: compact tool]
              Model calls compact explicitly.
              Same summarization as auto_compact.

How It Works

Step 1: Lever 0 -- Persisted Output

The first line of defense runs at tool execution time, before a result even enters the conversation. When a tool result exceeds a size threshold, we write the full output to disk and replace it with a short preview. This prevents a single giant command output from consuming half the context window.

// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
  return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
  if (anthropicClient) {
    return anthropicClient;
  }
  anthropicClient = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
    baseURL: process.env.ANTHROPIC_BASE_URL || void 0
  });
  return anthropicClient;
}
function createLoopContext() {
  return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {

The model can later read_file the stored path to access the full content if needed. Nothing is lost -- the detail just lives on disk instead of in the conversation.

Step 2: Lever 1 -- Micro-Compact

Before each LLM call, we scan for old tool results and replace them with one-line placeholders. This is invisible to the user and runs every turn. The key subtlety: we preserve read_file results because those serve as reference material the model often needs to look back at.

// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
  return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
  if (anthropicClient) {
    return anthropicClient;
  }
  anthropicClient = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
    baseURL: process.env.ANTHROPIC_BASE_URL || void 0
  });
  return anthropicClient;
}
function createLoopContext() {
  return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {

Step 3: Lever 2 -- Auto-Compact

When micro-compaction is not enough and the token count crosses a threshold, the harness takes a bigger step: it saves the full transcript to disk for recovery, asks the LLM to summarize the entire conversation, and then replaces all messages with that summary. The agent continues from the summary as if nothing happened.

// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
  return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
  if (anthropicClient) {
    return anthropicClient;
  }
  anthropicClient = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
    baseURL: process.env.ANTHROPIC_BASE_URL || void 0
  });
  return anthropicClient;
}
function createLoopContext() {
  return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {

Step 4: Lever 3 -- Manual Compact

The compact tool lets the model itself trigger summarization on demand. It uses exactly the same mechanism as auto-compact. The difference is who decides: auto-compact fires on a threshold, manual compact fires when the agent judges it is the right time to compress.

Step 5: Integration in the Agent Loop

All four levers compose naturally inside the main loop:

// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import fsp from "node:fs/promises";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
  return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
  if (anthropicClient) {
    return anthropicClient;
  }
  anthropicClient = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
    baseURL: process.env.ANTHROPIC_BASE_URL || void 0
  });
  return anthropicClient;
}
function createLoopContext() {
  return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {

Transcripts preserve full history on disk. Large outputs are saved to .task_outputs/tool-results/. Nothing is truly lost -- just moved out of active context.

What Changed From s05

Component	Before (s05)	After (s06)
Tools	5	5 (base + compact)
Context mgmt	None	Four-lever compression
Persisted-output	None	Large outputs -> disk + preview
Micro-compact	None	Old results -> placeholders
Auto-compact	None	Token threshold trigger
Transcripts	None	Saved to .transcripts/

Try It

npm run s06

Ask the agent to run pwd
Ask it to run ls -la
Ask it to summarize the current workspace in one sentence
Ask it to create notes/hello.ts and print the file content

What You've Mastered

At this point, you can:

Explain why a long agent session degrades and eventually fails without compression
Intercept oversized tool outputs before they enter the context window
Silently replace stale tool results with lightweight placeholders each turn
Trigger a full conversation summarization -- automatically on a threshold or manually via a tool call
Preserve full transcripts on disk so nothing is permanently lost

Stage 1 Complete

You now have a complete single-agent system. Starting from a bare API call in s01, you have built up tool use, structured planning, sub-agent delegation, dynamic skill loading, and context compression. Your agent can read, write, execute, plan, delegate, and work indefinitely without running out of memory. That is a real coding agent.

Before moving on, consider going back to s01 and rebuilding the whole stack from scratch without looking at the code. If you can write all six layers from memory, you truly own the ideas -- not just the implementation.

Stage 2 begins with s07 and hardens this foundation. You will add permission controls, hook systems, persistent memory, error recovery, and more. The single agent you built here becomes the kernel that everything else wraps around.

Key Takeaway

Compaction is not deleting history -- it is relocating detail so the agent can keep working.