s05

Skills

Core Loop

Discover Cheaply, Load Deeply|481 LOC|5 tools

Discover cheaply, load deeply -- only when needed.

s01 > s02 > s03 > s04 > [ s05 ] > s06 > s07 > s08 > s09 > s10 > s11 > s12 > s13 > s14 > s15 > s16 > s17 > s18 > s19

What You'll Learn

Why stuffing all domain knowledge into the system prompt wastes tokens
The two-layer loading pattern: cheap names up front, expensive bodies on demand
How frontmatter (YAML metadata at the top of a file) gives each skill a name and description
How the model decides for itself which skill to load and when

You don't memorize every recipe in every cookbook you own. You know which shelf each cookbook sits on, and you pull one down only when you're actually cooking that dish. An agent's domain knowledge works the same way. You might have expertise files for git workflows, testing patterns, code review checklists, PDF processing -- dozens of topics. Loading all of them into the system prompt on every request is like reading every cookbook cover to cover before cracking a single egg. Most of that knowledge is irrelevant to any given task.

The Problem

You want your agent to follow domain-specific workflows: git conventions, testing best practices, code review checklists. The naive approach is to put everything in the system prompt. But 10 skills at 2,000 tokens each means 20,000 tokens of instructions on every API call -- most of which have nothing to do with the current question. You pay for those tokens every turn, and worse, all that irrelevant text competes for the model's attention with the content that actually matters.

The Solution

Split knowledge into two layers. Layer 1 lives in the system prompt and is cheap: just skill names and one-line descriptions (~100 tokens per skill). Layer 2 is the full skill body, loaded on demand through a tool call only when the model decides it needs that knowledge.

System prompt (Layer 1 -- always present):
+--------------------------------------+
| You are a coding agent.              |
| Skills available:                    |
|   - git: Git workflow helpers        |  ~100 tokens/skill
|   - test: Testing best practices     |
+--------------------------------------+

When model calls load_skill("git"):
+--------------------------------------+
| tool_result (Layer 2 -- on demand):  |
| <skill name="git">                   |
|   Full git workflow instructions...  |  ~2000 tokens
|   Step 1: ...                        |
| </skill>                             |
+--------------------------------------+

How It Works

Step 1. Each skill is a directory containing a SKILL.md file. The file starts with YAML frontmatter (a metadata block delimited by --- lines) that declares the skill's name and description, followed by the full instruction body.

skills/
  pdf/
    SKILL.md       # ---\n name: pdf\n description: Process PDF files\n ---\n ...
  code-review/
    SKILL.md       # ---\n name: code-review\n description: Review code\n ---\n ...

Step 2. SkillLoader scans for all SKILL.md files at startup. It parses the frontmatter to extract names and descriptions, and stores the full body for later retrieval.

// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
  return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
  if (anthropicClient) {
    return anthropicClient;
  }
  anthropicClient = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
    baseURL: process.env.ANTHROPIC_BASE_URL || void 0
  });
  return anthropicClient;
}
function createLoopContext() {
  return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {
  const resolved = path.resolve(WORKDIR, relativePath);

Step 3. Layer 1 goes into the system prompt so the model always knows what skills exist. Layer 2 is wired up as a normal tool handler -- the model calls load_skill when it decides it needs the full instructions.

// Tree-shaken bundle: chapter wiring + only used runtime code.
// agents_self_contained/_runtime.ts
import Anthropic from "@anthropic-ai/sdk";
import dotenv from "dotenv";
import { execSync, spawn, spawnSync } from "node:child_process";
import fs from "node:fs";
import path from "node:path";
import process from "node:process";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";
dotenv.config({ override: true });
var WORKDIR = process.cwd();
var DEFAULT_MODEL = "claude-3-5-sonnet-latest";
var anthropicClient = null;
function getModelId() {
  return process.env.MODEL_ID || DEFAULT_MODEL;
}
function getAnthropicClient() {
  if (anthropicClient) {
    return anthropicClient;
  }
  anthropicClient = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "missing-api-key",
    baseURL: process.env.ANTHROPIC_BASE_URL || void 0
  });
  return anthropicClient;
}
function createLoopContext() {
  return { workdir: WORKDIR, messages: [], meta: {} };
}
function safePath(relativePath) {
  const resolved = path.resolve(WORKDIR, relativePath);

The model learns what skills exist (cheap, ~100 tokens each) and loads them only when relevant (expensive, ~2000 tokens each). On a typical turn, only one skill is loaded instead of all ten.

What Changed From s04

Component	Before (s04)	After (s05)
Tools	5 (base + task)	5 (base + load_skill)
System prompt	Static string	+ skill descriptions
Knowledge	None	skills/*/SKILL.md files
Injection	None	Two-layer (system + result)

Try It

npm run s05

Ask the agent to run pwd
Ask it to run ls -la
Ask it to summarize the current workspace in one sentence
Ask it to create notes/hello.ts and print the file content

What You've Mastered

At this point, you can:

Explain why "list first, load later" beats stuffing everything into the system prompt
Write a SKILL.md with YAML frontmatter that a SkillLoader can discover
Wire up two-layer loading: cheap descriptions in the system prompt, full bodies via tool_result
Let the model decide for itself when domain knowledge is worth loading

You don't need skill ranking systems, multi-provider merging, parameterized templates, or recovery-time restoration rules. The core pattern is simple: advertise cheaply, load on demand.

What's Next

You now know how to keep knowledge out of context until it's needed. But what happens when context grows large anyway -- after dozens of turns of real work? In s06, you'll learn how to compress a long conversation down to its essentials so the agent can keep working without hitting token limits.

Key Takeaway

Advertise skill names cheaply in the system prompt; load the full body through a tool call only when the model actually needs it.