CCA Foundations twisthandai.co.uk

Scenario 02 · Coding Assistant

~15 min read · interactive · 3‑question self‑check

The context window is a budget. A twelve‑step working session.

A skill that indexes your codebase in forty thousand tokens is forty thousand tokens the refactor will not have. The window is a ledger, not a stage.

Skills that do a lot of work should do that work elsewhere.

§ 01 · Setup

The drift.

A team ships /analyze-codebase — a custom skill that walks the repository and returns a report: dependency graph, test coverage counts, complexity hotspots, unused exports. It works. It is useful. Developers invoke it when they need the lay of the land before a refactor.

Then the complaints start. Claude is less responsive after I run that skill. It forgets what we were doing. It starts referencing files that aren’t in scope. It talks to me about coverage metrics when I’ve already moved on. The complaints are consistent enough that the skill gets disabled, which solves the drift and loses the capability.

Both reactions are wrong. The skill is not broken. The session is not broken. The architecture is — the skill is producing its tens of thousands of tokens of output into the same context window that has to hold the original task, the design decisions made since, and the rest of the working session. Every token the skill writes is a token the rest of the work does not get.

The tempting framing — treat a long session getting worse as a model problem to be solved with a better model, a smaller prompt, or /compact — reads correctly and fails the exam. The exam rewards answers that treat the context window as a budget and manage where work gets spent.

§ 02 · Evidence

Twelve steps. Twice.

We simulated one working session twice — twelve steps, identical workflow, identical skill. The user task is refactor the auth middleware. The skill is /analyze-codebase. The only difference between the two runs is a single line of frontmatter in the skill’s SKILL.md.

Press run. Each panel steps through the same session in lockstep. The bar at the top of each panel is the main context window filling up as tokens accumulate. The cells below are the steps — green if Claude stayed on the original task, amber if the context crossed into pressured territory, red if the original task had been crowded out.

The twelve‑step session

Same session. Same skill. One frontmatter line apart.

Workflow

  1. 01 user · refactor the auth middleware
  2. 02 skill · /analyze-codebase
  3. 03 user · which files import authMiddleware?
  4. 04 skill · /analyze-codebase
  5. 05 user · how’s the test coverage?
  6. 06 skill · /analyze-codebase
  7. 07 user · ok, back to the refactor
  8. 08 user · split into verify() and authorise()
  9. 09 skill · /analyze-codebase
  10. 10 user · apply the refactor
  11. 11 user · update the callers
  12. 12 user · run the test suite

Mode A

Direct execution

No isolation. Skill runs in main session.

Main context 0 / 200k
/ 12 on‑task steps

Mode B

context: fork

Skill runs in isolated subagent.

Main context 0 / 200k
Subagent context 0 / 200k

idle

/ 12 on‑task steps
On‑task Pressured Off‑track Run 0

Mode A’s main context crosses the pressured threshold after the third skill invocation and never comes back. Every subsequent step happens in a window where the original task has been pushed so far up the history that the model’s ability to act on it has measurably degraded. The cells go red not because the model got worse but because the input it’s acting on no longer centres the work.

Mode B’s main context barely moves. The skill still does all the same work — the subagent bar fills to the same height every time — but that work is done in a separate context that terminates when the skill finishes. What returns to the main session is a compact structured summary, a few hundred tokens of signal rather than forty thousand of raw output.

§ 03 · Why context fills

Context is a ledger.

The context window is not a stage where the current thought plays to an audience and then exits. It is a ledger that accumulates everything the model has been shown in this session — user turns, tool calls, tool results, its own outputs, everything — in order, in full, until the window is closed.

A skill that returns forty thousand tokens of analysis writes forty thousand tokens to that ledger. Claude’s next response reads the ledger to decide what to do. If the ledger now contains three such invocations and the original task has been displaced from its high-signal position near the end of the context, the model’s decision quality on that original task drops. This is not a failure mode. It is the architecture operating exactly as designed.

The three reactive fixes all miss. /compact summarises the ledger but is lossy — specific values, dates, and intermediate decisions are exactly what gets paraphrased away. An instruction inside the skill to “summarise before displaying” runs after the verbose work has already been written into the main context; the pollution has already happened. Splitting the skill into three smaller skills produces the same total output volume at the same destination. None of these treats the window as a budget.

The exam tests the reactive fixes as distractors. “Instruct the skill to summarise” is the strongest of them — it sounds like good engineering. It is not, because the summary instruction runs inside the polluting context. The only intervention that treats the window as a budget is isolation: do the verbose work somewhere the main context cannot see.

§ 04 · The pattern

What context: fork actually is.

A skill in Claude Code lives at .claude/skills/<name>/SKILL.md — a single Markdown file with YAML frontmatter on top and the skill’s prompt below. Frontmatter configures how the skill is invoked and how it runs. Three keys matter for this scenario. Together they are the skill’s contract.

# .claude/skills/analyze-codebase/SKILL.md
---
context: fork
allowed-tools: ["Read", "Grep", "Glob"]
argument-hint: "Path to the directory to analyse"
---

# Analyse codebase structure
Walk the directory at $ARGUMENTS. Produce a report covering
dependency graph, test coverage, complexity hotspots, and unused
exports. Return a structured summary, not raw output.

context: fork runs the skill in a subagent — a fresh context window spawned for this invocation and torn down when it returns. The subagent sees only what the skill prompt tells it to see. It does not inherit the main session’s history and its output does not flow into the main session’s history. The only thing that returns is the subagent’s final response, which the skill is instructed to make structured and compact.

allowed-tools is the subagent’s toolbox. Scoping it to Read, Grep, Glob means the analysis skill physically cannot write files, run shell commands, or touch anything outside its remit. The exam treats this as a security control; it is also a context control, because a tool the subagent doesn’t have is a tool it can’t consume context with.

argument-hint is the prompt the skill shows the developer when they invoke it without arguments. It turns “what did I name this skill’s argument” into a guided input.

Identify the operation whose output volume greatly exceeds its decision value to the main session. Put that operation in a skill. Put context: fork in the skill’s frontmatter. Constrain the skill’s tools to the minimum. Instruct the skill to return a structured summary — the main session consumes the summary, never the raw work.

§ 05 · Pattern reach

Isolation as discipline.

The fork pattern shows up across the exam whenever an operation’s output volume greatly exceeds its decision value. Three places it appears, briefly.

Example 01

Explore subagent · multi‑phase refactor

Discovery, then design, then apply

A 120‑file error‑handling wrapper project runs discovery in an Explore subagent that returns a summary, then keeps the collaborative design and the consistent implementation phases in the main session where retained context earns its keep.

Example 02

context: fork · brainstorming skill

Explore alternatives cleanly

An /explore-alternatives skill that evaluates three implementation paths runs forked, so the rejected approaches don’t bleed into the subsequent implementation. The skill returns the chosen approach; the losers stay in the subagent’s private transcript.

Example 03

Frontmatter trio · production skill

Argument hint, allowed tools, fork

A /migration skill that scaffolds database migrations uses argument-hint to force a migration name on invocation, allowed-tools to block destructive operations, and context: fork to prevent leak‑through from unrelated prior conversation. One skill, three safeguards.

In each case the subagent is doing work the main session doesn’t need to witness, only conclude from. The division of labour is the point.

§ 06 · Adjacent

When it belongs in CLAUDE.md instead.

Skills are one half of Claude Code’s configuration surface. The other half is CLAUDE.md — the instruction file loaded on every session. The distinction the exam tests, repeatedly, is which kind of guidance goes where. The answer is a single question: should this apply every time Claude wakes up, or only when a specific task is being done?

CLAUDE.md

Always loaded

Universal standards and conventions that should apply to every response: coding style, testing approach, naming conventions, architectural invariants. The three‑level hierarchy — user (~/.claude/CLAUDE.md), project (.claude/CLAUDE.md), directory — lets standards compose.

.claude/skills/

Loaded on invocation

Workflow‑specific guidance that applies only when the workflow is being done: a /review skill for code review, a /migration skill for database migrations, a /commit skill with team‑specific message conventions. The skill loads when invoked and unloads when done.

.claude/rules/

Loaded by file match

Path‑specific conventions that apply when Claude is editing matching files. YAML frontmatter with a paths: glob pattern — for example, paths: ["**/*.test.*"] — loads testing conventions only when a test file is being touched. Ideal for cross‑cutting conventions that span directories.

Putting every convention in the root CLAUDE.md and trusting Claude to apply only the relevant section is the distractor on every version of the question. It reads as organised and fails in practice — the model has to infer which heading applies, every time, for every response. Deterministic loading beats inferred relevance.

§ 07 · Self‑check

Three questions. Scroll after answering.

Progress
Use keys ABCD

Question 01 · Isolation

A team’s /analyze-codebase skill produces a comprehensive report: dependency graph, coverage counts, quality metrics. Developers report that after running it, Claude becomes less responsive in the session and loses the original task. What closes the gap?

Why A

Do the verbose work somewhere the main context cannot see.

context: fork spawns a subagent with its own context window. The verbose analysis happens there. When the subagent returns, only its structured final response appears in the main session — the raw intermediate output never does. The main context stays lean and the original task stays on its decision‑critical path.

BHaiku is a model selection. Tokens still accumulate in the main context at the same rate.
CThree skills produce the same total output volume to the same destination.
DThe summary instruction runs inside the polluting context. The raw work has already entered the ledger by the time the summary fires.

Question 02 · Loading surface

A project’s CLAUDE.md has grown to 400 lines: coding standards, testing conventions, a PR‑review checklist, a deployment runbook, and migration procedures. Standards should always apply. Workflow guidance should apply only when doing that workflow. How should this be restructured?

Why D

Always‑load goes in CLAUDE.md. On‑demand goes in Skills.

CLAUDE.md is read on every session — the right home for universal standards that should apply regardless of task. Skills are loaded on invocation — the right home for workflow guidance that should only apply when the workflow is being done. Splitting by when it should fire matches the loading model to the intent.

AUniversal standards should always apply. Moving them out of CLAUDE.md breaks that.
B@path organises files without changing loading behaviour. Everything still loads every session.
C.claude/rules/ loads by file‑path match. PR review doesn’t map to a file glob — it’s a workflow.

Question 03 · Planning mode

A codebase has clean notification patterns for email, SMS, and push. A ticket says “add Slack support” — no further detail. Slack offers three fundamentally different integration shapes: incoming webhooks (one‑way), bot tokens (two‑way with delivery state), Slack Apps (events with workspace approval). How should the task begin?

Why B

Plan first when the approaches differ architecturally, not cosmetically.

The ticket is ambiguous and the three integration shapes are not interchangeable — they differ in auth, two‑way capability, and operational surface. Planning mode investigates without making changes and produces a plan the user approves. That alignment comes cheap; an integration approach that has to be torn out in week three does not.

A“Match the existing pattern” reads as good citizenship. The existing pattern is one‑way and Slack’s value often hinges on two‑way features.
CScaffolding without the integration decided means the scaffold gets thrown away once the integration is chosen.
DBot tokens is a defensible choice — but the ticket didn’t ask for delivery confirmation, and you’re committing to workspace‑auth overhead for no stated reason.

Scenario 02 · Result

0/3

The takeaway

The main context holds the goal.
The fork holds the work.