Subagent-Driven Development

In one line: One fresh subagent per task — the orchestrator directs and reviews, never implements — so context from earlier tasks never pollutes later ones.

What: The default execution model for implementation. Rather than running all tasks in one long session, dispatch a fresh subagent per task. The orchestrating session reviews output between tasks via two-stage review (spec compliance, then code quality).

Why: Long-running sessions degrade in quality — a structural property of context-window management, not speculation. A session that has done five tasks carries all five contexts: files read, errors hit, debugging output, intermediate states. By the sixth task, irrelevant context leaks: a variable name from task 2 surfaces in task 6, an error-handling pattern from task 3 lands where it doesn't belong.

Fresh subagents start each task with only the relevant context: the design, the plan, the task description, CLAUDE.md, and memory files. The orchestrator keeps the big picture while implementers focus without distraction.

The Controller Pattern:

The orchestrating session serves as a controller that:

Reads the implementation plan and extracts all tasks with their full text
For each task, dispatches an implementer subagent with the task description, relevant context (file paths, design spec section), and any constraints from earlier tasks
Receives the implementer's result and status
Dispatches a spec reviewer subagent to verify the implementation matches the design
Dispatches a code quality reviewer subagent to verify engineering standards
Tracks task completion and proceeds to the next task

The controller does not implement. It directs, reviews, and decides.

Wave Dispatch:

When the plan annotates tasks with ‖ parallelisable markers (Section 3.1), the controller dispatches independent tasks concurrently rather than sequentially. The unit of dispatch is a wave — a set of tasks that can execute in parallel because they share no implementation dependencies, file collisions, or sequential ordering constraints.

The dispatch shape:

Wave 1: [Task 1]                                  ← single task, no parallelism
Wave 2: [Task 2 ‖ Task 3 ‖ Task 4]                ← three tasks dispatched in parallel
Wave 3: [Task 5]                                  ← waits for all of Wave 2 to complete
Wave 4: [Task 6 ‖ Task 7]                         ← two tasks in parallel
Wave 5: [Task 8]                                  ← waits for Wave 4

Each subagent in a wave receives the same context preparation (full task text, scene-setting context, constraints) but executes in its own isolated context window. The controller awaits the wave's completion before reviewing — review is the synchronization point, not dispatch.

Why waves matter beyond speedup: the sequential-only dispatch model bottlenecks on the critical path of the longest task in each cohort. For plans where 60-70% of tasks are independent, sequential execution wastes the majority of available parallel capacity. Wave dispatch preserves the discipline of the controller pattern (fresh subagent per task, review between tasks) while reclaiming that capacity. The orchestrating session's context budget is unchanged because each subagent has its own context — the controller only carries plan state and review verdicts.

Honest annotation rule. A task is ‖ parallelisable only when it is genuinely independent of every other task in its wave along two dimensions: (a) file-disjoint — no two parallel tasks touch the same source file, or git merge conflicts in the post-task commit phase derail the wave; (b) host-resource-disjoint — no two parallel tasks contend for the same hard-quota host resource (Docker VM memory, port allocations, file descriptors). For testcontainer-heavy projects the practical ceiling is roughly floor(docker_memory_GB / 1.5) — about 5 parallel suites per 8GB; 8-way needs ~12GB; 12-way needs ~18GB. Tasks depending on a shared upstream task cannot dispatch in the same wave as it. Design review is the moment to verify every ‖ annotation along both dimensions.

The host-resource dimension exists because a 5-way fan-out can OOM-kill subagents during simultaneous testcontainer spawn even when the file-disjoint test passes — five suites each spinning Postgres + Keycloak + Redis peak right at a typical 8GB Docker ceiling. Bump Docker memory to clear the immediate blocker; the rule prevents relearning the lesson at every project's first 5+ way fan-out.

Three subagent failure modes to recognise, all sharing one files-on-disk recovery pattern but with distinct causes:

OOM (host-resource exhaustion). Subagent crashes when its testcontainers can't allocate memory. Root cause: parallel fan-out × per-suite memory > Docker ceiling. Mitigation: the host-resource-disjoint dimension above; bump Docker memory.
Tail-pipe blocking (anti-pattern 7 in skill:s4u-loop-dispatch). The verification command pipes pytest through tail -N and blocks on the pipe after pytest finishes. Mitigation: capture-to-file (pytest > /tmp/out.txt 2>&1; cat /tmp/out.txt).
Stream-watchdog stall (anti-pattern 8 in skill:s4u-loop-dispatch). Subagent burns its budget in research before writing code, or stalls in verify+commit waiting on monitor notifications it can't arm; the platform watchdog kills it after ~600s of no output. Mitigation: pre-resolve file pointers and narrow scope at the orchestrator before dispatch. Common enough that it is treated as an expected-default, not an exception.

Recovery for all three is identical: the orchestrator inspects git status for files the dead subagent wrote, runs verification, and commits if green — typically 2-5 minutes versus ~20 for the original dispatch.

Parallel-wave recovery as default workflow. When dispatching parallel waves, expect the verify+commit phase to stall on a meaningful fraction of agents and bake orchestrator-finishes-verify+commit recovery into the workflow rather than treating it as fallback:

After dispatching N parallel agents, the orchestrator sleeps until each reports back. Treat "subagent stalled" as a normal branch, not an exception.
When any subagent stalls (reports "tests still running, waiting for monitor", or no commit lands within ~10 minutes after the file writes), the orchestrator runs git status for unstaged files, runs the targeted pytest invocation with capture-to-file, and commits each task separately if green.
This is the calibrated shape for parallel-wave dispatch under current platform stability, not degradation. In practice a large fraction of parallel agents stall in some way; cumulative recovery (~5 min each) is far cheaper than giving up and re-dispatching.
It only works because subagents reliably write files to disk before stalling. If they ever start losing files mid-stall, revisit this rule.

Stub-test ownership transfer. When a brief says "X will replace this stub later" and adds a stub-assertion test, the later task's brief MUST include "delete or update the stub-assertion test." Otherwise the later task replaces the production code but leaves the test asserting the old stub event — a full-suite gate failure that is annoying but mechanical. Pin the ownership transfer in the brief that ships the replacement code, not the one that shipped the stub.

Wave failures. When any subagent in a wave returns BLOCKED or NEEDS_CONTEXT, the wave does not abort — sibling subagents continue. The controller addresses the blocked task (re-dispatch with more context, escalate, or decompose) and re-sequences the remaining waves accordingly. The two-stage review for the wave's successful tasks proceeds normally; the blocked task is reviewed only after it completes via the resolution path.

Two-Stage Review:

After each task, the implementation undergoes two independent reviews:

Stage 1 (Spec Compliance): A reviewer subagent reads the design specification and the implementation, checking whether the implementation delivers what was specified. This catches missing features, scope creep, and silent divergences.
Stage 2 (Code Quality): A reviewer subagent checks the implementation for engineering quality — correct patterns, sufficient tests, proper error handling, security considerations.

The stages are ordered deliberately. Spec compliance first ensures the right thing was built before evaluating how well it was built. Reversing this order risks polishing code that solves the wrong problem.

Implementer Status Handling:

Implementer subagents report one of four statuses:

Status	Meaning	Controller Response
`DONE`	Task completed, ready for review	Proceed to spec compliance review
`DONE_WITH_CONCERNS`	Completed but flagged doubts	Read concerns, address if correctness-related, then review
`NEEDS_CONTEXT`	Missing information	Provide context and re-dispatch
`BLOCKED`	Cannot complete the task	Assess: provide more context, use a more capable model, split the task, or escalate to human

The BLOCKED status deserves emphasis. When a subagent reports BLOCKED, retrying with the same context and the same model is almost never the correct response. Something about the task needs to change — more context, a more capable model, a different decomposition, or human judgment.

Evidence: The two-stage review catches two distinct defect categories — spec compliance catches "built the wrong thing," code quality catches "built it wrong" — and both are common in AI-assisted development, so they need separate checks. Reproducible: the /subagent-driven-development skill dispatches a fresh agent per task and runs both review stages between them; the dispatch patterns and prompt templates are covered in Section 5.4.