Skip to main content

Subagent-Driven Development

What: Subagent-driven development is the default execution model for implementation tasks. Rather than executing all tasks in a single long-running session, the methodology dispatches a fresh subagent for each task in the implementation plan. The orchestrating session reviews output between tasks using a two-stage review process (spec compliance, then code quality).

Why: Long-running AI sessions degrade in quality. This is not speculation — it is a structural property of context window management. A session that has implemented 5 tasks carries the context of all 5 implementations: file contents read, error messages encountered, debugging outputs, intermediate states. By the sixth task, the AI is making decisions influenced by irrelevant context from earlier tasks — a variable name from task 2 leaks into task 6, an error handling pattern from task 3 is applied where it does not belong, a debugging workaround from task 4 is treated as the correct approach.

Fresh subagents solve this by starting each task with only the relevant context: the design specification, the implementation plan, the specific task description, and the project's CLAUDE.md and memory files. The orchestrating session retains the big picture (which tasks are complete, what issues were found, how the overall implementation is progressing) while the implementing subagents focus on the immediate task without distraction.

The Controller Pattern:

The orchestrating session serves as a controller that:

  1. Reads the implementation plan and extracts all tasks with their full text
  2. For each task, dispatches an implementer subagent with the task description, relevant context (file paths, design spec section), and any constraints from earlier tasks
  3. Receives the implementer's result and status
  4. Dispatches a spec reviewer subagent to verify the implementation matches the design
  5. Dispatches a code quality reviewer subagent to verify engineering standards
  6. Tracks task completion and proceeds to the next task

The controller does not implement. It directs, reviews, and decides.

Wave Dispatch:

When the plan annotates tasks with ‖ parallelisable markers (Section 3.1), the controller dispatches independent tasks concurrently rather than sequentially. The unit of dispatch is a wave — a set of tasks that can execute in parallel because they share no implementation dependencies, file collisions, or sequential ordering constraints.

The dispatch shape:

Wave 1: [Task 1] ← single task, no parallelism
Wave 2: [Task 2 ‖ Task 3 ‖ Task 4] ← three tasks dispatched in parallel
Wave 3: [Task 5] ← waits for all of Wave 2 to complete
Wave 4: [Task 6 ‖ Task 7] ← two tasks in parallel
Wave 5: [Task 8] ← waits for Wave 4

Each subagent in a wave receives the same context preparation (full task text, scene-setting context, constraints) but executes in its own isolated context window. The controller awaits the wave's completion before reviewing — review is the synchronization point, not dispatch.

Why waves matter beyond speedup: the sequential-only dispatch model bottlenecks on the critical path of the longest task in each cohort. For plans where 60-70% of tasks are independent, sequential execution wastes the majority of available parallel capacity. Wave dispatch preserves the discipline of the controller pattern (fresh subagent per task, review between tasks) while reclaiming that capacity. The orchestrating session's context budget is unchanged because each subagent has its own context — the controller only carries plan state and review verdicts.

Honest annotation rule. A task is ‖ parallelisable only when it is genuinely independent of every other task in the same wave along two dimensions: (a) file-disjoint — no two parallel tasks touch the same source file, since git merge conflicts during the post-task commit phase will derail the wave; (b) host-resource-disjoint (added in v2.1.1, 2026-04-28) — no two parallel tasks contend for the same hard-quota host resource (Docker VM memory, port allocations, file descriptor limits). For testcontainer-heavy projects, the practical parallelism ceiling is roughly floor(docker_memory_GB / 1.5) — about 5 parallel suites per 8GB of Docker memory; 8-way parallel needs ~12GB; 12-way parallel needs ~18GB. Tasks that depend on a shared upstream task complete cannot dispatch in the same wave as that upstream task. Design review (Section 3.1 second-party scrutiny where it applies, otherwise a self-check at design time) is the right moment to verify every annotation along both dimensions.

The host-resource dimension was added because Ratiba's M5 Wave 2 5-way fan-out OOM-killed three subagents during simultaneous testcontainer spawn even though the file-disjoint test passed — Postgres + Keycloak + Redis × 5 parallel suites = ~7.5GB peak, exactly at the host's 7.57GB Docker ceiling. Bumping Docker memory removes the immediate blocker; codifying the dimension in the rule prevents the same lesson from being relearned at every project's first 5+ way fan-out.

Three subagent failure modes to recognise when dispatching, all sharing the same files-on-disk recovery pattern but with distinct causes:

  1. OOM (host-resource exhaustion). Subagent crashes when its testcontainers can't allocate memory. Root cause: parallel fan-out × per-suite memory > Docker ceiling. Mitigation: the host-resource-disjoint dimension above; bump Docker memory.
  2. Tail-pipe blocking (anti-pattern 7 in appendix-k-loop-dispatch.md). Subagent's verification command pipes pytest through tail -N, blocking on the pipe even after pytest finishes. Mitigation: capture-to-file pattern (pytest > /tmp/out.txt 2>&1; cat /tmp/out.txt).
  3. Stream-watchdog stall (anti-pattern 8 in appendix-k-loop-dispatch.md, added v2.1.2). Subagent burns its execution budget in research before writing code OR stalls during the verify+commit phase waiting on monitor notifications it can't arm; platform watchdog kills it after ~600s of no output. Mitigation: pre-resolve file pointers + narrow scope at the orchestrator before dispatch. Codified after Ratiba M6 surfaced 5 stall instances; upgraded to expected-default in v2.1.3 after Ratiba M9 hit 11 instances in a single milestone.

Recovery for all three is identical: orchestrator inspects git status for files written by the dead subagent, runs verification + commits if green; recovery cost is typically 2-5 minutes versus the ~20 minutes of the original dispatch.

Parallel-wave recovery as default workflow (v2.1.3, added 2026-05-05). When dispatching parallel waves of subagents, expect the verify+commit phase to stall on a meaningful fraction of the agents and bake the orchestrator-finishes-verify+commit recovery into the workflow rather than treating it as fallback. Concretely:

  • After dispatching N parallel agents, the orchestrator sleeps until each subagent reports back. Don't treat "subagent stalled" as exceptional — handle it as a normal branch.
  • When any subagent stalls (e.g., reports "tests still running, waiting for monitor" or no commit lands within ~10 minutes after the file writes), the orchestrator runs git status to identify unstaged files, runs the targeted pytest invocation against them with the capture-to-file pattern, and commits each task separately if green.
  • This isn't workflow degradation — it's the calibrated workflow shape for parallel-wave dispatch with current platform stability. Ratiba session evidence: 14+ stalls across M6+M7+M9 (~70% of parallel agents stalled in some way). Cumulative recovery overhead (~5 min × 14) was ~70 minutes — far less than the dispatch cost would have been if the orchestrator had instead given up and re-dispatched.
  • The pattern only works because subagents reliably write files to disk before stalling. If they ever start losing files mid-stall, this rule needs revisiting.

Stub-test ownership transfer (v2.1.3). When a task brief mentions "X will replace this stub later" and the task adds a stub-assertion test, the later task's brief MUST include "delete or update the stub-assertion test" in its scope. Otherwise the later task replaces the production code but leaves the test asserting the old stub event — surfaces as a full-suite gate failure that's annoying but mechanical to fix. Codified after Ratiba M9 T1's admin.ws_message_received_stub test failed at the Wave 2 full-suite gate when M9 T4 replaced the stub with the real AdminMessageRouter. Pin the test ownership transfer in the brief that ships the replacement code, not the brief that shipped the stub.

Wave failures. When any subagent in a wave returns BLOCKED or NEEDS_CONTEXT, the wave does not abort — sibling subagents continue. The controller addresses the blocked task (re-dispatch with more context, escalate, or decompose) and re-sequences the remaining waves accordingly. The two-stage review for the wave's successful tasks proceeds normally; the blocked task is reviewed only after it completes via the resolution path.

Two-Stage Review:

After each task, the implementation undergoes two independent reviews:

  • Stage 1 (Spec Compliance): A reviewer subagent reads the design specification and the implementation, checking whether the implementation delivers what was specified. This catches missing features, scope creep, and silent divergences.

  • Stage 2 (Code Quality): A reviewer subagent checks the implementation for engineering quality — correct patterns, sufficient tests, proper error handling, security considerations.

The stages are ordered deliberately. Spec compliance first ensures the right thing was built before evaluating how well it was built. Reversing this order risks polishing code that solves the wrong problem.

Implementer Status Handling:

Implementer subagents report one of four statuses:

StatusMeaningController Response
DONETask completed, ready for reviewProceed to spec compliance review
DONE_WITH_CONCERNSCompleted but flagged doubtsRead concerns, address if correctness-related, then review
NEEDS_CONTEXTMissing informationProvide context and re-dispatch
BLOCKEDCannot complete the taskAssess: provide more context, use a more capable model, split the task, or escalate to human

The BLOCKED status deserves emphasis. When a subagent reports BLOCKED, retrying with the same context and the same model is almost never the correct response. Something about the task needs to change — more context, a more capable model, a different decomposition, or human judgment.

Evidence: Subagent-driven development is the default execution model specified in the Trust Relay memory files ("always use subagent-driven development — never ask, never offer choices, just proceed with it"). The model's effectiveness is demonstrated by the 22 commits/day velocity on active days — a rate that would not be sustainable in long-running sessions where each subsequent task degrades in quality as context accumulates. The two-stage review pattern catches different categories of defects: spec compliance catches "built the wrong thing" while code quality catches "built it wrong." Both categories are common in AI-assisted development and require separate checks. See appendix-c-agents.md for the subagent dispatch patterns and prompt templates.