The Full Cycle
The full development cycle applies to any non-trivial new feature or architectural change. Trivial changes (configuration updates, typo fixes, single-line adjustments with no architectural impact) may skip directly to implementation.
Phase details:
Brainstorm (/brainstorming). Open-ended problem exploration. The skill guides the conversation through problem statement articulation, constraint identification, and generation of at least 3 candidate approaches with explicit tradeoffs. The output is a conversation log, not a document. Its purpose is to prevent the common failure mode of committing to the first approach that seems reasonable.
Duration: 10-30 minutes. Skip threshold: if the implementation approach is obvious and has no meaningful alternatives (e.g., "add a new column to an existing table with an obvious type and no migration concerns"), skip to design or planning.
The Brainstorm Gate (Pre-Mortem Block). When a brainstorm proposal triggers any of the conditions below, the brainstorm phase cannot conclude — and no spec/plan/code work begins — until a Pre-Mortem Block is emitted that addresses the Decision-Cost Rubric (Section 2.7) and three reflection fields.
Trigger conditions (any one fires the gate):
- Adds a new dependency (library, framework, external service)
- Replicates a pattern across 3 or more files / call sites (the pydantic-ai migration hit 8)
- Estimated to change hot-path latency by more than 100 ms (either direction)
- Modifies a public API surface, schema, or data contract
- Spans more than 2 hours of estimated implementation work
- Changes safety policy, guard/refusal behavior, or any end-user-protection
mechanism (relaxations AND additions). This is the hardest trigger: the gate
cannot be cleared by the proposer alone — the block must end with an explicit
human sign-off line (
Safety sign-off: <name, date>), and the Pre-Mortem must name the incident class the change could re-open.
Pre-Mortem Block format (exact shape — the structure is the gate):
## Pre-Mortem — <proposal name>
Proposal in one sentence: ...
Triggers fired: <which of the six conditions above>
Rubric axes:
- Latency: <estimate, or "not measured because…">
- Dependency surface: <new deps + transitive deps + lines we own vs. depend on>
- Debuggability: <what a 3am stack trace looks like; who can fix it>
- Reversibility: <hours to undo>
- Blast radius: <code paths affected; additive vs. substitutive>
- Alternative considered: <one credible alternative + one-sentence "why rejected">
Strongest risk I see: <specific, named-component, falsifiable>
What would change my mind: <concrete signal — measurement, benchmark, user report>
Confidence: <low / medium / high, with reason>
The two non-skippable lines are Strongest risk I see and What would change my mind. If those fields read "no significant risks" and "nothing comes to mind," the gate has not been cleared — the brainstorm continues until they can be filled with specifics. Generic risks (complexity, maintenance burden) do not satisfy the format; the field demands a specific failure mode tied to a named system component. For safety-policy triggers a third line is non-skippable: Safety sign-off: <name, date> — without it the gate is not cleared, regardless of who proposed the change.
Why this lives at brainstorm specifically. The cost of pivoting compounds non-linearly with phase progression. At brainstorm a pivot costs a sentence. At spec it costs the spec. At plan it costs the plan plus stakeholder re-alignment. At code it costs the code. By the time the rubric fires at commit-message phase, the work is done and the only available action is revert. Firing the rubric at brainstorm changes the question from "did we catch the bad decision" to "did we make the right decision in the first place" — which is where the leverage lives.
Why this is harder than commit-phase enforcement. At brainstorm the proposer (often the human collaborator) is most engaged and most invested in the proposal. In AI-human collaboration, this is also where the AI's training-time gradients toward agreement peak. The visible-artifact design is what compensates: if the AI drifts past the gate without emitting the Pre-Mortem Block, the absence is detectable to the human in real time. The norm is enforceable specifically because the artifact is missing, not because the AI remembered to push back.
The case study (pydantic-ai, May 2026) that demonstrates the cost of not having this gate lives in Section 2.7's Evidence paragraph. That case study is reproduced as a polished narrative in the ZOL project's Docusaurus under methodology/decision-cost-rubric.md — the showcase used to argue for adoption of this rubric beyond the original project.
Design (/writing-plans). One artifact replaces the former specification + plan pair (template: templates/spec-template.md in the methodology repo). It captures the reasoning AND the ordered task list: audience, problem statement, design decisions table (decision / choice / rationale), the Pre-Mortem Block when the Section 3.1 gate fires, data flow, API contracts, error handling strategy, testing strategy, and the implementation tasks. The 2026-06-11 assessment measured the old dual artifact producing 3.85MB of process prose with specs written 7-30 minutes before their code by the same session (finding PW-4) — ceremony, not scrutiny. Trivial changes (single file, no schema/API/safety surface) need only a one-sentence design note in the PR description.
Each task in the artifact specifies: task number and title; dependencies; acceptance criteria; files likely to be created or modified. Tasks are sized for a single subagent session (~60 minutes max — split anything larger). Annotate ‖ parallelisable only when the task is genuinely independent (Section 5.4's wave-dispatch consumes the marker).
Duration: 30-60 minutes. The artifact is committed to docs/ before any implementation begins.
Second-party scrutiny threshold. Designs touching a safety path, schema, public API, or auth get review by a second party — a human, or a fresh-context agent in a SEPARATE session that has not seen the design being written. Same-session self-review does not count: the assessment found its catch-rate indistinguishable from zero, and "Draft (awaiting user review)" designs merging unreviewed (PW-4). The Plan Walkthrough ritual this threshold replaces had zero recorded completions in three months — its founding instance was still pending after 7 weeks; review-by-fresh-context is the form of scrutiny that survives contact with throughput pressure.
Create Worktree (/using-git-worktrees). Creates a git worktree for the feature branch. Worktrees provide filesystem isolation: the feature branch exists in a separate directory, preventing accidental commits to the wrong branch. The skill handles branch creation, worktree setup, and context transfer (copying relevant memory files to the worktree's context).
Implement with TDD (/test-driven-development). The implementation phase uses test-driven development as the default execution model. The skill enforces the red-green-refactor cycle:
- Write a failing test that describes the desired behavior.
- Write the minimal implementation that makes the test pass.
- Refactor the implementation while keeping tests green.
The skill includes an anti-rationalization table — a structured check that prevents the developer or AI from rationalizing why a test failure is "expected" or "not a real problem." Every test failure must either be fixed or explicitly documented as a known limitation.
In PoC mode, the cycle is relaxed: code-first, tests-after is acceptable. The tests are still mandatory; only the ordering is flexible.
Verify Completion (/verification-before-completion). Before marking any task complete, this skill requires evidence:
- Fresh test output with timestamps
- Coverage report showing threshold compliance
- Linter output with zero new warnings
- For UI changes: visual verification
This is the skill that implements the "evidence over claims" principle (Section 2.3) at the task level.
Code Review (/requesting-code-review). Two-stage review process (detailed in Section 3.3). First stage: spec compliance (does the implementation match the design?). Second stage: code quality (are patterns correct, tests sufficient, edge cases handled?).
Finish & Merge (/finishing-a-development-branch). Merges the feature branch, cleans up the worktree, and updates any memory files that should reflect the new state of the system. The skill ensures that documentation was updated alongside code (documentation commit pattern) and that the merge is clean.