Skip to main content

Note (canon v3): the operational content of this appendix is carried by the s4u-product-scale-planning skill — agents load THAT; this appendix remains the full reference.

Appendix L: Product-Scale Planning + Cross-Consistency Review

This appendix expands Section 5.6 of the main methodology document with the concrete process for product-scale planning, the cross-consistency review checklist, and the bounded-autonomy checkpoint shape. The three load-bearing rules — plan-at-the-right-resolution-per-distance, cross-consistency-as-discrete-phase, bounded-autonomy-checkpoints — live in §5.6 and are the canonical reference. This appendix is where the process earns its keep through specifics.

Why this is in its own appendix

Product-scale planning is conceptually a small extension to per-milestone planning (§3.1) but operationally a meaningful shift: it moves the human from "between every milestone" to "between every milestone-cluster." The process steps and review checklist need to be specific enough that anyone running the methodology can execute them mechanically — not "design and review your plans," but "apply this checklist to these inputs and produce this output."

The Process: From Empty Project to /loop-Ready

The product-scale planning pass produces six artifacts, in order:

  1. Product brief / PRD — single document. The "what are we building" anchor. Already part of the methodology (§3.1's design-before-code rule).
  2. ADR set — every architectural decision the product needs. Includes mandatory ADR-0001 (tech stack, references the canonical stack from §4.5 + appendix-m), plus per-product-area ADRs (data model, multi-tenancy, payment routing, etc.). Each ADR is independently reviewable; the cross-consistency review pass below verifies they cohere.
  3. Milestone plan set — one plan document per milestone, drafted at the resolution-per-distance described in §5.6 rule 1. The current milestone gets a fully-walked-through plan; near-future milestones get wave structure + acceptance criteria; far-future milestones get sketches.
  4. Roadmap — orders the milestones, lists their dependencies on each other, marks the bounded-autonomy checkpoint between each milestone pair.
  5. Cross-consistency review report — the output of running the review checklist below against artifacts 1-4. Either "all clear, dispatch authorised" or "N issues, fix and re-review."
  6. STATE.md — initialised with the project's starting position. Updated per-milestone-shipped during /loop execution.

Each artifact is independently authored (or reviewed) by the human; /loop only kicks off after all six exist and the cross-consistency review report says "all clear."

The Cross-Consistency Review Checklist

Run this checklist as a discrete phase — not ambient verification during execution. Output a written report (Markdown, committed to docs/reviews/cross-consistency-{date}.md) listing every check performed and its result. The report's existence is the gate for /loop dispatch.

Check A — ADR↔plan↔baseline-migration cross-reference integrity (mechanical)

Extended in v2.1.2 (2026-04-30) to include the schema-name leg. Extended in v2.1.3 (2026-05-05) to include column nullability, frontend filesystem prerequisites, and schema/enum constraints on test-data shapes. The original ADR↔ADR scope missed Ratiba M6's four schema-vs-plan divergences and Ratiba M9's three plan-vs-reality discoveries at dispatch time; the extended grep would have caught all of them at plan-freeze.

Inputs: every accepted ADR in docs/adr/, every milestone plan in docs/superpowers/plans/, every existing migration in alembic/versions/, the frontend scaffold filesystem (if applicable), and the schema/enum definitions of any test framework that plans claim to populate.

ADR cross-references:

  • A.1: List every other ADR referenced in any ADR body (e.g., "per ADR-0003"). Verify each referenced ADR exists in the same docs/adr/ directory.
  • A.2: List every "deferred to ADR-NNNN" or "to be decided in ADR-NNNN" marker. Verify the target ADR exists and is in Accepted status. A deferred decision pointing at a Draft ADR is a yellow flag (acceptable if the draft is being held for legitimate reasons, e.g., privacy ADR awaiting GDPR review); a deferred decision pointing at a non-existent ADR is a red flag.
  • A.3: Identify "supersedes" / "superseded by" pairs. Verify both sides reference each other and the superseded one is in Superseded status.

Schema-name cross-references (added v2.1.2):

  • A.4: For each plan, extract every fully-qualified schema name it references — <schema>.<table>.<column> patterns (e.g., public.tenants.payment_enabled, <tenant>.payments.status, payment_routing.expires_at). Build a "claimed schema names" set per plan.
  • A.5: For each claimed schema name, check whether it exists in any baseline migration under alembic/versions/or whether the same plan contains an explicit migration task that creates it. A name claimed by plan N that is neither in an existing migration nor produced by a migration task within plan N is a red flag.
  • A.6: For each migration filename pinned in any plan (0004_X.py, 0005_Y.py), check whether that exact filename already exists. A pinned filename collision is a red flag — plans should refer to migrations by purpose, never by filename (see also Section 5.6 rule 2(d)).

Column nullability + filesystem + test-data shape cross-references (added v2.1.3):

  • A.7: For each schema column referenced in plan logic with a nullability assumption (e.g., "filter WHERE phone_number IS NOT NULL", "INSERT skipping the optional phone_number column"), verify the column's actual NULL/NOT NULL shape against the migration. Codified after Ratiba M9 T3 — plan brief assumed tenant_admins.phone_number IS NOT NULL filter was meaningful, but the column is NOT NULL (so the filter is a no-op; subagent had to use is_active=TRUE as proxy).
  • A.8: For each plan-claimed file path under a directory that requires scaffolding (e.g., frontend/src/... requires create-next-app; docs/site/... requires Docusaurus init; infra/k8s/... requires Helm chart), verify the parent scaffold exists. If frontend/ is empty except for .env.example, a plan listing frontend/src/app/admin/chat/page.tsx is making a scaffold assumption that won't survive dispatch. Codified after Ratiba M9 T6 — plan listed Next.js paths as if scaffolded; reality forced a T6 split into T6a (bootstrap) + T6b (chat panel).
  • A.9: For each plan-claimed test scenario or fixture, verify it fits the schema/enum constraints of the test framework's data shape. If a plan adds "admin command scenarios" to a calibration set whose schema is Literal["customer", "agent"] for the role field, the scenarios literally can't be authored against that schema. Codified after Ratiba M9 T10 — plan listed admin command scenarios for the customer-facing calibration set; required scope correction at dispatch time.

Tooling: a Python script that parses each plan markdown for backtick-quoted schema names, migration filenames, file paths, and test-data shapes; cross-references against ls alembic/versions/, grep -r "CREATE TABLE\|ADD COLUMN" alembic/versions/, the project's filesystem scaffold (parent-dir-exists checks for plan-claimed paths), and the test framework's Pydantic / TypedDict / Literal schemas. The script's output: per-plan list of unsatisfied references across all 9 sub-checks.

Check B — Plan capability handoffs (mechanical)

Inputs: every milestone plan in docs/superpowers/plans/.

For each plan:

  • B.1: Extract the "Files to create / modify" section. The set of new files a plan creates is its produced capability surface.
  • B.2: For each subsequent plan, scan its task descriptions and code snippets for imports of any path. If plan N+k imports a path that no earlier plan (M0..N+k-1) produces, flag it.
  • B.3: Special case: forward references to "TBD in milestone X" or "deferred to plan Y" are tracked. Verify each forward-reference target plan exists.

Tooling: a Python script that parses the plan markdown (Files to create is a stable section header), builds a "produced capabilities" map, then for each plan does grep -E '^from app\.|^import app\.' and checks each import against the produced-by-prior-plans map. Output: a CSV of unsatisfied imports.

Check C — Decision contradictions across plans (judgment)

Inputs: every milestone plan + every accepted ADR.

This check is judgment-call. The pattern that catches contradictions:

  • C.1: Read every plan's "Review-resolved decisions" sections (if the design received second-party review per §3.1) and every ADR's "Decision" section. List the load-bearing decisions per plan/ADR.
  • C.2: Look for the same property decided differently in two places. Examples from real Ratiba history: ADR-0001 said "360dialog BSP" → ADR-0008 said "Cloud API direct" (resolved by superseding); plan v2 said "soft-fail on cost ceiling" → plan v3 said "M5 ships SOFT-only, hard deferred" (consistent — same call expressed two ways). The latter is fine; the former requires either the supersedes pair OR a contradiction flag.
  • C.3: Look for properties that should logically agree but don't. Examples: plan N says "use Redis SETNX with 30s TTL" + plan N+1 says "Redis SETNX with 60s TTL" — pick one or document why they differ.

Tooling: this check is human-judgment; the output of the report should list every load-bearing decision found and pair contradictions explicitly. There's no script that catches "the schema field name was tenant_id in plan 3 and tenantId in plan 5" as a meaningful contradiction without semantic understanding.

Check D — Canonical stack drift (mechanical)

Inputs: project's ADR-0001 + the canonical stack from §4.5 + appendix-m.

For each library in the canonical stack:

  • D.1: If mandatory and the project's ADR-0001 doesn't include it, that's a flag — the project must explicitly accept or override the mandatory item.
  • D.2: If default and the project picks something else, ADR-0001 must have a "deviation rationale" section for that swap.
  • D.3: If forbidden and any project plan references the forbidden library, that's a hard fail.

Tooling: 50-line shell script that diffs the project's ADR-0001 stack section against the canonical-stack list.

Check E — Roadmap dependency consistency (mechanical)

Inputs: ROADMAP.md + the milestone plan set.

  • E.1: Roadmap milestone order matches plan dependency order. If plan M5 imports from plan M3, the roadmap must list M3 before M5.
  • E.2: Bounded-autonomy checkpoints are explicitly marked between milestones in the roadmap (e.g., a horizontal rule + "Checkpoint: review M3 discoveries before M4 dispatch").

The Bounded-Autonomy Checkpoint Shape

At each checkpoint (between milestones), /loop pauses and produces a structured prompt for the human:

# Checkpoint: M{N} → M{N+1}

## What landed in M{N}
- {commit list with one-line summaries}
- {test count delta}
- {coverage delta}

## Surprises pinned during M{N}
- {list of surprises that went into the close-out memo}

## What M{N+1} plan assumes
- {the load-bearing assumptions M{N+1} makes about the state at this point}

## Question
Did anything in M{N} invalidate the assumptions M{N+1} is making?

## Options
- **Proceed** — M{N+1} plan is still valid; dispatch Wave 1 of M{N+1}
- **Amend** — M{N+1} plan needs revision; pause loop, re-review M{N+1}
- **Halt** — fundamental discovery; suspend product-scale execution, return to ad-hoc planning

Typical answer: "proceed" (5 minutes). Sometimes: "amend" (15-30 minutes of plan revision before re-dispatch). Rarely: "halt" (the discovery is fundamental enough to break product-scale assumptions; revert to per-milestone planning until the discovery is reconciled).

Worked Example: Ratiba M5 → M6 (planned)

This is a planning-time worked example, not a post-execution one. Ratiba's M5 plan v3 is currently dispatching (Wave 1 in flight). M6 is "M-Pesa STK push integration." Product-scale planning would draft the M6 plan now (not after M5 closes) and run cross-consistency review.

M6 plan sketch (resolution per §5.6 rule 1, since M6 is current+1):

  • Wave structure: 5-6 tasks across 2-3 waves.
  • Acceptance criteria: per ADR-0007 (payments orchestration). PesaPal exclusively for cards, M-Pesa direct via Daraja, 8min/30min nudge/abandon timing, daily 3am EAT consolidated reaper, dead-letter table.
  • Imports from M5: app.orchestrator.booking_graph, app.persistence.tenant_scoped_saver, app.llm.router. All produced by M5 plans.
  • Imports from M3: app.tenancy.context.current_tenant. Produced.

Cross-consistency review against M3+M4+M5 (Check B):

  • ✅ Every import M6 needs is produced by an earlier milestone.
  • ⚠️ M6 references app.orchestrator.booking_graph.PAYMENT_PENDING state — M5 plan v3 lists BookingState.fsm_state enum but doesn't enumerate PAYMENT_PENDING. Either M5 plan v3 needs to add it (and we're discovering a missing field at planning time, not at execution time) OR M6 needs to explicitly add the state via an FSM amendment task. Flag this explicitly.

Bounded-autonomy checkpoint between M5 and M6:

  • "Did M5's FSM state model include payment-related states?" — if yes, proceed; if no, amend M6 plan to start with a state-extension task.

This is the kind of cross-cutting concern the front-loaded planning surfaces before /loop dispatches anything, rather than after M5 closes and M6 hits a missing state at task 1.

Summary

Product-scale planning is the discipline of producing all six artifacts — PRD, ADR set, milestone plan set, roadmap, cross-consistency review, STATE.md — before any subagent dispatches Wave 1 of milestone 1. The cost is 1-2 days of upfront planning effort; the benefit is multi-day autonomous /loop execution with 5-minute checkpoints between milestones. The cross-consistency review checklist (A-E) is the gate for /loop authorisation; the bounded-autonomy checkpoint shape is the human contract that surfaces discovery-during-execution at the right boundaries.

This is not "no humans in the loop." It is "humans at the boundaries that matter, machines through the territory between."