Product-Scale Planning + Cross-Consistency Review
Added in methodology v2.1 (2026-04-28). Amended in v2.1.2 (2026-04-30) — Check (a) extended from ADR↔ADR to ADR↔plan↔baseline-migration cross-grep; Check (d) forbids migration filename pinning; Rule 4 added (schema-invariant tests pin behavior, not raw counts). Amended in v2.1.3 (2026-05-05) — Checks (e) column nullability, (f) frontend filesystem prerequisites, (g) schema/enum constraints on test-data shapes added. v2.1.3 amendments emerged from Ratiba M9. The /loop pattern (Section 5.5) gave us autonomous execution within a milestone. This section gives us autonomous execution across milestones — multi-plan upfront drafting, cross-consistency review as a discrete phase, then /loop dispatches the whole sequence with explicit human checkpoints between milestones.
What: Product-scale planning is the practice of front-loading the architectural decisions and implementation plans for every known milestone of a product before /loop begins dispatching the first one. Instead of designing M3, building M3, then designing M4, then building M4, the project produces all M3-Mn plans (at decreasing levels of detail per how far ahead they sit) plus the ADR set that gates them, then runs a cross-consistency review pass before any subagent dispatches Wave 1 of M3. Once cross-consistency review passes, /loop autonomously runs M3 → M4 → M5 → ... with bounded human checkpoints (typically a 5-minute review) between milestones.
Why: Plan-build-plan-build is wasteful in two distinct ways. First, the per-milestone planning cycle takes 30-90 minutes of high-judgment human attention each time (brainstorm → design draft → review → revision). Across an n-milestone product, that's n × 60 minutes of context-switching. Second, cross-cutting concerns surface late — a decision in M5 reveals that M3's schema needs a field, which requires an ADR amendment + an M3 follow-up commit + memory of why. Front-loading surfaces those cross-cutting concerns at planning time, when amendments are cheap edits rather than data migrations.
The compounding benefit is that /loop becomes meaningful at product scale. With per-milestone planning the human is on the critical path between milestones; the loop terminates at the milestone boundary. With product-scale planning the loop persists across boundaries, only halting at the explicit checkpoints. A two-week product becomes a multi-day autonomous run with an hour of upfront planning effort and 5-minute review checkpoints.
Three load-bearing rules:
-
Plan at the right resolution per milestone distance. M{current} gets a fully reviewed design with task-level deliverables and
‖ parallelisableannotations. M{current+1} gets a plan with wave structure + task names + acceptance criteria but task internals deferred until design-review time. M{current+2..n} get a plan sketch — milestone goal, expected dependencies on prior milestones, expected ADRs gating it, expected wave count. Detail decreases with distance because the cost of churn on far-future plans is high (every M{current} discovery may force a far-future plan rewrite). Trying to fully plan M{current+5} at the same fidelity as M{current} is the over-spec failure mode this rule exists to prevent. -
Cross-consistency review is a discrete phase, not an ambient property. After all plans are drafted, before any subagent dispatches the first task, run a single review pass with seven checks: (a) ADR↔plan↔baseline-migration cross-references (extended in v2.1.2): every ADR referenced from another ADR or plan must exist; every "deferred to ADR-NNNN" marker must point at a real, accepted ADR; and every schema name (table, column, constraint) mentioned in any plan must either appear in the existing baseline migrations or in an explicit migration task within the same plan. The third leg of the cross-grep was added because Ratiba M6 hit four schema-vs-plan divergences at dispatch time. All four would have surfaced during cross-consistency review with a mechanical schema-name grep. (b) Plan↔plan capability handoffs: if plan N+1 imports
app.frob.bar, plan N (or earlier) must deliverapp.frob.bar. List every cross-plan import; verify the producer plan exists. (c) No contradictory decisions across plans: plan N says X is JSONB, plan M says X is a foreign key — pick one, amend the other. (d) Migration filename pinning is forbidden (v2.1.2): plans refer to migrations by purpose ("the payment_callbacks_unrouted migration"), never by pinned filename. Filename collisions are common when an earlier task ships a different filename than the plan anticipated. (e) Column nullability cross-grep (v2.1.3, added 2026-05-05): for each schema column referenced in plan logic (e.g., "filterWHERE phone_number IS NOT NULL"), verify the column's actualNULL/NOT NULLshape against the migration. Codified after Ratiba M9 T3 — plan brief said "filter ontenant_admins.phone_number IS NOT NULL" but the column isNOT NULL. (f) Frontend filesystem prerequisites (v2.1.3): for every plan-claimed frontend file path (e.g.,frontend/src/app/admin/chat/page.tsx), verify the parent scaffold exists. Codified after Ratiba M9 T6 — plan listed the file as if Next.js was scaffolded; reality was an emptyfrontend/directory. Extends to all "scaffold-assumed" file paths beyond frontend (e.g.,docs/site/...for Docusaurus,infra/...for IaC). (g) Schema/enum constraints on test-data shapes (v2.1.3): for every plan-claimed test scenario or fixture, verify it fits the schema/enum constraints of the test framework's data shape. Codified after Ratiba M9 T10 — plan listed admin command scenarios that literally couldn't fitTurnSpec.role: Literal["customer", "agent"]. Both new checks are mechanical greps. The review's output is either "all clear, dispatch" or "N issues — fix and re-review." Mechanical for (a), (b), (d), (e), (f), (g); judgment-call for (c) but the review forces the judgment to happen before execution rather than during. -
Bounded autonomy: explicit checkpoints between milestones. Autonomy doesn't mean "no humans"; it means "no humans for known territory." Between milestone N close and milestone N+1 first dispatch, /loop pauses for an explicit human checkpoint with one question: "did we discover anything in M{N} that invalidates the plan for M{N+1}?" Typical answer: no, proceed (5 minutes). Sometimes: yes, plan N+1 needs an amendment (15-30 minutes of plan revision). Either way the checkpoint surfaces the discovery-during-execution risk that purely-autonomous systems suppress. Without it, an M3 discovery silently invalidates an M5 plan that runs to completion before anyone notices.
-
Schema-invariant tests pin behavior, not raw counts (v2.1.2). Tests that pin a single source of truth — expected column sets, scenario counts, downgrade walks — are valuable but brittle when written as raw integers. A column add to
public.tenantsshould not require updating three unrelated test files. Two specific prescriptions: (a) Downgrade walks use explicit revision id, never-Ncount:alembic downgrade ${m3_baseline_rev}survives every future migration;alembic downgrade -3breaks on every additive migration. (b) Count assertions derive from behavior, not raw integers: preferassert all(s.final.state for s in scenarios)("every scenario has a non-empty final state") overassert len(scenarios) == 10("exactly ten scenarios, fail loudly on every addition"). The behavior-based form catches genuine drift (a malformed scenario) while tolerating intentional additions. Codified after Ratiba M6 T11 added one column and broke three invariant tests — the fixes were trivial but the breakage was noise that the prescription would have suppressed.
Evidence: Ratiba.chat is the first S4U project to attempt product-scale planning (planned for the M5+M6 transition; M3 and M4 used per-milestone planning). The cross-consistency review machinery is mechanically tractable: ADR cross-references can be checked by grep + a list of accepted ADR numbers; plan capability handoffs can be checked by parsing each plan's "Files to create" section against later plans' import statements. The bounded-autonomy checkpoint is the explicit acknowledgment that real engineering reveals decisions plans cannot anticipate — the methodology's commitment to the discovery-during-execution model from Section 3 (development lifecycle). Detailed process + checklist + worked example in appendix-l-product-scale-planning.md.
Product-scale planning interacts with prior amendments:
- Design review (Section 3.1) happens at design draft time for every milestone in the product-scale set, not just the next one.
- Wave dispatch (Section 5.4) operates within each milestone's plan unchanged.
- /loop (Section 5.5) sequences across milestones rather than terminating at milestone boundaries; the bounded-autonomy checkpoints become the explicit pause points in the loop.
- STATE.md (Appendix J) tracks "current milestone + last-shipped within milestone" identically to per-milestone planning; the checkpoint moments are the obvious update boundaries.