Skip to main content

Rule Inventory — methodology.md v3 Traceability Audit

Method: full read of docs/methodology.md (1,819 lines); extracted every MUST / NEVER / forbidden / mandatory / non-negotiable / gate plus imperatively-stated rules in the lifecycle, quality-gate, testing, memory, instruction-hierarchy, and tool sections; de-duplicated across sections (multi-section rules list all sources). Disposition vocabulary: card (operating card), skill:<name>, kit:<file>, DELETE: <reason>.

#rule (one line, faithful)source §disposition
1Every non-trivial change follows the mandatory pipeline brainstorm → design → execution; "No code is written until a design exists."§2.1, §3.1card
2Pipeline skip threshold is judgment-based: "if you can fully hold the change and all its implications in your head, skip the pipeline."§2.1, §3.1skill:s4u-lifecycle
3"No implementation is considered complete until fresh verification output... confirms the claim"; "it should work now" = "I have not verified it."§2.3card
4Completion evidence set: fresh test output (recent timestamps), coverage report at threshold, linter output with zero new warnings, visual verification for UI changes.§2.3, §3.1, §8skill:s4u-lifecycle
5All schema changes via Alembic migrations — "never a manual ALTER TABLE."§2.4, §4.2card
6"A feature is not considered complete until its documentation exists" — documentation is a deliverable on the same tier as code and tests.§2.5, §11card
7Documentation commit pattern: branches changing architecture ship code + ADR (if a decision was made) + doc page update + doc build verification; the reviewer checks for it or confirms none was needed.§2.5, §11.4skill:s4u-doc-excellence
8Architecture-as-code: structured frontmatter (components, tests, data_flow, depends_on, last_verified, status); auto-generated architecture index that agents read before modifying code; tests: mapping so the Stop hook verifies the RIGHT tests ran.§2.5 ext.skill:s4u-doc-excellence
9Doc-staleness sync check "blocks commits when documentation is stale (last_verified > 30 days for modified files)."§2.5 ext.skill:s4u-doc-excellence
10Documentation-first: brainstorming creates a placeholder page (status: planned) before any code exists; the page tracks planned → in-progress → implemented.§2.5 ext.skill:s4u-doc-excellence
11"A failing test is REAL until proven stale" — default-investigate the prod code change in the touched area before classifying as stale, transient, or unrelated.§2.6, §3.2card
12Stale-test procedure: compare git log -- <prod-file> vs git log -- <test-file>, run the prod code path; declare stale only if the prod change was intentional AND the test was missed in that commit — itself a debt entry, not a no-op closure.§2.6skill:s4u-lifecycle
13Defensive-guard comments distinguish Verified <date> via <method>: <cause> from Hypothesized <date>: <cause>; update the comment when the actual cause is later identified.§2.6card
14Convergent design intuition is a quality signal — when human and AI independently reach the same design, document the convergence in the commit message or design doc.§2.6skill:s4u-lifecycle
15Six-axis Decision-Cost Rubric is mandatory for every architectural change, dependency adoption, or pattern shift (latency, dependency surface, debuggability, reversibility, blast radius, alternative considered); unquantified axes state "not measured because…".§2.7card
16Load-bearing architectural commits and ADRs include a Decision context: block recording which axes were considered and the estimates.§2.7skill:s4u-adr
17Consolidation review runs monthly on a generated work-list (feature-flag census with born-dates, modules >3,000 lines, ADR-register integrity).§2.8kit:scripts/consolidation-census.sh
18Standing consolidation mandate: "each cycle retires at least two rollout flags... or merges one duplicated code lane — or records in the census report why not"; no retirement + no justification = unfinished review.§2.8card
19Brainstorm Gate: any of six triggers (new dependency; pattern across ≥3 files/call sites; >100 ms hot-path latency change; public API/schema/data-contract change; >2 h estimated work; safety-policy/guard/refusal change) blocks all spec/plan/code work until a Pre-Mortem Block is emitted.§2.7, §3.1card
20Safety-policy trigger cannot be cleared by the proposer alone: the block must end with Safety sign-off: <name, date> and name the incident class the change could re-open.§3.1card
21Pre-Mortem Block exact format: proposal in one sentence, triggers fired, six rubric axes, strongest risk, what would change my mind, confidence — "the structure is the gate."§3.1skill:s4u-lifecycle
22Non-skippable lines: "Strongest risk I see" demands a specific failure mode tied to a named component; "What would change my mind" demands a falsifiable signal; generic answers do not clear the gate.§3.1card
23Brainstorm produces a problem statement, constraint analysis, and ≥3 candidate approaches with explicit tradeoffs.§2.1, §3.1skill:s4u-lifecycle
24One design artifact (merged specification + ordered task list, template templates/spec-template.md) is committed to docs/ before any implementation begins.§3.1card
25Trivial changes (single file, no schema/API/safety surface) need only a one-sentence design note in the PR description.§3.1skill:s4u-lifecycle
26Tasks are sized for a single subagent session (~60 minutes max — split anything larger) and specify number/title, dependencies, acceptance criteria, files touched.§3.1skill:s4u-lifecycle
27Second-party scrutiny: designs touching a safety path, schema, public API, or auth get review by a human or a fresh-context agent in a SEPARATE session; "Same-session self-review does not count."§3.1card
28Agent feature work happens in a git worktree (filesystem isolation; prevents wrong-branch commits).§3.1, §14card
29TDD (red-green-refactor) is the default execution model; PoC mode may reorder to code-first but "The tests are still mandatory; only the ordering is flexible."§3.1, §8card
30Anti-rationalization: every test failure is either fixed or explicitly documented as a known limitation — never rationalized as "expected."§3.1skill:s4u-testing-standard
31Bug fixes start with four-phase systematic investigation (root cause → pattern analysis → hypothesis testing → implementation); guess-and-check debugging is prohibited.§3.2skill:s4u-lifecycle
32A failing test reproducing the bug is mandatory before any fix: "A bug fix without a failing test is a fix without proof."§3.2card
33Minimal fix: the smallest change that makes the failing test pass; verify all existing tests still pass.§3.2skill:s4u-lifecycle
34Two-stage review on every task: Stage 1 spec compliance, then Stage 2 code quality (patterns, tests, error handling, security, maintainability) — in that order.§3.3, §3.4, §5.4skill:s4u-code-review
35"'Silently diverging' — implementing something different from the spec without updating the spec — is never acceptable"; fix the implementation or update the spec with rationale.§3.3card
36Receiving review: evaluate each finding on technical merit; agree with evidence, not deference; push back with technical reasoning when a finding is wrong.§3.3skill:s4u-code-review
37Every review finding gets one of three dispositions — fixed, acknowledged (deferred with tracking issue), or disputed (with technical reasoning); "No finding is silently ignored."§3.3card
38Subagent-driven development is the default execution model: fresh subagent per task; "The controller does not implement. It directs, reviews, and decides."§3.4, §5.4card
39/executing-plans replaces subagent dispatch only for parallel human sessions, plans of ≤3 low-complexity tasks, or tightly coupled tasks — a pragmatic judgment call.§3.4skill:s4u-lifecycle
40Instruction placement by tier: cross-project rules in global CLAUDE.md; project stack/architecture/testing context in project CLAUDE.md; ephemeral state in memory — no tech-specific rules in global, no ephemera in project instructions.§4.1–4.3skill:s4u-memory-discipline
41Three-tier stack semantics with proportional deviation cost: mandatory deviation = new project ADR; default deviation = ADR-0001 entry; "Forbidden tier has no deviation path."§4.5card
42Single-source rule: every normative rule lives in exactly one loadable home (operating card, named skill, or appendix-m); project CLAUDE.md carries a one-line pointer plus ADR-backed deltas; "Verbatim copies are forbidden."§4.5card
43Canon/CLAUDE.md drift is checked mechanically in CI, not by periodic human re-reading.§4.5kit:scripts/check-canon-consistency.sh
44Stack tier item lists (mandatory / default / forbidden inventories with lineage pointers).§4.5DELETE: duplicate — single-source home is appendix-m-canonical-stack.md per §4.5 rule 1; canon keeps the tier semantics (row 41) and a pointer
45Stack amendment from real-project evidence only: default→mandatory after ≥2 projects shipped ≥3 months; →forbidden only after ≥1 cited real failure mode; new libraries enter at default tier.§4.5skill:s4u-adr
46Capability-based tool prescription: capability tiers are the durable contract; model names are point-in-time bindings — re-bind on deprecation without a methodology change.§5.2skill:s4u-loop-dispatch
47Reviewer model selection: judgment-intensive review (security, compliance, architecture) on the high-capability tier; checklist review mid-tier; mechanical lookups fast tier.§5.2, §5.3skill:s4u-code-review
48Review-gate dispatch is change-type-driven, not voluntary: API routes→API reviewer; auth/authz→security; models/migrations→migration; AI decision/audit→compliance; RLS patterns→security+migration; new services→API+security; reviews run after each task, before merge, with structured output (severity/location/issue/fix).§5.3, §7.3skill:s4u-code-review
49Project reviewer agent definitions and checklists (API, security, compliance, migration).§5.1, §5.3kit:templates/agents/ (api-, security-, compliance-, migration-reviewer.md)
50Honest annotation: ‖ parallelisable only when genuinely independent on BOTH dimensions — file-disjoint and host-resource-disjoint (testcontainer ceiling ≈ floor(docker_memory_GB / 1.5)); verify every at design review.§3.1, §5.4skill:s4u-loop-dispatch
51Tasks depending on a shared upstream task cannot dispatch in the same wave as that upstream; review is the wave's synchronization point, not dispatch.§5.4skill:s4u-loop-dispatch
52Dispatch mitigations: subagent verification uses capture-to-file (pytest > /tmp/out.txt 2>&1; cat /tmp/out.txt), never piped through tail; pre-resolve file pointers + narrow scope at the orchestrator before dispatch.§5.4skill:s4u-loop-dispatch
53Parallel-wave recovery is the default workflow: expect verify+commit stalls; the orchestrator inspects git status, runs targeted verification on the dead subagent's files, and commits if green.§5.4skill:s4u-loop-dispatch
54Stub-test ownership transfer: the brief of the task replacing a stub MUST include "delete or update the stub-assertion test."§5.4skill:s4u-loop-dispatch
55Wave failures don't abort the wave: siblings continue; the controller re-dispatches with more context, decomposes, or escalates the blocked task and re-sequences remaining waves.§5.4skill:s4u-loop-dispatch
56Implementer status protocol DONE / DONE_WITH_CONCERNS / NEEDS_CONTEXT / BLOCKED; "retrying with the same context and the same model is almost never the correct response" to BLOCKED.§5.4skill:s4u-loop-dispatch
57Wake-signal hierarchy: notifications > monitors > heartbeats; a dispatched subagent's completion notification IS the wake signal (no double-wake Monitor); polling-as-primary-signal is an anti-pattern.§5.5skill:s4u-loop-dispatch
58Cache-aware loop cadence: under 270 s only when actively watching an imminent change, otherwise 1200–1800 s; "never 300s" — the 300–600 s range is a trap.§5.5skill:s4u-loop-dispatch
59Loop prompts are self-contained cron-messages-to-future-self: complete subject/verb/object, explicit file/task/wave pointers, no "the X we discussed."§5.5skill:s4u-loop-dispatch
60Plan resolution decreases with milestone distance: fully reviewed design for M{current}, wave structure + acceptance criteria for M+1, plan sketch beyond.§5.6skill:s4u-product-scale-planning
61Cross-consistency review is a discrete phase before first dispatch, with seven checks (a)–(g): ADR↔plan↔baseline-migration cross-refs; plan↔plan capability handoffs; no contradictory decisions; no migration filename pinning; column nullability; filesystem prerequisites for claimed paths; schema/enum fit of test-data shapes.§5.6skill:s4u-product-scale-planning
62Migration filename pinning is forbidden — plans refer to migrations by purpose, never by pinned filename.§5.6skill:s4u-product-scale-planning
63Bounded autonomy: explicit human checkpoint between milestones — "did we discover anything in M{N} that invalidates the plan for M{N+1}?"§5.6skill:s4u-product-scale-planning
64Milestone ADRs must reference the canonical stack; canon↔ADR-0001 drift is a cross-consistency review issue.§4.5, §5.6skill:s4u-product-scale-planning
65Downgrade walks use an explicit revision id (alembic downgrade ${rev}), never -N counts.§5.6 r4skill:s4u-testing-standard
66Count assertions derive from behavior, not raw integers — pin invariants ("every scenario has a non-empty final state"), tolerate intentional additions.§5.6 r4skill:s4u-testing-standard
67"The hub budget is enforced in BYTES — the loader's unit: 24,000 bytes" (line counts are gameable and were gamed).§6.1kit:templates/hooks/memory-budget-check.sh
68Under truncation, section order is policy: durable sections come FIRST; perishable active-work entries come LAST, one line each (≤250 characters).§6.1skill:s4u-memory-discipline
69Hub-and-spoke: MEMORY.md is an index (mission, quick reference, preferences, known debt, links); depth lives in single-topic spoke files.§6.1skill:s4u-memory-discipline
70Memory files declare one of four types in YAML frontmatter — user, feedback, project, reference — each with its own update pattern.§6.2skill:s4u-memory-discipline
71What NOT to save: code patterns, git history, debugging session details, test commands, file-level docs, current task status; filtering principle — save only what a fresh session would otherwise get wrong.§6.3skill:s4u-memory-discipline
72Verify before acting: specific memory claims (counts, file paths, line numbers) are verified against current code before use — "a starting point for investigation, not a conclusion."§6.4card
73Update stale memories as part of the current task; treat volatile claims with lower trust than stable ones.§6.4skill:s4u-memory-discipline
74Memory staleness warnings displayed on files older than a threshold.§6.4DELETE: describes Claude Code platform behavior, not an agent-actionable rule — the actionable half is row 72
75Gate admission rule: every gate declares (a) per-occurrence cost, (b) enforcement mechanism — "'the agent will remember' is not a mechanism" — and (c) retirement condition; new gates missing any field are rejected at canon review.§7card
76Live contract smoke: any feature crossing an external SDK/API boundary ships a marker-gated live smoke (3–5 real-credential calls) run before deploy; "Mocked-only coverage of an external boundary is an unverified boundary."§3.1, §7card
77Migrated-schema oracle: integration-test databases are built via the migration chain (alembic upgrade head), "never via ORM metadata create_all."§7card
78Name the oracle: every review names the production artifact each verification observes; proxy observations are recorded and the gap either closed or accepted in writing.§7card
79Layer 1 — linting runs automatically after every file edit.§2.3, §7.1kit:templates/hooks/lint-on-edit.sh
80Layer 2 — Stop verification fires at task completion (diff-aware, advisory, command-type).§7.1kit:templates/hooks/verify-before-stop.sh
81Layer 3 — pre-push blocking gate: failing tests, coverage below threshold, or lint regressions block the push.§7.1kit:templates/hooks/pre-push-gate.sh
82Layer 4 — security scanning: dev-time SAST/secrets/IaC with auto-remediation loop; PR-time semantic security review on every pull request (compliance requirement for regulated systems).§7.1skill:s4u-code-review
83Blocking prompt-type Stop hook (verbatim Trust Relay JSON, "the prompt is not a suggestion — it is a structural gate").§7.2DELETE: superseded — the blocking prompt variant trapped sessions in completion loops (zol-rag 2026); v3 ships the diff-aware advisory command hook (templates/hooks/verify-before-stop.sh)
84No mocking by default: real services via testcontainers (PostgreSQL, MinIO, Redis, Temporal); unittest.mock of internal classes forbidden — mock at process boundaries only.§2.3, §4.5, §8card
85Every approved mock carries a MOCK APPROVED comment stating the reason, the approver, the date, and the alternative for running against real services.§2.3, §8card
86No time.sleep() in tests; deterministic tests via the standard time-control library (canonical default freezegun; Temporal-workflow tests use WorkflowEnvironment.start_time_skipping()).§4.5, §8card
87In-memory databases as test fixtures are forbidden — use real Postgres in testcontainers.§4.5, §8card
88Tests requiring more than 30 seconds are marked @pytest.mark.slow and excluded from the default run.§8skill:s4u-testing-standard
89Tiered coverage targets by mode (PoC: 90% core / 70% elsewhere; Production: 90% across layers, 85% failure-branch, 80% real-integration ratio), enforced by tooling.§2.3, §8skill:s4u-testing-standard
90shadcn/ui is the required design system, added per-component (npx shadcn@latest add <component>); building custom components when a shadcn/ui equivalent exists is forbidden.§9skill:s4u-ui-review
91Skeleton loaders for all async content; full-page loading spinners are forbidden.§9skill:s4u-ui-review
92Native alert() / confirm() / browser dialogs are forbidden — Sonner toasts + inline confirmation.§4.5, §9card
93Inline confirmation for destructive actions; modals reserved for complex multi-step flows that cannot be accomplished inline.§9skill:s4u-ui-review
94Accessibility minimums: WCAG AA contrast 4.5:1 for every color combination; 44 px minimum touch targets on mobile; 320/768/1024+ responsive breakpoints.§9skill:s4u-ui-review
95ADR required for: technology choices, architectural patterns, data-model decisions, integration approaches, security decisions.§10card
96ADR template: Title, Date, Status (Proposed/Accepted/Deprecated/Superseded by ADR-YYYY), Context, Decision, Consequences, Alternatives Considered with rejection rationale.§10skill:s4u-adr
97Supersession tracking: the superseded ADR's status becomes "Superseded by ADR-YYYY" and the new ADR references the old.§10kit:scripts/check-adr-register.sh
98Documentation is honest, not aspirational: planned features labeled planned; "A skeptic... should find the documentation understated, not overstated."§11.2skill:s4u-doc-excellence
99Evidence-grade claims: every factual doc assertion is backed by a metric, test result, commit reference, or reproducible command.§11.2skill:s4u-doc-excellence
100Multi-audience by structure: docs organized so each audience navigates to its content instead of reading everything and filtering.§11.2skill:s4u-doc-excellence
101Docs deploy from the same repo and commit as code; diagrams authored as Mermaid in version control.§11.3skill:s4u-doc-excellence
102Documentation update triggers are structural changes, not tactical fixes (per the §11.1 trigger table).§11.1skill:s4u-doc-excellence
103STATE.md is generated from git/gh data or absent — "a missing file is safer than a misleading one."§11.6, §14.1card
104STATE.md cadence is per-shipped-artifact (not a daily journal); it records milestone position, last shipped, blockers, next dispatch — not a roadmap/plan/memory substitute.§11.6skill:s4u-memory-discipline
105Five requirements for every AI output: input provenance, model identification, chain of thought, confidence scoring, immutable audit log.§12DELETE: Trust-Relay-specific compliance architecture — §14 (v3) classifies domain compliance as Project-specific, adopt-with-ADR; belongs in the showcase/case studies, not canon rules
106Non-suppression principle: "The system can ADD scrutiny but NEVER suppress risk signals."§12DELETE: Trust-Relay-specific (KYB risk workflow) — same §14 Project-specific classification as row 105
107AML record-keeping: 5-year retention, SAR-supporting audit trail, auditable risk methodology.§12DELETE: Trust Relay regulatory scope, not methodology canon
108AGENTS.md symlinked to CLAUDE.md for cross-tool compatibility.§13kit:setup/SETUP-GUIDE.md
109The operating card is the ≤5K-token rule surface; always-loaded instruction artifacts stay within declared byte budgets.§14kit:scripts/check-context-budget.sh
110Core CI gates are required status checks on the default branch: full test suite (never -x first-failure abort) and full lint (ruff check + ruff format --check, not rule subsets).§14card
111CODEOWNERS-backed human review on safety paths.§14card
112Products with a user-facing safety surface run a deterministic safety-floor eval subset as a required CI check.§14card
113Silent-failure discipline R1–R3: log collection sizes before return; regression-pin every silent-failure branch; contract-test cross-component shared state.§14card
114Project-specific tier components (domain compliance architectures, living-doc sites, MCP integrations) are adopted only with a deviation ADR.§14skill:s4u-adr
115Permission mode is a security control, not a preference: promptless agent action against anything production-reaching is a team-level decision; stated default = plan mode or ask-permission for production-touching commands.§14card
116The repo is the memory: anything a teammate would need lives in repo-versioned docs; a repo-versioned ONBOARDING.md is the single entry point.§14.1card
117Org-owned repo with branch protection — required status checks must be enforceable, not advisory.§14.1card
118Deploy serialization: deploy scripts take a lock (flock or equivalent) and refuse when another deploy holds it or a long-running job (ingest, migration) is active.§14.1card
119Incident roles are named in the runbook before the first incident: who is paged, who can roll back, where forensic artifacts land.§14.1card

Counts per disposition

DispositionCount
card42
skill:s4u-lifecycle11
skill:s4u-loop-dispatch11
skill:s4u-doc-excellence9
skill:s4u-memory-discipline7
skill:s4u-code-review5
skill:s4u-testing-standard5
skill:s4u-product-scale-planning5
skill:s4u-adr4
skill:s4u-ui-review4
kit10
DELETE6
Total119

Skill subtotal: 61. Kit files referenced: scripts/consolidation-census.sh, scripts/check-canon-consistency.sh, scripts/check-adr-register.sh, scripts/check-context-budget.sh, templates/hooks/lint-on-edit.sh, templates/hooks/verify-before-stop.sh, templates/hooks/pre-push-gate.sh, templates/hooks/memory-budget-check.sh, templates/agents/, setup/SETUP-GUIDE.md.

Projected operating-card size

42 card-bound rules. Measured rule-text bytes for the 42 card rows in this table: 6,782 bytes; rendered at ~1.5 lines each (bullet + grouping headers + ~10% formatting overhead): ~7,500 bytes projected — within the ≤20,000-byte target with ~62% headroom.

Rules exceeding the ~1.5-line (~180-byte) budget as quoted here: #15 (rubric axes), #19 (six gate triggers), #42 (single-source), #75 (gate admission), #76 (live contract smoke) — 5 of 42 (12%, under the 20% flag threshold); all compress to ≤2 lines on the card by dropping quoted clauses already carried verbatim in skills (e.g., the rubric axis list lives once, in the card's rubric block).

De-duplication notes (survive review)

  • Verification-before-completion appears in §2.3, §3.1, §7.1, §7.2 — inventoried once as procedure (row 4) plus one row per enforcement layer (rows 79–81); the §7.2 blocking-prompt variant is the only deleted copy (row 83), deleted as superseded, not as duplicate.
  • Two-stage review appears in §3.3, §3.4, §5.4 — one row (34). Subagent-default appears in §2.2, §3.4, §5.4 — one row (38). Doc-commit pattern appears in §2.5 and §11.4 — one row (7).
  • The freezegun/Temporal contradiction (assessment §5.2 item 1) is resolved in the v3 §8 text; row 86 carries the harmonized form.
  • The Plan Walkthrough (assessment G2) no longer exists in v3 text — retired in favor of second-party scrutiny (row 27); no inventory row needed.
  • The assessment's keyword census counted 84 normative lines; full imperative extraction across lifecycle/agent-ops/adoption sections, after merging multi-section duplicates and splitting compound rules into separately-enforceable units, yields 119.