S4U Operating Card
The complete rule surface of the S4U methodology — rules only, one to two lines each. Rationale and evidence live in
methodology.md(reference, never a startup read). Procedures load on demand via the s4u skills. On conflict, this card wins. Traceability:docs/rule-inventory.md.
Lifecycle
- Every non-trivial change follows brainstorm → design → execution. No code until a design exists. Trivial = single file, no schema/API/safety surface → one-sentence design note in the PR. [→ skill:s4u-lifecycle]
- ONE design artifact (spec + ordered tasks,
templates/spec-template.md), committed todocs/before implementation. - Second-party scrutiny: designs touching a safety path, schema, public API, or auth are reviewed by a human or a fresh-context agent in a SEPARATE session. Same-session self-review does not count.
- Agent feature work happens in a git worktree.
- TDD (red-green-refactor) is the default execution model; PoC mode may reorder to code-first, but the tests are still mandatory.
- A bug fix without a failing test is a fix without proof: reproduce first.
- Subagent-driven development is the default execution model: fresh subagent per task; the controller directs, reviews, decides — it does not implement.
Brainstorm Gate (Pre-Mortem)
Any ONE trigger fires the gate — no design/plan/code until the Pre-Mortem Block is emitted [full format → skill:s4u-lifecycle]:
- New dependency (library, framework, external service)
- Pattern replicated across ≥3 files / call sites
- Hot-path latency change >100 ms (either direction)
- Public API, schema, or data-contract change
- More than 2 hours of estimated work
- Safety policy / guard / refusal behavior changes (relaxations AND
additions) — hardest trigger: requires a literal
Safety sign-off: <name, date>line and must name the incident class the change could re-open. The proposer cannot clear it alone.
The block addresses all six Decision-Cost axes — latency, dependency surface, debuggability, reversibility, blast radius, alternative considered — plus: "Strongest risk I see" (specific failure mode, named component) and "What would change my mind" (falsifiable signal). Generic answers do not clear the gate.
Verification & review
- Nothing is complete until fresh verification output confirms it. "It should work now" means "I have not verified it."
- A failing test is REAL until proven stale — investigate the touched area's prod code before classifying it stale, transient, or unrelated.
- Name the oracle: every review names the production artifact each check observes. A proxy (mock, ORM-built schema, intermediate frame, spot-check subset) is recorded as a proxy; the gap is closed or accepted in writing.
- Live contract smoke: any external SDK/API boundary ships a marker-gated live smoke (3–5 real-credential calls) run before deploy. Mocked-only coverage of an external boundary is an unverified boundary.
- Migrated-schema oracle: integration-test databases are built via the
migration chain (
alembic upgrade head), never ORMcreate_all. - Silently diverging from the spec is never acceptable: fix the code or amend the spec with rationale.
- Every review finding gets one of three dispositions — fixed / acknowledged (with tracking issue) / disputed (with technical reasoning). None ignored. [→ skill:s4u-code-review]
Testing
- No mocking by default: real services via testcontainers (PostgreSQL, MinIO, Redis, …). Mock only at process boundaries; mocking internal classes is forbidden. [→ skill:s4u-testing-standard]
- Every approved mock carries a
MOCK APPROVEDcomment: what, why, approver, date, and how to run without it. - No
time.sleep()in tests. Time control via the standard library (defaultfreezegun; Temporal tests use time-skipping). - In-memory databases as fixtures for Postgres-targeting code are forbidden.
Gates are mechanisms
- Gate admission rule: every gate declares (a) per-occurrence cost, (b) enforcement mechanism — a hook, CI check, or script; "the agent will remember" is not a mechanism — and (c) retirement condition.
- Core CI gates are REQUIRED status checks on the default branch: full test
suite (never
-x), full lint (ruff check+ruff format --check, never rule subsets). - Products with a user-facing safety surface run a deterministic safety-floor eval subset as a required CI check.
- Flag-flips to default-ON go through the probe-gated path
(
templates/scripts/flip-flag.sh): no recorded falsification-probe artifact, no flip. - Monthly consolidation review with a generated work-list
(
scripts/consolidation-census.sh): each cycle retires ≥2 rollout flags or merges one duplicated lane — or records why not. Unjustified no-op = review not done.
Silent-failure discipline (R1–R3)
- R1: any function returning a collection logs its size before returning.
- R2: every silent-failure branch (empty fallback, NULL filter, swallowed exception) gets a regression-pinning test, shipped WITH the fix.
- R3: cross-component shared state (WS/HTTP/DB handoffs) gets a contract test that simulates the wire format and asserts both sides agree.
Data & schema
- All schema changes via Alembic migrations — never a manual
ALTER TABLE.
Documentation & state
- A feature is not complete until its documentation exists — same tier as code and tests. [→ skill:s4u-doc-excellence]
- STATE.md is generated from git/gh data, or absent. A missing file is safer than a misleading one.
- Defensive-guard comments distinguish
Verified <date> via <method>: <cause>fromHypothesized <date>: <cause>; upgrade the comment when the real cause is found. - Public doc syncs pass the classification gate
(
scripts/check-doc-classification.sh): no infra/secret-shaped content without an explicitaudience: public.
Stack & deviations
- Single source, pointers elsewhere: every rule lives in exactly one loadable home; project CLAUDE.md = pointer + ADR-backed deltas only. Verbatim copies are forbidden.
- Three-tier stack (appendix-m is the single source): mandatory deviation = new ADR; default deviation = ADR-0001 entry; forbidden tier has no deviation path.
- Native
alert()/confirm()/browser dialogs are forbidden — toasts + inline confirmation. [→ skill:s4u-ui-review]
ADRs
- ADR required for: technology choices, architectural patterns, data-model
decisions, integration approaches, security decisions.
[→ skill:s4u-adr; register checked by
scripts/check-adr-register.sh]
Memory
- Hub budget: 24,000 BYTES, durable sections first, one-line entries. [→ skill:s4u-memory-discipline]
- Verify memory claims (counts, paths, line numbers) against current code before acting on them — memory is a starting point, not a conclusion.
Team operation (N>1)
- The repo is the memory: anything a teammate needs lives in repo-versioned docs; ONBOARDING.md is the single entry point.
- Org-owned repo with branch protection — required checks must be enforceable, not advisory.
- CODEOWNERS-backed human review on safety paths.
- Permission mode is a security control: promptless agent action against production-reaching surfaces is a team decision; default = plan/ask mode.
- Deploys take a lock and refuse during active jobs or concurrent deploys.
- Incident roles (who is paged, who rolls back, where forensics land) are named in the runbook before the first incident.