Skip to main content

S4U Operating Card

The complete rule surface of the S4U methodology — rules only, one to two lines each. Rationale and evidence live in methodology.md (reference, never a startup read). Procedures load on demand via the s4u skills. On conflict, this card wins. Traceability: docs/rule-inventory.md.

Lifecycle

  • Every non-trivial change follows brainstorm → design → execution. No code until a design exists. Trivial = single file, no schema/API/safety surface → one-sentence design note in the PR. [→ skill:s4u-lifecycle]
  • ONE design artifact (spec + ordered tasks, templates/spec-template.md), committed to docs/ before implementation.
  • Second-party scrutiny: designs touching a safety path, schema, public API, or auth are reviewed by a human or a fresh-context agent in a SEPARATE session. Same-session self-review does not count.
  • Agent feature work happens in a git worktree.
  • TDD (red-green-refactor) is the default execution model; PoC mode may reorder to code-first, but the tests are still mandatory.
  • A bug fix without a failing test is a fix without proof: reproduce first.
  • Subagent-driven development is the default execution model: fresh subagent per task; the controller directs, reviews, decides — it does not implement.

Brainstorm Gate (Pre-Mortem)

Any ONE trigger fires the gate — no design/plan/code until the Pre-Mortem Block is emitted [full format → skill:s4u-lifecycle]:

  1. New dependency (library, framework, external service)
  2. Pattern replicated across ≥3 files / call sites
  3. Hot-path latency change >100 ms (either direction)
  4. Public API, schema, or data-contract change
  5. More than 2 hours of estimated work
  6. Safety policy / guard / refusal behavior changes (relaxations AND additions) — hardest trigger: requires a literal Safety sign-off: <name, date> line and must name the incident class the change could re-open. The proposer cannot clear it alone.

The block addresses all six Decision-Cost axes — latency, dependency surface, debuggability, reversibility, blast radius, alternative considered — plus: "Strongest risk I see" (specific failure mode, named component) and "What would change my mind" (falsifiable signal). Generic answers do not clear the gate.

Verification & review

  • Nothing is complete until fresh verification output confirms it. "It should work now" means "I have not verified it."
  • A failing test is REAL until proven stale — investigate the touched area's prod code before classifying it stale, transient, or unrelated.
  • Name the oracle: every review names the production artifact each check observes. A proxy (mock, ORM-built schema, intermediate frame, spot-check subset) is recorded as a proxy; the gap is closed or accepted in writing.
  • Live contract smoke: any external SDK/API boundary ships a marker-gated live smoke (3–5 real-credential calls) run before deploy. Mocked-only coverage of an external boundary is an unverified boundary.
  • Migrated-schema oracle: integration-test databases are built via the migration chain (alembic upgrade head), never ORM create_all.
  • Silently diverging from the spec is never acceptable: fix the code or amend the spec with rationale.
  • Every review finding gets one of three dispositions — fixed / acknowledged (with tracking issue) / disputed (with technical reasoning). None ignored. [→ skill:s4u-code-review]

Testing

  • No mocking by default: real services via testcontainers (PostgreSQL, MinIO, Redis, …). Mock only at process boundaries; mocking internal classes is forbidden. [→ skill:s4u-testing-standard]
  • Every approved mock carries a MOCK APPROVED comment: what, why, approver, date, and how to run without it.
  • No time.sleep() in tests. Time control via the standard library (default freezegun; Temporal tests use time-skipping).
  • In-memory databases as fixtures for Postgres-targeting code are forbidden.

Gates are mechanisms

  • Gate admission rule: every gate declares (a) per-occurrence cost, (b) enforcement mechanism — a hook, CI check, or script; "the agent will remember" is not a mechanism — and (c) retirement condition.
  • Core CI gates are REQUIRED status checks on the default branch: full test suite (never -x), full lint (ruff check + ruff format --check, never rule subsets).
  • Products with a user-facing safety surface run a deterministic safety-floor eval subset as a required CI check.
  • Flag-flips to default-ON go through the probe-gated path (templates/scripts/flip-flag.sh): no recorded falsification-probe artifact, no flip.
  • Monthly consolidation review with a generated work-list (scripts/consolidation-census.sh): each cycle retires ≥2 rollout flags or merges one duplicated lane — or records why not. Unjustified no-op = review not done.

Silent-failure discipline (R1–R3)

  • R1: any function returning a collection logs its size before returning.
  • R2: every silent-failure branch (empty fallback, NULL filter, swallowed exception) gets a regression-pinning test, shipped WITH the fix.
  • R3: cross-component shared state (WS/HTTP/DB handoffs) gets a contract test that simulates the wire format and asserts both sides agree.

Data & schema

  • All schema changes via Alembic migrations — never a manual ALTER TABLE.

Documentation & state

  • A feature is not complete until its documentation exists — same tier as code and tests. [→ skill:s4u-doc-excellence]
  • STATE.md is generated from git/gh data, or absent. A missing file is safer than a misleading one.
  • Defensive-guard comments distinguish Verified <date> via <method>: <cause> from Hypothesized <date>: <cause>; upgrade the comment when the real cause is found.
  • Public doc syncs pass the classification gate (scripts/check-doc-classification.sh): no infra/secret-shaped content without an explicit audience: public.

Stack & deviations

  • Single source, pointers elsewhere: every rule lives in exactly one loadable home; project CLAUDE.md = pointer + ADR-backed deltas only. Verbatim copies are forbidden.
  • Three-tier stack (appendix-m is the single source): mandatory deviation = new ADR; default deviation = ADR-0001 entry; forbidden tier has no deviation path.
  • Native alert()/confirm()/browser dialogs are forbidden — toasts + inline confirmation. [→ skill:s4u-ui-review]

ADRs

  • ADR required for: technology choices, architectural patterns, data-model decisions, integration approaches, security decisions. [→ skill:s4u-adr; register checked by scripts/check-adr-register.sh]

Memory

  • Hub budget: 24,000 BYTES, durable sections first, one-line entries. [→ skill:s4u-memory-discipline]
  • Verify memory claims (counts, paths, line numbers) against current code before acting on them — memory is a starting point, not a conclusion.

Team operation (N>1)

  • The repo is the memory: anything a teammate needs lives in repo-versioned docs; ONBOARDING.md is the single entry point.
  • Org-owned repo with branch protection — required checks must be enforceable, not advisory.
  • CODEOWNERS-backed human review on safety paths.
  • Permission mode is a security control: promptless agent action against production-reaching surfaces is a team decision; default = plan/ask mode.
  • Deploys take a lock and refuse during active jobs or concurrent deploys.
  • Incident roles (who is paged, who rolls back, where forensics land) are named in the runbook before the first incident.