Skip to main content

The Canonical Technology Stack

Added in methodology v2.1 (2026-04-28). The instruction hierarchy (§4.1-4.4) tells AI agents what to read and which skills to invoke. The canonical tech stack tells them what to build with — methodology-level defaults that flow downstream into every project's CLAUDE.md and ADR-0001 instead of being re-derived per project.

What: The canonical tech stack is a three-tier list of the libraries, tools, and patterns that S4U projects use across the board. Every project's ADR-0001 references this list as its starting point and either accepts the canonical defaults or amends them with explicit rationale. Three tiers, distinct semantics:

  • Mandatory. Always use this. Deviating requires an ADR explicitly justifying why and what the alternative buys. Examples: pydantic v2 for runtime validation, structlog for structured logging (never stdlib logging directly in application code), testcontainers for integration tests against external services, pytest-asyncio mode auto for async tests, pyright for static type checking. The bar for deviation is high because mandatory items are load-bearing for the methodology's other guarantees (e.g., evidence-over-claims requires structured logs that are mechanically queryable; testcontainers is what makes the no-mocking-by-default rule from §7 enforceable).

  • Default. Use this unless you have a project-specific reason to pick differently; the reason goes in ADR-0001. Examples: httpx for HTTP clients, respx for HTTP testing, freezegun for time-travel tests, asyncpg for application Postgres queries, psycopg 3 only when needed (e.g., LangGraph's PostgresSaver per Ratiba ADR-0001), SQLAlchemy 2.0 async + Alembic for ORM + migrations. The bar for deviation is moderate — these are the right defaults but the cost of swapping is bounded.

  • Forbidden. Never use this regardless of project. Examples: time.sleep() in tests (use fake clocks); mocking-by-default (use testcontainers, see §7); in-memory databases as test fixtures (use real Postgres in testcontainers); the requests library for new code (sync, not async-native); unittest.mock of internal classes (mock at process boundaries only, see §5.4); alert() / confirm() / native browser dialogs in UI code (use Sonner toasts + inline UI, see §UI/UX rules).

Why: Cross-project consistency compounds. Trust Relay, Zol-RAG, and Ratiba.chat have slightly different domains but ~80% of their stack overlap is identical — and where they overlap, contributors can move between projects without re-learning HTTP testing patterns, logging conventions, or migration shapes. Currently that consistency is encoded by osmosis (a contributor on project N looks at project N-1 for the pattern) rather than by methodology rule. Codifying the stack means a fresh project starts at the right defaults from day zero, and a fresh contributor reads one document instead of three codebases.

The three-tier distinction is load-bearing. A flat "preferred stack" list collapses the meaningful difference between "you must use structlog" (the no-stdlib-logging rule is what makes the evidence pattern in §2.3 tractable) and "you should reach for httpx first" (a defensible default that doesn't break the methodology if a project picks differently). Mandatory items are requirements; defaults are opinions; forbidden items are guardrails. Each tier has a different review cost when a project wants to deviate.

Three load-bearing rules:

  1. Single source, pointers elsewhere. Every normative rule — including the mandatory stack list — lives in exactly one loadable home (the operating card, a named skill, or appendix-m for the stack). The project's CLAUDE.md carries a one-line pointer to that home plus only genuinely project-specific deltas, each delta backed by a deviation ADR (e.g., Ratiba's psycopg-for-LangGraph exception). Verbatim copies are forbidden: the 2026-06-11 assessment traced every contradiction pair it found (freezegun, the stack table stored three ways and wrong in all three, alembic-state, PR-state) to a hand-maintained copy that drifted (findings CE-5/PW-5). Drift is checked mechanically by scripts/check-canon-consistency.sh in this repo's CI, not by periodic human re-reading. Mandatory items remain non-negotiable per-PR; what changed in v3 is only WHERE they live — once, not four times.

  2. Deviation has an ADR cost proportional to the tier. Mandatory deviation: a new ADR in the project's docs/adr/ justifying why this project departs from canon. Default deviation: an entry in the project's existing ADR-0001 explaining the swap (no new ADR needed — the tech-stack ADR already exists). Forbidden tier has no deviation path; the cost is "you can't ship that and still claim to follow the methodology." This proportionality is what gives the tiers teeth — without it, every choice slides toward "default" and the list loses its enforcement value.

  3. The canonical stack is amended from real-project evidence, not from speculation. A library moves from default to mandatory only after ≥2 projects have shipped with it for ≥3 months. A library moves to forbidden only after ≥1 project has hit a real failure mode that the library caused (concretely cited in the canonical-stack appendix). New library additions land in the default tier first; promotion or demotion happens by amendment. This rule prevents the list from accumulating wishful-thinking entries that nobody actually uses.

Evidence: The current canonical stack is curated from the union of three S4U projects (Trust Relay, Zol-RAG, Ratiba.chat) with explicit "inspired-from" pointers per entry so the lineage is auditable. The mandatory items are the ones that all three projects shipped with from day one; the defaults are the ones that emerged as obvious consensus across the projects' early phases; the forbidden items are codified failures (e.g., the "no mocking by default" rule comes from Trust Relay losing two days to mock/prod divergence in 2026-Q1). The full stack list, organised by category (backend / frontend / data / testing / observability / infrastructure) with inspired-from references and deviation-rationale templates, is in appendix-m-canonical-stack.md.

The canonical stack interacts with the rest of the methodology:

  • Project ADR-0001 (tech stack) becomes a thin amendment on top of the canonical stack, not a from-scratch derivation. Most ADR-0001s become 1-2 pages instead of 5-10.
  • §7 quality gates (no mocking, evidence-over-claims) are enforceable only because the mandatory tier guarantees the underlying primitives exist. structlog mandate makes "evidence is structured-log queries" a methodology rule rather than a project preference.
  • §5.6 product-scale planning includes a step "verify each milestone's ADRs reference the canonical stack" — drift between methodology canon and project ADR-0001 is a cross-consistency review issue.