Note (canon v3): the operational content of this appendix is carried by the s4u-testing-standard skill — agents load THAT; this appendix remains the full reference.

Appendix A: Testing Standard

The comprehensive testing reference for S4U projects. It distills the core testing principles, adds patterns for workflow orchestration and async Python services, and defines the coverage targets and evidence requirements that apply across all projects.

In one line: tests exist before merge, verify behaviour against real services, are deterministic and fast, and ship with copy-pasted output as evidence.

For the hub document, see methodology.md.

Core Principles
PoC vs Production Mode
Coverage Targets
No Mocking by Default
Testcontainers Fixtures
Temporal Workflow Testing
Async Python Testing Patterns
HTTP Boundary Mocking
Security Tests
API Contract Alignment
Evidence Requirements
Frontend Testing
Quick Reference

1. Core Principles

Five principles govern all testing decisions. They apply in both PoC and Production mode, with thresholds adjusted per mode.

Principle 1: Tests Must Exist Before Merge

In one line: no code merges without passing tests for the changed code.

Every pull request must include tests covering the changed code.

# Pre-merge verification (both modes)
pytest tests/ -v
pytest --cov=app --cov-report=term-missing

The coverage threshold depends on the mode (see Section 3); the requirement that tests exist is absolute. A PR with zero tests for new functionality is never mergeable.

Principle 2: Tests Must Catch Real Bugs

In one line: assert observable behaviour against real state, not that a function was called.

A test that passes regardless of whether the code works correctly has negative value — it provides false confidence.

Bad — proves nothing:

def test_user_service(mock_service):
    mock_service.create.assert_called()  # What did it create? Was it correct?

Good — verifies behavior:

def test_user_service_creates_user(db_session):
    result = user_service.create(email="test@example.com")

    assert result.id is not None
    assert result.email == "test@example.com"

    # Verify the user was actually persisted
    persisted = db_session.query(User).filter_by(email="test@example.com").first()
    assert persisted is not None
    assert persisted.id == result.id

The bad test verifies a function was called. The good test verifies a user with the correct attributes was created and persisted in a real database. If the implementation breaks, the good test fails and the bad test might still pass.

Principle 3: Real Services Always (No Mocking by Default)

In one line: mocking and in-memory databases are forbidden unless explicitly approved — see Section 4.

Principle 4: Deterministic Tests Only

In one line: identical results on every run — no wall-clock, no sleeps, no unseeded randomness, no order dependence.

Non-determinism destroys confidence in the suite and wastes time on phantom failures.

Forbidden:

time.sleep() — creates timing-dependent tests that are slow and flaky
Reliance on wall-clock time — use controlled time mechanisms
Uncontrolled random inputs without seeding
Tests that depend on execution order

For workflow-engine code, use the engine's built-in time-skipping environment instead of clock-manipulation libraries. See Section 6 for the correct pattern.

For cache/TTL testing and other time-dependent logic:

# Use a controllable clock abstraction appropriate to your domain.
# The key requirement: time must be deterministic and test-controlled.
# Temporal provides start_time_skipping(). For other contexts,
# use the framework's native time control or inject a clock dependency.

Principle 5: Fast by Default

In one line: the default suite runs in seconds; slow tests are opt-in behind a marker + env gate.

The default test suite must complete quickly. Slow tests are opt-in.

import os
import pytest

# Default: runs in every test suite execution (<30s)
def test_validation():
    assert User(email="test@example.com").is_valid()

# Opt-in slow test: only runs when explicitly requested
@pytest.mark.slow
@pytest.mark.skipif(not os.getenv("RUN_SLOW_TESTS"), reason="Slow test")
def test_full_pipeline():
    """End-to-end pipeline test requiring all services."""
    ...

Configure pytest to skip slow tests by default:

# pytest.ini or pyproject.toml [tool.pytest.ini_options]
markers =
    slow: marks tests as slow (deselect with '-m "not slow"')

# Normal development: fast tests only
pytest tests/ -v

# CI or explicit: include slow tests
RUN_SLOW_TESTS=1 pytest tests/ -v

2. PoC vs Production Mode

The methodology supports two quality modes. The mode is declared in the project's CLAUDE.md and governs coverage thresholds, required test types, and workflow expectations.

PoC Mode

In one line: test-after is fine, but tests still exist before merge, hit 70%, pass clean, and obey the no-sleep / no-unapproved-mock rules.

Use for: exploratory work, prototypes, proof-of-concept features, early-stage development.

Allowed:

Write code first, tests after (test-after workflow)
Skip edge case tests initially
Use simpler assertions
Skip E2E tests

Still required:

Tests must exist before PR merge
70% coverage minimum (may be overridden higher for critical layers)
All tests must pass (pytest exit code 0)
No time.sleep() in tests
No mocking without approval (Principle 3 applies in full)

Workflow:

Implement feature (explore freely)
Get it working (manual testing acceptable during development)
Write tests for what you built
Verify: pytest --cov=app --cov-fail-under=70
Commit & PR

Production Mode

In one line: 90% line / 85% failure-branch coverage, real-service integration tests, security tests on auth/data endpoints, E2E for user-facing apps; bug fixes start with a failing test.

Use for: user-facing features, bug fixes, refactoring, anything that ships to users.

Required:

90% line coverage minimum
85% failure branch coverage
Security tests for auth/data endpoints (see Section 9)
Real services in integration tests
E2E tests for user-facing applications

Bug fix workflow (TDD strongly encouraged):

Write failing test that reproduces the bug
Run test — confirm it fails (RED)
Fix the bug
Run test — confirm it passes (GREEN)
Refactor if needed
Commit with test evidence

New feature workflow:

If requirements are clear: TDD approach
If exploring: test-after approach
Either way: 90% coverage before merge

3. Coverage Targets

In one line: PoC gates at 70% line; Production gates at 90% line / 85% failure-branch / 80% real-integration, and critical layers get Production-grade coverage even in PoC.

By Mode

Mode	Line Coverage	Failure Branch Coverage	Real Integration Ratio
PoC	70%	--	--
Production	90%	85%	80%

Failure branch coverage: percentage of error/exception paths that have dedicated tests. A function with 3 failure modes needs at least 2-3 tested in Production mode.

Real integration ratio: percentage of integration tests that use real services (via testcontainers) vs mocks. In Production mode, at least 80% of integration tests must run against real databases, queues, and object stores.

By Layer (PoC Mode Overrides)

Projects may override per-layer targets in their CLAUDE.md. A typical split:

Layer	Target	Rationale
Core business logic (state machines, domain services)	90%	Errors here are correctness/compliance failures
API endpoints	70%	Standard CRUD; less risk
UI components	70%	UI logic; caught by manual testing
Infrastructure integration	70%	Stable once working

Do this: if a bug in a layer would cause compliance, financial, or data-integrity failures, give that layer Production-grade coverage regardless of overall project mode.

Verification Commands

# Overall coverage gate
pytest --cov=app --cov-fail-under=70 --cov-report=term-missing

# Per-module coverage (for layer-specific targets)
pytest --cov=app.workflows --cov-fail-under=90 --cov-report=term-missing
pytest --cov=app.api --cov-fail-under=70 --cov-report=term-missing

4. No Mocking by Default

In one line: mocking and in-memory databases are forbidden unless approved with a MOCK APPROVED comment naming what, who, and how to run real.

The Rule

Mocking and in-memory databases are forbidden unless explicitly approved. Refactoring mocked tests onto real services costs more than the time saved up front, and mocked tests verify that your code interacts correctly with a mock — not that it works with the real service.

What Is Forbidden (Without Approval)

unittest.mock.Mock() for services
MagicMock for database or API clients
SQLite as PostgreSQL substitute
In-memory databases (H2, SQLite :memory:)
localStorage/IndexedDB mocks in frontend tests
fakeredis, moto, or similar fake services

When Mocking Is Allowed

Mocking is permitted in these cases, with documented justification:

Allowed Mock Target	Rationale
Third-party APIs with rate limits or costs (Stripe, OpenAI, etc.)	Economic and practical constraint
Services that cannot run in containers (proprietary systems)	Technical impossibility
Pure functions with no I/O (unit tests)	No external dependency to mock — this is standard unit testing

Approval Process

Every use of mocking requires a structured comment in the test file:

# MOCK APPROVED: OpenAI API - cost and rate limit concerns
# Approved by: [name] on [date]
# Alternative: Set REAL_OPENAI=1 to run against real API
@pytest.fixture
def mock_openai():
    """Mocked OpenAI client for cost-sensitive tests."""
    with respx.mock:
        respx.post("https://api.openai.com/v1/chat/completions").mock(
            return_value=httpx.Response(200, json={"choices": [...]})
        )
        yield

The approval comment must include:

What is being mocked and why (the MOCK APPROVED: line)
Who approved and when (the Approved by: line)
How to run against the real service when needed (the Alternative: line)

Files using mocks without the approval comment are flagged during code review.

5. Testcontainers Fixtures

Testcontainers provide ephemeral, real instances of infrastructure services: start a real Docker container, run tests against it, tear it down. This is the standard approach for all integration tests.

In one line: container fixtures are session-scoped (slow to start); session/client fixtures are function-scoped and roll back for per-test isolation.

Prerequisites

Docker must be running on the development machine
Python package: testcontainers (add to dev dependencies)

PostgreSQL Fixture

# conftest.py
import pytest
from testcontainers.postgres import PostgresContainer
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker


@pytest.fixture(scope="session")
def postgres_container():
    """Start a real PostgreSQL instance for the test session."""
    with PostgresContainer("postgres:16-alpine") as postgres:
        yield postgres


@pytest.fixture(scope="session")
def db_url(postgres_container):
    """Async database URL from the running container."""
    sync_url = postgres_container.get_connection_url()
    # Convert psycopg2 URL to asyncpg URL
    return sync_url.replace("psycopg2", "asyncpg")


@pytest.fixture
async def db_session(db_url):
    """Per-test async database session with automatic rollback."""
    engine = create_async_engine(db_url)

    # Create tables (use your project's Base.metadata)
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

    async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
    async with async_session() as session:
        yield session
        await session.rollback()

    await engine.dispose()

MinIO (S3-Compatible) Fixture

# conftest.py
import pytest
import boto3
from testcontainers.minio import MinioContainer


@pytest.fixture(scope="session")
def minio_container():
    """Start a real MinIO instance for the test session."""
    with MinioContainer() as minio:
        yield minio


@pytest.fixture
def s3_client(minio_container):
    """S3 client configured to talk to the test MinIO instance."""
    config = minio_container.get_config()
    client = boto3.client(
        "s3",
        endpoint_url=config["endpoint"],
        aws_access_key_id=config["access_key"],
        aws_secret_access_key=config["secret_key"],
    )
    # Create a test bucket
    client.create_bucket(Bucket="test-bucket")
    yield client

Session vs Function Scope

Use scope="session" for container fixtures — containers are expensive to start (2-5 seconds each). Use scope="function" (default) for session/client fixtures so each test gets a clean state.

# Container: starts once per test session
@pytest.fixture(scope="session")
def postgres_container(): ...

# Session: new per test, rolls back for isolation
@pytest.fixture  # scope="function" is the default
async def db_session(postgres_container): ...

6. Temporal Workflow Testing

Workflow-engine code requires specialized testing patterns. The engine's Python SDK provides a built-in testing environment that runs an in-memory server with time-skipping capabilities — drive timers through the SDK, never with external clock manipulation.

In one line: use the engine's time-skipping WorkflowEnvironment; never freezegun or time.sleep() for engine timers, and poll state with short async sleeps when you must wait.

The Correct Pattern: `WorkflowEnvironment.start_time_skipping()`

import pytest
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker
from temporalio.client import Client

from app.workflows.compliance_case import ComplianceCaseWorkflow
from app.workflows.activities import (
    process_documents,
    run_enrichment_lookup,
    generate_follow_up_tasks,
)


@pytest.fixture
async def temporal_env():
    """In-memory Temporal server with automatic time-skipping."""
    async with await WorkflowEnvironment.start_time_skipping() as env:
        yield env


@pytest.fixture
async def temporal_worker(temporal_env):
    """Worker registered with the workflow and all activities."""
    async with Worker(
        temporal_env.client,
        task_queue="test-queue",
        workflows=[ComplianceCaseWorkflow],
        activities=[
            process_documents,
            run_enrichment_lookup,
            generate_follow_up_tasks,
        ],
    ) as worker:
        yield worker


async def test_workflow_happy_path(temporal_env, temporal_worker):
    """Test the full compliance workflow from creation to approval."""
    # Start the workflow
    handle = await temporal_env.client.start_workflow(
        ComplianceCaseWorkflow.run,
        args=["case-123", "company-456"],
        id="test-workflow-1",
        task_queue="test-queue",
    )

    # Signal: documents submitted
    await handle.signal(ComplianceCaseWorkflow.documents_submitted, {
        "files": ["invoice.pdf", "registration.pdf"],
    })

    # Query: check status after processing
    status = await handle.query(ComplianceCaseWorkflow.get_status)
    assert status["state"] == "REVIEW_PENDING"
    assert status["iteration"] == 1

    # Signal: officer approves
    await handle.signal(ComplianceCaseWorkflow.officer_decision, {
        "decision": "APPROVED",
        "notes": "All documents verified.",
    })

    # Wait for workflow completion
    result = await handle.result()
    assert result["final_status"] == "APPROVED"

What NOT to Use

Do not use freezegun for workflow tests. The engine manages its own internal clock for timers, retries, and deadlines. External clock manipulation (freezegun, unittest.mock.patch("time.time"), manual datetime stubbing) does not affect the engine's internal timers and produces incorrect test behavior.

Do not use time.sleep() to wait for workflow state transitions. The time-skipping environment advances timers automatically. To wait for a specific state, query the workflow in a loop with a short async sleep:

import asyncio

async def wait_for_status(handle, expected_status, timeout=10.0):
    """Wait for workflow to reach a specific status."""
    deadline = asyncio.get_event_loop().time() + timeout
    while asyncio.get_event_loop().time() < deadline:
        status = await handle.query(ComplianceCaseWorkflow.get_status)
        if status["state"] == expected_status:
            return status
        await asyncio.sleep(0.1)  # Short poll interval, not a delay
    raise TimeoutError(f"Workflow did not reach {expected_status} within {timeout}s")

Testing Signals and Queries

Workflows communicate via signals (inbound events) and queries (state inspection). Test both explicitly:

async def test_workflow_follow_up_loop(temporal_env, temporal_worker):
    """Test that follow-up creates a new iteration."""
    handle = await temporal_env.client.start_workflow(
        ComplianceCaseWorkflow.run,
        args=["case-789", "company-012"],
        id="test-workflow-follow-up",
        task_queue="test-queue",
    )

    # First iteration: submit docs, wait for review
    await handle.signal(ComplianceCaseWorkflow.documents_submitted, {
        "files": ["partial-doc.pdf"],
    })

    status = await wait_for_status(handle, "REVIEW_PENDING")
    assert status["iteration"] == 1

    # Officer requests follow-up
    await handle.signal(ComplianceCaseWorkflow.officer_decision, {
        "decision": "FOLLOW_UP_REQUIRED",
        "notes": "Missing bank statement.",
        "required_documents": ["bank_statement"],
    })

    # Workflow should loop back to AWAITING_DOCUMENTS
    status = await wait_for_status(handle, "AWAITING_DOCUMENTS")
    assert status["iteration"] == 2

7. Async Python Testing Patterns

pytest-asyncio Configuration

Set asyncio_mode=auto in your pytest configuration. This eliminates the need for @pytest.mark.asyncio decorators on every async test function.

# pytest.ini
[pytest]
asyncio_mode = auto

With this setting, all async def test_* functions are automatically recognized as async tests:

# No decorator needed — asyncio_mode=auto handles it
async def test_create_case(db_session):
    case = await case_service.create(db_session, company_name="Test Corp")
    assert case.id is not None

asyncpg Compatibility: CAST Syntax

In one line: with asyncpg, use CAST(:param AS type) — the ::type cast operator breaks on bound parameters.

asyncpg does not support the :: cast operator with bound parameters.

Forbidden:

# This will raise asyncpg.exceptions.InvalidSQLStatementError
await session.execute(
    text("INSERT INTO cases (metadata) VALUES (:data::jsonb)"),
    {"data": json.dumps({"key": "value"})},
)

Correct:

# Use CAST() function instead
await session.execute(
    text("INSERT INTO cases (metadata) VALUES (CAST(:data AS jsonb))"),
    {"data": json.dumps({"key": "value"})},
)

This applies to all ::type casts in parameterized SQL: ::jsonb, ::uuid, ::integer, ::text[], etc. Always use CAST(:param AS type) instead.

Async Fixture Patterns

@pytest.fixture
async def case_with_documents(db_session, s3_client):
    """Create a case with uploaded documents for testing."""
    case = await case_service.create(db_session, company_name="Test Corp")

    # Upload a test document to MinIO
    s3_client.put_object(
        Bucket="test-bucket",
        Key=f"{case.id}/iteration-1/invoice.pdf",
        Body=b"%PDF-1.4 test content",
    )

    await case_service.update_status(db_session, case.id, "DOCUMENTS_RECEIVED")
    await db_session.commit()

    yield case

    # Cleanup happens via session rollback (db) and container teardown (MinIO)

8. HTTP Boundary Mocking

For services that call external HTTP APIs (third-party enrichment, AI providers), mock at the HTTP transport boundary using respx (for httpx) or responses (for requests). Never mock the service function itself — only the HTTP call it makes.

In one line: mock the wire, not the function — so the request assembly, response parsing, and business rules still get tested.

The Pattern: Mock the Wire, Not the Function

import httpx
import respx
import pytest

from app.workflows.activities import run_enrichment_lookup


# MOCK APPROVED: enrichment API - external service, not available in test environment
# Approved by: [architect] on [date]
# Alternative: Set ENRICHMENT_MOCK_MODE=false with the real enrichment service running
@pytest.fixture
def mock_enrichment_api():
    """Mock the external enrichment API HTTP boundary."""
    with respx.mock:
        respx.post("https://enrichment.example.com/api/lookup").mock(
            return_value=httpx.Response(
                200,
                json={
                    "findings": [
                        {
                            "source": "commercial_register",
                            "entity": "Test Corp BV",
                            "status": "active",
                            "confidence": 0.95,
                        }
                    ],
                    "risk_score": 0.3,
                },
            )
        )
        yield


async def test_enrichment_lookup_processes_findings(mock_enrichment_api, db_session):
    """Test that the activity correctly processes the enrichment API response."""
    result = await run_enrichment_lookup(
        case_id="case-123",
        company_name="Test Corp BV",
        documents_markdown=["# Invoice\nAmount: EUR 10,000"],
    )

    # Verify the activity processed the response correctly
    assert result.risk_score == 0.3
    assert len(result.findings) == 1
    assert result.findings[0].source == "commercial_register"

Why This Pattern

The function under test contains real business logic: it assembles the request, calls the API, parses the response, applies business rules, and returns a structured result. Mocking only the HTTP wire exercises all of that logic. Mocking the function itself would test nothing.

respx vs httpx.MockTransport

Both work. respx is preferred for its declarative API:

# respx: declarative, context-managed
with respx.mock:
    respx.get("https://api.example.com/data").mock(
        return_value=httpx.Response(200, json={"key": "value"})
    )
    # tests run here

# httpx.MockTransport: lower-level, useful for complex routing
def handler(request):
    if request.url.path == "/data":
        return httpx.Response(200, json={"key": "value"})
    return httpx.Response(404)

transport = httpx.MockTransport(handler)
client = httpx.AsyncClient(transport=transport)

9. Security Tests

Required in Production mode for any application handling user data or implementing multi-tenancy. Strongly recommended (not enforced) in PoC mode for auth and tenant-isolation endpoints.

In one line: test that protected endpoints reject unauthenticated requests (401), cross-tenant access is blocked (403/404), PII never reaches logs, and RLS filters rows at the DB.

Authentication Enforcement

Every protected endpoint must be tested without credentials to verify it returns 401:

async def test_endpoint_requires_auth(client):
    """Verify that unauthenticated requests are rejected."""
    response = await client.get("/api/cases")
    assert response.status_code == 401

Tenant Isolation

Multi-tenant applications must verify that users from one tenant cannot access another tenant's data:

async def test_tenant_isolation(client, tenant_a_token, tenant_b_case):
    """Verify that Tenant A cannot access Tenant B's case."""
    response = await client.get(
        f"/api/cases/{tenant_b_case.id}",
        headers={"Authorization": f"Bearer {tenant_a_token}"},
    )
    assert response.status_code in (403, 404)
    # 403 = explicitly denied; 404 = filtered by RLS (both are acceptable)

PII Leak Prevention

Sensitive data must never appear in log output:

def test_pii_not_in_logs(caplog):
    """Verify that PII is redacted from log output."""
    with caplog.at_level("DEBUG"):
        user = UserService.create(
            email="private@example.com",
            national_id="85.07.15-123.45",
        )

    log_text = caplog.text
    assert "85.07.15-123.45" not in log_text
    assert "private@example.com" not in log_text

Row-Level Security Verification

For PostgreSQL RLS-protected tables, verify that the policy enforcement works at the database level:

async def test_rls_prevents_cross_tenant_access(db_session_tenant_a, tenant_b_case_id):
    """Verify that RLS prevents Tenant A from querying Tenant B's data."""
    result = await db_session_tenant_a.execute(
        text("SELECT * FROM cases WHERE id = :id"),
        {"id": tenant_b_case_id},
    )
    rows = result.fetchall()
    assert len(rows) == 0  # RLS filters the row — it's invisible, not forbidden

10. API Contract Alignment

In full-stack apps, the Python (Pydantic) and TypeScript type systems must stay aligned. A mismatch between backend response shapes and frontend types causes runtime errors that are invisible to backend and frontend tests individually.

In one line: Pydantic models are the contract source of truth; any PR changing a response model updates the matching TypeScript type in the same PR, enforced at code review.

The Problem

# Backend (Python) — added a new field
class CaseResponse(BaseModel):
    id: str
    status: str
    risk_score: float  # New field added in latest sprint

// Frontend (TypeScript) — not updated
interface CaseResponse {
  id: string;
  status: string;
  // risk_score is missing — frontend code accessing it gets undefined
}

The Practice

Pydantic models are the source of truth. The backend defines the API contract via Pydantic response models. FastAPI generates an OpenAPI schema from these models automatically.
TypeScript interfaces must mirror Pydantic models. When a Pydantic model changes, the corresponding TypeScript interface must be updated in the same PR.
Review gate. During code review, any PR that modifies a Pydantic response model must include the corresponding TypeScript type update. Reviewers check for this explicitly.
OpenAPI as alignment mechanism. FastAPI's auto-generated /docs endpoint provides the canonical API contract. Use it to verify alignment:

# Export the OpenAPI schema
curl http://localhost:8002/openapi.json > openapi.json

# Compare against frontend types (manual or via tooling)

This is a process discipline, not an automated tool. The methodology does not prescribe a specific type generation tool because the right choice depends on your stack and build system. What it does prescribe: Pydantic and TypeScript types must match, and code review enforces this.

11. Evidence Requirements

Every response that includes code changes must include actual test output. This is how the methodology verifies work is complete — not optional.

In one line: paste real pytest output and a term-missing coverage report; "tests should pass" and green-checkmark screenshots are not evidence.

Minimum Evidence (Both Modes)

## Test Results

$ pytest tests/test_feature.py -v PASSED tests/test_feature.py::test_create_feature PASSED tests/test_feature.py::test_feature_validation PASSED tests/test_feature.py::test_feature_error_handling === 3 passed in 1.25s ===

## Coverage

$ pytest --cov=app.feature --cov-report=term-missing Name Stmts Miss Cover Missing

app/feature.py 50 5 90% 42-44, 67-68

TOTAL 50 5 90%

Production Mode Additional Evidence

Production mode also requires evidence of failure path coverage and security test results when applicable:

## Security Tests

$ pytest tests/security/ -v PASSED tests/security/test_auth.py::test_endpoint_requires_auth PASSED tests/security/test_auth.py::test_tenant_isolation PASSED tests/security/test_auth.py::test_pii_not_in_logs === 3 passed in 2.10s ===

What Constitutes Valid Evidence

Actual pytest output (copy-pasted, not paraphrased)
Coverage report with line-level detail (term-missing format)
All tests passing (exit code 0)
Coverage meeting the applicable threshold

Not valid evidence:

"Tests should pass" or "I believe this works"
Screenshots of green checkmarks without detail
Coverage reports that exclude the changed files

12. Frontend Testing

Stack

Test runner: Jest
Component testing: React Testing Library
API mocking: jest.mock for API client modules

Patterns

// Component test: verify rendering and interaction
import { render, screen, fireEvent } from "@testing-library/react";
import { RecordList } from "@/components/dashboard/RecordList";

// Mock the API module (not individual fetch calls)
jest.mock("@/lib/api", () => ({
  getRecords: jest.fn().mockResolvedValue([
    { id: "record-1", company_name: "Test Corp", status: "REVIEW_PENDING" },
  ]),
}));

test("renders record list with status badges", async () => {
  render(<RecordList />);

  expect(await screen.findByText("Test Corp")).toBeInTheDocument();
  expect(screen.getByText("REVIEW_PENDING")).toBeInTheDocument();
});

test("clicking a record navigates to detail view", async () => {
  const mockRouter = { push: jest.fn() };
  render(<RecordList router={mockRouter} />);

  const recordRow = await screen.findByText("Test Corp");
  fireEvent.click(recordRow);

  expect(mockRouter.push).toHaveBeenCalledWith("/dashboard/record-1");
});

Frontend Mock Policy

In one line: mock the typed API client module (it's a real network boundary); never mock fetch/axios globally; use a real backend for E2E.

API calls from the frontend cross a network boundary — they are external I/O. Mock the typed API client module, not the fetch implementation:

// Acceptable: mock the typed API client
jest.mock("@/lib/api");

// Not acceptable: mock fetch/axios globally
jest.mock("axios"); // Too broad — hides real integration issues

For E2E tests (Production mode), use a real backend instance. Frontend E2E tests with mocked backends provide limited value.

13. Quick Reference

Commands

# Run all tests
pytest tests/ -v

# Run single file
pytest tests/test_workflow.py -v

# Run single test
pytest tests/test_workflow.py::test_happy_path -v

# Coverage (PoC threshold)
pytest --cov=app --cov-fail-under=70 --cov-report=term-missing

# Coverage (Production threshold)
pytest --cov=app --cov-fail-under=90 --cov-report=term-missing

# Include slow tests
RUN_SLOW_TESTS=1 pytest tests/ -v

# Frontend tests
npm test
npm run test:watch

Coverage Targets Summary

Mode	Line	Failure Branch	Real Integration
PoC	70%	--	--
Production	90%	85%	80%

Forbidden List (Without `MOCK APPROVED` Comment)

Forbidden	Use Instead
`unittest.mock.Mock()` for services	Testcontainers
`MagicMock` for DB/API clients	Testcontainers
SQLite as PostgreSQL substitute	`PostgresContainer`
In-memory databases	`PostgresContainer`
`fakeredis`	`RedisContainer`
`moto` (AWS mock)	`MinioContainer` or `LocalStackContainer`
`time.sleep()` in tests	`start_time_skipping()` / async polling
`freezegun` for workflow-engine tests	`WorkflowEnvironment.start_time_skipping()`
`::jsonb` cast with asyncpg	`CAST(:param AS jsonb)`

Mock Approval Template

# MOCK APPROVED: [Service Name] - [reason]
# Approved by: [name] on [YYYY-MM-DD]
# Alternative: [how to run against real service]

End of Appendix A: Testing Standard

Table of Contents​

1. Core Principles​

Principle 1: Tests Must Exist Before Merge​

Principle 2: Tests Must Catch Real Bugs​

Principle 3: Real Services Always (No Mocking by Default)​

Principle 4: Deterministic Tests Only​

Principle 5: Fast by Default​

2. PoC vs Production Mode​

PoC Mode​

Production Mode​

3. Coverage Targets​

By Mode​

By Layer (PoC Mode Overrides)​

Verification Commands​

4. No Mocking by Default​

The Rule​

What Is Forbidden (Without Approval)​

When Mocking Is Allowed​

Approval Process​

5. Testcontainers Fixtures​

Prerequisites​

PostgreSQL Fixture​

MinIO (S3-Compatible) Fixture​

Session vs Function Scope​

6. Temporal Workflow Testing​

The Correct Pattern: WorkflowEnvironment.start_time_skipping()​

What NOT to Use​

Testing Signals and Queries​

7. Async Python Testing Patterns​

pytest-asyncio Configuration​

asyncpg Compatibility: CAST Syntax​

Async Fixture Patterns​

8. HTTP Boundary Mocking​

The Pattern: Mock the Wire, Not the Function​

Why This Pattern​

respx vs httpx.MockTransport​

9. Security Tests​

Authentication Enforcement​

Tenant Isolation​

PII Leak Prevention​

Row-Level Security Verification​

10. API Contract Alignment​

The Problem​

The Practice​

11. Evidence Requirements​

Minimum Evidence (Both Modes)​

$ pytest --cov=app.feature --cov-report=term-missing Name Stmts Miss Cover Missing​

app/feature.py 50 5 90% 42-44, 67-68​

Production Mode Additional Evidence​

What Constitutes Valid Evidence​

12. Frontend Testing​

Stack​

Patterns​

Frontend Mock Policy​

13. Quick Reference​

Commands​

Coverage Targets Summary​

Forbidden List (Without MOCK APPROVED Comment)​

Mock Approval Template​

Table of Contents