Testing Standard (Summary)

In one line: No mocking by default, deterministic tests only, actual output as evidence, fast by default — the enforcement behind "evidence over claims" (Section 2.3).

The testing standard governs how tests are written, what counts as valid, and what thresholds apply.

Key Principles:

No mocking by default. Mocking and in-memory databases are forbidden unless explicitly approved with a documented MOCK APPROVED comment that states the reason, the approver, the date, and the alternative for running against real services. Use testcontainers for PostgreSQL, MinIO, Redis, and Temporal.
Deterministic tests only. No time.sleep() in tests. For time-dependent behavior, use the project's standard time-control library (canonical default per Section 4.5: freezegun; Temporal-workflow tests use WorkflowEnvironment.start_time_skipping() instead). Tests must produce the same result regardless of execution timing.
Evidence requirements. Every implementation response must include actual test output — pytest results and coverage report with timestamps. The statement "tests should pass" is not evidence.
Fast by default. Tests that require more than 30 seconds are marked with @pytest.mark.slow and excluded from the default test run. Long-running integration tests are opt-in, not mandatory for every commit.

PoC vs Production Mode:

Dimension	PoC Mode	Production Mode
Test timing	Code first, tests after	TDD encouraged
Edge case tests	Skip initially	Required
E2E tests	Skip	Required for user-facing apps
Security tests	Deferred	Required for auth/data endpoints
Failure branch coverage	Not measured	85% minimum

Coverage Targets:

Layer	PoC	Production
Workflow state machine + activities	90%	90%
FastAPI endpoints	70%	90%
React components	70%	90%
Integration layers (document parsing, object storage)	70%	90%
Line coverage (overall)	70%	90%
Failure branch coverage	--	85%
Real integration ratio	--	80%

Evidence: Enforced by the testcontainers requirement and the MOCK APPROVED annotation pattern — a mock without the structured approval comment is flagged at review, and the pre-push gate requires fresh passing output. Reproducible: add a bare mock and watch review flag it. See appendix-a-testing.md.

Full specification: appendix-a-testing.md