S4U Methodology — Showcase & Evidence
Case studies and evidence narratives moved out of the agent-facing canon (2026-06-12, canon v3): agents load rules (operating-card.md) and procedures (skills/); humans read this for the story and the numbers.
Case study: Trust Relay regulatory compliance architecture (former §12)
Trust Relay is a KYB/KYC compliance platform operating under three regulatory frameworks. These regulations are not features to add later — they are architectural constraints that shape every design decision from the database schema to the API response format.
Applicable Regulations
EU AI Act (Annex III — High-Risk AI System). Trust Relay's AI-driven risk assessment qualifies as a high-risk AI system. The Act imposes specific requirements:
- Article 11 (Technical Documentation): Complete documentation of the AI system's design, development, and performance. Every model version, every prompt template, every training dataset must be recorded.
- Article 12 (Automatic Logging): All AI operations must be logged automatically, with sufficient granularity to reconstruct any individual decision.
- Article 13 (Transparency): Users must understand that they are interacting with an AI system and how AI decisions affect them.
- Article 14 (Human Oversight): Human officers must be able to override any AI decision. The system must support meaningful human review, not rubber-stamping.
- Article 15 (Accuracy & Robustness): Ongoing monitoring of AI system accuracy, with documented methodology for measuring and reporting performance.
GDPR (Articles 22, 25, 30, 35). Automated decision-making about individuals triggers specific rights:
- Article 22: Right to human review of automated decisions. Every AI-generated risk assessment must be reviewable by a human compliance officer.
- Article 25: Data protection by design. Privacy considerations are architectural constraints, not post-hoc additions.
- Article 30: Records of processing activities. Every data processing operation must be documented.
- Article 35: Data Protection Impact Assessment required for high-risk processing.
AML Directives (6AMLD / AMLR). Anti-money laundering regulations impose record-keeping and audit trail requirements:
- 5-year minimum retention for all KYB/KYC records.
- Audit trail supporting Suspicious Activity Report (SAR) generation.
- Documented risk-based methodology for risk assessment — the methodology itself must be auditable, not just the results.
The Five Requirements for Every AI Output
Every AI-driven decision, recommendation, or risk assessment in the system must satisfy five requirements. These are non-negotiable architectural constraints — the system is designed so that producing an AI output without meeting all five requirements is structurally impossible.
1. Input Provenance. What data was the decision based on? Every AI output records its input sources as SourcedFact and EvidenceReference objects — structured references to the specific documents, database records, or external data sources that contributed to the decision. A regulator reviewing an AI risk assessment can trace every claim to its source data.
2. Model Identification. Which model, which version, which prompt template? Every AI execution records the model identifier (e.g., claude-opus-4-20250514), the prompt template name, and the prompt version ID. When a prompt template is updated, the prompt_version_id foreign key ensures that historical decisions reference the exact prompt that was used, not the current version.
3. Chain of Thought. The full reasoning captured. PydanticAI's all_messages() method records the complete conversation between the system and the AI model — the prompt, the model's response, any tool calls, and the final output. This is stored as immutable evidence, not as a summary.
4. Confidence Scoring. Quantified certainty with documented methodology. Every AI assessment includes a confidence score with a methodology reference explaining how the score was calculated. The scoring methodology is itself an auditable artifact — a regulator can evaluate not just the score but the method that produced it.
5. Immutable Audit Log. An append-only audit_events table records every state transition in the compliance workflow. The table schema prevents updates and deletes — the immutability guarantee is enforced at the database level, not by application code that might forget to call the audit function. This means the audit trail is tamper-evident: any gap in the sequence of events is detectable.
The Non-Suppression Principle
The system can ADD scrutiny but NEVER suppress risk signals. This is the foundational design constraint for all AI-driven risk assessment in the platform.
Concretely: if an AI model identifies a risk indicator (a sanctions match, a negative media mention, a registration anomaly), the system records the indicator and presents it to the compliance officer. A subsequent AI analysis that does not find the same risk does not remove the indicator — it adds a second opinion alongside the first. The compliance officer sees both and makes the final determination.
Any AI recommendation that would reduce the level of scrutiny applied to a case must be:
- Traceable to specific evidence that justifies the reduction (not just "the model thinks the risk is low")
- Flagged for human review before taking effect
- Recorded in the audit trail with the full reasoning chain
This principle exists because the regulatory consequences of suppressing a legitimate risk signal (missed SAR filing, compliance failure) are orders of magnitude more severe than the operational cost of investigating a false positive. The system is architecturally biased toward caution.
Evidence: Trust Relay implements all five requirements through its evidence bundle system (SourcedFact, EvidenceReference, evidence_service.py), the prompt_version_id foreign key pattern across AI execution records, PydanticAI all_messages() capture, the confidence scoring methodology (Pillar 1), and the append-only audit_events table with 33 RLS-protected tables. See appendix-f-evidence.md for the architectural rigor metrics.
Evidence metrics (former appendix-f)
All metrics collected from the Trust Relay codebase on 2026-03-21, pinned to commit 2d98ebb on branch master. Every metric includes the exact command used to collect it, enabling independent verification.
Scope: backend/app/ for production Python code, frontend/src/ for production TypeScript (excluding tests, dependencies, and generated files).
1. Codebase Scale
| Metric | Count |
|---|---|
| Total lines of code | 144,821 |
| Backend Python | 69,985 LOC (237 files) |
| Frontend TypeScript/TSX | 74,836 LOC (304 files) |
| API endpoints | 233 |
| API router files | 40 (excluding deps/ and __init__.py) |
| ORM models (SQLAlchemy 2.0) | 43 |
| Service modules | 102 (excluding __init__.py) |
| Pydantic model files | 35 |
Collection commands
# Backend file count and LOC
find backend/app -name "*.py" -not -path "*__pycache__*" | wc -l
find backend/app -name "*.py" -not -path "*__pycache__*" | xargs wc -l | tail -1
# Frontend file count and LOC
find frontend/src \( -name "*.ts" -o -name "*.tsx" \) | wc -l
find frontend/src \( -name "*.ts" -o -name "*.tsx" \) | xargs wc -l | tail -1
# API endpoints
grep -r "@router\.\(get\|post\|put\|patch\|delete\)" backend/app/api/ --include="*.py" | wc -l
# API router files
find backend/app/api -name "*.py" -not -name "__init__.py" -not -path "*/deps/*" | wc -l
# ORM models
grep -c "^class.*Base):" backend/app/db/models.py
# Service modules
find backend/app/services -name "*.py" -not -name "__init__.py" | wc -l
# Pydantic model files
find backend/app/models -name "*.py" | wc -l
2. Development Velocity
| Metric | Value |
|---|---|
| Total commits | 1,264 |
| Development timeframe | 29 calendar days, 25 active development days (2026-02-20 to 2026-03-21) |
| Commits per week | 158 (~22/day on active days) |
| AI co-authored commits | 1,211 (95.8%) |
| Commit convention | Conventional commits (feat/fix/docs/test with scope) |
Collection commands
# Total commits
git rev-list --count master
# First and last commit dates
git log --reverse --format="%ai" | head -1
git log -1 --format="%ai"
# Active development days
git log --format="%ad" --date=short | sort -u | wc -l
# AI co-authored commits
git log --all --grep="Co-Authored-By" --oneline | wc -l
3. Testing & Quality
| Metric | Count |
|---|---|
| Backend test files | 225 |
| Backend test functions | 3,769 |
| Frontend test files | 59 |
| Total test files | 284 |
| Documented mock approval comments | 241 (across 232 test files) |
| Files using testcontainers | 82 (290 total references) |
Collection commands
# Backend test files
find backend/tests -name "test_*.py" | wc -l
# Backend test functions
grep -r "def test_" backend/tests/ | wc -l
# Frontend test files
find frontend/src \( -name "*.test.*" -o -name "*.spec.*" \) | wc -l
# Documented mock approvals (total comments)
grep -r "MOCK APPROVED" backend/tests/ | wc -l
# Files containing mock approvals
grep -rl "MOCK APPROVED" backend/tests/ | wc -l
# Files using testcontainers
grep -r "testcontainers\|TestContainer\|PostgresContainer" backend/ --include="*.py" -l | wc -l
4. Architectural Rigor
| Metric | Count |
|---|---|
| Architecture Decision Records | 17 (with supersession tracking) |
| Alembic database migrations | 31 |
| RLS-protected tables | 33 (22 core + 2 diagnostics + 9 added in migrations 023-030) |
| Completed architectural pillars | 6 of 7 planned |
Collection commands
# ADR count
ls docs/adr/ | grep -c "^ADR-"
# Alembic migrations
ls backend/alembic/versions/*.py | wc -l
# RLS tables (grep + manual counting of TENANT_TABLES, DIAGNOSTICS_TABLES,
# and individual statements in migrations 023-030)
5. Living Documentation
| Metric | Count |
|---|---|
| Total Docusaurus documents | 72 |
| Architecture documents | 30 |
| Architecture Decision Records | 17 |
| API Reference documents | 8 |
| Strategy & business documents | 6 |
| Feature showcase documents | 2 |
| Documentation commits (Mar 2-18) | 20+ |
| Publicly deployed at | trust-relay.pages.dev |
| Code-to-documentation commit ratio | ~1:1 |
6. Methodology Infrastructure
| Component | Count |
|---|---|
| Custom agent definitions | 18 (14 global + 4 project) |
| Persistent memory files | 22 |
| Superpowers lifecycle skills | 14 |
| Quality gate hook layers | 3 |
| MCP server integrations | 3 |
7. Updated Metrics (March 2026)
| Metric | Value |
|---|---|
| Architecture pages with structured frontmatter | 35 |
| Components mapped in architecture index | 56 |
| Backend coverage (documented/total) | 38/153 (25%) |
| Plugins installed | 7 (Superpowers, code-review, code-simplifier, typescript-lsp, Serena, explanatory-output-style, context7) |
| MCP servers configured | 2 (Neo4j, Temporal) |
| AGENTS.md cross-tool compatibility | Yes (Linux Foundation standard) |
| Verification tools | 7 (OpenSanctions, jurisdiction risk, email security, Wayback, consumer reviews, Interpol, virtual office) |
| Network investigation | EVOI-driven recursive scanning (16 entities, 4 countries, 50 directors) |
Velocity Context
These metrics represent work by a single architect (Adrian, Soft4U BV) collaborating with Claude Opus via the methodology described in this document over 25 active development days within a 29-day calendar period. The 95.8% co-authoring rate reflects the human-AI collaboration model described in Section 2.2 of the hub document: the human architects, reviews, and validates; Claude implements, tests, and iterates.
This methodology specification was itself designed using the brainstorm-to-spec-to-plan lifecycle it describes, authored collaboratively with Claude Opus, and reviewed by project agents — a practical demonstration of the process.