Skip to main content

S4U Methodology — Showcase & Evidence

Case studies and evidence narratives moved out of the agent-facing canon (2026-06-12, canon v3): agents load rules (operating-card.md) and procedures (skills/); humans read this for the story and the numbers.

Case study: Trust Relay regulatory compliance architecture (former §12)

Trust Relay is a KYB/KYC compliance platform operating under three regulatory frameworks. These regulations are not features to add later — they are architectural constraints that shape every design decision from the database schema to the API response format.


Applicable Regulations

EU AI Act (Annex III — High-Risk AI System). Trust Relay's AI-driven risk assessment qualifies as a high-risk AI system. The Act imposes specific requirements:

  • Article 11 (Technical Documentation): Complete documentation of the AI system's design, development, and performance. Every model version, every prompt template, every training dataset must be recorded.
  • Article 12 (Automatic Logging): All AI operations must be logged automatically, with sufficient granularity to reconstruct any individual decision.
  • Article 13 (Transparency): Users must understand that they are interacting with an AI system and how AI decisions affect them.
  • Article 14 (Human Oversight): Human officers must be able to override any AI decision. The system must support meaningful human review, not rubber-stamping.
  • Article 15 (Accuracy & Robustness): Ongoing monitoring of AI system accuracy, with documented methodology for measuring and reporting performance.

GDPR (Articles 22, 25, 30, 35). Automated decision-making about individuals triggers specific rights:

  • Article 22: Right to human review of automated decisions. Every AI-generated risk assessment must be reviewable by a human compliance officer.
  • Article 25: Data protection by design. Privacy considerations are architectural constraints, not post-hoc additions.
  • Article 30: Records of processing activities. Every data processing operation must be documented.
  • Article 35: Data Protection Impact Assessment required for high-risk processing.

AML Directives (6AMLD / AMLR). Anti-money laundering regulations impose record-keeping and audit trail requirements:

  • 5-year minimum retention for all KYB/KYC records.
  • Audit trail supporting Suspicious Activity Report (SAR) generation.
  • Documented risk-based methodology for risk assessment — the methodology itself must be auditable, not just the results.

The Five Requirements for Every AI Output

Every AI-driven decision, recommendation, or risk assessment in the system must satisfy five requirements. These are non-negotiable architectural constraints — the system is designed so that producing an AI output without meeting all five requirements is structurally impossible.

1. Input Provenance. What data was the decision based on? Every AI output records its input sources as SourcedFact and EvidenceReference objects — structured references to the specific documents, database records, or external data sources that contributed to the decision. A regulator reviewing an AI risk assessment can trace every claim to its source data.

2. Model Identification. Which model, which version, which prompt template? Every AI execution records the model identifier (e.g., claude-opus-4-20250514), the prompt template name, and the prompt version ID. When a prompt template is updated, the prompt_version_id foreign key ensures that historical decisions reference the exact prompt that was used, not the current version.

3. Chain of Thought. The full reasoning captured. PydanticAI's all_messages() method records the complete conversation between the system and the AI model — the prompt, the model's response, any tool calls, and the final output. This is stored as immutable evidence, not as a summary.

4. Confidence Scoring. Quantified certainty with documented methodology. Every AI assessment includes a confidence score with a methodology reference explaining how the score was calculated. The scoring methodology is itself an auditable artifact — a regulator can evaluate not just the score but the method that produced it.

5. Immutable Audit Log. An append-only audit_events table records every state transition in the compliance workflow. The table schema prevents updates and deletes — the immutability guarantee is enforced at the database level, not by application code that might forget to call the audit function. This means the audit trail is tamper-evident: any gap in the sequence of events is detectable.


The Non-Suppression Principle

The system can ADD scrutiny but NEVER suppress risk signals. This is the foundational design constraint for all AI-driven risk assessment in the platform.

Concretely: if an AI model identifies a risk indicator (a sanctions match, a negative media mention, a registration anomaly), the system records the indicator and presents it to the compliance officer. A subsequent AI analysis that does not find the same risk does not remove the indicator — it adds a second opinion alongside the first. The compliance officer sees both and makes the final determination.

Any AI recommendation that would reduce the level of scrutiny applied to a case must be:

  1. Traceable to specific evidence that justifies the reduction (not just "the model thinks the risk is low")
  2. Flagged for human review before taking effect
  3. Recorded in the audit trail with the full reasoning chain

This principle exists because the regulatory consequences of suppressing a legitimate risk signal (missed SAR filing, compliance failure) are orders of magnitude more severe than the operational cost of investigating a false positive. The system is architecturally biased toward caution.

Evidence: Trust Relay implements all five requirements through its evidence bundle system (SourcedFact, EvidenceReference, evidence_service.py), the prompt_version_id foreign key pattern across AI execution records, PydanticAI all_messages() capture, the confidence scoring methodology (Pillar 1), and the append-only audit_events table with 33 RLS-protected tables. See appendix-f-evidence.md for the architectural rigor metrics.


Evidence metrics (former appendix-f)

All metrics collected from the Trust Relay codebase on 2026-03-21, pinned to commit 2d98ebb on branch master. Every metric includes the exact command used to collect it, enabling independent verification.

Scope: backend/app/ for production Python code, frontend/src/ for production TypeScript (excluding tests, dependencies, and generated files).


1. Codebase Scale

MetricCount
Total lines of code144,821
Backend Python69,985 LOC (237 files)
Frontend TypeScript/TSX74,836 LOC (304 files)
API endpoints233
API router files40 (excluding deps/ and __init__.py)
ORM models (SQLAlchemy 2.0)43
Service modules102 (excluding __init__.py)
Pydantic model files35

Collection commands

# Backend file count and LOC
find backend/app -name "*.py" -not -path "*__pycache__*" | wc -l
find backend/app -name "*.py" -not -path "*__pycache__*" | xargs wc -l | tail -1

# Frontend file count and LOC
find frontend/src \( -name "*.ts" -o -name "*.tsx" \) | wc -l
find frontend/src \( -name "*.ts" -o -name "*.tsx" \) | xargs wc -l | tail -1

# API endpoints
grep -r "@router\.\(get\|post\|put\|patch\|delete\)" backend/app/api/ --include="*.py" | wc -l

# API router files
find backend/app/api -name "*.py" -not -name "__init__.py" -not -path "*/deps/*" | wc -l

# ORM models
grep -c "^class.*Base):" backend/app/db/models.py

# Service modules
find backend/app/services -name "*.py" -not -name "__init__.py" | wc -l

# Pydantic model files
find backend/app/models -name "*.py" | wc -l

2. Development Velocity

MetricValue
Total commits1,264
Development timeframe29 calendar days, 25 active development days (2026-02-20 to 2026-03-21)
Commits per week158 (~22/day on active days)
AI co-authored commits1,211 (95.8%)
Commit conventionConventional commits (feat/fix/docs/test with scope)

Collection commands

# Total commits
git rev-list --count master

# First and last commit dates
git log --reverse --format="%ai" | head -1
git log -1 --format="%ai"

# Active development days
git log --format="%ad" --date=short | sort -u | wc -l

# AI co-authored commits
git log --all --grep="Co-Authored-By" --oneline | wc -l

3. Testing & Quality

MetricCount
Backend test files225
Backend test functions3,769
Frontend test files59
Total test files284
Documented mock approval comments241 (across 232 test files)
Files using testcontainers82 (290 total references)

Collection commands

# Backend test files
find backend/tests -name "test_*.py" | wc -l

# Backend test functions
grep -r "def test_" backend/tests/ | wc -l

# Frontend test files
find frontend/src \( -name "*.test.*" -o -name "*.spec.*" \) | wc -l

# Documented mock approvals (total comments)
grep -r "MOCK APPROVED" backend/tests/ | wc -l

# Files containing mock approvals
grep -rl "MOCK APPROVED" backend/tests/ | wc -l

# Files using testcontainers
grep -r "testcontainers\|TestContainer\|PostgresContainer" backend/ --include="*.py" -l | wc -l

4. Architectural Rigor

MetricCount
Architecture Decision Records17 (with supersession tracking)
Alembic database migrations31
RLS-protected tables33 (22 core + 2 diagnostics + 9 added in migrations 023-030)
Completed architectural pillars6 of 7 planned

Collection commands

# ADR count
ls docs/adr/ | grep -c "^ADR-"

# Alembic migrations
ls backend/alembic/versions/*.py | wc -l

# RLS tables (grep + manual counting of TENANT_TABLES, DIAGNOSTICS_TABLES,
# and individual statements in migrations 023-030)

5. Living Documentation

MetricCount
Total Docusaurus documents72
Architecture documents30
Architecture Decision Records17
API Reference documents8
Strategy & business documents6
Feature showcase documents2
Documentation commits (Mar 2-18)20+
Publicly deployed attrust-relay.pages.dev
Code-to-documentation commit ratio~1:1

6. Methodology Infrastructure

ComponentCount
Custom agent definitions18 (14 global + 4 project)
Persistent memory files22
Superpowers lifecycle skills14
Quality gate hook layers3
MCP server integrations3

7. Updated Metrics (March 2026)

MetricValue
Architecture pages with structured frontmatter35
Components mapped in architecture index56
Backend coverage (documented/total)38/153 (25%)
Plugins installed7 (Superpowers, code-review, code-simplifier, typescript-lsp, Serena, explanatory-output-style, context7)
MCP servers configured2 (Neo4j, Temporal)
AGENTS.md cross-tool compatibilityYes (Linux Foundation standard)
Verification tools7 (OpenSanctions, jurisdiction risk, email security, Wayback, consumer reviews, Interpol, virtual office)
Network investigationEVOI-driven recursive scanning (16 entities, 4 countries, 50 directors)

Velocity Context

These metrics represent work by a single architect (Adrian, Soft4U BV) collaborating with Claude Opus via the methodology described in this document over 25 active development days within a 29-day calendar period. The 95.8% co-authoring rate reflects the human-AI collaboration model described in Section 2.2 of the hub document: the human architects, reviews, and validates; Claude implements, tests, and iterates.

This methodology specification was itself designed using the brainstorm-to-spec-to-plan lifecycle it describes, authored collaboratively with Claude Opus, and reviewed by project agents — a practical demonstration of the process.