ADR-0008: User Feedback and Think Harder

Status: Accepted

Context

No search system achieves perfect results on every query, including RAG systems (Lewis et al., 2020). Users need a mechanism to signal when they are dissatisfied with a response, and the system needs a way to attempt a better answer. Two distinct needs emerged from user testing:

Feedback collection: Understanding which responses are helpful and which are not, enabling continuous improvement
Immediate improvement: When a response is poor, users want a better answer now, not after the development team reviews their feedback

Decision

Implement a two-part feedback system:

Part 1: Thumbs Up / Thumbs Down

Every response includes a feedback widget with thumbs-up and thumbs-down buttons. This feedback is:

Stored per session (no authentication required for the public interface)
Aggregated in Prometheus metrics for quality trend analysis
Used to identify content areas with consistently poor satisfaction

Part 2: "Think Harder" Escalation

When a user signals dissatisfaction (thumbs down), a "Think Harder" button appears. Activating this triggers an escalated search pipeline that invests significantly more computational resources:

Normal vs. Escalated Pipeline

Parameter	Normal	Think Harder
Candidate retrieval	20 chunks (full mode)	100 chunks
Reranking	BGE-reranker-v2-m3 (full mode)	BGE-reranker-v2-m3
LLM model	Tier 2 / Tier 3	Escalation tier
Expected latency	~7s (full mode)	~12-15s
API cost	~$0.0015	~$0.005

Cross-Encoder Reranker

The cross-encoder reranker (Jina Reranker v2, with bge-reranker-v2-m3 as local fallback) is always-on in full mode (rag_full_mode=True, the default). In full mode, 20 candidates are reranked to the top 10 for context assembly (ADR-0034). The escalated Think Harder pipeline expands this to 100 candidates reranked to top 20 for even broader coverage.

Retrieves 20 candidates (full mode) or 100 candidates (escalated)
Scores each candidate against the original query using a cross-attention mechanism
Returns the top 10 (normal) or top 20 (escalated) most relevant candidates for context assembly

See ADR-0024 for the full mode feature flag design.

Rate Limiting

To prevent abuse of the computationally expensive escalated pipeline:

Constraint	Value
Per session	3 Think Harder requests per hour
Per query	1 Think Harder per response
Session tracking	Cookie-based (no authentication required)

Consequences

Positive

Users have an immediate recourse when results are poor, rather than simply abandoning the system
Feedback data drives continuous improvement of retrieval and generation quality
The escalated pipeline demonstrably produces better results for complex queries
Session-based tracking works without authentication, supporting the public interface

Negative

"Think Harder" is ~3x more expensive per query than normal search
The 12-15 second latency of escalated search may test user patience
Rate limiting may frustrate users who encounter multiple poor results in succession
The existence of "Think Harder" implicitly acknowledges that normal search is sometimes insufficient

Feedback Loop

The feedback loop creates a virtuous cycle: user signals identify weak areas, which guide content improvements, which produce better results, which increase user satisfaction.

Context​

Decision​

Part 1: Thumbs Up / Thumbs Down​

Part 2: "Think Harder" Escalation​

Normal vs. Escalated Pipeline​

Cross-Encoder Reranker​

Rate Limiting​

Consequences​

Positive​

Negative​

Feedback Loop​