ADR-0008: User Feedback and Think Harder
Status: Accepted
Context
No search system achieves perfect results on every query, including RAG systems (Lewis et al., 2020). Users need a mechanism to signal when they are dissatisfied with a response, and the system needs a way to attempt a better answer. Two distinct needs emerged from user testing:
- Feedback collection: Understanding which responses are helpful and which are not, enabling continuous improvement
- Immediate improvement: When a response is poor, users want a better answer now, not after the development team reviews their feedback
Decision
Implement a two-part feedback system:
Part 1: Thumbs Up / Thumbs Down
Every response includes a feedback widget with thumbs-up and thumbs-down buttons. This feedback is:
- Stored per session (no authentication required for the public interface)
- Aggregated in Prometheus metrics for quality trend analysis
- Used to identify content areas with consistently poor satisfaction
Part 2: "Think Harder" Escalation
When a user signals dissatisfaction (thumbs down), a "Think Harder" button appears. Activating this triggers an escalated search pipeline that invests significantly more computational resources:
Normal vs. Escalated Pipeline
| Parameter | Normal | Think Harder |
|---|---|---|
| Candidate retrieval | 20 chunks (full mode) | 100 chunks |
| Reranking | BGE-reranker-v2-m3 (full mode) | BGE-reranker-v2-m3 |
| LLM model | Tier 2 / Tier 3 | Escalation tier |
| Expected latency | ~7s (full mode) | ~12-15s |
| API cost | ~$0.0015 | ~$0.005 |
Cross-Encoder Reranker
The cross-encoder reranker (Jina Reranker v2, with bge-reranker-v2-m3 as local fallback) is always-on in full mode (rag_full_mode=True, the default). In full mode, 20 candidates are reranked to the top 10 for context assembly (ADR-0034). The escalated Think Harder pipeline expands this to 100 candidates reranked to top 20 for even broader coverage.
- Retrieves 20 candidates (full mode) or 100 candidates (escalated)
- Scores each candidate against the original query using a cross-attention mechanism
- Returns the top 10 (normal) or top 20 (escalated) most relevant candidates for context assembly
See ADR-0024 for the full mode feature flag design.
Rate Limiting
To prevent abuse of the computationally expensive escalated pipeline:
| Constraint | Value |
|---|---|
| Per session | 3 Think Harder requests per hour |
| Per query | 1 Think Harder per response |
| Session tracking | Cookie-based (no authentication required) |
Consequences
Positive
- Users have an immediate recourse when results are poor, rather than simply abandoning the system
- Feedback data drives continuous improvement of retrieval and generation quality
- The escalated pipeline demonstrably produces better results for complex queries
- Session-based tracking works without authentication, supporting the public interface
Negative
- "Think Harder" is ~3x more expensive per query than normal search
- The 12-15 second latency of escalated search may test user patience
- Rate limiting may frustrate users who encounter multiple poor results in succession
- The existence of "Think Harder" implicitly acknowledges that normal search is sometimes insufficient
Feedback Loop
The feedback loop creates a virtuous cycle: user signals identify weak areas, which guide content improvements, which produce better results, which increase user satisfaction.