Skip to main content

ADR-0008: User Feedback and Think Harder

Status: Accepted

Context

No search system achieves perfect results on every query, including RAG systems (Lewis et al., 2020). Users need a mechanism to signal when they are dissatisfied with a response, and the system needs a way to attempt a better answer. Two distinct needs emerged from user testing:

  1. Feedback collection: Understanding which responses are helpful and which are not, enabling continuous improvement
  2. Immediate improvement: When a response is poor, users want a better answer now, not after the development team reviews their feedback

Decision

Implement a two-part feedback system:

Part 1: Thumbs Up / Thumbs Down

Every response includes a feedback widget with thumbs-up and thumbs-down buttons. This feedback is:

  • Stored per session (no authentication required for the public interface)
  • Aggregated in Prometheus metrics for quality trend analysis
  • Used to identify content areas with consistently poor satisfaction

Part 2: "Think Harder" Escalation

When a user signals dissatisfaction (thumbs down), a "Think Harder" button appears. Activating this triggers an escalated search pipeline that invests significantly more computational resources:

Normal vs. Escalated Pipeline

ParameterNormalThink Harder
Candidate retrieval20 chunks (full mode)100 chunks
RerankingBGE-reranker-v2-m3 (full mode)BGE-reranker-v2-m3
LLM modelTier 2 / Tier 3Escalation tier
Expected latency~7s (full mode)~12-15s
API cost~$0.0015~$0.005

Cross-Encoder Reranker

The cross-encoder reranker (Jina Reranker v2, with bge-reranker-v2-m3 as local fallback) is always-on in full mode (rag_full_mode=True, the default). In full mode, 20 candidates are reranked to the top 10 for context assembly (ADR-0034). The escalated Think Harder pipeline expands this to 100 candidates reranked to top 20 for even broader coverage.

  1. Retrieves 20 candidates (full mode) or 100 candidates (escalated)
  2. Scores each candidate against the original query using a cross-attention mechanism
  3. Returns the top 10 (normal) or top 20 (escalated) most relevant candidates for context assembly

See ADR-0024 for the full mode feature flag design.

Rate Limiting

To prevent abuse of the computationally expensive escalated pipeline:

ConstraintValue
Per session3 Think Harder requests per hour
Per query1 Think Harder per response
Session trackingCookie-based (no authentication required)

Consequences

Positive

  • Users have an immediate recourse when results are poor, rather than simply abandoning the system
  • Feedback data drives continuous improvement of retrieval and generation quality
  • The escalated pipeline demonstrably produces better results for complex queries
  • Session-based tracking works without authentication, supporting the public interface

Negative

  • "Think Harder" is ~3x more expensive per query than normal search
  • The 12-15 second latency of escalated search may test user patience
  • Rate limiting may frustrate users who encounter multiple poor results in succession
  • The existence of "Think Harder" implicitly acknowledges that normal search is sometimes insufficient

Feedback Loop

The feedback loop creates a virtuous cycle: user signals identify weak areas, which guide content improvements, which produce better results, which increase user satisfaction.