Skip to main content

ADR-0022: Dynamic Retrieval Future Consideration

Date: 2026-02-10 | Status: Deferred

Context

Dynamic retrieval techniques extend the standard RAG pipeline (Lewis et al., 2020). Approaches such as FLARE (Forward-Looking Active REtrieval) and DRAGIN (Dynamic Retrieval Augmented Generation based on Information Needs) retrieve additional context mid-generation when the model detects low-confidence tokens:

  1. Generate a partial response
  2. Detect uncertain tokens (low probability, hedging language)
  3. Formulate a targeted retrieval query based on the uncertain passage
  4. Retrieve additional context
  5. Continue generation with enriched context

This is particularly effective for long-form generation where the initial retrieval may not cover all sub-topics.

Decision

Defer implementation. Rationale:

  1. Short-form answers: The medical search chatbot produces short, focused answers (typically 2-5 sentences). Dynamic retrieval provides the most value for multi-paragraph generation where context needs shift mid-response.

  2. Streaming complexity: The pipeline uses WebSocket streaming for real-time token delivery. Dynamic retrieval requires pausing mid-stream, performing a retrieval round-trip, and resuming -- adding significant architectural complexity.

  3. Latency sensitivity: Each mid-generation retrieval adds 200-500ms (embedding + vector search + reranking). For short answers, this overhead exceeds the generation time itself.

  4. Upfront retrieval sufficiency: With 50-100 candidates, RRF fusion, and BGE reranking to top-15, the upfront retrieval captures sufficient context for short-form answers.

Consequences

  • Revisit if expanding to multi-step reasoning, long-form generation, or report-style outputs
  • Monitor FLARE/DRAGIN research for latency-optimized variants
  • Current architecture supports adding retrieval hooks at the service layer if needed later
  • ADR-0021: Self-RAG Future Consideration (related adaptive retrieval pattern)
  • ADR-0020: Reciprocal Rank Fusion (upfront retrieval quality)