Retrieval-augmented generation
Compute interpretation
Inference-time external-memory pattern that trades retrieval latency and indexing work for grounding and freshness.
Supporting reading cards
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020,
inference_time_compute_post_training) - REALM: Retrieval-Augmented Language Model Pre-Training (2020,
inference_time_compute_post_training) - WebGPT: Browser-assisted question-answering with human feedback (2021,
inference_time_compute_post_training)
Obsolete or less central under later compute
Track this only through linked reading cards; do not treat this method page as standalone evidence.