Inference-time compute and post-training
The frontier shifts to inference allocation: RLHF, chain-of-thought, verifiers, retrieval, tools, and agents.
20 papers
Regime 8 of 10
Inference-time compute and post-training
Device/setup
Large pretrained models served with retrieval systems, human-feedback pipelines, external tools, and inference-time sampling or search budgets.
Bottleneck
Static model weights cannot cheaply store all knowledge or guarantee reasoning, alignment, tool use, or factuality at serve time.
Methods that fit
RAG/REALM, RLHF and preference optimization, chain-of-thought, self-consistency, ReAct, Toolformer, process supervision, DPO, Tree of Thoughts, and Voyager add compute after pretraining.
Methods that became obsolete or less central
One-shot next-token prediction without retrieval, feedback, tool use, or deliberation became less central for capable assistants and agents.
Representative papers
Open questions
- Quantify when extra inference-time compute beats larger pretrained weights for reliability and cost.