Inference-time compute and post-training

The frontier shifts to inference allocation: RLHF, chain-of-thought, verifiers, retrieval, tools, and agents.

20 papers Regime 8 of 10

Inference-time compute and post-training

Device/setup

Large pretrained models served with retrieval systems, human-feedback pipelines, external tools, and inference-time sampling or search budgets.

Bottleneck

Static model weights cannot cheaply store all knowledge or guarantee reasoning, alignment, tool use, or factuality at serve time.

Methods that fit

RAG/REALM, RLHF and preference optimization, chain-of-thought, self-consistency, ReAct, Toolformer, process supervision, DPO, Tree of Thoughts, and Voyager add compute after pretraining.

Methods that became obsolete or less central

One-shot next-token prediction without retrieval, feedback, tool use, or deliberation became less central for capable assistants and agents.

Representative papers

Rank Year Paper Priority Status
87 2020 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 5 downloaded / read_complete
88 2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 5 downloaded / read_complete
89 2022 Training language models to follow instructions with human feedback 5 downloaded / read_complete
90 2022 ReAct: Synergizing Reasoning and Acting in Language Models 5 downloaded / read_complete
91 2017 Deep Reinforcement Learning from Human Preferences 4 downloaded / read_complete
92 2020 Learning to summarize from human feedback 4 downloaded / read_complete
93 2022 Self-Consistency Improves Chain of Thought Reasoning in Language Models 4 downloaded / read_complete
94 2023 Toolformer: Language Models Can Teach Themselves to Use Tools 4 downloaded / read_complete
95 2023 Let's Verify Step by Step 4 downloaded / read_complete
96 2020 REALM: Retrieval-Augmented Language Model Pre-Training 3 downloaded / read_complete
97 2021 WebGPT: Browser-assisted question-answering with human feedback 3 downloaded / read_complete
98 2022 Constitutional AI: Harmlessness from AI Feedback 3 downloaded / read_complete
99 2022 Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 3 downloaded / read_complete
100 2023 Direct Preference Optimization: Your Language Model is Secretly a Reward Model 3 downloaded / read_complete
101 2023 Tree of Thoughts: Deliberate Problem Solving with Large Language Models 3 downloaded / read_complete
102 2023 Voyager: An Open-Ended Embodied Agent with Large Language Models 3 downloaded / read_complete
121 2025 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 5 downloaded / read_complete
122 2025 Kimi k1.5: Scaling Reinforcement Learning with LLMs 5 downloaded / read_complete
123 2025 s1: Simple test-time scaling 4 downloaded / read_complete
131 2026 Kimi K2.5: Visual Agentic Intelligence 4 downloaded / read_complete

Open questions

  • Quantify when extra inference-time compute beats larger pretrained weights for reliability and cost.

Papers in this compute regime 20