← 返回方法列表

Distillation

英文原文文件：distillation.md

计算解释

以训练阶段的高算力开销换取部署紧凑性的技术模式：将大型教师模型或模型集成的行为迁移到更轻量的学生模型中。

支撑阅读卡

Distilling the Knowledge in a Neural Network (2015, efficient_edge_inference)
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (2016, efficient_edge_inference)
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019, efficient_edge_inference)
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025, inference_time_compute_post_training)
Kimi k1.5: Scaling Reinforcement Learning with LLMs (2025, inference_time_compute_post_training)
s1: Simple test-time scaling (2025, inference_time_compute_post_training)
Qwen3 Technical Report (2025, hyperscale_dense_llm_training)
Gemma 3 Technical Report (2025, efficient_edge_inference)

后续计算范式下过时或退居次要的内容

仅通过已链接的阅读卡追踪，不将本方法页视为独立证据来源。