Distillation
英文原文文件:distillation.md
计算解释
以训练阶段的高算力开销换取部署紧凑性的技术模式:将大型教师模型或模型集成的行为迁移到更轻量的学生模型中。
支撑阅读卡
- Distilling the Knowledge in a Neural Network (2015,
efficient_edge_inference) - SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (2016,
efficient_edge_inference) - DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019,
efficient_edge_inference) - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025,
inference_time_compute_post_training) - Kimi k1.5: Scaling Reinforcement Learning with LLMs (2025,
inference_time_compute_post_training) - s1: Simple test-time scaling (2025,
inference_time_compute_post_training) - Qwen3 Technical Report (2025,
hyperscale_dense_llm_training) - Gemma 3 Technical Report (2025,
efficient_edge_inference)
后续计算范式下过时或退居次要的内容
仅通过已链接的阅读卡追踪,不将本方法页视为独立证据来源。