Scaling laws
英文原文文件:scaling_laws.md
计算解释
指导密集训练预算在计算量、数据量与模型规模之间如何分配的经验规则。
支撑阅读卡
- Language Models are Unsupervised Multitask Learners (2019,
hyperscale_dense_llm_training) - Language Models are Few-Shot Learners (2020,
hyperscale_dense_llm_training) - Scaling Laws for Neural Language Models (2020,
hyperscale_dense_llm_training) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (2021,
sparse_memory_efficient_scaling) - Training Compute-Optimal Large Language Models (2022,
hyperscale_dense_llm_training) - PaLM: Scaling Language Modeling with Pathways (2022,
hyperscale_dense_llm_training) - Gemini: A Family of Highly Capable Multimodal Models (2023,
hyperscale_dense_llm_training) - Textbooks Are All You Need (2023,
hyperscale_dense_llm_training) - A Survey of Large Language Models (2023,
hyperscale_dense_llm_training) - The Llama 3 Herd of Models (2024,
hyperscale_dense_llm_training) - Qwen3 Technical Report (2025,
hyperscale_dense_llm_training)
后续计算范式下过时或退居次要的内容
仅通过已链接的阅读卡追踪,不将本方法页视为独立证据来源。