SGD and stochastic optimizers
英文原文文件:sgd.md
计算解释
一种优化范式:以精确梯度为代价换取可扩展的噪声更新;当数据集与模型规模超出全批量训练能力时,该方法成为核心手段。
支撑阅读卡
- Large-scale machine learning with stochastic gradient descent (2010,
pre_2012_cpu_statistical_foundations) - ImageNet Classification with Deep Convolutional Neural Networks (2012,
single_gpu_deep_learning) - Adam: A Method for Stochastic Optimization (2014,
single_gpu_deep_learning) - Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (2017,
multi_gpu_dense_training)
后续计算范式下过时或退居次要的内容
仅通过已链接的阅读卡追踪,不将本方法页视为独立证据来源。