SGD and stochastic optimizers
Compute interpretation
Optimization style that trades exact gradients for scalable noisy updates and becomes central as datasets and models outgrow full-batch training.
Supporting reading cards
- Large-scale machine learning with stochastic gradient descent (2010,
pre_2012_cpu_statistical_foundations) - ImageNet Classification with Deep Convolutional Neural Networks (2012,
single_gpu_deep_learning) - Adam: A Method for Stochastic Optimization (2014,
single_gpu_deep_learning) - Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (2017,
multi_gpu_dense_training)
Obsolete or less central under later compute
Track this only through linked reading cards; do not treat this method page as standalone evidence.