← All methods

SGD and stochastic optimizers

Compute interpretation

Optimization style that trades exact gradients for scalable noisy updates and becomes central as datasets and models outgrow full-batch training.

Supporting reading cards

Large-scale machine learning with stochastic gradient descent (2010, pre_2012_cpu_statistical_foundations)
ImageNet Classification with Deep Convolutional Neural Networks (2012, single_gpu_deep_learning)
Adam: A Method for Stochastic Optimization (2014, single_gpu_deep_learning)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (2017, multi_gpu_dense_training)

Obsolete or less central under later compute

Track this only through linked reading cards; do not treat this method page as standalone evidence.