Normalization methods
Compute interpretation
Training-stability methods that adapt optimization to depth, batch size, and distributed hardware constraints.
Supporting reading cards
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015,
single_gpu_deep_learning) - Identity Mappings in Deep Residual Networks (2016,
multi_gpu_dense_training) - Layer Normalization (2016,
multi_gpu_dense_training) - Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (2017,
multi_gpu_dense_training) - Group Normalization (2018,
multi_gpu_dense_training) - Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour with Batch Normalization (2018,
multi_gpu_dense_training)
Obsolete or less central under later compute
Track this only through linked reading cards; do not treat this method page as standalone evidence.