← 全部范式 / 03

多 GPU 稠密训练

主要瓶颈变为多 GPU 间的同步、批量大小、深度与内存稳定性。

12 篇论文第 3 个，共 10 个计算范式

Multi-GPU dense training

英文原文文件：README.md

设备/设置

同步多 GPU 服务器和小集群，PCIe/NVLink/InfiniBand 等通信成为核心设计约束。

瓶颈

深度、批量扩展、梯度同步、数值范围和通信开销成为限制因素。

适配的方法

残差连接、大批量 SGD、混合精度、归一化变体、密集连接、深度可分卷积和分布式序列模型均针对这一集群规模进行适配。

变得过时或不再中心的方法

忽略 all-reduce 成本、激活内存和批量效应的单设备方法，其可迁移性随之降低。

代表性论文

排名	年份	论文	优先级	状态
21	2016	Identity Mappings in Deep Residual Networks	6	downloaded / read_complete
22	2015	Deep Residual Learning for Image Recognition	5	downloaded / read_complete
23	2016	Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation	5	downloaded / read_complete
24	2017	Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour	5	downloaded / read_complete
25	2017	Mixed Precision Training	5	downloaded / read_complete
26	2016	Layer Normalization	4	downloaded / read_complete
27	2016	Xception: Deep Learning with Depthwise Separable Convolutions	3	downloaded / read_complete
28	2018	Group Normalization	3	downloaded / read_complete
29	2019	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	3	downloaded / read_complete
30	2015	Rethinking the Inception Architecture for Computer Vision	2	downloaded / read_complete
31	2016	Densely Connected Convolutional Networks	2	downloaded / read_complete
32	2018	Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour with Batch Normalization	2	downloaded / read_complete

开放问题

追踪通信感知优化如何为 Transformer 规模的分布式训练铺平道路。

相关论文 12

2016 Identity Mappings in Deep Residual Networks

2015 Deep Residual Learning for Image Recognition

2016 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

2017 Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

2017 Mixed Precision Training

2016 Layer Normalization

2016 Xception: Deep Learning with Depthwise Separable Convolutions

2018 Group Normalization

2019 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

2015 Rethinking the Inception Architecture for Computer Vision

2016 Densely Connected Convolutional Networks

2018 Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour with Batch Normalization