单 GPU 深度学习

消费级 GPU 使高吞吐量的稠密张量训练成为可能。CNN、dropout 和批归一化成为主流方法。

12 篇论文 第 2 个,共 10 个计算范式

Single-GPU deep learning

英文原文文件:README.md

设备/设置

以单块或少量工作站 GPU 为主,典型是内存有限的 NVIDIA Kepler/Fermi 时代显卡和 CPU 数据管线。

瓶颈

关键在于将更深网络放入 GPU 内存,并让卷积、循环网络、归一化和检测管线稳定训练。

适配的方法

卷积、Dropout、Adam、BatchNorm、带注意力的编码器-解码器、区域检测器、U-Net、VGG/Inception 深度和 GPU 友好小批量训练均适配于该计算范式。

变得过时或不再中心的方法

CPU-only 训练和手工视觉/NLP 流水线在密集 GPU kernel 可直接学习特征后退居次要地位。

代表性论文

排名 年份 论文 优先级 状态
9 2012 ImageNet Classification with Deep Convolutional Neural Networks 10 downloaded / read_complete
10 2014 Neural Machine Translation by Jointly Learning to Align and Translate 8 downloaded / read_complete
11 2014 Sequence to Sequence Learning with Neural Networks 8 downloaded / read_complete
12 2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 8 downloaded / read_complete
13 2014 Adam: A Method for Stochastic Optimization 7 downloaded / read_complete
14 2014 Dropout: A Simple Way to Prevent Neural Networks from Overfitting 7 downloaded / read_complete
15 2014 Rich feature hierarchies for accurate object detection and semantic segmentation 7 downloaded / read_complete
16 2014 Very Deep Convolutional Networks for Large-Scale Image Recognition 7 downloaded / read_complete
17 2015 Fast R-CNN 7 downloaded / read_complete
18 2015 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 7 downloaded / read_complete
19 2015 U-Net: Convolutional Networks for Biomedical Image Segmentation 6 downloaded / read_complete
20 2014 Going Deeper with Convolutions 3 downloaded / read_complete

开放问题

  • 区分收益来自算法、CUDA kernel/内存布局,还是更大的标注数据。

相关论文 12