Single-GPU deep learning

Device/setup

One or a few workstation GPUs, most visibly Kepler/Fermi-era NVIDIA cards with limited memory, plus CPU-side data pipelines.

Bottleneck

Fitting deeper networks into GPU memory while keeping convolution, recurrent training, normalization, and detection pipelines numerically stable.

Methods that fit

Convolutions, dropout, Adam, batch normalization, encoder-decoder attention, region-based detectors, U-Net, VGG/Inception-style depth, and GPU-friendly minibatching matched the single-GPU regime.

Methods that became obsolete or less central

CPU-only training and hand-crafted vision/NLP pipelines lost ground where dense GPU kernels could learn features directly.

Representative papers

Rank	Year	Paper	Priority	Status
9	2012	ImageNet Classification with Deep Convolutional Neural Networks	10	downloaded / read_complete
10	2014	Neural Machine Translation by Jointly Learning to Align and Translate	8	downloaded / read_complete
11	2014	Sequence to Sequence Learning with Neural Networks	8	downloaded / read_complete
12	2015	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	8	downloaded / read_complete
13	2014	Adam: A Method for Stochastic Optimization	7	downloaded / read_complete
14	2014	Dropout: A Simple Way to Prevent Neural Networks from Overfitting	7	downloaded / read_complete
15	2014	Rich feature hierarchies for accurate object detection and semantic segmentation	7	downloaded / read_complete
16	2014	Very Deep Convolutional Networks for Large-Scale Image Recognition	7	downloaded / read_complete
17	2015	Fast R-CNN	7	downloaded / read_complete
18	2015	Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks	7	downloaded / read_complete
19	2015	U-Net: Convolutional Networks for Biomedical Image Segmentation	6	downloaded / read_complete
20	2014	Going Deeper with Convolutions	3	downloaded / read_complete

Open questions

Separate which gains came from algorithms, which from CUDA kernels and memory layout, and which from larger labeled datasets.