Single-GPU deep learning
Commodity GPUs make high-throughput dense tensor training practical. CNNs, dropout, batch normalization become dominant.
12 papers
Regime 2 of 10
Single-GPU deep learning
Device/setup
One or a few workstation GPUs, most visibly Kepler/Fermi-era NVIDIA cards with limited memory, plus CPU-side data pipelines.
Bottleneck
Fitting deeper networks into GPU memory while keeping convolution, recurrent training, normalization, and detection pipelines numerically stable.
Methods that fit
Convolutions, dropout, Adam, batch normalization, encoder-decoder attention, region-based detectors, U-Net, VGG/Inception-style depth, and GPU-friendly minibatching matched the single-GPU regime.
Methods that became obsolete or less central
CPU-only training and hand-crafted vision/NLP pipelines lost ground where dense GPU kernels could learn features directly.
Representative papers
Open questions
- Separate which gains came from algorithms, which from CUDA kernels and memory layout, and which from larger labeled datasets.