Pre-2012 CPU and statistical foundations
CPU-centric training, small datasets, hand-engineered features — the prerequisite measurement and optimization culture.
8 papers
Regime 1 of 10
Pre-2012 CPU and statistical foundations
Device/setup
CPU-centric research machines, small benchmark datasets, early web/data infrastructure, and no neural-network accelerators.
Bottleneck
Credit assignment, representation learning, benchmark reproducibility, and CPU-time limits dominated more than raw accelerator throughput.
Methods that fit
Backpropagation, support vectors, compact convolutional nets, autoencoders, deep belief nets, SGD recipes, and standardized datasets made learning experiments reproducible on CPUs.
Methods that became obsolete or less central
Hand-designed feature pipelines and tiny task-specific demonstrations became less central once GPUs made end-to-end representation learning economical.
Representative papers
| Rank | Year | Paper | Priority | Status |
|---|---|---|---|---|
| 1 | 2009 | ImageNet: A large-scale hierarchical image database | 10 | downloaded / read_complete |
| 2 | 1986 | Learning representations by back-propagating errors | 9 | downloaded / read_complete |
| 3 | 1998 | Gradient-based learning applied to document recognition | 9 | downloaded / read_complete |
| 4 | 2006 | Reducing the dimensionality of data with neural networks | 8 | downloaded / read_complete |
| 5 | 2006 | A fast learning algorithm for deep belief nets | 8 | downloaded / read_complete |
| 6 | 1995 | Support-vector networks | 7 | downloaded / read_complete |
| 7 | 2010 | Large-scale machine learning with stochastic gradient descent | 5 | downloaded / read_complete |
| 8 | 1998 | The MNIST database of handwritten digit images for machine learning research | 3 | html_saved_no_pdf / read_complete |
Open questions
- Synthesize how benchmark/data infrastructure and learning algorithms co-evolved before GPU deep learning.