Pre-2012 CPU and statistical foundations

CPU-centric training, small datasets, hand-engineered features — the prerequisite measurement and optimization culture.

8 papers Regime 1 of 10

Pre-2012 CPU and statistical foundations

Device/setup

CPU-centric research machines, small benchmark datasets, early web/data infrastructure, and no neural-network accelerators.

Bottleneck

Credit assignment, representation learning, benchmark reproducibility, and CPU-time limits dominated more than raw accelerator throughput.

Methods that fit

Backpropagation, support vectors, compact convolutional nets, autoencoders, deep belief nets, SGD recipes, and standardized datasets made learning experiments reproducible on CPUs.

Methods that became obsolete or less central

Hand-designed feature pipelines and tiny task-specific demonstrations became less central once GPUs made end-to-end representation learning economical.

Representative papers

Rank Year Paper Priority Status
1 2009 ImageNet: A large-scale hierarchical image database 10 downloaded / read_complete
2 1986 Learning representations by back-propagating errors 9 downloaded / read_complete
3 1998 Gradient-based learning applied to document recognition 9 downloaded / read_complete
4 2006 Reducing the dimensionality of data with neural networks 8 downloaded / read_complete
5 2006 A fast learning algorithm for deep belief nets 8 downloaded / read_complete
6 1995 Support-vector networks 7 downloaded / read_complete
7 2010 Large-scale machine learning with stochastic gradient descent 5 downloaded / read_complete
8 1998 The MNIST database of handwritten digit images for machine learning research 3 html_saved_no_pdf / read_complete

Open questions

  • Synthesize how benchmark/data infrastructure and learning algorithms co-evolved before GPU deep learning.

Papers in this compute regime 8