Pre-2012 CPU and statistical foundations

Device/setup

CPU-centric research machines, small benchmark datasets, early web/data infrastructure, and no neural-network accelerators.

Bottleneck

Credit assignment, representation learning, benchmark reproducibility, and CPU-time limits dominated more than raw accelerator throughput.

Methods that fit

Backpropagation, support vectors, compact convolutional nets, autoencoders, deep belief nets, SGD recipes, and standardized datasets made learning experiments reproducible on CPUs.

Methods that became obsolete or less central

Hand-designed feature pipelines and tiny task-specific demonstrations became less central once GPUs made end-to-end representation learning economical.

Representative papers

Rank	Year	Paper	Priority	Status
1	2009	ImageNet: A large-scale hierarchical image database	10	downloaded / read_complete
2	1986	Learning representations by back-propagating errors	9	downloaded / read_complete
3	1998	Gradient-based learning applied to document recognition	9	downloaded / read_complete
4	2006	Reducing the dimensionality of data with neural networks	8	downloaded / read_complete
5	2006	A fast learning algorithm for deep belief nets	8	downloaded / read_complete
6	1995	Support-vector networks	7	downloaded / read_complete
7	2010	Large-scale machine learning with stochastic gradient descent	5	downloaded / read_complete
8	1998	The MNIST database of handwritten digit images for machine learning research	3	html_saved_no_pdf / read_complete

Open questions

Synthesize how benchmark/data infrastructure and learning algorithms co-evolved before GPU deep learning.