All papers
Papers
132 papers
Sort by:
1986 Learning representations by back-propagating errors Pre-2012 CPU and statistical foundations 30757
1998 Gradient-based learning applied to document recognition Pre-2012 CPU and statistical foundations 58089
1998 The MNIST database of handwritten digit images for machine learning research Pre-2012 CPU and statistical foundations 4604
2006 Reducing the dimensionality of data with neural networks Pre-2012 CPU and statistical foundations 20914
2009 ImageNet: A large-scale hierarchical image database Pre-2012 CPU and statistical foundations 61711
2010 Large-scale machine learning with stochastic gradient descent Pre-2012 CPU and statistical foundations 5624
2014 Neural Machine Translation by Jointly Learning to Align and Translate Single-GPU deep learning 14620
2014 Dropout: A Simple Way to Prevent Neural Networks from Overfitting Single-GPU deep learning 34275
2014 Rich feature hierarchies for accurate object detection and semantic segmentation Single-GPU deep learning 31710
2014 Very Deep Convolutional Networks for Large-Scale Image Recognition Single-GPU deep learning 75538
2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Single-GPU deep learning 24377
2015 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Single-GPU deep learning 18238
2015 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Generative media compute —
2015 Human-level control through deep reinforcement learning Search, simulation, and science compute —
2016 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Multi-GPU dense training 5668
2016 SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size Efficient and edge inference —
2016 Mastering the game of Go with deep neural networks and tree search Search, simulation, and science compute —
2017 In-Datacenter Performance Analysis of a Tensor Processing Unit TPU and accelerator Transformer era 4406
2017 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Sparse and memory-efficient scaling —
2017 Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Generative media compute —
2017 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Efficient and edge inference —
2018 Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour with Batch Normalization Multi-GPU dense training 53
2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding TPU and accelerator Transformer era 1631
2018 GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism TPU and accelerator Transformer era —
2018 A Style-Based Generator Architecture for Generative Adversarial Networks Generative media compute —
2018 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play Search, simulation, and science compute —
2019 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Multi-GPU dense training 5013
2019 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer TPU and accelerator Transformer era 3692
2019 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations TPU and accelerator Transformer era 984
2019 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context TPU and accelerator Transformer era 3146
2019 XLNet: Generalized Autoregressive Pretraining for Language Understanding TPU and accelerator Transformer era 1857
2019 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Hyperscale dense LLM training —
2019 ZeRO: Memory Optimizations Toward Training Trillion Parameter Models Hyperscale dense LLM training —
2019 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Efficient and edge inference —
2019 Mastering Atari, Go, chess and shogi by planning with a learned model Search, simulation, and science compute —
2019 Grandmaster level in StarCraft II using multi-agent reinforcement learning Search, simulation, and science compute —
2020 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale TPU and accelerator Transformer era —
2020 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Sparse and memory-efficient scaling —
2020 Score-Based Generative Modeling through Stochastic Differential Equations Generative media compute —
2020 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Inference-time compute and post-training —
2020 REALM: Retrieval-Augmented Language Model Pre-Training Inference-time compute and post-training —
2020 Improved protein structure prediction using potentials from deep learning Search, simulation, and science compute —
2021 Scaling Language Models: Methods, Analysis and Insights from Training Gopher Hyperscale dense LLM training —
2021 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Sparse and memory-efficient scaling —
2021 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts Sparse and memory-efficient scaling —
2021 WebGPT: Browser-assisted question-answering with human feedback Inference-time compute and post-training —
2021 Highly accurate protein structure prediction with AlphaFold Search, simulation, and science compute —
2022 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Hyperscale dense LLM training —
2022 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Sparse and memory-efficient scaling —
2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Inference-time compute and post-training —
2022 Training language models to follow instructions with human feedback Inference-time compute and post-training —
2022 ReAct: Synergizing Reasoning and Acting in Language Models Inference-time compute and post-training —
2022 Self-Consistency Improves Chain of Thought Reasoning in Language Models Inference-time compute and post-training —
2022 Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks Inference-time compute and post-training —
2022 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Efficient and edge inference —
2023 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Sparse and memory-efficient scaling —
2023 Toolformer: Language Models Can Teach Themselves to Use Tools Inference-time compute and post-training —
2023 Direct Preference Optimization: Your Language Model is Secretly a Reward Model Inference-time compute and post-training —
2023 Tree of Thoughts: Deliberate Problem Solving with Large Language Models Inference-time compute and post-training —
2023 Voyager: An Open-Ended Embodied Agent with Large Language Models Inference-time compute and post-training —
2024 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Sparse and memory-efficient scaling —
2024 Accurate structure prediction of biomolecular interactions with AlphaFold 3 Search, simulation, and science compute —
2025 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Inference-time compute and post-training —
2025 Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Search, simulation, and science compute —
2025 AlphaEvolve: A coding agent for scientific and algorithmic discovery Search, simulation, and science compute —