全部论文
论文
132 篇论文
排序方式:
1998 The MNIST database of handwritten digit images for machine learning research 2012 年前 CPU 与统计基础 4604
2014 Rich feature hierarchies for accurate object detection and semantic segmentation 单 GPU 深度学习 31710
2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 单 GPU 深度学习 24377
2015 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 单 GPU 深度学习 18238
2015 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks 生成式媒体计算 —
2016 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 多 GPU 稠密训练 5668
2018 Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour with Batch Normalization 多 GPU 稠密训练 53
2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding TPU、加速器与 Transformer 时代 1631
2018 GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism TPU、加速器与 Transformer 时代 —
2018 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play 搜索、仿真与科学计算 —
2019 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer TPU、加速器与 Transformer 时代 3692
2019 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations TPU、加速器与 Transformer 时代 984
2019 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context TPU、加速器与 Transformer 时代 3146
2019 XLNet: Generalized Autoregressive Pretraining for Language Understanding TPU、加速器与 Transformer 时代 1857
2019 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 超大规模稠密 LLM 训练 —
2020 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale TPU、加速器与 Transformer 时代 —
2021 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 稀疏化与内存高效扩展 —
2022 Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 推理阶段计算与后训练 —
2024 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 稀疏化与内存高效扩展 —