全部论文

论文

132 篇论文

排序方式：

1986 Learning representations by back-propagating errors 2012 年前 CPU 与统计基础 30757

1995 Support-vector networks 2012 年前 CPU 与统计基础 40427

1998 Gradient-based learning applied to document recognition 2012 年前 CPU 与统计基础 58089

1998 The MNIST database of handwritten digit images for machine learning research 2012 年前 CPU 与统计基础 4604

2006 Reducing the dimensionality of data with neural networks 2012 年前 CPU 与统计基础 20914

2006 A fast learning algorithm for deep belief nets 2012 年前 CPU 与统计基础 16386

2009 ImageNet: A large-scale hierarchical image database 2012 年前 CPU 与统计基础 61711

2010 Large-scale machine learning with stochastic gradient descent 2012 年前 CPU 与统计基础 5624

2012 ImageNet Classification with Deep Convolutional Neural Networks 单 GPU 深度学习 766 ★

2013 Auto-Encoding Variational Bayes 生成式媒体计算 —

2014 Neural Machine Translation by Jointly Learning to Align and Translate 单 GPU 深度学习 14620

2014 Sequence to Sequence Learning with Neural Networks 单 GPU 深度学习 13358

2014 Adam: A Method for Stochastic Optimization 单 GPU 深度学习 84773

2014 Dropout: A Simple Way to Prevent Neural Networks from Overfitting 单 GPU 深度学习 34275

2014 Rich feature hierarchies for accurate object detection and semantic segmentation 单 GPU 深度学习 31710

2014 Very Deep Convolutional Networks for Large-Scale Image Recognition 单 GPU 深度学习 75538

2014 Going Deeper with Convolutions 单 GPU 深度学习 1390

2014 Generative Adversarial Nets 生成式媒体计算 —

2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 单 GPU 深度学习 24377

2015 Fast R-CNN 单 GPU 深度学习 27853

2015 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 单 GPU 深度学习 18238

2015 U-Net: Convolutional Networks for Biomedical Image Segmentation 单 GPU 深度学习 88677

2015 Deep Residual Learning for Image Recognition 多 GPU 稠密训练 4712

2015 Rethinking the Inception Architecture for Computer Vision 多 GPU 稠密训练 565

2015 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks 生成式媒体计算 —

2015 Distilling the Knowledge in a Neural Network 高效推理与边缘部署 —

2015 Human-level control through deep reinforcement learning 搜索、仿真与科学计算 —

2016 Identity Mappings in Deep Residual Networks 多 GPU 稠密训练 10082

2016 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 多 GPU 稠密训练 5668

2016 Layer Normalization 多 GPU 稠密训练 498

2016 Xception: Deep Learning with Depthwise Separable Convolutions 多 GPU 稠密训练 358

2016 Densely Connected Convolutional Networks 多 GPU 稠密训练 1912

2016 Image-to-Image Translation with Conditional Adversarial Networks 生成式媒体计算 —

2016 SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size 高效推理与边缘部署 —

2016 Mastering the game of Go with deep neural networks and tree search 搜索、仿真与科学计算 —

2017 Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour 多 GPU 稠密训练 2619

2017 Mixed Precision Training 多 GPU 稠密训练 880

2017 Attention Is All You Need TPU、加速器与 Transformer 时代 331 ★

2017 In-Datacenter Performance Analysis of a Tensor Processing Unit TPU、加速器与 Transformer 时代 4406

2017 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer 稀疏化与内存高效扩展 —

2017 Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks 生成式媒体计算 —

2017 Deep Reinforcement Learning from Human Preferences 推理阶段计算与后训练 —

2017 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 高效推理与边缘部署 —

2017 Mastering the game of Go without human knowledge 搜索、仿真与科学计算 —

2018 Group Normalization 多 GPU 稠密训练 478

2018 Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour with Batch Normalization 多 GPU 稠密训练 53

2018 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding TPU、加速器与 Transformer 时代 1631

2018 GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism TPU、加速器与 Transformer 时代 —

2018 Mesh-TensorFlow: Deep Learning for Supercomputers TPU、加速器与 Transformer 时代 —

2018 A Style-Based Generator Architecture for Generative Adversarial Networks 生成式媒体计算 —

2018 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play 搜索、仿真与科学计算 —

2019 EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 多 GPU 稠密训练 5013

2019 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer TPU、加速器与 Transformer 时代 3692

2019 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations TPU、加速器与 Transformer 时代 984

2019 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context TPU、加速器与 Transformer 时代 3146

2019 RoBERTa: A Robustly Optimized BERT Pretraining Approach TPU、加速器与 Transformer 时代 8

2019 XLNet: Generalized Autoregressive Pretraining for Language Understanding TPU、加速器与 Transformer 时代 1857

2019 What Does BERT Look at? An Analysis of BERT's Attention TPU、加速器与 Transformer 时代 —

2019 Language Models are Unsupervised Multitask Learners 超大规模稠密 LLM 训练 —

2019 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 超大规模稠密 LLM 训练 —

2019 ZeRO: Memory Optimizations Toward Training Trillion Parameter Models 超大规模稠密 LLM 训练 —

2019 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter 高效推理与边缘部署 —

2019 Mastering Atari, Go, chess and shogi by planning with a learned model 搜索、仿真与科学计算 —

2019 Grandmaster level in StarCraft II using multi-agent reinforcement learning 搜索、仿真与科学计算 —

2019 Dota 2 with Large Scale Deep Reinforcement Learning 搜索、仿真与科学计算 —

2020 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale TPU、加速器与 Transformer 时代 —

2020 Language Models are Few-Shot Learners 超大规模稠密 LLM 训练 —

2020 Scaling Laws for Neural Language Models 超大规模稠密 LLM 训练 —

2020 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding 稀疏化与内存高效扩展 —

2020 Big Bird: Transformers for Longer Sequences 稀疏化与内存高效扩展 —

2020 Linformer: Self-Attention with Linear Complexity 稀疏化与内存高效扩展 —

2020 Longformer: The Long-Document Transformer 稀疏化与内存高效扩展 —

2020 Reformer: The Efficient Transformer 稀疏化与内存高效扩展 —

2020 Denoising Diffusion Probabilistic Models 生成式媒体计算 —

2020 Score-Based Generative Modeling through Stochastic Differential Equations 生成式媒体计算 —

2020 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 推理阶段计算与后训练 —

2020 Learning to summarize from human feedback 推理阶段计算与后训练 —

2020 REALM: Retrieval-Augmented Language Model Pre-Training 推理阶段计算与后训练 —

2020 Improved protein structure prediction using potentials from deep learning 搜索、仿真与科学计算 —

2021 Scaling Language Models: Methods, Analysis and Insights from Training Gopher 超大规模稠密 LLM 训练 —

2021 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 稀疏化与内存高效扩展 —

2021 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts 稀疏化与内存高效扩展 —

2021 High-Resolution Image Synthesis with Latent Diffusion Models 生成式媒体计算 —

2021 Zero-Shot Text-to-Image Generation 生成式媒体计算 —

2021 Improved Denoising Diffusion Probabilistic Models 生成式媒体计算 —

2021 WebGPT: Browser-assisted question-answering with human feedback 推理阶段计算与后训练 —

2021 LoRA: Low-Rank Adaptation of Large Language Models 高效推理与边缘部署 —

2021 Highly accurate protein structure prediction with AlphaFold 搜索、仿真与科学计算 —

2022 Training Compute-Optimal Large Language Models 超大规模稠密 LLM 训练 —

2022 PaLM: Scaling Language Modeling with Pathways 超大规模稠密 LLM 训练 —

2022 BLOOM: A 176B-Parameter Open-Access Multilingual Language Model 超大规模稠密 LLM 训练 —

2022 GPT-NeoX-20B: An Open-Source Autoregressive Language Model 超大规模稠密 LLM 训练 —

2022 OPT: Open Pre-trained Transformer Language Models 超大规模稠密 LLM 训练 —

2022 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness 稀疏化与内存高效扩展 —

2022 Scalable Diffusion Models with Transformers 生成式媒体计算 —

2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 推理阶段计算与后训练 —

2022 Training language models to follow instructions with human feedback 推理阶段计算与后训练 —

2022 ReAct: Synergizing Reasoning and Acting in Language Models 推理阶段计算与后训练 —

2022 Self-Consistency Improves Chain of Thought Reasoning in Language Models 推理阶段计算与后训练 —

2022 Constitutional AI: Harmlessness from AI Feedback 推理阶段计算与后训练 —

2022 Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 推理阶段计算与后训练 —

2022 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers 高效推理与边缘部署 —

2023 Llama 2: Open Foundation and Fine-Tuned Chat Models 超大规模稠密 LLM 训练 —

2023 LLaMA: Open and Efficient Foundation Language Models 超大规模稠密 LLM 训练 —

2023 Gemini: A Family of Highly Capable Multimodal Models 超大规模稠密 LLM 训练 —

2023 Mistral 7B 超大规模稠密 LLM 训练 —

2023 Textbooks Are All You Need 超大规模稠密 LLM 训练 —

2023 A Survey of Large Language Models 超大规模稠密 LLM 训练 —

2023 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning 稀疏化与内存高效扩展 —

2023 Toolformer: Language Models Can Teach Themselves to Use Tools 推理阶段计算与后训练 —

2023 Let's Verify Step by Step 推理阶段计算与后训练 —

2023 Direct Preference Optimization: Your Language Model is Secretly a Reward Model 推理阶段计算与后训练 —

2023 Tree of Thoughts: Deliberate Problem Solving with Large Language Models 推理阶段计算与后训练 —

2023 Voyager: An Open-Ended Embodied Agent with Large Language Models 推理阶段计算与后训练 —

2023 QLoRA: Efficient Finetuning of Quantized LLMs 高效推理与边缘部署 —

2023 Fast Inference from Transformers via Speculative Decoding 高效推理与边缘部署 —

2024 The Llama 3 Herd of Models 超大规模稠密 LLM 训练 —

2024 Mixtral of Experts 稀疏化与内存高效扩展 —

2024 DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 稀疏化与内存高效扩展 —

2024 Accurate structure prediction of biomolecular interactions with AlphaFold 3 搜索、仿真与科学计算 —

2025 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 推理阶段计算与后训练 —

2025 Kimi k1.5: Scaling Reinforcement Learning with LLMs 推理阶段计算与后训练 —

2025 s1: Simple test-time scaling 推理阶段计算与后训练 —

2025 Qwen3 Technical Report 超大规模稠密 LLM 训练 —

2025 Gemma 3 Technical Report 高效推理与边缘部署 —

2025 BitNet b1.58 2B4T Technical Report 高效推理与边缘部署 —

2025 Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 搜索、仿真与科学计算 —

2025 AlphaEvolve: A coding agent for scientific and algorithmic discovery 搜索、仿真与科学计算 —

2025 Kimi K2: Open Agentic Intelligence 稀疏化与内存高效扩展 —

2025 DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models 稀疏化与内存高效扩展 —

2026 Kimi K2.5: Visual Agentic Intelligence 推理阶段计算与后训练 —

2026 Qwen3.5-Omni Technical Report 生成式媒体计算 —