| 33 |
2017 |
Attention Is All You Need |
10 |
downloaded / read_complete |
| 34 |
2017 |
In-Datacenter Performance Analysis of a Tensor Processing Unit |
5 |
downloaded / read_complete |
| 35 |
2018 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
5 |
downloaded / read_complete |
| 36 |
2018 |
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism |
5 |
downloaded / read_complete |
| 37 |
2018 |
Mesh-TensorFlow: Deep Learning for Supercomputers |
5 |
downloaded / read_complete |
| 38 |
2020 |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale |
5 |
downloaded / read_complete |
| 39 |
2019 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
4 |
downloaded / read_complete |
| 40 |
2019 |
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
3 |
downloaded / read_complete |
| 41 |
2019 |
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context |
3 |
downloaded / read_complete |
| 42 |
2019 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach |
2 |
downloaded / read_complete |
| 43 |
2019 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding |
2 |
downloaded / read_complete |
| 44 |
2019 |
What Does BERT Look at? An Analysis of BERT's Attention |
1 |
downloaded / read_complete |