| 45 |
2019 |
Language Models are Unsupervised Multitask Learners |
5 |
downloaded / read_complete |
| 46 |
2019 |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism |
5 |
downloaded / read_complete |
| 47 |
2019 |
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models |
5 |
downloaded / read_complete |
| 48 |
2020 |
Language Models are Few-Shot Learners |
5 |
downloaded / read_complete |
| 49 |
2020 |
Scaling Laws for Neural Language Models |
5 |
downloaded / read_complete |
| 50 |
2022 |
Training Compute-Optimal Large Language Models |
5 |
downloaded / read_complete |
| 51 |
2022 |
PaLM: Scaling Language Modeling with Pathways |
5 |
downloaded / read_complete |
| 52 |
2021 |
Scaling Language Models: Methods, Analysis and Insights from Training Gopher |
4 |
downloaded / read_complete |
| 53 |
2023 |
Llama 2: Open Foundation and Fine-Tuned Chat Models |
4 |
downloaded / read_complete |
| 54 |
2023 |
LLaMA: Open and Efficient Foundation Language Models |
4 |
downloaded / read_complete |
| 55 |
2022 |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
3 |
downloaded / read_complete |
| 56 |
2022 |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
3 |
downloaded / read_complete |
| 57 |
2022 |
OPT: Open Pre-trained Transformer Language Models |
3 |
downloaded / read_complete |
| 58 |
2023 |
Gemini: A Family of Highly Capable Multimodal Models |
3 |
downloaded / read_complete |
| 59 |
2023 |
Mistral 7B |
3 |
downloaded / read_complete |
| 60 |
2023 |
Textbooks Are All You Need |
3 |
downloaded / read_complete |
| 61 |
2024 |
The Llama 3 Herd of Models |
3 |
downloaded / read_complete |
| 62 |
2023 |
A Survey of Large Language Models |
1 |
downloaded / read_complete |
| 124 |
2025 |
Qwen3 Technical Report |
4 |
downloaded / read_complete |