资料来源 — AI 计算结构史

SOURCE URL: https://www.nature.com/articles/s41586-019-1724-z
RAW HTML: history/sources/text/2019-alphastar_2019.html

Grandmaster level in StarCraft II using multi-agent reinforcement learning

Abstract
Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions
1
,
2
,
3
, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems
4
. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks
5
,
6
. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.
Access through your institution
Buy or subscribe
This is a preview of subscription content,
access via your institution
Access options
Access through your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99
/ 30 days
cancel any time
Learn more
Subscribe to this journal
Receive 52 print issues and online access
$199.00 per year
only $3.83 per issue
Learn more
Buy this article
Purchase on SpringerLink
Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Log in
Learn about institutional subscriptions
Read our FAQs
Contact customer support
Fig. 1: Training setup.
The alternative text for this image may have been generated using AI.
Fig. 2: Results.
The alternative text for this image may have been generated using AI.
Fig. 3: Ablations for key components of AlphaStar.
The alternative text for this image may have been generated using AI.
Fig. 4: AlphaStar training progression.
The alternative text for this image may have been generated using AI.
Similar content being viewed by others
SC2EGSet: StarCraft II Esport Replay and Game-state Dataset
Article
Open access
08 September 2023
Faster sorting algorithms discovered using deep reinforcement learning
Article
Open access
07 June 2023
A social path to human-like artificial intelligence
Article
17 November 2023
Data availability
All the games that AlphaStar played online can be found in the file ‘replays.zip’ in the Supplementary Data, and the raw data from the Battle.net experiment can be found in ‘bnet.json’ in the Supplementary Data.
Code availability
The StarCraft II environment was open sourced in 2017 by Blizzard and DeepMind
7
. All the human replays used for imitation learning can be found at
https://github.com/Blizzard/s2client-proto
. The pseudocode for the supervised learning, reinforcement learning, and multi-agent learning components of AlphaStar can be found in the file ‘pseudocode.zip’ in the Supplementary Data. All the neural architecture details and hyper-parameters can be found in the file ‘detailed-architecture.txt’ in the Supplementary Data.
References
AIIDE StarCraft AI Competition.
https://www.cs.mun.ca/dchurchill/starcraftaicomp/
.
Student StarCraft AI Tournament and Ladder.
https://sscaitournament.com/
.
Starcraft 2 AI ladder.
https://sc2ai.net/
.
Churchill, D., Lin, Z. & Synnaeve, G. An analysis of model-based heuristic search techniques for StarCraft combat scenarios. in
Artificial Intelligence and Interactive Digital Entertainment Conf
. (AAAI, 2017).
Sutton, R. & Barto, A.
Reinforcement Learning: An Introduction
(MIT Press, 1998).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning.
Nature
521
, 436–444 (2015).
Article
ADS
CAS
Google Scholar
Vinyals, O. et al. StarCraft II: a new challenge for reinforcement learning. Preprint at
https://arxiv.org/abs/1708.04782
(2017).
Vaswani, A. et al. Attention is all you need.
Adv. Neural Information Process. Syst
.
30
, 5998–6008 (2017).
Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory.
Neural Comput
.
9
, 1735–1780 (1997).
Article
CAS
Google Scholar
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J. & Khudanpur, S. Recurrent neural network based language model.
INTERSPEECH
-2010
1045–1048 (2010).
Google Scholar
Metz, L., Ibarz, J., Jaitly, N. & Davidson, J. Discrete sequential prediction of continuous actions for deep RL. Preprint at
https://arxiv.org/abs/1705.05035v3
(2017).
Vinyals, O., Fortunato, M. & Jaitly, N. Pointer networks.
Adv. Neural Information Process. Syst
.
28
, 2692–2700 (2015).
Google Scholar
Mnih, V. et al. Asynchronous methods for deep reinforcement learning.
Proc. Machine Learning Res
.
48
, 1928–1937 (2016).
Google Scholar
Espeholt, L. et al. IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures.
Proc. Machine Learning Res
.
80
, 1407–1416 (2018).
Google Scholar
Wang, Z. et al. Sample efficient actor-critic with experience replay. Preprint at
https://arxiv.org/abs/1611.01224v2
(2017).
Sutton, R. Learning to predict by the method of temporal differences.
Mach. Learn
.
3
, 9–44 (1988).
Google Scholar
Oh, J., Guo, Y., Singh, S. & Lee, H. Self-Imitation Learning.
Proc. Machine Learning Res
.
80
, 3875–3884 (2018).
Google Scholar
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.
Science
362
, 1140–1144 (2018).
Article
ADS
MathSciNet
CAS
Google Scholar
Balduzzi, D. et al. Open-ended learning in symmetric zero-sum games.
Proc. Machine Learning Res
.
97
, 434–443 (2019).
Google Scholar
Brown, G. W. Iterative solution of games by fictitious play.
Act. Anal. Prod. Alloc
.
13
, 374–376 (1951).
MathSciNet
MATH
Google Scholar
Leslie, D. S. & Collins, E. J. Generalised weakened fictitious play.
Games Econ. Behav
.
56
, 285–298 (2006).
Article
MathSciNet
Google Scholar
Heinrich, J., Lanctot, M. & Silver, D. Fictitious self-play in extensive-form games.
Proc. Intl Conf. Machine Learning
32
, 805–813 (2015).
Google Scholar
Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. Preprint at
https://arxiv.org/abs/1704.04760v1
(2017).
Elo, A. E.
The Rating of Chessplayers, Past and Present
(Arco, 2017).
Campbell, M., Hoane, A. & Hsu, F. Deep Blue.
Artif. Intell
.
134
, 57–83 (2002).
Article
Google Scholar
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search.
Nature
529
, 484–489 (2016).
Article
ADS
CAS
Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning.
Nature
518
, 529–533 (2015).
Article
ADS
CAS
Google Scholar
Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven exploration by self-supervised prediction.
Proc. IEEE Conf. Computer Vision Pattern Recognition Workshops
16–17 (IEEE, 2017).
Jaderberg, M. et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning.
Science
364
, 859–865 (2019).
Article
ADS
MathSciNet
CAS
Google Scholar
OpenAI OpenAI Five.
https://blog.openai.com/openai-five/
(2018).
Buro, M. Real-time strategy games: a new AI research challenge.
Intl Joint Conf. Artificial Intelligence
1534–1535 (2003).
Samvelyan, M.
et al
. The StarCraft multi-agent challenge.
Intl Conf. Autonomous Agents and MultiAgent Systems
2186–2188 (2019).
Zambaldi, V. et al. Relational deep reinforcement learning. Preprint at
https://arxiv.org/abs/1806.01830v2
(2018).
Usunier, N., Synnaeve, G., Lin, Z. & Chintala, S. Episodic exploration for deep deterministic policies: an application to StarCraft micromanagement tasks. Preprint at
https://arxiv.org/abs/1609.02993v3
(2017).
Weber, B. G. & Mateas, M. Case-based reasoning for build order in real-time strategy games.
AIIDE ’09 Proc. 5th AAAI Conf. Artificial Intelligence and Interactive Digital Entertainment
106–111 (2009).
Buro, M. ORTS: a hack-free RTS game environment.
Intl Conf. Computers and Games
280–291 (Springer, 2002).
Churchill, D. SparCraft: open source StarCraft combat simulation.
https://code.google.com/archive/p/sparcraft/
(2013).
Weber, B. G. AIIDE 2010 StarCraft competition.
Artificial Intelligence and Interactive Digital Entertainment Conf
. (2010).
Uriarte, A. & Ontañón, S. Improving Monte Carlo tree search policies in StarCraft via probabilistic models learned from replay data.
Artificial Intelligence and Interactive Digital Entertainment Conf
. 101–106 (2016).
Hsieh, J.-L. & Sun, C.-T. Building a player strategy model by analyzing replays of real-time strategy games.
IEEE Intl Joint Conf. Neural Networks
3106–3111 (2008).
Synnaeve, G. & Bessiere, P. A Bayesian model for plan recognition in RTS games applied to StarCraft.
Artificial Intelligence and Interactive Digital Entertainment Conf
. 79–84 (2011).
Shao, K., Zhu, Y. & Zhao, D. StarCraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans. Emerg.
Top. Comput. Intell
.
3
, 73–84 (2019).
Google Scholar
Facebook CherryPi.
https://torchcraft.github.io/TorchCraftAI/
.
Berkeley Overmind.
https://www.icsi.berkeley.edu/icsi/news/2010/10/klein-berkeley-overmind
(2010).
Justesen, N. & Risi, S. Learning macromanagement in StarCraft from replays using deep learning.
IEEE Conf. Computational Intelligence and Games (CIG)
162–169 (2017).
Synnaeve, G. et al. Forward modeling for partial observation strategy games—a StarCraft defogger.
Adv. Neural Information Process. Syst
.
31
, 10738–10748 (2018).
Google Scholar
Farooq, S. S., Oh, I.-S., Kim, M.-J. & Kim, K. J. StarCraft AI competition report.
AI Mag
.
37
, 102–107 (2016).
Article
Google Scholar
Sun, P. et al. TStarBots: defeating the cheating level builtin AI in StarCraft II in the full game. Preprint at
https://arxiv.org/abs/1809.07193v3
(2018).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at
https://arxiv.org/abs/1707.06347v2
(2017).
Ibarz, B. et al. Reward learning from human preferences and demonstrations in Atari.
Adv. Neural Information Process. Syst
.
31
, 8011–8023 (2018).
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Overcoming exploration in reinforcement learning with demonstrations.
IEEE Intl Conf. Robotics and Automation
6292–6299 (2018).
Christiano, P. F. et al. Deep reinforcement learning from human preferences.
Adv. Neural Information Process. Syst
.
30
, 4299–4307 (2017).
Lanctot, M. et al. A unified game-theoretic approach to multiagent reinforcement learning.
Adv. Neural Information Process. Syst
.
30
, 4190–4203 (2017).
Perez, E., Strub, F., De Vries, H., Dumoulin, V. & Courville, A. FiLM: visual reasoning with a general conditioning layer. Preprint at
https://arxiv.org/abs/1709.07871v2
(2018).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition.
Proc. IEEE Conf. Computer Vision and Pattern Recognition
770–778 (2016).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at
https://arxiv.org/abs/1503.02531v1
(2015).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at
https://arxiv.org/abs/1412.6980v9
(2014).
Bishop, C. M.
Pattern Recognition and Machine Learning
(Springer, 2006).
Rusu, A. A. et al. Policy distillation. Preprint at
https://arxiv.org/abs/1511.06295
(2016).
Parisotto, E., Ba, J. & Salakhutdinov, R. Actor-mimic: deep multitask and transfer reinforcement learning. Preprint at
https://arxiv.org/abs/1511.06342
(2016).
Precup, D., Sutton, R. S. & Singh, S. P. Eligibility traces for off-policy policy evaluation.
ICML ’00 Proc. 17th Intl Conf. Machine Learning
759–766 (2016).
DeepMind Research on Ladder.
https://starcraft2.com/en-us/news/22933138
(2019).
Vinyals, O. et al. AlphaStar: mastering the real-time strategy game StarCraft II
https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii
(DeepMind, 2019).
Download references
Acknowledgements
We thank Blizzard for creating StarCraft and for their continued support of the research environment, and for enabling AlphaStar to participate in Battle.net. In particular, we thank A. Hudelson, C. Lee, K. Calderone, and T. Morten. We also thank StarCraft II professional players G. ‘MaNa’ Komincz and D. ‘Kelazhur’ Schwimer for their StarCraft expertise and advice. We thank A. Cain, A. Razavi, D. Toyama, D. Balduzzi, D. Fritz, E. Aygün, F. Strub, G. Ostrovski, G. Alain, H. Tang, J. Sanchez, J. Fildes, J. Schrittwieser, J. Novosad, K. Simonyan, K. Kurach, P. Hamel, R. Barreira, S. Reed, S. Bartunov, S. Mourad, S. Gaffney, T. Hubert, the team that created PySC2 and the whole DeepMind Team, with special thanks to the research platform team, comms and events teams, for their support, ideas, and encouragement.
Author information
Author notes
These authors contributed equally: Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Chris Apps, David Silver
Authors and Affiliations
DeepMind, London, UK
Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps & David Silver
Team Liquid, Utrecht, Netherlands
Dario Wünsch
Authors
Oriol Vinyals
View author publications
Search author on:
PubMed
Google Scholar
Igor Babuschkin
View author publications
Search author on:
PubMed
Google Scholar
Wojciech M. Czarnecki
View author publications
Search author on:
PubMed
Google Scholar
Michaël Mathieu
View author publications
Search author on:
PubMed
Google Scholar
Andrew Dudzik
View author publications
Search author on:
PubMed
Google Scholar
Junyoung Chung
View author publications
Search author on:
PubMed
Google Scholar
David H. Choi
View author publications
Search author on:
PubMed
Google Scholar
Richard Powell
View author publications
Search author on:
PubMed
Google Scholar
Timo Ewalds
View author publications
Search author on:
PubMed
Google Scholar
Petko Georgiev
View author publications
Search author on:
PubMed
Google Scholar
Junhyuk Oh
View author publications
Search author on:
PubMed
Google Scholar
Dan Horgan
View author publications
Search author on:
PubMed
Google Scholar
Manuel Kroiss
View author publications
Search author on:
PubMed
Google Scholar
Ivo Danihelka
View author publications
Search author on:
PubMed
Google Scholar
Aja Huang
View author publications
Search author on:
PubMed
Google Scholar
Laurent Sifre
View author publications
Search author on:
PubMed
Google Scholar
Trevor Cai
View author publications
Search author on:
PubMed
Google Scholar
John P. Agapiou
View author publications
Search author on:
PubMed
Google Scholar
Max Jaderberg
View author publications
Search author on:
PubMed
Google Scholar
Alexander S. Vezhnevets
View author publications
Search author on:
PubMed
Google Scholar
Rémi Leblond
View author publications
Search author on:
PubMed
Google Scholar
Tobias Pohlen
View author publications
Search author on:
PubMed
Google Scholar
Valentin Dalibard
View author publications
Search author on:
PubMed
Google Scholar
David Budden
View author publications
Search author on:
PubMed
Google Scholar
Yury Sulsky
View author publications
Search author on:
PubMed
Google Scholar
James Molloy
View author publications
Search author on:
PubMed
Google Scholar
Tom L. Paine
View author publications
Search author on:
PubMed
Google Scholar
Caglar Gulcehre
View author publications
Search author on:
PubMed
Google Scholar
Ziyu Wang
View author publications
Search author on:
PubMed
Google Scholar
Tobias Pfaff
View author publications
Search author on:
PubMed
Google Scholar
Yuhuai Wu
View author publications
Search author on:
PubMed
Google Scholar
Roman Ring
View author publications
Search author on:
PubMed
Google Scholar
Dani Yogatama
View author publications
Search author on:
PubMed
Google Scholar
Dario Wünsch
View author publications
Search author on:
PubMed
Google Scholar
Katrina McKinney
View author publications
Search author on:
PubMed
Google Scholar
Oliver Smith
View author publications
Search author on:
PubMed
Google Scholar
Tom Schaul
View author publications
Search author on:
PubMed
Google Scholar
Timothy Lillicrap
View author publications
Search author on:
PubMed
Google Scholar
Koray Kavukcuoglu
View author publications
Search author on:
PubMed
Google Scholar
Demis Hassabis
View author publications
Search author on:
PubMed
Google Scholar
Chris Apps
View author publications
Search author on:
PubMed
Google Scholar
David Silver
View author publications
Search author on:
PubMed
Google Scholar
Contributions
O.V., I.B., W.M.C., M.M., A.D., J.C., D.H.C., R.P., T.E., P.G., J.O., D. Horgan, M.K., I.D., A.H., L.S., T.C., J.P.A., C.A., and D.S. contributed equally. O.V., I.B., W.M.C., M.M., A.D., J.C., D.H.C., R.P., T.E., P.G., J.O., D. Horgan, M.K., I.D., A.H., L.S., T.C., J.P.A., C.A., R.L., M.J., V.D., Y.S., A.S.V., D.B., T.L.P., C.G., Z.W., T. Pfaff, T. Pohlen, Y.W., and D.S. designed and built AlphaStar with advice from T.S. and T.L. J.M. and R.R. contributed to software engineering. D.W. and D.Y. provided expertise in the StarCraft II domain. K.K., D. Hassabis, K.M., O.S., and C.A. managed the project. D.S., W.M.C., O.V., J.O., I.B., and D.H.C. wrote the paper with contributions from M.M., J.C., D. Horgan, L.S., R.L., T.C., T.S., and T.L. O.V. and D.S. led the team.
Corresponding authors
Correspondence to
Oriol Vinyals
or
David Silver
.
Ethics declarations
Competing interests
M.J., W.M.C., O.V., and D.S. have filed provisional patent application 62/796,567 about the contents of this manuscript. The remaining authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review information
Nature
thanks Dave Churchill, Santiago Ontanon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Extended Data Fig. 1 APM limits.
Top, win probability of AlphaStar Supervised against itself, when applying various agent action rate limits. Our limit does not affect supervised performance and is acceptable when compared to humans. Bottom, distributions of APMs of AlphaStar Final (blue) and humans (red) during games on Battle.net. Dashed lines show mean values.
Extended Data Fig. 2 Delays.
Left, distribution of delays between when the game generates an observation and when the game executes the corresponding agent action. Right, distribution of how long agents request to wait without observing between observations.
Extended Data Fig. 3 Overview of the architecture of AlphaStar.
A detailed description is provided in the Supplementary Data, Detailed Architecture.
Extended Data Fig. 4 Distribution of units built in a game.
Units built by Protoss AlphaStar Supervised (left) and AlphaStar Final (right) over multiple self-play games. AlphaStar Supervised can build every unit.
Extended Data Fig. 5 A more detailed analysis of multi-agent ablations from Fig.
3c, d
.
PFSP-based training outperforms FSP under all measures considered: it has a stronger population measured by relative population performance, provides a less exploitable solution, and has better final agent performance against the corresponding league.
Extended Data Fig. 6 Training infrastructure.
Diagram of the training setup for the entire league.
Extended Data Fig. 7 Battle.net performance details.
Top, visualization of all the matches played by AlphaStar Final (right) and matches against opponents above 4,500 MMR of AlphaStar Mid (left). Each Gaussian represents an opponent MMR (with uncertainty): AlphaStar won against opponents shown in green and lost to those shown in red. Blue is our MMR estimate, and black is the MMR reported by StarCraft II. The orange background is the Grandmaster league range. Bottom, win probability versus gap in MMR. The shaded grey region shows MMR model predictions when players’ uncertainty is varied. The red and blue line are empirical win rates for players above 6,000 MMR and AlphaStar Final, respectively. Both human and AlphaStar win rates closely follow the MMR model.
Extended Data Fig. 8 Payoff matrix (limited to only Protoss versus Protoss games for simplicity) split into agent types of the league.
Blue means a row agent wins, red loses, and white draws. The main agents behave transitively: the more recent agents win consistently against older main agents and exploiters. Interactions between exploiters are highly non-transitive: across the full payoff, there are around 3,000,000 rock–paper–scissor cycles (with requirement of at least 70% win rates to form a cycle) that involve at least one exploiter, and around 200 that involve only main agents.
Extended Data Table 1 Agent input space
Full size table
Extended Data Table 2 Agent action space
Full size table
Supplementary information
Reporting Summary (download PDF
)
Supplementary Data (download ZIP
)
This zipped file contains the pseudocode, StarCraft II replay files, detailed neural network architecture and raw data from the Battle.net experiment.
Rights and permissions
Reprints and permissions
About this article
Cite this article
Vinyals, O., Babuschkin, I., Czarnecki, W.M.
et al.
Grandmaster level in StarCraft II using multi-agent reinforcement learning.
Nature
575
, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
Download citation
Received
:
30 August 2019
Accepted
:
10 October 2019
Published
:
30 October 2019
Version of record
:
30 October 2019
Issue date
:
14 November 2019
DOI
:
https://doi.org/10.1038/s41586-019-1724-z
Share this article
Anyone you share the following link with will be able to read this content:
Get shareable link
Sorry, a shareable link is not currently available for this article.
Copy shareable link to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
General collective intelligence for multi-robot systems
Shiyu Zhao
Nature Electronics
(2026)
A self-correcting multi-agent LLM framework for language-based physics simulation and explanation
Donggeun Park
Hyeonbin Moon
Seunghwa Ryu
npj Artificial Intelligence
(2026)
Self-Play Meta-Reinforcement Learning in Multi-Agent Games
Imre Gergely Mali
Acta Universitatis Sapientiae, Informatica
(2026)
Comments
Commenting on this article is now closed.
Fulvio Pereira
6 May 2020, 22:38
@JessMcDonell What do you think?

Data availability
All the games that AlphaStar played online can be found in the file ‘replays.zip’ in the Supplementary Data, and the raw data from the Battle.net experiment can be found in ‘bnet.json’ in the Supplementary Data.

Code availability
The StarCraft II environment was open sourced in 2017 by Blizzard and DeepMind
7
. All the human replays used for imitation learning can be found at
https://github.com/Blizzard/s2client-proto
. The pseudocode for the supervised learning, reinforcement learning, and multi-agent learning components of AlphaStar can be found in the file ‘pseudocode.zip’ in the Supplementary Data. All the neural architecture details and hyper-parameters can be found in the file ‘detailed-architecture.txt’ in the Supplementary Data.

References
AIIDE StarCraft AI Competition.
https://www.cs.mun.ca/dchurchill/starcraftaicomp/
.
Student StarCraft AI Tournament and Ladder.
https://sscaitournament.com/
.
Starcraft 2 AI ladder.
https://sc2ai.net/
.
Churchill, D., Lin, Z. & Synnaeve, G. An analysis of model-based heuristic search techniques for StarCraft combat scenarios. in
Artificial Intelligence and Interactive Digital Entertainment Conf
. (AAAI, 2017).
Sutton, R. & Barto, A.
Reinforcement Learning: An Introduction
(MIT Press, 1998).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning.
Nature
521
, 436–444 (2015).
Article
ADS
CAS
Google Scholar
Vinyals, O. et al. StarCraft II: a new challenge for reinforcement learning. Preprint at
https://arxiv.org/abs/1708.04782
(2017).
Vaswani, A. et al. Attention is all you need.
Adv. Neural Information Process. Syst
.
30
, 5998–6008 (2017).
Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory.
Neural Comput
.
9
, 1735–1780 (1997).
Article
CAS
Google Scholar
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J. & Khudanpur, S. Recurrent neural network based language model.
INTERSPEECH
-2010
1045–1048 (2010).
Google Scholar
Metz, L., Ibarz, J., Jaitly, N. & Davidson, J. Discrete sequential prediction of continuous actions for deep RL. Preprint at
https://arxiv.org/abs/1705.05035v3
(2017).
Vinyals, O., Fortunato, M. & Jaitly, N. Pointer networks.
Adv. Neural Information Process. Syst
.
28
, 2692–2700 (2015).
Google Scholar
Mnih, V. et al. Asynchronous methods for deep reinforcement learning.
Proc. Machine Learning Res
.
48
, 1928–1937 (2016).
Google Scholar
Espeholt, L. et al. IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures.
Proc. Machine Learning Res
.
80
, 1407–1416 (2018).
Google Scholar
Wang, Z. et al. Sample efficient actor-critic with experience replay. Preprint at
https://arxiv.org/abs/1611.01224v2
(2017).
Sutton, R. Learning to predict by the method of temporal differences.
Mach. Learn
.
3
, 9–44 (1988).
Google Scholar
Oh, J., Guo, Y., Singh, S. & Lee, H. Self-Imitation Learning.
Proc. Machine Learning Res
.
80
, 3875–3884 (2018).
Google Scholar
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.
Science
362
, 1140–1144 (2018).
Article
ADS
MathSciNet
CAS
Google Scholar
Balduzzi, D. et al. Open-ended learning in symmetric zero-sum games.
Proc. Machine Learning Res
.
97
, 434–443 (2019).
Google Scholar
Brown, G. W. Iterative solution of games by fictitious play.
Act. Anal. Prod. Alloc
.
13
, 374–376 (1951).
MathSciNet
MATH
Google Scholar
Leslie, D. S. & Collins, E. J. Generalised weakened fictitious play.
Games Econ. Behav
.
56
, 285–298 (2006).
Article
MathSciNet
Google Scholar
Heinrich, J., Lanctot, M. & Silver, D. Fictitious self-play in extensive-form games.
Proc. Intl Conf. Machine Learning
32
, 805–813 (2015).
Google Scholar
Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. Preprint at
https://arxiv.org/abs/1704.04760v1
(2017).
Elo, A. E.
The Rating of Chessplayers, Past and Present
(Arco, 2017).
Campbell, M., Hoane, A. & Hsu, F. Deep Blue.
Artif. Intell
.
134
, 57–83 (2002).
Article
Google Scholar
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search.
Nature
529
, 484–489 (2016).
Article
ADS
CAS
Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning.
Nature
518
, 529–533 (2015).
Article
ADS
CAS
Google Scholar
Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven exploration by self-supervised prediction.
Proc. IEEE Conf. Computer Vision Pattern Recognition Workshops
16–17 (IEEE, 2017).
Jaderberg, M. et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning.
Science
364
, 859–865 (2019).
Article
ADS
MathSciNet
CAS
Google Scholar
OpenAI OpenAI Five.
https://blog.openai.com/openai-five/
(2018).
Buro, M. Real-time strategy games: a new AI research challenge.
Intl Joint Conf. Artificial Intelligence
1534–1535 (2003).
Samvelyan, M.
et al
. The StarCraft multi-agent challenge.
Intl Conf. Autonomous Agents and MultiAgent Systems
2186–2188 (2019).
Zambaldi, V. et al. Relational deep reinforcement learning. Preprint at
https://arxiv.org/abs/1806.01830v2
(2018).
Usunier, N., Synnaeve, G., Lin, Z. & Chintala, S. Episodic exploration for deep deterministic policies: an application to StarCraft micromanagement tasks. Preprint at
https://arxiv.org/abs/1609.02993v3
(2017).
Weber, B. G. & Mateas, M. Case-based reasoning for build order in real-time strategy games.
AIIDE ’09 Proc. 5th AAAI Conf. Artificial Intelligence and Interactive Digital Entertainment
106–111 (2009).
Buro, M. ORTS: a hack-free RTS game environment.
Intl Conf. Computers and Games
280–291 (Springer, 2002).
Churchill, D. SparCraft: open source StarCraft combat simulation.
https://code.google.com/archive/p/sparcraft/
(2013).
Weber, B. G. AIIDE 2010 StarCraft competition.
Artificial Intelligence and Interactive Digital Entertainment Conf
. (2010).
Uriarte, A. & Ontañón, S. Improving Monte Carlo tree search policies in StarCraft via probabilistic models learned from replay data.
Artificial Intelligence and Interactive Digital Entertainment Conf
. 101–106 (2016).
Hsieh, J.-L. & Sun, C.-T. Building a player strategy model by analyzing replays of real-time strategy games.
IEEE Intl Joint Conf. Neural Networks
3106–3111 (2008).
Synnaeve, G. & Bessiere, P. A Bayesian model for plan recognition in RTS games applied to StarCraft.
Artificial Intelligence and Interactive Digital Entertainment Conf
. 79–84 (2011).
Shao, K., Zhu, Y. & Zhao, D. StarCraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans. Emerg.
Top. Comput. Intell
.
3
, 73–84 (2019).
Google Scholar
Facebook CherryPi.
https://torchcraft.github.io/TorchCraftAI/
.
Berkeley Overmind.
https://www.icsi.berkeley.edu/icsi/news/2010/10/klein-berkeley-overmind
(2010).
Justesen, N. & Risi, S. Learning macromanagement in StarCraft from replays using deep learning.
IEEE Conf. Computational Intelligence and Games (CIG)
162–169 (2017).
Synnaeve, G. et al. Forward modeling for partial observation strategy games—a StarCraft defogger.
Adv. Neural Information Process. Syst
.
31
, 10738–10748 (2018).
Google Scholar
Farooq, S. S., Oh, I.-S., Kim, M.-J. & Kim, K. J. StarCraft AI competition report.
AI Mag
.
37
, 102–107 (2016).
Article
Google Scholar
Sun, P. et al. TStarBots: defeating the cheating level builtin AI in StarCraft II in the full game. Preprint at
https://arxiv.org/abs/1809.07193v3
(2018).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at
https://arxiv.org/abs/1707.06347v2
(2017).
Ibarz, B. et al. Reward learning from human preferences and demonstrations in Atari.
Adv. Neural Information Process. Syst
.
31
, 8011–8023 (2018).
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Overcoming exploration in reinforcement learning with demonstrations.
IEEE Intl Conf. Robotics and Automation
6292–6299 (2018).
Christiano, P. F. et al. Deep reinforcement learning from human preferences.
Adv. Neural Information Process. Syst
.
30
, 4299–4307 (2017).
Lanctot, M. et al. A unified game-theoretic approach to multiagent reinforcement learning.
Adv. Neural Information Process. Syst
.
30
, 4190–4203 (2017).
Perez, E., Strub, F., De Vries, H., Dumoulin, V. & Courville, A. FiLM: visual reasoning with a general conditioning layer. Preprint at
https://arxiv.org/abs/1709.07871v2
(2018).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition.
Proc. IEEE Conf. Computer Vision and Pattern Recognition
770–778 (2016).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at
https://arxiv.org/abs/1503.02531v1
(2015).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at
https://arxiv.org/abs/1412.6980v9
(2014).
Bishop, C. M.
Pattern Recognition and Machine Learning
(Springer, 2006).
Rusu, A. A. et al. Policy distillation. Preprint at
https://arxiv.org/abs/1511.06295
(2016).
Parisotto, E., Ba, J. & Salakhutdinov, R. Actor-mimic: deep multitask and transfer reinforcement learning. Preprint at
https://arxiv.org/abs/1511.06342
(2016).
Precup, D., Sutton, R. S. & Singh, S. P. Eligibility traces for off-policy policy evaluation.
ICML ’00 Proc. 17th Intl Conf. Machine Learning
759–766 (2016).
DeepMind Research on Ladder.
https://starcraft2.com/en-us/news/22933138
(2019).
Vinyals, O. et al. AlphaStar: mastering the real-time strategy game StarCraft II
https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii
(DeepMind, 2019).
Download references

Acknowledgements
We thank Blizzard for creating StarCraft and for their continued support of the research environment, and for enabling AlphaStar to participate in Battle.net. In particular, we thank A. Hudelson, C. Lee, K. Calderone, and T. Morten. We also thank StarCraft II professional players G. ‘MaNa’ Komincz and D. ‘Kelazhur’ Schwimer for their StarCraft expertise and advice. We thank A. Cain, A. Razavi, D. Toyama, D. Balduzzi, D. Fritz, E. Aygün, F. Strub, G. Ostrovski, G. Alain, H. Tang, J. Sanchez, J. Fildes, J. Schrittwieser, J. Novosad, K. Simonyan, K. Kurach, P. Hamel, R. Barreira, S. Reed, S. Bartunov, S. Mourad, S. Gaffney, T. Hubert, the team that created PySC2 and the whole DeepMind Team, with special thanks to the research platform team, comms and events teams, for their support, ideas, and encouragement.

Author information
Author notes
These authors contributed equally: Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Chris Apps, David Silver
Authors and Affiliations
DeepMind, London, UK
Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps & David Silver
Team Liquid, Utrecht, Netherlands
Dario Wünsch
Authors
Oriol Vinyals
View author publications
Search author on:
PubMed
Google Scholar
Igor Babuschkin
View author publications
Search author on:
PubMed
Google Scholar
Wojciech M. Czarnecki
View author publications
Search author on:
PubMed
Google Scholar
Michaël Mathieu
View author publications
Search author on:
PubMed
Google Scholar
Andrew Dudzik
View author publications
Search author on:
PubMed
Google Scholar
Junyoung Chung
View author publications
Search author on:
PubMed
Google Scholar
David H. Choi
View author publications
Search author on:
PubMed
Google Scholar
Richard Powell
View author publications
Search author on:
PubMed
Google Scholar
Timo Ewalds
View author publications
Search author on:
PubMed
Google Scholar
Petko Georgiev
View author publications
Search author on:
PubMed
Google Scholar
Junhyuk Oh
View author publications
Search author on:
PubMed
Google Scholar
Dan Horgan
View author publications
Search author on:
PubMed
Google Scholar
Manuel Kroiss
View author publications
Search author on:
PubMed
Google Scholar
Ivo Danihelka
View author publications
Search author on:
PubMed
Google Scholar
Aja Huang
View author publications
Search author on:
PubMed
Google Scholar
Laurent Sifre
View author publications
Search author on:
PubMed
Google Scholar
Trevor Cai
View author publications
Search author on:
PubMed
Google Scholar
John P. Agapiou
View author publications
Search author on:
PubMed
Google Scholar
Max Jaderberg
View author publications
Search author on:
PubMed
Google Scholar
Alexander S. Vezhnevets
View author publications
Search author on:
PubMed
Google Scholar
Rémi Leblond
View author publications
Search author on:
PubMed
Google Scholar
Tobias Pohlen
View author publications
Search author on:
PubMed
Google Scholar
Valentin Dalibard
View author publications
Search author on:
PubMed
Google Scholar
David Budden
View author publications
Search author on:
PubMed
Google Scholar
Yury Sulsky
View author publications
Search author on:
PubMed
Google Scholar
James Molloy
View author publications
Search author on:
PubMed
Google Scholar
Tom L. Paine
View author publications
Search author on:
PubMed
Google Scholar
Caglar Gulcehre
View author publications
Search author on:
PubMed
Google Scholar
Ziyu Wang
View author publications
Search author on:
PubMed
Google Scholar
Tobias Pfaff
View author publications
Search author on:
PubMed
Google Scholar
Yuhuai Wu
View author publications
Search author on:
PubMed
Google Scholar
Roman Ring
View author publications
Search author on:
PubMed
Google Scholar
Dani Yogatama
View author publications
Search author on:
PubMed
Google Scholar
Dario Wünsch
View author publications
Search author on:
PubMed
Google Scholar
Katrina McKinney
View author publications
Search author on:
PubMed
Google Scholar
Oliver Smith
View author publications
Search author on:
PubMed
Google Scholar
Tom Schaul
View author publications
Search author on:
PubMed
Google Scholar
Timothy Lillicrap
View author publications
Search author on:
PubMed
Google Scholar
Koray Kavukcuoglu
View author publications
Search author on:
PubMed
Google Scholar
Demis Hassabis
View author publications
Search author on:
PubMed
Google Scholar
Chris Apps
View author publications
Search author on:
PubMed
Google Scholar
David Silver
View author publications
Search author on:
PubMed
Google Scholar
Contributions
O.V., I.B., W.M.C., M.M., A.D., J.C., D.H.C., R.P., T.E., P.G., J.O., D. Horgan, M.K., I.D., A.H., L.S., T.C., J.P.A., C.A., and D.S. contributed equally. O.V., I.B., W.M.C., M.M., A.D., J.C., D.H.C., R.P., T.E., P.G., J.O., D. Horgan, M.K., I.D., A.H., L.S., T.C., J.P.A., C.A., R.L., M.J., V.D., Y.S., A.S.V., D.B., T.L.P., C.G., Z.W., T. Pfaff, T. Pohlen, Y.W., and D.S. designed and built AlphaStar with advice from T.S. and T.L. J.M. and R.R. contributed to software engineering. D.W. and D.Y. provided expertise in the StarCraft II domain. K.K., D. Hassabis, K.M., O.S., and C.A. managed the project. D.S., W.M.C., O.V., J.O., I.B., and D.H.C. wrote the paper with contributions from M.M., J.C., D. Horgan, L.S., R.L., T.C., T.S., and T.L. O.V. and D.S. led the team.
Corresponding authors
Correspondence to
Oriol Vinyals
or
David Silver
.

Ethics declarations
Competing interests
M.J., W.M.C., O.V., and D.S. have filed provisional patent application 62/796,567 about the contents of this manuscript. The remaining authors declare no competing interests.

Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review information
Nature
thanks Dave Churchill, Santiago Ontanon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables
Extended Data Fig. 1 APM limits.
Top, win probability of AlphaStar Supervised against itself, when applying various agent action rate limits. Our limit does not affect supervised performance and is acceptable when compared to humans. Bottom, distributions of APMs of AlphaStar Final (blue) and humans (red) during games on Battle.net. Dashed lines show mean values.
Extended Data Fig. 2 Delays.
Left, distribution of delays between when the game generates an observation and when the game executes the corresponding agent action. Right, distribution of how long agents request to wait without observing between observations.
Extended Data Fig. 3 Overview of the architecture of AlphaStar.
A detailed description is provided in the Supplementary Data, Detailed Architecture.
Extended Data Fig. 4 Distribution of units built in a game.
Units built by Protoss AlphaStar Supervised (left) and AlphaStar Final (right) over multiple self-play games. AlphaStar Supervised can build every unit.
Extended Data Fig. 5 A more detailed analysis of multi-agent ablations from Fig.
3c, d
.
PFSP-based training outperforms FSP under all measures considered: it has a stronger population measured by relative population performance, provides a less exploitable solution, and has better final agent performance against the corresponding league.
Extended Data Fig. 6 Training infrastructure.
Diagram of the training setup for the entire league.
Extended Data Fig. 7 Battle.net performance details.
Top, visualization of all the matches played by AlphaStar Final (right) and matches against opponents above 4,500 MMR of AlphaStar Mid (left). Each Gaussian represents an opponent MMR (with uncertainty): AlphaStar won against opponents shown in green and lost to those shown in red. Blue is our MMR estimate, and black is the MMR reported by StarCraft II. The orange background is the Grandmaster league range. Bottom, win probability versus gap in MMR. The shaded grey region shows MMR model predictions when players’ uncertainty is varied. The red and blue line are empirical win rates for players above 6,000 MMR and AlphaStar Final, respectively. Both human and AlphaStar win rates closely follow the MMR model.
Extended Data Fig. 8 Payoff matrix (limited to only Protoss versus Protoss games for simplicity) split into agent types of the league.
Blue means a row agent wins, red loses, and white draws. The main agents behave transitively: the more recent agents win consistently against older main agents and exploiters. Interactions between exploiters are highly non-transitive: across the full payoff, there are around 3,000,000 rock–paper–scissor cycles (with requirement of at least 70% win rates to form a cycle) that involve at least one exploiter, and around 200 that involve only main agents.
Extended Data Table 1 Agent input space
Full size table
Extended Data Table 2 Agent action space
Full size table

Supplementary information
Reporting Summary (download PDF
)
Supplementary Data (download ZIP
)
This zipped file contains the pseudocode, StarCraft II replay files, detailed neural network architecture and raw data from the Battle.net experiment.

Rights and permissions
Reprints and permissions

About this article
Cite this article
Vinyals, O., Babuschkin, I., Czarnecki, W.M.
et al.
Grandmaster level in StarCraft II using multi-agent reinforcement learning.
Nature
575
, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
Download citation
Received
:
30 August 2019
Accepted
:
10 October 2019
Published
:
30 October 2019
Version of record
:
30 October 2019
Issue date
:
14 November 2019
DOI
:
https://doi.org/10.1038/s41586-019-1724-z
Share this article
Anyone you share the following link with will be able to read this content:
Get shareable link
Sorry, a shareable link is not currently available for this article.
Copy shareable link to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by
General collective intelligence for multi-robot systems
Shiyu Zhao
Nature Electronics
(2026)
A self-correcting multi-agent LLM framework for language-based physics simulation and explanation
Donggeun Park
Hyeonbin Moon
Seunghwa Ryu
npj Artificial Intelligence
(2026)
Self-Play Meta-Reinforcement Learning in Multi-Agent Games
Imre Gergely Mali
Acta Universitatis Sapientiae, Informatica
(2026)

Comments
Commenting on this article is now closed.
Fulvio Pereira
6 May 2020, 22:38
@JessMcDonell What do you think?

Cite this article
Vinyals, O., Babuschkin, I., Czarnecki, W.M.
et al.
Grandmaster level in StarCraft II using multi-agent reinforcement learning.
Nature
575
, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
Download citation
Received
:
30 August 2019
Accepted
:
10 October 2019
Published
:
30 October 2019
Version of record
:
30 October 2019
Issue date
:
14 November 2019
DOI
:
https://doi.org/10.1038/s41586-019-1724-z
Share this article
Anyone you share the following link with will be able to read this content:
Get shareable link
Sorry, a shareable link is not currently available for this article.
Copy shareable link to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative