Original Paper: Dota 2 with Large Scale Deep Reinforcement Learning
01
1
02
03
1
04
1
05
1
06
07
08
2
09
1
10
11
1
12
1
13
14
15
16
1
17
18
1
19
20
21
22
23
24
25
26
27
28
29
30
31
1
32
33
34
1
35
36
37
38
39
40
41
42
1
43
44
45
46
47
48
49
50
51
52
1
53
54
55
56
1
57
58
59
60
61
62
63
64
65
66
67
68
General Comment / Insight
So, like 40% of the online games that they won were abandoned by humans for whatever reason. But still, a 99.4% win rate is no joke.
Explanation
These phrases - "long time horizons", "partial observability" and "high dimensionality" are described 5 paragraphs later, so don't worry if you don't understand them right now!
Summary
What they are saying essentially here, is that given the nature of the game, their main problem was not "what kind of network could we train to play this game?" but rather, "how could we train our mod...
Summary
Ooh. Since their game environment *and* code both kept changing from time to time, they would have to keep throwing away the model they had trained. The way to get around this was to develop tools th...
Question
Hmmm it's not clear if they're trying to focus on challenges specific to Dota2 compared to OTHER games, or challenges from ALL video games. If it's the former, then talking about only chess and Go d...
General Comment / Insight
For long time horizons, it's a little nonsensical to compare Dota2 with chess and Go! Of course readers know that video games and board games are different. They should compare it with DeepMind's ATAR...
Question
Can anyone who actually plays Dota talk about the implications of the choice of heroes? In their Appendix P (in the original paper), they say that expanding the pool size for heroes led to much slowe...
General Comment / Insight
Yep, acting on only the 4th timestep seems standard now- the original Atari paper by DeepMind also acted on the 4th timestep.
Question
Argh. Once again, it seems that experience playing Dota2 would be very beneficial in reading this paper. I wonder how many AI researchers play video games that often. :)
General Comment / Insight
About the claims that the discrepancies don't introduce bias while benchmarking against human players - during training that's definitely true. I'm going to nitpick a little here - during an actual ...
General Comment / Insight
Overall comments: this paper is like a cutting-edge tutorial in how to *structure and implement* an AI project/experiment. The key contribution is in the *way* they perform this massive experiment. T...
General Comment / Insight
770+ PFlops for 10 straight months! I'm curious how much it cost to train OpenAI Five. This article estimates the AlphaGoZero (which is definitely much less intensive) to alone cost over $30 million ...
Explanation
The training samples only correspond to *small portions* of many different games played with many different policies. Games are played in small portions and the policies were constantly being updated.
General Comment / Insight
Overall comments: this paper is like a cutting-edge tutorial in how to *structure and implement* an AI project/experiment. The key contribution is in the *way* they perform this massive experiment. T...
Explanation
Put simply, the point of surgery was to do experiments more quickly without losing much performance, and *not* to produce the perfect model. Once they arrived at the final model architecture etc and...