"Another sub-problem of the wider StarCraft (Blizzard Entertainment, 1998) playing problem is build order planning. The problem here is in which order to build certain improvements to the player’s base and in which order to research certain technology, a complex planning problem at a considerably higher level of abstraction than micro-battles. Here, Weber et al have data-mined logs of existing StarCraft (Blizzard Entertainment, 1998) matches to find successful build orders that can be applied in games played by agents."
Ultimate goal being to destroy the opponent, different build orders meet that goal, scouting could show which direction the opponent is moving, and action plan adjusted to counter or take advantage of a weakness.
The catch is that "can I build this without dying in the process" is an almost completely unsolved problem. Even "can my opponent just kill me now" is currently only solved by fairly crude approximation. Without being able to evaluate that statically we're nowhere near evaluating it over the course of a build order, accounting for incomplete information and multiple possible responses.
In general, Starcraft is tricky because very small differences in unrelated concerns can cause wild swings in expected outcome. Compare a Terran wall-in that blocks zealots vs. one with a zealot-sized gap. A few pixels of space -- a pathfinding concern -- radically impacts build order concerns.
Come check out the current state-of-the-art (and lots of earlier stage work as well) at http://twitch.tv/sscait
But it does have an equivalent in scalable cloud infrastructure, where you can literally download more RAM, cores, etc.
You can't just buy addition execution units? Sure there are limiting factors - in games as well.
I don't know if a better way to do it, but really makes it tough to jump in and commit to giving a read through.
Backprop can then refine the most efficient net architectures. Curiously some evolve structures akin to LSTM units.
The draft book that is linked shows RL discussed in Sections 2.6 (pp. 75-79) and 3.3.2 (pp. 122-125), while Neural Networks are briefly covered in Section 2.5.1 (pp. 62-74) with a note about DeepMind's DQN RL agent on p. 91.
The cost of waking a sleeping CPU core is perhaps comparable to the cost of building a factory in-game. Whereas, building a ton of factories that can't be saturated by the incoming cash money is comparable to building a superscalar processor architecture that can't be saturated by the cache memory.