Beating the World’s Best at Super Smash Bros. with Deep Reinforcement Learning

Beating the World’s Best at Super Smash Bros. with Deep Reinforcement Learning(arxiv.org)

202 points by willwhitney 9 years ago | 55 comments

gwern 9 years ago |

Note: it doesn't learn from pixels but features directly from RAM; and superhuman reaction time, with performance badly degrading when human-like delays added.

Good discussions on Reddit: https://www.reddit.com/r/MachineLearning/comments/5vh4ae/r_a... https://www.reddit.com/r/smashbros/comments/5vin8x/beating_t...

stcredzero 9 years ago | |

I could see this technology used for the bootstrapping of highly emergent MMO game worlds. It could be used to populate a world with fake "player" NPCs that are actually part of a simulated online ecosystem. Give the NPCs a large enough population, such that players cannot exert significant selection pressure, but give the NPCs real selection pressure through interaction with artificial life evolved with Genetic Algorithms. The rate of evolution of the a-life and the NPCs could be tuned to provide a comfortable rate of change for the human players, and the NPCs would insulate the players from the frustrations GAs might cause.

malikNF 9 years ago | | |

League of legends for example has bots appear in PvP games. While these bots are not produced by the game's developers not a lot was done then to get rid of these things. I guess they were tolerated since it just make the queue times smaller for human players.

( http://boards.na.leagueoflegends.com/en/c/gameplay-balance/b... )

jbf1001 9 years ago | | |

Or even some kind of "Always online" MMO like chronicles of elyria. Being able to just tell your character to play while you are away without 'scripting' them would be nice.

https://chroniclesofelyria.com/

SerLava 9 years ago | |

This reminds me of Starcraft AI experiments. They can't actually make the computer smart, so they just jam 2000 button presses per second down the tube, giving every single unit its own simultaneous AI, and it out micromanages anyone.

With Marines usually.

RoboTeddy 9 years ago | | |

I heard that the DeepMind Starcraft project intends to limit their AI's APM (actions per minute) down to something human-like.

nickpsecurity 9 years ago | | |

That's not what Starcraft AI field is about. They actually started with a combo of people doing planner-oriented systems and micro-oriented systems. Hybrids followed that. There's many methods at play. Here's a survey:

https://www.cs.mun.ca/~dchurchill/pdf/starcraft_survey.pdf

The competitions that involved humans showed humans destroyed them by spotting their patterns and beating those patterns. Also with bluffing or distractions such as having one unit do weird things around their base as the human player built up an army. The bots that beat humans will have to learn to spot bluffs and other weird patterns humans will do to screw with them. On top of all the stuff prior AI did with human-level talent. My money is on humans for DeepMind vs Starcraft although I'm happy to be proven wrong.

vladfi1 9 years ago | | |

In starcraft there's a much bigger advantage since humans are inherently "single threaded", and so you can get much bigger discrepancies in APM (or EPM). Smash is more like 1 unit vs 1 unit micro. The precision and timing are still advantages for the AI, but not so much raw parallel compute.

erik 9 years ago | | |

Broodwar bots perform poorly against competent humans though. Micro advantage or not, the strategic decision making isn't there yet.

hkmurakami 9 years ago | | |

Or individual muta micro. That was the winning "strategy" in the first BWAI cup many years ago.

stale2002 9 years ago | | |

Honestly, it still isn't even that good. Best startcraft AI in the world that cheats, still can't beat the low tier pros.

empath75 9 years ago | | |

That's an interesting definition of smart that doesn't including being able to manage hundreds of units simultaneously.

modeless 9 years ago | |

I was similarly disappointed when I read this, but upon further reflection I still like this paper. It is very plausible that both of these problems could be fixed, it would just take a lot more time/power to train, and the resulting system would likely not run in real time making it impossible to test against real humans.

Further advancement in this area will require huge leaps in hardware performance. Luckily in the next few years I expect that the pace of improvement in specialized hardware for neural nets will far outpace Moore's Law.

gwern 9 years ago | | |

I'm not nearly that pessimistic. Beating SSBM is well within the capability of a well-tuned A3C, and definitely within the capabilities of a group like DeepMind. More neuromorphic hardware is unnecessary and with current RL methods, they are more CPU-bound than GPU-bound (take a look at the NN they use, it's trivially small; most of the computation goes towards running many SSB games in parallel in order to generate any data to do some small updates on the NN).

I believe they've handicapped themselves, actually, with their shortcuts: the performance of agents is crippled by the inability to see projectiles due to the choice to avoid learning from pixels (which I bet would actually be quite fast, as learning from pixels is not the bottleneck in ALE), and likewise the use of the other RAM features is the path of the Dark Side - allowing immediate quick learning through huge dimensionality reduction, seductively simple, yes, yet poison in the end as the agent is unable to learn all the other things it would've learned (such as projectiles). I suspect that this is why their current implementation is unable to learn to play multiple characters: because it can't see which character it is and what play style it should use.

So I would not be surprised at all to hear in a year or two that human-delay-equivalent agent using raw pixels could beat human champs routinely.

willwhitney 9 years ago | |

Handling delays (and the uncertainty they entail) is a huge challenge, and I think it'll be a rich area of research. The simplest part of the problem is that delays in action or perception also slow the propagation of reward signals, and credit assignment is still a really hard problem.

Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale.

hyperbovine 9 years ago | | |

Really naive question, can't they just train the net to react instantaneously on a $d$-delayed screen? I don't see conceptually why this approach would succeed with d=0 but fail for (say) d=25ms. (I am too busy/lazy to read the papers and understand what breaks down.)

revelation 9 years ago | |

we instead use features read from the game’s memory on each frame, consisting of each player’s position, velocity, and action state, along with several other values

So it's cheating, presumably knowing the opponents action before the animation even starts to play.

scythe 9 years ago | | |

Smash is played on analog displays precisely so that the lag between RAM and the display can be as small as possible, usually 50 ms. In fact there's a 50 ms delay added to the AI for this reason. However, the AI takes no account of the fact that it takes about 230 ms for a signal to travel from a human's retina through the occipital lobe and motor cortex and activate the motor neurons in the hand. The AI can also generate input sequences that are nearly impossible for a human, such as the "dustless [i.e. perfect] dashdance".

But this is what a top player (who regularly beats both of the players tested in the study) looks like playing against a hand-coded bot:

https://www.youtube.com/watch?v=9qWHM8DNdr8

and this is what the humans eventually learned to do:

https://www.youtube.com/watch?v=be8UDlVuAl8

Even if you add reaction time, a big part of Smash skill for humans comprises accurately manipulating the analog stick. The computer can just declare any angle it wants; you're not having a fair competition until you build a robot thumb that manipulates a joystick the way humans do, IMO. Otherwise a character like Pikachu can recover perfectly every time.

vladfi1 9 years ago | | |

It does not see the opponent's actions before they take effect on screen, and the actual controller states are not part of the feature representation we used (though they actually are somewhere in the RAM).

stagbeetle 9 years ago | | |

Part of the skill in competitive play is to be able to predict what move your opponent is going to do next.

Most mid-level players already have a good grasp of prediction, which is arguably along the sames lines of being able to know with certainty what action your opponent is taking a few frames before he does it.

Coupling that with pretty obscene frame-lag for Smash, it's not really that much of an advantage.

As well that competitive isn't really that impressive considering how limited your actions are by banning items and more dynamic stages (see: restricting RNG). In this way, it's nothing more than a simple chess-bot. Now, if it could actually take in complex environments and multiple tools, that'd be pretty next level.

brilee 9 years ago |

Video of the AI here, playing as the black captain falcon: https://www.youtube.com/watch?v=dXJUlqBsZtE

swanson 9 years ago |

We all know that Mew2King is first reinforcement learning AI capable of beating Super Smash Bros pro players.

https://www.youtube.com/watch?v=z-1YfhUFtbY&feature=youtu.be...

forgotmysn 9 years ago | |

and he still can't beat Armada

Sniffnoy 9 years ago | | |

I am possibly being here the person who accidentally takes the joke literally, but Mew2King has in fact beaten Armada on three occasions: Once at SKTAR 3, once at Smash Summit 2, and most recently at UGC Smash Open.

jwtadvice 9 years ago |

While the AI might be cheating by taking salient features from RAM rather than from pixel values, this is still an incredible feat. Just a few years ago we did not have generic algorithms that could take even salient features and self-learn policies to near this level this quickly.

willwhitney 9 years ago | |

Yup, it's definitely an advantage to get all the correct values from the game state. But not as much as you might think; the vision portion of a DQN or similar trains quite quickly.

Plus, our bot doesn't have any clue about projectiles. We don't know where they live in memory, so the network doesn't get to know about them at all.

dyselon 9 years ago | | |

Can I ask what the feature set looked like? I always kind of wanted to do this with the Skullgirls AI, but never had the time while we were developing it. As a developer, I obviously had full access to the game state, but I'm still not really sure what the best way to represent that state to a neural network is.

Blackthorn 9 years ago | | |

Getting them from RAM instead of the screen doesn't give you an advantage on (for example) DI or ledge teching?

smaili 9 years ago |

As someone who's played for quite a while I can tell you SSBM is one of the most complex games I've ever come across.

jensv 9 years ago | |

Why do you think the game is complex? Fairly simple game with low barrier to entry which is great when you invite guests over for games. Super Simple Button Mash!

chrisdbaldwin 9 years ago | | |

Likely due to the advanced, non-intuitive mechanics that have been discovered over the years. The entry barrier may be low, but the skill cap is high.

lanius 9 years ago |

I'm impressed it beat the likes of S2J and Zhu. I wonder how it'd fare against the Five Gods?

WhitneyLand 9 years ago |

What's the key insight here compared to previous systems?. As far as I can tell, still no one can beat simple non-deterministic games that require some planning.

My favorite example is Ms. Pac Man because it seems so old and simplistic. Been tried by a dozen teams and no one can beat a decent human.

cerved 9 years ago |

Civ AI has denounced this research

fiatjaf 9 years ago |

I was expecting a video.