Agents that imagine and plan

Agents that imagine and plan(deepmind.com)

173 points by interconnector 8 years ago | 58 comments

interfixus 8 years ago |

"This form of deliberative reasoning is essentially ‘imagination’, it is a distinctly human ability"

A completely unfounded supposition, as so often appears to be the case when some human monopoly is claimed. We didn't magically sprout whole new categories of ability during a measly few million years of evolution.

Anecdotally, I see crows getting out out the way of my car. Not confused and haphazardly as many birds do, but in calculated, deliberate, unhurried steps to somewhere just outside my trajectory - steps which clearly takes into account such elements as my speed and the state of other traffic on the road. Furthermore, when it's season for walnuts and the like, they'll calmly drop their haul on the asphalt, expecting my tyres to crush it for them. This - in my rural bit of Northern Europe - appears to be a recent import or invention; I never saw it done until two years ago.

And there's The Case of the Dog and the Peanut Butter Jars. My dog, my peanut butter jars, and they were empty, but not cleaned. Alone at home, she found them one day, and clearly had experimented on the first one, which had bitemarks aplenty on the lid. The rest she managed to unscrew without damage. Having licked the jars clean, apparently she got to thinking of the grumpy guy who woul eventually be coming home. I can think of no other explanation why I found the entire stash of licked-clean jars hidden - although not succesfully - under a rug.

Tell me again about imagination and its distinctly human nature.

bambax 8 years ago | |

Of course. This is extremely annoying, esp. now that the Internets are choke-full of counter examples. Descartes wrote many stupid things about animals being "automatons", but at least he had the excuse of living in a pre-Youtube erra.

> When placing a glass on the edge of a table, for example, we will likely pause to consider how stable it is and whether it might fall. On the basis of that imagined consequence we might readjust the glass to prevent it from falling and breaking.

Or, if you're a cat, you might push it over the edge for the fun of it:

https://www.youtube.com/watch?v=RI1rv3re7as

In fact, the cat in this video appears to have more imagination than the paper's authors.

bryanrasmussen 8 years ago | |

reminds me of my dog and cat when I was a kid in Germany whenever we left the house the cat, a Siamese who had learned to open doors, would open the door to the garbage and the dog would then pull it into the kitchen to spread on the floor for a party.

zimpenfish 8 years ago | |

> I can think of no other explanation

Well, just because you can't think of one, doesn't mean your explanation is correct, surely. This could easily be explained by an instinctual "hide food remnants to avoid attracting bigger things".

interfixus 8 years ago | | |

In some formal scheme yes. In the actual situation no, it could not easily. Or we can reduce the question to a squabble of semantics: Alright, the dog's actions were not conscious and actively planned, but then neither are ours. I fail to see the fundamental difference, and have never really heard a coherent case made that there is one. You are of course right that argument from own lack of imagination is no proof of anything.

akvadrako 8 years ago | |

> Tell me again about imagination and its distinctly human nature.

https://en.wikipedia.org/wiki/Bicameralism_(psychology)

interfixus 8 years ago | | |

Thank you. Yet another exhibit in the case against psychology as a valid scientific endeavour.

ansgri 8 years ago |

https://en.wikipedia.org/wiki/Model_predictive_control

Of course imagining possible outcomes before executing is useful! And it has many uses outside deep learning. No reason to reinvent new words, really. At least without referring to the established ones.

Maybe there is a serious novel idea, but I've missed it.

Basically, if you need to control a complex process (i.e. bring some future outcome in accordance to your plan), you can build a forward model of the system under control (which is simpler than a reverse model), and employ some optimization techniques (combinatorial, i.e. graph-based; numeric derivative-free, i.e. pattern-search; or differential) to find the optimal current action.

gradstudent 8 years ago |

I'm not a planning guy but I work in a closely related community so I'm a least somewhat familar with the area.

Looking at the first paper (https://arxiv.org/pdf/1707.06170.pdf), it seems surprisingly shallow and light on details. So they have a learning system for continuous planning. So what? The AI Planning community has been doing this for ages with MDPs and POMDPs, solving problems where the planning domain has some discrete variables and some continuous variables. Here's a summary tutorial from Scott Sanner at ICAPS 2012: http://icaps12.icaps-conference.org/planningschool/slides-Sa...

Speaking of ICAPS: this conference is the primary venue for disseminating scientific results to researchers in the area. Yet the authors here cite exactly one ICAPS paper. WTF?

My bullshit detector is blaring.

tnecniv 8 years ago | |

I agree. Besides (PO)MDPs, the control people also get into neural networks whenever they come in vogue.

This thesis from 2000 was the first hit for "reinforcement learning control theory" from google: http://www.cs.colostate.edu/~anderson/res/rl/matt-diss.pdf

BTW, people in related fields may work on similar things but don't always publish at the same venue -- labels matter. For example, ICRA and RSS are some of the top robotics venues and people trying to sell themselves as roboticists will prefer to publish there.

EDIT: In the second paper, they learn the model only from the images, not from the game state, which is neat. That should be highlighted more than the one sentence it was given.

gone35 8 years ago | |

Not my field either but prima facie it does seem suspiciously close to good-old hallucinated feedback-like techniques, POMDPs, etc in the planning / ML-oriented robotics community (see e.g. [1]). Didn't read too carefully though...

[1] Boots, et al. (2011) Closing the learning-planning loop with predictive state representations. http://journals.sagepub.com/doi/10.1177/0278364911404092

aqsalose 8 years ago |

The obvious caveat: this is quite far away from my field of expertise. Doubly so, because I'm not an expert in neural net ML and neither in cognitive science. So take this with spoonful of salt. But anyhow, I don't like the word "imagine" here. It seems suggest cognitive capabilities that their model probably does not have.

As far as I do understand the papers, their model builds (in unsupervised fashion which sounds very cool) an internal simulation of the agent's environment and runs it to evaluate different actions, so I can see why they'd call it imagination / planning, because that's the obvious inspiration for the model and so it sort of fits. But in common parlance, "imagination" [1] also means something that relatively conscious agents do, often with originality, and it does not seem that their models are yet that advanced.

I'm tempted to compare the choice of terminology to DeepDream, which is not exactly a replication of the mental states associated with human sleep, either.

[1] https://en.wikipedia.org/wiki/Imagination

jtraffic 8 years ago |

Off topic: I posted this exact article four days ago: https://news.ycombinator.com/item?id=14813807

In the past, when I post exact duplicates, HN redirects me and automatically upvotes the original instead. I wonder why this doesn't always happen. (I'm not bothered, just curious.)

Double off topic: It's very interesting to see how much difference timing makes. My original had a single upvote, and this hit the front page.

boulos 8 years ago | |

The merging is fairly narrowly windowed in time (I think ~hours not >1 day). Sometimes the mods will send you an email (if you have one stored in your account profile) and ask you to repost with a front-page bonus attached. But yeah, timing is everything :).

seanwilson 8 years ago |

I'm likely completely missing the point but how is this concept of imagination different from looking ahead in a search tree? Isn't exploring a search tree like in Chess or Go exploring future possibilities and their consequences before you decide on what to do next?

sullyj3 8 years ago | |

A search tree in something like chess is quite small, and very discrete. You can enumerate every possible action, and exploring the tree to a useful depth is computationally tractable. By contrast, for an agent operating in a complex environment, like a robot in the real world, even if you somehow came up with a coherent process for listing every possible action the robot could take, you might not even be able to store them all, let alone compute their consequences. Think about the sheer amount of information you'd need to process. Moreover, the real world is (for practical purposes) continuous. The robot would have the option of engaging one of it's motor for one millisecond, or two milliseconds, or three milliseconds, etc.

This seems to be tackling the issue of what to do when there are just too many options, and the depth of exploration necessary to make useful predictions is too high, for you to just enumerate everything, heuristically prune, and pick the optimum.

seanwilson 8 years ago | | |

> Moreover, the real world is (for practical purposes) continuous. The robot would have the option of engaging one of it's motor for one millisecond, or two milliseconds, or three milliseconds, etc.

Are there not similar techniques to search trees that are used here? Obviously you wouldn't enumerate all options but you'd think you could guess at some practical ones then guess options between the most promising. Either way, it just feels "imagination" is making it sound like an entirely new approach when heuristically pruned search trees could be described in the same way to me.

PeterisP 8 years ago | |

The difference is that in chess or go, generating a search tree is trivial - predicting the world state after a couple turns of go or enumerating all the possible opponent moves in chess takes just a small bit of straightforward code encoding the (simple) rules of the game.

But how does an agent (not you) figure out a search tree of some nontrivial problem? How do you predict what the world state will be after taking some action if a programmer hasn't done that for you? Heck, even how do you predict what the world state might be after a second of doing absolutely nothing in a real-time environment? This is what this research is about.

e9 8 years ago | |

I am thinking exactly the same thing. Maybe they are trying to get some hype from media?

GuiA 8 years ago |

Why do we need to explicitly design architectures such as the "imagination encoder" the article describes? A proposed long term goal of deep learning is to have AI that surpasses human cognition (e.g. DeepMind's About page touts that they are "developing programs that can learn to solve any complex problem without needing to be taught how"), which was not explicitly designed in terms of architectural components such as an "imagination encoder".

Shouldn't imagination and planning be observed spontaneously as emergent properties of a sufficiently complex neural network? Conversely, if we have to explicitly account for these properties and come up with specific designs to emulate them, how do we know that we are on the right track to beyond human levels of cognition, and not just building "one-trick networks"?

deepnet 8 years ago |

> particularly in programs like AlphaGo, which use an ‘internal model’ to analyse how actions lead to future outcomes in order to to reason and plan.

I was under the impression that AlphaGo makes no plan but responds to the current board state with expert move probabilites that prunes MCTS random playouts.

There is no plan (AFAIK) or strategy in the AlphaGo papers so I find this statement that AlphaGo is an imaginative planner quite curious.

Perhaps someone can reconcile these statements or correct my knowledge of AlphaGo ?

Very interesting papers, it will be nice to see the imagination encoder methods applied to highly stochastic enviroments or indeed a robot in the real world.

ebalit 8 years ago | |

In AlphaGo, MCTS is used to explore many plans and select the best. As far as I know, it then execute only the first action of the selected plan, and start a new planning for the next action. As such, it doesn't "stick to the plan", so you could say that it doesn't have a strategy. But the MCTS is definitely a planner.

deepnet 8 years ago | | |

Yes absolutely, I think your explication is perfectly correct.

Though (IMHO) MCTS is better characterised as evaluating moves rather than exploring plans.

The MCTS only explores the moves in order of likelyhood using the most basic of heuristics, random playout.

The Net outputs likely moves based only the current board position, it formulates no strategy.

No state is stored across moves - each play is independent, relying only on the current board position.

I still don't see anything anywhere in AlphaGo that is a plan, trajectory or strategy.

Neither is there an evaluation of the opponent nor any attempt to outwit them.

That it performs so astonishingly well without a plan is very very interesting and should perhaps give us pause - is planning a hubris ? Do we undervalue our use of heuristics in our own behaviour ?

mehh 8 years ago |

Painful paper to read because of the inaccurate use of the word 'imagination'.

I'm sure the guys who wrote this are smart enough to know its not imagination (perhaps arguably a small subset of the attributes that contribute to what we know as imagination, but not imagination itself).

Which leads me to assume this hyperbole is there purely for the benefit of PR and stock price.

mehh 8 years ago | |

Trying hard to resist saying this, but all I can think of is "infinite polygon engine" ...

ww520 8 years ago |

Evaluating different outcomes far ahead may be very computational intensive. One thing that AlphaGO shows is that a simple approach with Monte Carlo tree search can drastically cut down the search space. The "imagine" part could be just guided random walk ahead in planning, with something like Monte Carlo tree search.

miguelrochefort 8 years ago |

I'm confused. I thought AI was already about this.

Are they introducing something new, or is it just gimmick and buzzwords?

landon32 8 years ago | |

Their architecture is definitely novel. What are you referring to that made it sound like real AI we have could already do this?

thinkloop 8 years ago |

> imagination is a distinctly human ability

Right...