Agents that imagine and plan(deepmind.com) |
Agents that imagine and plan(deepmind.com) |
A completely unfounded supposition, as so often appears to be the case when some human monopoly is claimed. We didn't magically sprout whole new categories of ability during a measly few million years of evolution.
Anecdotally, I see crows getting out out the way of my car. Not confused and haphazardly as many birds do, but in calculated, deliberate, unhurried steps to somewhere just outside my trajectory - steps which clearly takes into account such elements as my speed and the state of other traffic on the road. Furthermore, when it's season for walnuts and the like, they'll calmly drop their haul on the asphalt, expecting my tyres to crush it for them. This - in my rural bit of Northern Europe - appears to be a recent import or invention; I never saw it done until two years ago.
And there's The Case of the Dog and the Peanut Butter Jars. My dog, my peanut butter jars, and they were empty, but not cleaned. Alone at home, she found them one day, and clearly had experimented on the first one, which had bitemarks aplenty on the lid. The rest she managed to unscrew without damage. Having licked the jars clean, apparently she got to thinking of the grumpy guy who woul eventually be coming home. I can think of no other explanation why I found the entire stash of licked-clean jars hidden - although not succesfully - under a rug.
Tell me again about imagination and its distinctly human nature.
> When placing a glass on the edge of a table, for example, we will likely pause to consider how stable it is and whether it might fall. On the basis of that imagined consequence we might readjust the glass to prevent it from falling and breaking.
Or, if you're a cat, you might push it over the edge for the fun of it:
https://www.youtube.com/watch?v=RI1rv3re7as
In fact, the cat in this video appears to have more imagination than the paper's authors.
Well, just because you can't think of one, doesn't mean your explanation is correct, surely. This could easily be explained by an instinctual "hide food remnants to avoid attracting bigger things".
Of course imagining possible outcomes before executing is useful! And it has many uses outside deep learning. No reason to reinvent new words, really. At least without referring to the established ones.
Maybe there is a serious novel idea, but I've missed it.
Basically, if you need to control a complex process (i.e. bring some future outcome in accordance to your plan), you can build a forward model of the system under control (which is simpler than a reverse model), and employ some optimization techniques (combinatorial, i.e. graph-based; numeric derivative-free, i.e. pattern-search; or differential) to find the optimal current action.
Looking at the first paper (https://arxiv.org/pdf/1707.06170.pdf), it seems surprisingly shallow and light on details. So they have a learning system for continuous planning. So what? The AI Planning community has been doing this for ages with MDPs and POMDPs, solving problems where the planning domain has some discrete variables and some continuous variables. Here's a summary tutorial from Scott Sanner at ICAPS 2012: http://icaps12.icaps-conference.org/planningschool/slides-Sa...
Speaking of ICAPS: this conference is the primary venue for disseminating scientific results to researchers in the area. Yet the authors here cite exactly one ICAPS paper. WTF?
My bullshit detector is blaring.
This thesis from 2000 was the first hit for "reinforcement learning control theory" from google: http://www.cs.colostate.edu/~anderson/res/rl/matt-diss.pdf
BTW, people in related fields may work on similar things but don't always publish at the same venue -- labels matter. For example, ICRA and RSS are some of the top robotics venues and people trying to sell themselves as roboticists will prefer to publish there.
EDIT: In the second paper, they learn the model only from the images, not from the game state, which is neat. That should be highlighted more than the one sentence it was given.
[1] Boots, et al. (2011) Closing the learning-planning loop with predictive state representations. http://journals.sagepub.com/doi/10.1177/0278364911404092
As far as I do understand the papers, their model builds (in unsupervised fashion which sounds very cool) an internal simulation of the agent's environment and runs it to evaluate different actions, so I can see why they'd call it imagination / planning, because that's the obvious inspiration for the model and so it sort of fits. But in common parlance, "imagination" [1] also means something that relatively conscious agents do, often with originality, and it does not seem that their models are yet that advanced.
I'm tempted to compare the choice of terminology to DeepDream, which is not exactly a replication of the mental states associated with human sleep, either.
In the past, when I post exact duplicates, HN redirects me and automatically upvotes the original instead. I wonder why this doesn't always happen. (I'm not bothered, just curious.)
Double off topic: It's very interesting to see how much difference timing makes. My original had a single upvote, and this hit the front page.
This seems to be tackling the issue of what to do when there are just too many options, and the depth of exploration necessary to make useful predictions is too high, for you to just enumerate everything, heuristically prune, and pick the optimum.
Are there not similar techniques to search trees that are used here? Obviously you wouldn't enumerate all options but you'd think you could guess at some practical ones then guess options between the most promising. Either way, it just feels "imagination" is making it sound like an entirely new approach when heuristically pruned search trees could be described in the same way to me.
But how does an agent (not you) figure out a search tree of some nontrivial problem? How do you predict what the world state will be after taking some action if a programmer hasn't done that for you? Heck, even how do you predict what the world state might be after a second of doing absolutely nothing in a real-time environment? This is what this research is about.
Shouldn't imagination and planning be observed spontaneously as emergent properties of a sufficiently complex neural network? Conversely, if we have to explicitly account for these properties and come up with specific designs to emulate them, how do we know that we are on the right track to beyond human levels of cognition, and not just building "one-trick networks"?
I was under the impression that AlphaGo makes no plan but responds to the current board state with expert move probabilites that prunes MCTS random playouts.
There is no plan (AFAIK) or strategy in the AlphaGo papers so I find this statement that AlphaGo is an imaginative planner quite curious.
Perhaps someone can reconcile these statements or correct my knowledge of AlphaGo ?
Very interesting papers, it will be nice to see the imagination encoder methods applied to highly stochastic enviroments or indeed a robot in the real world.
Though (IMHO) MCTS is better characterised as evaluating moves rather than exploring plans.
The MCTS only explores the moves in order of likelyhood using the most basic of heuristics, random playout.
The Net outputs likely moves based only the current board position, it formulates no strategy.
No state is stored across moves - each play is independent, relying only on the current board position.
I still don't see anything anywhere in AlphaGo that is a plan, trajectory or strategy.
Neither is there an evaluation of the opponent nor any attempt to outwit them.
That it performs so astonishingly well without a plan is very very interesting and should perhaps give us pause - is planning a hubris ? Do we undervalue our use of heuristics in our own behaviour ?
I'm sure the guys who wrote this are smart enough to know its not imagination (perhaps arguably a small subset of the attributes that contribute to what we know as imagination, but not imagination itself).
Which leads me to assume this hyperbole is there purely for the benefit of PR and stock price.
Are they introducing something new, or is it just gimmick and buzzwords?
Right...
MPC is an useful concept if you have a predictive model that's at least vaguely close to the actual behavior. In some contexts (e.g. modeling of particular industrial systems) programmers could build such a model, but in the general case that's absolutely not feasible, the world is full with problems where, practically speaking, you can not manually build a forward model of the system under control.
So this article is about initial research on systems that can construct such a predictive model/imagination from experience, with a proof of concept that the current deep learning approaches allow us to build systems that can learn such predictive models (which wasn't really possible before) and further development of this concept seems to be the way how we can actually apply things like MPC to problems where we won't build a forward model ourselves; and in the long run, that means pretty much all problems.
I just want to emphasize this point as the crux here. We have many many techniques for AI that involve doing roll-outs once a smart human with domain knowledge hands the system a fully-formed model of the dynamics. Not so many where the dynamics are learned
Once you have a model, you can invert it to make a controller, as the post above points out. For classical linear models, this can be done analytically. For non-linear models, you can use the model to train a controller, running the model with random inputs to generate a training set.
(I spent several years working on the simulator problem, shipped a simulation product ("Falling Bodies", the first ragdoll simulator that didn't suck)[1] and eventually sold the technology to a physics engine startup and went on to other things. Even today, as Sony and Boston Dynamics have demonstrated at great time and expense, there's no market for legged robots yet.)
'Imagination' isn't even a good word for it---in conversational English, we often use the word for thinking about models of fictitious states that can't happen, which has subtle value for humans, but not yet for machines.
I don't know if learned models are novel, but they certainly aren't vanilla MPC. (In my quick scan of them, only second paper mentions learning models)
Originality seems not to be the boundary, since even this simple model seems to imagine world states that they never saw, never will see, and possibly even aren't possible in their environment, i.e. they are "original" in some sense.
If I look at the common understanding of "imagination" and myself, what can I imagine? I can imagine 'what-if' scenarios of my future, e.g. what could be the outcome if I do this or that, or if something particular happens; I can imagine scenarios of my past, i.e., "replay" memories; I can imagine counterfactual scenarios that never happened and never will; I can imagine various senses - i.e. how a particular melody (which I'm "constructing" right now, iteratively, with the help of this imagination to guide my iterations) might sound when played in a band, or how something I'm drawing might look like when it's completed - all of this seems different variations on essentially the same thing, which is an internal simulation (model) generating data about various hypothetical states.
This might be used to evaluate different actions, but it might also be used to simply experience these states (i.e. daydream) or do something else - that's more of a question on how the agent would want to use the "imagination module", not a particular property of the imagination/internal simulation model itself.
But consider that many would object if I stated jumping spiders have an active imagination. Yet, this is not far fetched if you accept that imagination includes planning against a learned model.
Insects are known to be capable of learning. They need to be able to remember routes or learn which locations to prefer or avoid. Jumping spiders are known for their ability to carry out and hold complex plans in their head. Though a form of imagination, most would hesitate to call it that.
(1) http://science.sciencemag.org/content/355/6327/833, https://youtu.be/exsrX6qsKkA?t=44s
There are also a lot of indications that ultimately you need some tricks (i.e. specialized portions of the architecture that bias the kinds of solutions the AI can learn) to be able to learn effectively in the environments we're interested in. For example, we know that there is a time dimension to agent tasks, and that objects don't pop in and out of existence, they tend to exist continuously. These are biases we are free to add to a learning system without worrying about it limiting the ultimate intelligence of the system.
In the limit, the No Free Lunch theorems indicate that there's no such thing as a general learning system that doesn't sacrifice performance on some kinds of tasks. The goal of AI research is to sacrifice performance on tasks that we'll never encounter in favor of getting good performance on tasks we care about.
That is precisely the core of my interrogation. The papers mentioned in the article seem to be about "hand designing" the weird tricks; shouldn't the goal be to build a system that enables the emergence of these weird tricks without involving human design?
No! There's never been any scientific guarantee that "sufficiently complex" neural networks will give rise to anything in specific as an "emergent property", let alone human cognitive abilities like imagination and planning.
>how do we know that we are on the right track to beyond human levels of cognition, and not just building "one-trick networks"?
Steps to write a deep learning paper (from the Cynic's Guide to Artificial Intelligence):
1) Use a training set orders of magnitude larger than a human could learn from, build a one-trick network that gets superhuman performance on its one trick of a task.
2) Hype it up.
3) Research funding and/or profit and/or world domination!
(World domination has never been supplied when requested.)
Not necessarily. I think it comes down to what you mean by "sufficiently complex". If we took a classic feedforward Multi-Layer Perceptron and gave it massive amounts of good data, a long time to train, and a nearly unbounded network size, I'm not sure it would ever develop architecture within itself to plan or develop a robust internal model.
Our neurology took millions of years/generations to get where it is today though natural selection. We might want to tip the scales a bit by engineering the broad architectural pieces and letting emergent behavior fill the gaps.
Although it would be fun to try producing human level intelligence by seeding a physics simulation of primordial soup and letting it run for millions of "years", I don't think that's feasible for most researchers.
And what would be the seed for random number generator?
Why would you think that? We have no theoretical knowledge of how human "consciousness" emerges, and obviously no experimental data either.
>Conversely, if we have to explicitly account for these properties and come up with specific designs to emulate them, how do we know that we are on the right track to beyond human levels of cognition, and not just building "one-trick networks"?
We don't know how far the path leads, but the capabilities of the past 5-ish years progress are leaps and bounds beyond what anything else can do, and this is part of the work of pushing further down that path.
But besides some models being useful, as that old adage says, some are also useful-er than others. Adding stuff such as "imagination" functions as a constraint on the number of behavioral patterns that we are willing to consider, and that might lead us to find that one which we're looking for (i.e. "intelligent behavior") faster than a naïve approach.
Besides, it might not be the case that the likelihood of observing "intelligent behavior" increases over the complexity of the behavior generating process.
It sounds like the second paper is the more interesting one from your description though, so I will give that a read.
It depends on your goals - if your goal is to build a system that can perform smart actions (e.g. build/simulate something comparable to a brain), then that's not required (it may happen to be useful, or not); if your goal is to build a system that can create and build systems that can perform smart actions (e.g. build/simulate something comparable to the evolution process of an intelligent species) then it should.
Two comments:
1. Just because evolution came up with them for humans, doesn't mean if we run an evolutionary algorithm we'll come up with an intelligent system in any reasonable amount of time. There's no reason to believe it's easy to evolve such systems given that we only know of one human-level intelligence in the universe, and it seems to have taken billions of years to come about.
2. This is unnecessarily tying our hands. Evolution often builds very inefficient, overly complicated versions of things that can be simplified dramatically once humans understand the underlying principles behind why they work. In addition we have a huge body of theoretical work on planning, decision theory etc that improves dramatically on our natural learning processes that we can take advantage of. We get no points for not "cheating" here.
A bootstrap intelligence in order to self-plan, self-experiment, and self-modify. Escher hands drawing each other basically... or
Similar conditions to our only known spontaneous intelligence (us). That includes some sort of base code (genetics), competitive environments for rewarding good architectures, and lots of time in simulation. No guarantee this would work either.
Well, some of ours are. At least a few. It's not clear that any of the dog's actions are consciously planned, is it?
Search tree is an approximation of continuous search problems and needs to be built by someone. This approach builds its own search tree.
That is solving problems creatively.