Teaching Robots to Understand Semantic Concepts(research.googleblog.com) |
Teaching Robots to Understand Semantic Concepts(research.googleblog.com) |
I don't want to discount the value of this research. It's absolutely necessary to do this sort of basic proof-of-concept testing of these ideas. But the claim being made implicitly here is way beyond what's actually going on. The software understands nothing, and the "semantics" extend to simple image-matching of objects, but there's no deeper meaning associated with the labels, so I think calling that "semantics" is a major stretch.
This approach is not going to teach a robot how to pick fruit, or serve food, or clean floors anytime soon. In the best case where this is even a workable approach, research like this is just the first of millions more tiny steps along the path. Anyway I think it's naive to assume that a good way to approach automation is to write software to let robots learn by watching humans do the desired task. As cool as that sounds, chances are that approach would ultimately be a massively inefficient way to solve the problem. It'd be like trying to invent the automobile by building a steam-powered horse robot that can tow carriages. The critical purpose is being overlooked in favor of a cool-looking but totally impractical toy demo.
Google is still using what we could call a very rudimentary form of AI as they describe "Unsupervised learning on very small datasets is one of the most challenging scenarios in machine learning. To make this feasible, we use deep visual features from a large network trained for image recognition on ImageNet".
For example:
1.'grab that red ball'
2.'turn the handle on the door 90 degree then pull it out'.
Just like how people would do it, a video and a piece of instruction listed and a label to indicate whether this task is a success or not. Then you show a different setting and a new instruction, if the model successfully generalize and understand the semantics behind it, it should carry out the instruction successfully.
So yeah maybe it's not anywhere near human intelligence. But its still cool they've made a robot smarter than my dog.
EDIT: To provide a bit more information. I provide a video because I think it is much more impactful to see what happens than to read it. In this video you can see how an appropriate history of reinforcement will lead to very complex behavior in simple animals. By complex I mean behavior like "talking" and "problem solving".
Here is the 2nd part: https://www.youtube.com/watch?v=erhmslcHvaw
What I can't tell at all from this article is whether that day is years or decades away.
Semantics part: Seems like the idea is we can "transfer" knowledge from prior labeled samples so that we don't need to do as much new work labeling sample images with semantic labels.
Grasping part: "Emulating human movements with self-supervision and imitation." High-level imitation based on visual frame differences avoids needing to manually control actuators. Not sure how this works exactly
Two-stream model: ventral network asks What class, dorsal network asks Is this how we should grasp this object. The benefit is that we can make use of all the automatically generated (robot-generated) grasping data without having a human supervise all that automated grasping, e.g. "This process is a successful way to pickup this object, and also this object is an apple." The ventral network ties back this the grasping data (without object labels) to object labels, which allows for semantic control of the trained robot e.g. "Pickup that apple".
My personal opinion, having worked a little on this problem, is that it's very much like autonomous driving. Getting 90% of the job done is fairly straightforward, but getting that final 10% to make a system commercially viable will take years. Commercial growers don't (yet) have any pressure on labour - it's too readily available and too cheap.
http://www.agrobot.com/ solves this in quite a neat way. Rather than grasping the fruit, they just scoop up each berry in a cup with a blade on one side which severs the stem. This means you don't care about the precise shape of the berry either.
Yes, it's both. Depends on the task.
Afaik we can already program industrial robots by showing them what to do. Robot records movement in its actuators, then keeps replaying over and over.
And we already have machine learning agents that can observe your behavior and learn which news stories are "good" news stories and which aren't. (algo newsfeeds). You could use them to observe a magazine editor and after a few issues, you'd have a robo editor.
It reall, really depends on the job. And in a job, maybe some subtasks can be automated.
Perhaps the picking cannot be automated, but the transportation can. Or instead of computer vision, have a person click the fruit on a screen so the robot know where to pick.