Why Deep Learning surprises me

Why Deep Learning surprises me(thevivekpandey.github.io)

69 points by thevivekpandey 8 years ago | 80 comments

Most comments here are to the tune of "Well DL is just a bunch of correlations and statistics, it's not really understanding anything"

Ok, well I can also say "humans are just a bunch of chemical reactions and electrical signals."

The beauty of DL is in it's simplicity and really we're at the very starting point of seeing it work with extremely sparse networks (compared to biological intelligence). The fact that it works so well with such limited data in narrow domains should be energizing.

candiodari 8 years ago | |

I really do want to ask the author that question, given that he focuses so much on the "weird" idea that everything turns out to be just numbers moving around over time.

"What do you suppose the input to the human brain looks like ?"

Since I have kids I have come to realize that the same thing you see in neural nets you see in human beings. Understanding exists, but it is mostly not how human beings respond to the world around them. Mostly we are a minimally generalized dictionary, we know a long long very damn long list of "tricks". If A happens, B will follow. There's very little along the lines of "objects fall along a parabolic trajectory".

This leads to generalization errors, and the surprising thing is you see those in humans ! Kids having learned to open one type of door do not know how to deal with an (even very slightly) jammed doorknob, they don't recognize differently shaped doorknobs as doorknobs, etc. First few days they don't even realize that if pushing won't work, pulling might. So the understanding of opening doors really does start out on the level of "move the free end of the small cylindrical object in the middle of the door that's parallel to the floor down, and then push", and if any of those conditions fails, well, door's going to stay shut.

And this is exactly the very hard problem you encounter with neural nets : finding the right balance between specificity and generalization. But one saving grace is that if you specialize in enough special cases, you can get around without having a general understanding, and that's exactly what's happening with kids.

AndrewKemendo 8 years ago | | |

Yes, my advice to any aspiring AI researcher is to have a kid and approach them similar to how Piaget did

OtterCoder 8 years ago | |

Enervating? I find it the opposite. It's exciting and energizing to think of what we can do with this.

AndrewKemendo 8 years ago | | |

Gah thanks for the catch, gotta love autocorrect.

carapace 8 years ago | |

Babbage is said to have owned a dancing automaton he called "The Silver Lady" that was delightfully lifelike in its movements. I wouldn't say that such a device "understood" dance, no matter how perfectly it moved.

kthejoker2 8 years ago | | |

Given today's technology and sufficient time, you could devise an AI that could watch dance videos, "understand" dance, and create its own Silver Lady.

AndrewKemendo 8 years ago | | |

You're arguing a strawman. I never claimed that understanding was based on a phenomenological evaluation of an output. Rather, reductionism is not an argument against complexity.

jeremynixon 8 years ago |

The mysticism around ‘Emergence’ is just a modeling error where people only abstract in one way (say, down to cells) and don’t include something important like the interaction between cells in their reductionist model of the system. It’s like creating a graph without the edges. And so when those effects have manifest consequences at a higher level, it feels like they appeared as if by magic.

ktRolster 8 years ago | |

IT's kind of like saying, "This river is not the same river that it was upstream, and yet it is. The river is not the same water of last year, and yet it is the same river."

The phenomenon is entirely well understood by all involved, and yet coming up with a reasonable definition is hard. http://existentialcomics.com/comic/164 So it's easier to be mystical.

bitL 8 years ago |

I think author is stretching arguments here a bit - DL is just partitioning space according to some pre-baked associations given to it during training; in this case it's more like a non-linear optimization where we want to end up with N-million dimensional objects of certain shape obtained by optimizing some objective function allowing predicting similar associations. It doesn't have much with the actual innate quality of understanding. Maybe reinforcement learning with deep learning together (DRL) can move us towards such a quality at least in a mechanical sense.

mathgenius 8 years ago |

I don't see why "understanding" is equivalent to mere pattern recognition. Even using this word "recognition", what does that mean? It's another word like "understand". These algorithms are just pattern patterning. They don't even know they are patterning, that is a meta-property assigned in (or by) a context.

ffwd 8 years ago | |

I agree, but I think human knowledge can be represented as either a graph, hierarchy or network of patterns. Like my knowledge of the letter 'A' is a network of connections to patterns 'language', 'english', 'alphabet', whatever else, and if the computer can do the same, it can use that knowledge of that network (as a whole separate entity) to make a decision, so to speak.

Consciousness does come into it since we have a pretty visceral sense of it, and especially when we mentally trawl through our patterns to make some story, but really understanding should just be creating new patterns from existing patterns and the ability to utilize them as distinct entities in some way (rather than being emergent in the system implicitly and only being utilized by accident, say as emergent behavior randomly occurring because of local constraints)

AndrewKemendo 8 years ago | |

I don't see why "understanding" is equivalent to mere pattern recognition.

You're underestimating what goes into high accuracy pattern recognition as well as assuming that patterns exist for only one vector and in a single context.

If I asked you to explain how you "understand" some concept, it will inevitably be how the structure and mechanics of it relate to others and in what context. All of those are simply patterns that are abstracted or made more granular.

For example, how do you "understand" what a car is? You would inevitably describe some definition of a car mechanically and the context in which a car operates. So it's a contained combination of metal and plastic objects and usually liquids with a mechanism to transfer power through gearing and wheels, a compartment for humans, some control mechanisms etc... (definition of the technical), but it can't operate in water (boat) or in the air (airplane).

Each of these things is learned through exposure over time, and recognized as connected, to come up with a "understanding" of a car even before it's formally defined. This is why children ask if cars can fly or go in the water.

macleodnine 8 years ago | | |

I whole heartedly agree with this. I think your first paragraph sums up what the majority of people are missing.

iamleppert 8 years ago |

He's making the age old mistake of conflating mapping input and outputs to intelligence.

Intelligence is not defined by the ability to recognize letters. Or play a game of Go.

Deep learning is a powerful tool for creating systems that have an ability to map inputs to outputs with very noisy, non-linear or complex data.

The mapping itself may be complex, but it's not going about solving problems like a person would. It has no idea what letters are, and how they fit into its world. It has no concept of self, cannot contemplate its own existence -- and perhaps most important of all, has no free will.

The moment we have some kind of deep learning or AI that has free will and can express interest in something other than what it has been trained on, I would say we are closer to unraveling the mystery of consicenesss and human intellect.

Even babies are animals exhibit many forms of free will, decision making, and novel behavior that cannot be explained with our current observations of route deep learning techniques.

deafcalculus 8 years ago |

Consciousness is likely just a whole bunch of computation.

I suspect "What is consciousness?" will go the way of "What is life?". We more or less understand things that make up a bacteria. Those components aren't alive although the bacteria is. So, it's just a matter of definition.

RivieraKid 8 years ago | |

Couldn't disagree more.

Consciousness is misunderstood by surprisingly large number of smart people. The common view is that there's science and that's it, when actually, science just describes the patterns of what we observe via consciousness, which is in a way above science.

Regarding "what is life?", that's fundamentally different. Life can have fairly concrete definitions. Basically, it's a physical matter with specific properties, that's it. Whereas with consciousness, it's much more complicated. But defining, say, the feeling of pain as a physical matter with specific properties doesn't make much sense. "Pain is when these neurons are charged."

Also, what is a computation? A falling rock does perform a computation of a physical process. Any physical system can be said to perform a computation - or even a myriad of different computations, depending on how the physical state is interpreted.

mbrock 8 years ago | |

What do you mean by computation? What's an example of something that isn't computation?

deafcalculus 8 years ago | | |

In this context, I intended it to mean a combination of addition, multiplication, and a small set of relatively simple non-linear functions.

komaromy 8 years ago |

> Computers understand things as well as us, perhaps better.

If this was limited to chess, I would unquestionably agree.

If it was limited to image recognition, I would tentatively agree, although things like [0] make me cautious (admittedly, that was from March, and I'm not familiar with progress since then).

However, the author seems to be generalizing beyond those two domains, to the limits of human understanding. That seems like a couple-orders-of-magnitude leap too far to me. For example, I don't know of any autonomous system capable of understanding a short novel with simple language and writing a one-page summary of it, as might be expected of a human ten-year-old.

[0] https://twitter.com/Meaningness/status/846478348947668992

bitL 8 years ago | |

To what extent those universal perturbations are causing problems due to insufficient image augmentation? Or due to deficient optimizer used while training CNNs (all optimizers are just heuristics with nasty failure cases)? Could we train a GAN-like DNN on those perturbations to make their effect disappear?

amelius 8 years ago | |

Perhaps your reference [0] would work on the human brain too, if only we could know all the weights assigned to all neurons/axons of the given human this should apply to :)

statusgraph 8 years ago | | |

Interestingly, you can treat the NN as a black box (ie, not look at individual weights or even the architecture) and still derive adversarial cases:

https://arxiv.org/abs/1602.02697

komaromy 8 years ago | | |

Could be! We'll need a volunteer comfortable with having their neuronal weights experimented on.

jcoffland 8 years ago |

> Now I find it hard to hold on to the belief that I understand what is "A" and what is "B", while computer can only compute.

Humans being surprised by the computer should not be the yardstick for AI. A trained neural net can recognize the letter "A" and differentiate it from things that are not "A" but it does not know that "A" is part of the Latin alphabet and that there are other alphabets that form written human languages.

The day the computer spontaneously invents a new and usable alphabet without having been specifically designed to do so is the day I will concede we have hard AI. We have a long way to go. Until then it's just a bunch of hotdog/not hotdog classifiers.

mannykannot 8 years ago |

I have always believed that understanding is an emergent property of physical processes that could be modeled computationally, but I do not think deep learning has yet demonstrated that it has yet achieved it. Some of the evidence comes from the ways it fails, such as 'recognizing' images that humans would understand are not what the systems think they are, and being confident in decisions that make no sense. These situations occur precisely because of a lack of understanding. I am open to the possibility that deep learning alone might achieve understanding, but I think it is more likely to succumb to the law of diminishing returns before it gets there.

inventtheday 8 years ago |

Actually, computers are conscious as well. Consciousness is simply a system of information that operates on a continuous sense/plan/act loop. You could argue that they are "less" conscious, but to say that they are unconscious is to make the same mistake as people have made for years by saying that computers cannot "understand" anything.

Some people push back on this by saying computers have no sense of self. Thats not true. Most computers do have internal state representations about themselves. Take a driverless car for example. When it does localization, it's constantly referencing its own shape and speed and comparing it to the environment. That's a sense of self.

Whatever philosophical barriers we place between ourselves and machines (and animals/nature for that matter), one thing is for certain: they will eventually debunked.

westoncb 8 years ago | |

This is what's often referred to as the easy problem of consciousness—there is also a 'hard problem' (https://en.wikipedia.org/wiki/Hard_problem_of_consciousness). The tricky thing is people often just use 'consciousness' to refer to either, so most discussions of the subject are talking about completely different things.

inventtheday 8 years ago | | |

Thats interesting. Thanks for sharing - I'd never come across that term before. Lack of a common definition for consciousness definitely plagues the discussions around it.

In this case though, it is my opinion that there is no definite distinction between the two "types" of consciousness you are referring to. In my opinion, all consciousness exists on one vast spectrum. The distinctions between types are just constructs of human thought that were erected to preserve our sense of self and specialness as people.

AndrewOMartin 8 years ago |

Searle's Chinese Room Argument was specifically aimed at people claiming an algorithm could understand something because of its behaviour.

It applies to Deep Learning as much as it does Schank and Ableson's script understanding system.

inventtheday 8 years ago | |

The Chinese Room Argument is deeply flawed because it assumes that language translation in humans is a conscious phenomenon. In fact, if you're proficient in a foreign language, you can relate to the fact that for the most part translation happens in the black box of the subconscious mind. The words "bubble out" naturally. The black box of the subconscious mind is no different than the black box of the Chinese room. "Understanding" in the traditional sense is absent from both processes.

nightski 8 years ago | |

If a human can translate perfectly without understanding the conversation, then that to me implies that the mind itself gives no innate intelligence similar to the computer. It must be taught the meaning of things, exactly as a computer would need to be. I'm just not following his logic, it feels like a straw man. Of course the computer doesn't understand the meaning of the symbols it is translating, because it was never given data to teach it that (similar to a human in the scenario).

tomxor 8 years ago |

Perhaps i'm arguing semantics and this is what the author means but... in your primitive mind, you are able to recognise something even if you have no idea what it is, you can learn to recognise.

The ability to introspect and analyse what makes that thing unique or understand what it's purpose or origin is has everything to do with being sentient.

We might not know what exactly being sentient is but recognising an image is like lobotomising the brain to just be a visual cortex, it can match but the other networks that work in the abstract are not there.

freech 8 years ago |

http://lesswrong.com/lw/iv/the_futility_of_emergence/

kumartanmay 8 years ago |

Isn't human's greatest power in ability to think and imagine. Even animals are conscious and understand their surroundings?

dna_polymerase 8 years ago |

> Given enough examples, computers can understand what is letter "A" and what is letter "B".

Meh.

Given enough examples, computers now can distinguish letter A and B but distinguishing is not understanding. You could argue that after learning the Network just uses an instruction set and from the outside that may leave the impression of understanding but it really does not. Isn't that basically the Chinese room thing?

hyperbovine 8 years ago | |

In fact recent research indicates that you can randomly relabel the training examples and the network still achieves zero training error (https://arxiv.org/abs/1611.03530). So it is not "understanding" anything intrinsic or fundamental about the letter "A". Rather, it's just storing training examples somewhere inside of its millions of parameters, which sounds a lot less impressive.

jimfleming 8 years ago | | |

That is not a conclusion that can be drawn from the findings in the paper. While the models they evaluate can achieve zero training error on random labels, the test error is obviously not zero: it doesn't generalize at all. However, training on real labels often finds solutions which can generalize quite well.

A better way to summarize the central question of this paper would be: "Why is it that a large-parameter model trained with gradient descent on real data _could_ just memorize all of the training data (it has the capacity) yet finds solutions which generalize well to an unseen test set?"

To say that deep learning is _just_ memorizing its training data would be incorrect. We have empirical evidence to the contrary and this paper is part of that evidence.

nonbel 8 years ago | | |

They say some weird stuff in this paper:

>"Specifically, we take a candidate architecture and train it both on the true data and on a copy of the data in which the true labels were replaced by random labels. In the second case, there is no longer any relationship between the instances and the class labels. As a result, learning is impossible."

This is like saying learning someones phone number is impossible because there is no relationship between the person and the number.

nonbel 8 years ago | | |

>'So it is not "understanding" anything intrinsic or fundamental about the letter "A".'

What is there to understand? As far as I know the shapes we use for letters are arbitrary (at least at this point).

singham 8 years ago |

Daniel Dennett has been saying this for quite a while.