Resurgence of Neural Networks

Resurgence of Neural Networks(tjake.github.com)

262 points by marmalade 13 years ago | 57 comments

moron4hire 13 years ago |

Really interesting stuff.

I had once attempted to build a genetic algorithm for manipulating the synapse weights, specifically because of the problems of traditional back-propagation falling into local minima (unfortunately, some serious shit at work made it drop by the wayside). This RBM approach sounds better than back-propagation, but it also sounds like it would be prone to runaway feedback.

One of the performance problems with neural networks is that the number of cores on a typical machine are far less than the number of input and intermediate nodes in the network. The output nodes are less of a concern as you're trying to distill a lot of data down to a little data, but there is no reason to treat them differently. There are (very few) examples of NNs on GPUs, so that helps, but I've recently been curious to try a different, more hardware-driven approach, just because one could.

Texas Instruments has a cheap DSP chip that you'ns are probably familiar with called the MSP430. It's pretty easy to use, the tool chain is free and fairly easy to setup (especially for a bunch of professional software devs like us, right? right? Well, there's an Arduino-like tool now, too, if not), costs around 10 cents in bulk for the simplest version, requires very few external parts to run (something like a power source, 1 cap, and two resistors), and it has a couple of serial communication protocols built in. I'm quite fond of the chip; I've used it to build a number of digital synthesizers and toys.

For about $50 and quite a bit of soldering time, you could build a grid of 100 of these, each running at 16Mhz, and I bet with a clever design you could make them self programmable, i.e. propagate the program EEPROM over the grid. Load up a simple neural network program, maybe even having each chip simulating more than one node, and interface it with a PC to pump data in one end and draw it out the other. It might not be more useful than the GPGPU approach, but having physical hardware to play with and visualize node activity through other hardware would be a lot of fun.

stiff 13 years ago | |

There are many ways to avoid this, for example have a look at:

http://en.wikipedia.org/wiki/Rprop http://en.wikipedia.org/wiki/Conjugate_gradient_method

In traditional (I am not talking about the "deep" stuff) neural networks optimization is hardly ever the problem though, most often under- or over- fitting is the issue that produces poor performance.

One of the performance problems with neural networks is that the number of cores on a typical machine are far less than the number of input and intermediate nodes in the network.

This sounds odd. Certainly doing neural networks in hardware is interesting, but it sounds a bit like an imagined problem, I mean, when multiplying 20 numbers one does not complain that the number of cores is less than 20. And most often it is the training of the network that is resource-intensive not the actual running of it.

mierle 13 years ago | | |

Both RProp and Conjugate Gradient are descent methods-- they find a path to a local minima from the starting configuration. They do not help with finding the global minimum.

Using simulated annealing to guide random initializations can help find a better minima, but to get the global minima with simulated annealing takes an inordinate amount of time.

PaulHoule 13 years ago | | |

Well, if you've got 20 numbers to multiply you'll get the job done fastest if you do them in parallel with 20 dedicated multipliers.

There's an obvious vision of building a "neural circuit" where there is some specialized processor for each neuron but my guess is that it gets difficult when you consider the communication fabric required between the layers.

Retric 13 years ago | |

The problem with stuff like this is 16Mhz * 100 is far less than 3500Mhz * 4. So doing this in software on the desktop is generally a much better idea let alone 750Mhz * 1000+ if you can get a good GPU implementation. You also hit significant speed of light and bandwidth issues if you want to network a lot of these together because neurons don't just talk to there 4 closest friends.

PS: Still a fun project, just harder than you might think to scale.

moron4hire 13 years ago | | |

It's no doubt difficult to scale, but there is no hobby-scale project that I'm aware of doing something like this. A quad-core Beast-PU from Intel or AMD is great if you're looking to get work done, but terrible if you're looking to open the hood and poke at the bits and innards. The most I've seen lately has been someone loading an existing cluster OS on 64 Raspberry Pis calling it a day. There might even be some interesting considerations for power efficiency and algorithms for disabling and reenabling nodes in the graph. There might even be some insights learned on improvements and/or exploits for other processor network systems that currently exist, like vehicle CANbuses.

paxswill 13 years ago | |

It sounds almost like you want to recreate a Connection Machine [0] using modern(ish) hardware.

0: http://en.wikipedia.org/wiki/Connection_Machine

moron4hire 13 years ago | | |

Yes, I think the advent of cheaply available hardware prototyping is making something like the Connection Machine and Transputers a more viable target for home-grown research. Wouldn't it be neat to be able to make a kit for building one's own Lisp machine, complete with an open, hackable, live-inspectable, Lisp OS? I.e. recreate 40 years ago. How about converting nodes between data storage and processing? There's just a lot of potential fun here.

jerf 13 years ago | |

You may want this: http://www.greenarraychips.com/index.html

moron4hire 13 years ago | | |

That's pretty neat. Thanks for the link, I know a few other people who would be interested in this, too.

regularfry 13 years ago | |

Unless you're specifically trying to build a network with rather interesting temporal characteristics, there's no reason to try to map neurons to cores. Update rules are typically just matrix operations - no need to break out entertaining physical architectures for that.

bsenftner 13 years ago |

I've been working for several years as the "applications developer" for a neural net lab. The neural lab has spent 11 years developing and refining a neural net pipeline - a series of neural nets which given one or more photos of a person's face, the pipeline performs forensically accurate 3D Reconstructions of the person's face and head. The system is used by government & police agencies the world over when trying to determine what a "person of interest" looks like given random photos of their subject. I've additionally exposed an "entertainment" version of the technology which can be seen at www.3d-avatar-store.com. There one can create a 3D avatar, get a Maya rigged version for professional quality animation, as well as license my WebAPI to embed avatar creation into your own software. And the best part is the avatars look just like the person in the source photo.

chanced 13 years ago | |

Thats some pretty impressive stuff. I would suggest changing your site name though, it seems a bit spamish (I almost didn't go for that reason).

cpeterso 13 years ago | |

Very cool! Have you seen FaceGen? It's another 3D face modeling library that's popular with game developers (e.g. Elder Scrolls IV: Oblivion).

http://www.facegen.com/

jph00 13 years ago |

I'm the President and Chief Scientist of Kaggle, which ran the drug discovery project mentioned in the article. As it happens, I did my Strata talk on Tuesday about just this topic. I will be repeating the talk in webcast form (for free) in a few weeks: http://oreillynet.com/pub/e/2538 . I'll be focussing more on the data science implications, rather than implementation details.

tansey 13 years ago |

Nice write up. I gave a presentation on DBNs for my Neural Networks class in Fall 2011. If you'd like references to the relevant papers and some more details on the algorithms and applications, here are the slides: https://docs.google.com/presentation/d/18vJ2mOmb-Cbqsk0aNoUM...

SatvikBeri 13 years ago | |

This was a really fun read-thank you! If you don't mind answering, I've got a few questions:

1. "Vanishing gradients after 2-3 layers"-does this mean that the partial derivatives tend to be smaller on the higher layers, and therefore the network finds local minima that aren't very useful?

2. Step 3 (p 18) mentions that the outputs are not continuous variables, they're binary. What's the reasoning behind that?

tansey 13 years ago | | |

1. Basically. It means that the network has a hard time pulling itself in any direction since the weights in the deeper layers are never really adjusted by very much.

2. It's been a while since I read the paper, but I believe that the justification has to do with the proof of convergence of Gibbs sampling. I haven't tried using continuous values, so I can't give an intuition for what happens in those cases.

nicholasjarnold 13 years ago |

If you're really interested in understanding more about he "hierarchy of filters" quote, and much more related to that theory of how our brains operate, I strongly suggest the book On Intelligence by Jeff Hawkins. Super interesting stuff!

krenoten 13 years ago | |

Here's a page that gives a high level overview of the technology that he has helped to develop: https://www.numenta.com/technology.html

On Intelligence has dramatically changed the way I think about thinking. It's an awesome book.

nicholasjarnold 13 years ago | | |

Thanks for this, I forgot to mention about Numenta. Your comment prompted me to search my old archives for the setup file to "Vitamin D Video", a motion and object detection program that was a very early example of Numenta technology being successfully implemented.

Now, it looks like those Vitamin D people have their own company: http://www.vitamindinc.com/

Even the really early versions of Vitamin D were impressive. Anybody use it for anything interesting now?

mikecsh 13 years ago | |

I cannot agree more, this is one of the most interesting books I have ever read.

kolektiv 13 years ago | | |

Encouraging to hear, it turned up from Amazon earlier! Coincidental. Also arrived is Connectome by Seung - anyone read it yet? Opinions?

return0 13 years ago |

First, it's Geoffrey, not Gregory Hinton.

Here's a very good tech talk from him about RBMs: http://www.youtube.com/watch?v=AyzOUbkUf3M

That said, both approaches loosely mirror the function of the brain, as neurons are not simple threshold devices, and both backpropagation and the RBMs training algorithms do not have a biophysical equivalent.

wfn 13 years ago | |

That's a very good lecture by the way, basically explaining RBMs in more detail, and showcasing some interesting applications of deep unsupervised learning.

tjake 13 years ago | |

Oh sorry. I fixed it. Sorry Geoffrey!

freyr 13 years ago | | |

Second, it's Geoffrey, not Gregory Hinton.

dave_sullivan 13 years ago |

Oh, backprop isn't so bad...

After all, a deep belief network starts with an RBM for unsupervised pre-training, but the finetuning stage that follows just treats the network as a standard MLP using backprop.

Also, you can use an autoencoder instead of an RBM, which I think are getting better results these days? And there are better regularization techniques for backprop now--weight decay, momentum, L1/L2 regularization, dropout, probably more that I'm leaving out.

The pre-training (RBM or autoencoder) helps to not get stuck in local minimas, but there's also interesting research that suggests you're not even getting stuck in local minima so much as you're getting stuck in these low slope, high curvature corridors that gradient descent is blind to, so people are looking into second order methods that can take curvature into account so you can take big steps through these canyons and smaller steps when things are a bit steeper. Or something like that :-)

All that being said, anyone care to weigh in on the pros/cons of RBMs vs something like a contractive autoencoder? No such thing as a free lunch, so what are the key selling points of RBMs at this point? I keep seeing them pop up, but afaik, they don't provide a particular advantage over autoencoder variants.

Great article though, I'm really glad to see more and more people getting interested in neural networks, they've come a long way and people are just starting to wake up to that.

jghrng 13 years ago | |

All that being said, anyone care to weigh in on the pros/cons of RBMs vs something like a contractive autoencoder?

For some problems, it may be nice to have a generative model as offered by RBMS (although Rifai et al. published a sampling method for contractive auto-encoders recently: http://icml.cc/2012/papers/910.pdf). I feel like with RBMs, you can design models which incorporate prior knowledge more "easily" (you may end up with pretty complex models...), e.g. the conditional RBM, the mean-covariance RBM or the spike & slab RBM. Additionally, there's the deep boltzmann machine that consists of multiple layers that are jointly trained in an RBM-like fashion.

Auto-encoders are straightforward to understand and implement. With contractive terms or denoising, the are powerful feature extractors as well.

But as you already noted, if you "just" want to have a good classifier, I think it pretty much boils down to personal preference since you're going to spend some effort on making these techniques work well on your problem anyway.

visarga 13 years ago | |

> After all, a deep belief network starts with an RBM for unsupervised pre-training, but the finetuning stage that follows just treats the network as a standard MLP using backprop.

We could also pipe the raw data through an RBM and then slap a SVM or some other classifier on top.

smalieslami 13 years ago |

In fact we're only scratching the surface when it comes to the generative capabilities of deep models. See e.g. our recent work on using Deep Boltzmann Machines to learn how to draw object silhouettes: http://arkitus.com/ShapeBM/

boothead 13 years ago |

As mentioned in this thread by nicholasjarnold, Jeff Hawkins work on HTM (detailed in his excellent book "On Intelligence") seems superficially similar to this. Has anyone had experience of both approaches. HTM seems to have much more structure in the network, but I know next to nothing about AI and would love to hear from those who know a bit more.

rm999 13 years ago | |

I have some experience with both, but can't give a great comparison. The tldr is I've always had a better impression of Hinton than Hawkins, and have studied/followed Hinton's approaches much more carefully.

In late 2006/early 2007 I was working a lot with standard two layer feed-forward neural networks (first for my research and then for my job). Hinton had a great paper on practical deep networks at NIPS 2006 (a big AI/machine learning conference), which sparked my interest in more complex neural networks. I had read Hawkins' book a few years earlier, and my impression of it was somewhat negative; I thought it was a really interesting book, but it was too fluffy and high-level to be intriguing. He hit a lot of points about hierarchies in intelligence that were intriguing but not new or drastically insightful. After NIPS I downloaded some of Numenta's code (numenta is Hawkins' company) and it was pretty slow on toy problems so I didn't spend too much time with it - this isn't a judgement of their code, I just didn't have the time to dig deeply into it. My impression at the time, which may be unfair, is that Numenta's approach was ad-hoc while Hinton's was principled. I was negatively biased by Hawkins' book and my professors' opinions of him vs Hinton.

leot 13 years ago |

I remember running into Hinton one afternoon back in 2005 while on St. George. He was walking home, and especially cheerful from having just figured out how to do learning efficiently on deep belief nets. It's amazing to see the influence this work has had.

sherjilozair 13 years ago |

MNIST is not a good dataset to show any artificial intelligence on. The dataset is so simple, a good programmer can probably write 100 lines of python to write a classifier for it, based on no machine learning.

Neural Network techniques which work so well in small, easy and trivial datasets like MNIST do not generalize to more serious datasets, and that's where the "and this is where the magic happens" component is needed.

spin 13 years ago | |

Deep neural networks are now being developed for use in Google's speech recognition tools and Microsoft's. Microsoft claims a 30% increase in accuracy by using deep neural networks (developed in cooperation with Hinton's group at U Toronto)...

theschwa 13 years ago |

That Coursera class has been showing a start date of "Oct 1st 2012" for a while. Does anyone know when the next class might be?

SatvikBeri 13 years ago | |

Haven't heard anything about the next class starting, but I believe you can still sign up and download all the videos.

sk2code 13 years ago | | |

I believe you can. Your work won't be evaluated but atleast you have the material to get hands on.

scottmp10 13 years ago |

It is great to see more interest in neural networks, but the types of neural networks the author describes are missing some key aspects of what the brain is doing. I work with Jeff Hawkins at Numenta, and while our product is based on a type of neural network, it is quite different from the class of NN described in this post. For background:

A recent blog post by Jeff: https://grok.numenta.com/blog/not-your-fathers-neural-networ...

And more detailed information on the technology (I would recommend the CLA white paper): https://grok.numenta.com/technology.html

Rnnguy 13 years ago |

Sitting in a class right now reading this while Hinton is teaching neural nets.

textminer 13 years ago | |

I can remember times being a student, learning a lot, but losing focus of how amazing it was I had dedicated time solely to study. As someone working full-time now, who only gets to learn new fascinating math in his precious free time, I implore you to fully embrace the awesome opportunity in front of you. Enjoy the lecture, and milk whatever you can from this man's teaching.

tripzilch 13 years ago | | |

Similarly, I am glad I did not have a laptop/smartphone with wireless Internet [in the classroom] in my college days. I fear I'd have learned so much less. In fact, there's a pretty strong correlation between my study results going down the moment we got cable Internet at my students-home... :-/

(on the positive side, Internet made me loads of international friends, and there's truckloads of things I could not have learned without it)

tjake 13 years ago | |

Tell him about it :)

m12k 13 years ago |

I looked at Restricted Boltzmann Machines for a while when searching for a topic for my master's thesis. One very interesting use is to train an RBM with animations, and then use it generatively to create new animations - Hinton and one of his students, Graham Taylor, wrote a paper about it (http://www.cs.utoronto.ca/~hinton/csc2515/readings/nipsmocap... (PDF)). Imagine if it was expanded, so animators could train an RBM with a body of animation from a character, then simply specify "go from here to here" and the RBM would create an interstitial animation. Afaik a lot of animation work is just boilerplate like "line the character up so we can fire the sit down animation".

maaku 13 years ago |

Great post, and thank you for the link to Hinton's coursera page - I didn't know about that. I also hope to learn a thing or two from your github code. But it was so depressing to read this:

> Now, when I say Artificial Intelligence I’m really only referring to Neural Networks. There are many other kinds of A.I. out there (e.g. Expert Systems, Classifiers and the like) but none of those store information like our brain does (between connections across billions of neurons).

This is a middle-brow dismissal of almost the entire field of A.I. because it does not meet an unnecessarily narrow restriction. (Which, by the way, neural nets don't either. Real neurons are analog-in, digital-out, stochastic processes with behavior influenced by neural chemistry and with physical interconnectivity and timing among other things not accurately modeled at all by any neural net. It's closer modeling to the mechanisms of the brain, but far from equivalent and as a CogSci student you should know that.)

A.I. is the science of building artifacts exhibiting intelligent behavior, intelligence being loosely defined as what human minds do. But in theory and in practice, what human minds do is not the same thing as how they do it.

The human mind does appear to be a pattern matching engine, with components that might indeed be well described as a hidden Markov model or restricted Boltzmann machine. It may be that our brains are nothing more than an amalgamation of some 300 million or so interconnected hidden Markov models. That's Ray Kurzweil's view in How to Create a Mind, at any rate.

However it is a logical fallacy to infer that neural nets are the only or even the best mechanism for implementing all aspects of human-level intelligence. It's merely the first thing evolution was able to come up with through trial and error.

Take the classical opposite of neural nets, for example: symbolic logic. If given a suitable base of facts to work from and appropriate goals, a theorem prover on your cell phone could derive all mathematics known up to the early 20th century (and perhaps beyond), without the possibility of making a single mistake. And do it on a fraction of the energy you spend splitting a bill and calculating tip. A theorem prover alone does not solve the problem of initial learning of ontologies or reasoning about uncertainty in a partially observable and even sometimes inconsistent world. But analyzing memories and perception for new knowledge is a large part of what human minds do (consciously, at least), and if you have a better tool, why not use it?

Now I myself am enamored by Hilton-like RBM nets. This sort of unassisted deep-learning is probably a cornerstone in creating a general process for extracting any information from any environment, a central task of artificial general intelligence. However compared with specialized alternatives, neural nets are hideously inefficient for many things. Doesn't it make sense then to use an amalgam of specialized techniques when applicable, and fall back on neural nets for unstructured learning and other non-specialized tasks? Indeed this integrative approach is taken by OpenCog, although they plan to use DeSTIN deep-learning instead of Hilton-esque RBM's, in part because the output of DeSTIN is supposedly more easily stored in their knowledge base and parsed by their symbolic logic engine.

mhluongo 13 years ago |

Check out the rest of Geoffrey Hinton's work as well- http://scholr.ly/person/3595934/geoffrey-e-hinton

spin 13 years ago |

You can play with a Python version of this same algorithm (cd for rbm) here: https://github.com/Wizcorp/Eruditio

(I wrote it... :-)

frooxie 13 years ago |

Does anyone have a link to a web page (or to a book) that would be useful if you want to learn to program a Deep Belief Network?

countersixte 13 years ago | |

For a (slightly technical) in-depth guide to training RBM's: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf

It discusses how to choose a learning algorithm, selecting hyperparameters, number of hidden units, etc.

jghrng 13 years ago | |

I'd recommend Theano and the accompanying Deep Learning Tutorials at http://deeplearning.net/tutorial/