Neural Language Modeling from Scratch

Neural Language Modeling from Scratch(ofir.io)

97 points by ofirpress 8 years ago | 10 comments

hacker_9 8 years ago |

"In recent months, we’ve seen further improvements to the state of the art in RNN language modeling. The current state of the art results are held by two recent papers by Melis et al. and Merity et al.. These models make use of most, if not all, of the methods shown above, and extend them by using better optimizations techniques, new regularization methods, and by finding better hyperparameters for existing models. Some of these methods will be presented in part two of this guide."

Am I right in saying that the recently publicised Google Transformer [1] Neural Network is actually the state of the art now, over RNNs?

[1] https://research.googleblog.com/2017/08/transformer-novel-ne...

yorwba 8 years ago | |

The Transformer network is solving a different problem: translating a given sentence into another with the same meaning.

The problem discussed here is about completing the next word in a partial sentence, where AFAIK some variety of RNN is still best. It might be possible to adapt the Transformer architecture to that task, but that would make it a different model.

technomalogical 8 years ago |

Misread this, I thought this was Neural Language Modeling in Scratch, the visual programming language.

tw1010 8 years ago |

Whenever I see stuff like this I think; why RNNs, why not just multivariate polynomials? Every property you want from an RNN you can get from polynomials, except polynomials are significantly more exhaustively studied. Want certain invariants to be guaranteed? You got it! Just look up any undergraduate textbook on algebraic geometry. I'm glad that Yann Lecun went against the established paradigm of creating image filters manually. But why stop there. Let's go beyond the constraint of using only the mathematics commonly taught in engineering schools. Let's take some inspiration from other departments. Cross pollination is the key to revolutionary jumps in innovation.

yorwba 8 years ago | |

Two reasons multivariate polynomials are not commonly used in machine learning:

1. The number of parameters grows as (number of variables)^(degree of polynomial), which is highly inefficient. You could assume that the polynomial is a linear combination of easily factored ones, but that's equivalent to a neural network with one logarithmic-activation layer and one exponential-activation layer, followed by a linear layer. And most multivariate-polynomial theory probably hasn't focused on this special case.

2. To handle potentially unbounded sequences you'll have to use your multivariate polynomial in some kind of iterative/recursive scheme. That's what an RNN is. You could build an RNN out of multivariate polynomials. It probably won't work very well, because accumulating error will put you in an area of fast divergence. LSTMs use addition with a bounded function to avoid this.

tw1010 8 years ago | | |

The growth issue is only really a problem if assume you're picking your polynomials from R[x,y,...]. But there are other choices that would be more appropriate. Often you want the model to be invariant to rotation (e.g. if you're doing computer vision), in which case you'd use R[x,y,...]^G, where G is the group of rotations.

yosyp 8 years ago | |

Could you recommend further reading about multivariate polynomials and their applications to problems of this type?