Am I right in saying that the recently publicised Google Transformer [1] Neural Network is actually the state of the art now, over RNNs?
[1] https://research.googleblog.com/2017/08/transformer-novel-ne...
The problem discussed here is about completing the next word in a partial sentence, where AFAIK some variety of RNN is still best. It might be possible to adapt the Transformer architecture to that task, but that would make it a different model.
1. The number of parameters grows as (number of variables)^(degree of polynomial), which is highly inefficient. You could assume that the polynomial is a linear combination of easily factored ones, but that's equivalent to a neural network with one logarithmic-activation layer and one exponential-activation layer, followed by a linear layer. And most multivariate-polynomial theory probably hasn't focused on this special case.
2. To handle potentially unbounded sequences you'll have to use your multivariate polynomial in some kind of iterative/recursive scheme. That's what an RNN is. You could build an RNN out of multivariate polynomials. It probably won't work very well, because accumulating error will put you in an area of fast divergence. LSTMs use addition with a bounded function to avoid this.
Even assuming that rotations aren't lossy, I get at best a reduction in the number of parameters by a factor of √(number of variables), by fixing the rotation of a set of variables (representing sample point) so that one of them lies on a specific axis. In other words, this reduces the exponent by 1/2, which is still not small enough to make even second-degree polynomials feasible.
However, that doesn't mean I think symmetry priors like this are useless, so if you can point out further literature on this topic, that would be great! (It might also help me understand how exponentiating a group by another makes sense.)
Can you confirm or refute my calculation of the growth of the number of parameters when the input variables are the sample points of an image and the polynomial has to be rotation-invaraint?