Time Series Prediction Using LSTM Deep Neural Networks

Time Series Prediction Using LSTM Deep Neural Networks(altumintelligence.com)

213 points by shivinski 7 years ago | 70 comments

zwaps 7 years ago |

I find it interesting that Computer Scientists are basically rediscovering statistics.

Now when predicting time series, an issue is that most model (like ARIMA, GARCH etc.) are short-memory processes. When you look at the full-series prediction of LSTMs, you observe the same thing.

So in terms of Time Series, Machine Learning is currently in the mid to late 80's compared to Financial Econometrics.

So if you are a CS, you should now probably take a look at fractional GARCH models and incorporate this into the LSTM logic. If the statistic issues are the same, then this may give you that hot new paper.

RA_Fisher 7 years ago | |

It's been amazing to watch CS (really the Python community, save statsmodels and patsy) discover statistics. For a while I thought perhaps it was me and statistics that was "behind." Over time I realized that it was mostly re-invention of old ideas: one-hot encoding = dummy variables, neural networks approximating polynomial regression, etc. I decided to double-down on statistics and it's really paid off. NN / random forests and the stats-founded but CS-led approaches are very general models. That leaves statisticians a big opening because a more specific model can be chosen to obtain more accurate predictions. These days I'm positioning myself to clean-up the messes / save broken ML models. Turns out [stats] theory is very practical. :-)

curiousgal 7 years ago | | |

Because saying "relevant username" is frowned upon I'll just point out that R A Fisher is "a genius who almost single-handedly created the foundations for modern statistical science"[0]

0.https://en.m.wikipedia.org/wiki/Ronald_Fisher

thanatropism 7 years ago | | |

I know a handful of Econ phds working in data science; and Google, FB etc. have hired top economists as well.

The Phineas Gage of applied quantitative Econ is demand estimation. You typically want to know the elasticity of quantities sold to price so to inform pricing policies. But the problem is that causality is cloudy -- low prices cause a decrease in supply -- so you never know what you're looking at.

People with a decent training in econometrics know how to treat this problem.

I'm pretty sure orgs like Amazon were trying to do naive demand estimation, fell flat on their noses and copped to having to hire people who have thought about the underlying conceptual issues before.

Eridrus 7 years ago | | |

I'm curious what resources you found useful to learn stats modelling and what sorts of approaches have been useful.

On one hand, it's almost a tautoloy that specific models should be better than general models, but I worked on some 2d time series classification with a statistician and afterwards, for kicks, I replaced the entire thing with a CNN+LSTM and it worked just as well as the whole complicated model he had come up with.

cs702 7 years ago | | |

True.

On the other hand, the "more ignorant CS approach" has produced impressive achievements in language tasks (e.g., translation), visual tasks (e.g., image generation), game playing tasks (e.g., Go), agent-in-virtual-world tasks (e.g., DOTA), and robot-in-real-world tasks (e.g., self-driving cars).

Academic statistics departments often seem to be "20 years behind" on all those fronts...

laichzeit0 7 years ago | |

I don't think it's entirely fair to say "Computer Scientists are basically rediscovering statistics". LSTMs are used beyond just time series prediction. It is also quite common in language modelling tasks, which is also a sequence modelling task, and where it works quite well. I'm not familiar at all with using GARCH/ARIMA for something like this.

Also, with neural networks it's very easy and natural to build complex models where different "layers" perform different tasks. So an LSTM can very easily be extended to work bi-directionally (taking data from the beginning of the sequence, and the end of the sequence), adding things like attention, using word-vectors before the recurrent network or just using a character model.

What are the statistical equivalents for this? Because most of the papers on this topic seem to come from Computer Science. Take a look at the epilogue of [1] for a thorough discussion on where statistical theory needs to catch up.

[1] Computer Age Statistical Inference - Efron, Hastie.

srean 7 years ago | | |

> What are the statistical equivalents for this?

That would be nonparametric statistics.

md2be 7 years ago | |

I agree with your sentiments, but there is a contribution that the CS departments made that the statistics, math, Econ (as in econometrics departments) seemed to have overlooked. I remember going to each of these departments in 2002 and asking them why don’t we split the data sets to train and update the coefficients and automate the process. The answer was always the same “that’s trivial and adds nothing to the field”.

digitalzombie 7 years ago | | |

> why don’t we split the data sets to train and update the coefficients and automate the process.

What you just stated is just a pipeline. You can just split the data and train it and automate with tree ensemble that aren't boosting that is if you're talking about doing in parallel.

If you're just saying split and do as batch process in different time interval you can do that with nonparametric bayesian.

CS contribution in creating Deep learning and having it be the best accurate algo for certain data domain is pretty nice. But again Stat care a lot more than prediction.

zwaps 7 years ago | | |

I think that ML is very useful, but remember that forecasting is really not the main objective of econometric models.

Basically, forecasting implies you have a good handle on all properties of the relevant distributions, which in my opinion is a lost cause in social sciences (think external validity).

Instead, econometrics is nowadays mainly concerned with the identification of causal effect using non-parametric or semi-parametric approaches. Basically, you can believably estimate the directionality of some mechanism, but you probably never have the data or model to make a good out of sample prediction. You can, but it's basically implied that approaches that consistently estimate some marginal of a conditional expectation will NOT be that useful to predict a whole stochastic process.

Also, using training and test sets kind of predicates that your process is very stable. Otherwise the "test" set is not really a good test, is it? Again, in social sciences these things are hard to argue. You usually wanna generalize some mechanism from this industry to that industry, not find a good predictor in the same industry. Test datasets still run on the same data!

ML is successful because in practice we DO care about prediction. This allows us to do all the cool things. Because econometrics/stats is so conservative and comes from a causal standpoint, people are just really shy to develop a model for prediction (not everywhere true, but that's the gist). For ML, the primary question is basically how good the thing predicts. When I first tried scikit learn way back, I was so confused it didn't offer standard errors or some other statistical measure. But then I saw how ingrained the in-sample, out-sample process is and I thought well - that's really useful.

tl;dr: Stats and ML have different objectives, but there is a lot to learn in stats for ML

jkabrg 7 years ago | |

Nassim Taleb had some negative things to say about GARCH.

"GARCH does not work out of sample. It is a good story, but I was unable to use it in predicting squared deviations or mean deviations"

I haven't found it in Rob J Hyndman's forecasting tutorial either.

How does it fare in the Makridakis competitions?

VHRanger 7 years ago | | |

You shouldn't listen to N. Taleb on technical matters. He's been a classic mold crank for the last decade or so when it comes to anything serious, relegated instead to writing fluffy books on whatever he thinks is important.

zwaps 7 years ago | | |

GARCH, like I said, is a short memory process and is inherently inadequate for (longer) out of sample predictions. Doing this is possible, but not really correct. Taleb is basically right, of course what he says is probably inflammatory and half wrong, as usual.

Don't forget that most econometrics models are also concerned with identification and causality, less with prediction.

unhammer 7 years ago | |

Apropos, here's a "Time series shootout: ARIMA vs. LSTM" : https://www.youtube.com/watch?v=h9QWefYBfJg&list=PL51yKFtVfM...

jarym 7 years ago |

Why does everyone naively try to predict price? No ‘traders’ are interested in predicting it - what traders do is identify good locations to enter or exit the market.

I.e. places with defined risk where you will know if you’re wrong if it goes against you by x% while you expect a y% gain if you’re right AND y>x is worth more than the number of times you’re wrong.

The types of Algos that work well for this are edge identification ones - I know this because I am (not as well as I’d like) successfully doing it.

LSTMs haven’t performed so well for me in this task but non-NN algos have. CNNs however were promising but didn’t match what I’d come up with - still searching for the holy grail that’ll make me rich!

cshenton 7 years ago |

For anyone considering this, LSTM only starts to pay off if you have many many time series. For a single time series like this one you’re better off using classical time series approaches like ARIMA or other Gaussian state space models.

lettergram 7 years ago |

I've built quite a few of these kinds of models. The real trick is to compare it against other methods AND to properly split A LOT of data. In many cases, (depending on the input data) a random walk does roughly as well as "predicting". This is because signal data (such as stock data) often just follow a random (or seemingly random) trend.

glial 7 years ago |

Seems to me that this is almost dangerous unless the uncertainty (and therefore confidence) of the prediction can be quantified.

RA_Fisher 7 years ago | |

Yep, it is dangerous. If you're not quantifying uncertainty, you can't make safe predictions. I think this is reason for the obsession with "data cleaning" in the ML community, "outliers" aka rare observations sink general models.

GChevalier 7 years ago |

I see here that original poster (OP) of the post tried to use many-to-one LSTMs instead of many-to-many LSTMs. I tell that first by looking at the charts. Then I saw the method named "predict_point_by_point" with the comment "Predict each timestep given the last sequence of true data, in effect only predicting 1 step ahead each time" in his code here: https://github.com/jaungiers/LSTM-Neural-Network-for-Time-Se...

I strongly think the system would be better to perform many predictions at once instead, using seq2seq neural networks. The problem is properly explained here at the beginning of this other post: https://github.com/LukeTonin/keras-seq-2-seq-signal-predicti... This other post is, in turn, derived from my original project here doing seq2seq predictions with TensorFlow: https://github.com/guillaume-chevalier/seq2seq-signal-predic...

OP also forgot to cite the image I made: https://en.wikipedia.org/wiki/Long_short-term_memory#/media/...

Well, glad to see that some similar work as mine can get this much traction on HN. I would have loved to get this much traction when I did my post, too. Anyway, I would suggest OP to take a look at seq2seq, as it objectively performs better (and without the "laggy drift" visual effect observed as in OP's figure named "S&P500 multi-sequence prediction").

In other words, using many-to-one neural architectures creates some kind of feedback which doesn't happen with seq2seq which doesn't build on its own accumulated error. It has a decoder with different weights than the encoder, and can be deep (stacked).

luanton 7 years ago | |

https://news.ycombinator.com/item?id=17902967

The aim of this post is to explain why sequence to sequence models appear to perform better than "many to one" RNNs on signal prediction problems. It also describes an implementation of a sequence 2 sequence model using the Keras API.

djhworld 7 years ago |

I'm currently learning machine learning at the most basic level, this is the sort of stuff I want to work towards though

I deal with time series data a lot at work, I work in broadcasting/media and 99% of the time the data is fairly "predictable" and follows a regular daily pattern, peppered with the odd spikes during big, unpredicatble news events.

dafrie 7 years ago | |

A year ago, the original blog post [1] (it was just recently updated, which is now the one linked here on HN) helped me on a semester thesis, where I quite successfully used LSTM for short-term electricity load forecasting, which also has very strong daily, weekly and seasonal patterns. I used multiple features/variables such as calendar and weather data and found the LSTM models to easily beat ARIMA/TBATS forecasts.

You can find the code repo on my Github link [2], but please bear with the code quality. I only have an economics background, so my coding experience is fairly limited :)

[1] http://www.jakob-aungiers.com/articles/a/LSTM-Neural-Network...

[2] https://github.com/dafrie/lstm-load-forecasting

md2be 7 years ago | |

Time series analysis requires the data to be stationary.

dafrie 7 years ago | | |

Well, I don't want to be pedantic, but don't you rather mean "Most TSA MODELS require data to be stationary"? My experience has been, that often practical TSA actually involves how to deal (testing, differencing, smoothing...) with non-stationarity, which is often not a trivial task...

daviddumenil 7 years ago |

Could this approach be applied to a metric monitoring framework to give earlier/more accurate notifications if when a threshold would be crossed?

Typically these are triggered when e.g. 90% of a threshold has been crossed.

fooker 7 years ago |

So, curve fitting?

shawn 7 years ago |

If anyone is looking to get into machine learning, I've found "Introduction to Data Mining" very useful:

https://news.ycombinator.com/item?id=17808349

First edition: http://www.uokufa.edu.iq/staff/ehsanali/Tan.pdf

Also see "mining of massive datasets" usually available at this link, but it seems to be down: http://infolab.stanford.edu/~ullman/mmds/book.pdf

Which leads me to another point: Many of these books cost $100+. If you don't have those kind of resources, try Library Genesis. It's been very helpful for getting started.