Why Economic Models are Always Wrong(scientificamerican.com) |
Why Economic Models are Always Wrong(scientificamerican.com) |
http://en.wikipedia.org/wiki/Overfitting
It is quite a newbie mistake for a scientist to be surprised by it. It affects every kind of modelling.
I thought maybe this article would talk about why economic models are worst than other kinds of models. There are issues that arise when applying scientific models to the economy caused by the fact that when even good models are used to predict markets, the use of the models themselves to do trading, distorts the markets. When multiple parties use good models to compete in markets, they distort the markets in such a way that destroys the predictive power of the models.
There is a great explanation by Glen Whitman of Agoraphilia, that uses grocery line wait time predictions as a metaphor for this:
http://agoraphilia.blogspot.com/2005/03/doing-lines.html
See also:
I think radical underspecification is much more likely than overspecification, really.
(Since I encounter this a lot, let me pre-answer one question in advance, which is "What if only 300 bits really matter and the rest don't matter as much?" and the answer is that the term bit in information theory encompasses that idea already. If you have ten "bits", but they tend to be highly correlated together such that they are usually all 0 or all 1, you in fact don't have ten bits in information theory. Ten bits are, by definition, ten fully-independent true or false values. Bits-in-memory are not the same as information-theory-bits. A real system with 10,000 bits can not, pretty much by definition, be modeled by 100 bits. If it could, it would be a system with only 100 bits in the first place. Information theory cares about the true degrees of freedom available, not about your particular representation of the system.)
This article speaks of the separate problem that economic models are not evaluated in any sort of experiments, and thus are prone to overfitting. This makes them unlikely to even approximate well.
Consider a basic multilayer perceptron-style neural network. Overfitting is a well-understood problem in training an MLP. We work around it by training on a part of the data, and then measuring its accuracy on another part -- much as Carter did in his analysis. If the accuracy is poor, something is adjusted: the size of the hidden layer can be increased, the training set expanded, the duration of the training increased or decreased, or the MLP model discarded entirely.
If increase of the training set or reduction of the duration improves accuracy against the test set, this means we had an overfitting problem.
If you require notions such as "the true minimum number of bits" to be practical, you have to put additional restrictions on the language by which you describe the system -- such as your probability model. The representation does matter.
In the (ML) terms I'm used to, it is as an error surface with many local minima. That is, if you start out with a guess for the parameters and try to progressively optimize the cost function to reach a point where the error is lowest (i.e. the tangent of the error is 0), where you end if is extremely dependent on where you start out. When you find a local minimum, you have found a point where there is no nearby point that is better, but there may be some other point (or many) somewhere else in the model that is better. The very best one is the global minimum.
This is a well known problem in ML for non-convex error functions, and there are various methods for trying to avoid local minima and reach a global minimum.
But this case is actually worse than that -- it is an error surface with many global minima. Each is effectively a perfect fit for the data to date, but give different predictions about future data. Since each function is a perfect fit, it is literally impossible to predict the proper parameters. Which is what underspecification is.
If I'm correct, though, the OP is talking about creating a model with 100 bits of specification, and then creating a model of that model and trying to train those 100 bits, which seems like it should be a more tractable problem.
To me it sounds more like he's just rediscovered the fact that when you try to set a model's parameters based on a limited set of observations (he generated 3 years worth of data from his model, then trained parameters based on that data), there's a lot of uncertainty left over, and you won't necessarily get the right model.
This is quite obvious - if your observations only cover a limited portion of phase space, then you shouldn't be surprised that in a complex enough model multiple parameterizations will fit the observations equally well. You just didn't have enough freaking data to distinguish between the models! In all branches of science, we deal with this problem, and the solution is that you try to find the simplest possible model that accurately explains your data (or, as is happening in physics right now, you try to enumerate the next level of theories that reproduce current data so that you can figure out which experiments you'll need to run to distinguish between them).
So this has doesn't hint at any sort of fundamental flaw with modeling in general (and yeegads, it has even less to do with finance...) - it's just that he didn't have enough data to infer a proper parameterization. Don't build complex models and expect to train them on small datasets...
It doesn't look like overfitting to me. The input data is perfect, and the model is perfect, so it doesn't look like overfitting can occur.
I agree with many of the commenters in this article. This should be common knowledge.
I also, like many commenters, couldn't help but think of model-based climate predictions.
On the other hand, you can model by building conceptual models, calibrating them by hand (using computer methods for the number crunching only) and reasoning about divergences between model results and observed data rather than computing them away with raw power. This is what modeling should be about - a tool for understanding.
(this topic is dear to my heart - I have had this discussion so often. Models are not crystal balls, they are tools for understanding processes. Which is why I am so desperate when another economist, mathematician or computer scientist stands up and wants to model processes that require understanding with their barbaric brute force statistical methods to not have to study things that are outside of their comfort zone. When all you have is a hammer etc.)
IMHO, it was much better when most stock market decisions were mostly based on "fundamentals". Because that way the market was incentivising sound business decisions.
Actually, most trader's models do take market impact into account. If you had a perfect model for the market, I'm pretty sure that you (as a participant) would be included. In fact, your own actions are the easiest part of the model to get right, because you control them entirely.
I don't recall such a period. Is there a particular interval you're thinking of?
The solution to the hypothetical problem posed in the article is to separate the historical dataset into training and testing groups. The models should be generated while only 'seeing' the training data. You will, as the author mentioned, get many models that appear to fit the data. Most of these models will be garbage.
The fun part is when the testing data is introduced against the many models generated above. Most of the models will completely bomb, but a handful may actually predict the previously 'unseen' testing data with high accuracy. Those few models which pass the testing stage are the ones worth their salt.
Due to the self-aware nature of the markets, successful models probably will not be true indefinitely, but it's very possible they may be true long enough to be profitable. The less known your successful models are, the longer they will be successful predictors of the market. Hence why successful quant funds are notoriously secretive with their approaches. Open source would never work in finance.
That's not a problem if you take it as an incentive to improve how much you know about the real world. It's a problem when you put the model before the people, and say that "models got us in trouble because of calibration problems".
An economic crisis is not an unavoidable natural disaster, it's people screwing up other people.
You might be fooled it says something useful if you don't know what a 'model' means in any science.
So what is the point of the article? The author is trying to sell you his book where he most probably makes people who don't know anything about economics feel good or push an ideological agenda.
All his arguments apply equally well to any scientific models which require fitting, in geophysics (as he acknowledges), atmosphere/ocean science, climate modelling, most of biology, ecology, etc.
Why he singled out economics is beyond me.
For instance a great part of growth in the last 100 years has been from man's ability to harness energy from fossil fuels. If your time line is narrow enough, you can disregard the point that fossil fuels is not unlimited, and project continued rise in extraction.
Another example is the baby boom, and the introduction of women into the paid work force which led to continued rise in property prices.
One more is the introduction of laws which suddenly compel people to invest in the stockmarket. It leads to short term asset inflation but generally makes worse investment all round.
That said, it is fitting that an economy is well modelled using the principles of hydraulics. See http://en.wikipedia.org/wiki/MONIAC_Computer
Is this accurate? I remember reading that all the alarms were going off, they were just ignored or the models were "adjusted".
Ballester, P. J., & Carter, J. N. (2006). Characterising the parameter space of a highly nonlinear inverse problem. Inverse Problems in Science and Engineering, 14(2), 171-191. doi:10.1080/17415970500258162.
Ballester, P., & Carter, J. (2007). A parallel real-coded genetic algorithm for history matching and its application to a real petroleum reservoir. Journal of Petroleum Science and Engineering, 59(3-4), 157-168. doi:10.1016/j.petrol.2007.03.012.
All models are wrong. Some are useful.
In economy, some actors have an interest in faking the prediction, even if it is costly for them : it is often valuable to be unpredictable.
It's really no different than the meteorology simulations in the 60's that first discovered the butterfly effect.
http://en.wikipedia.org/wiki/Butterfly_effect#Origin_of_the_...
http://en.wikipedia.org/wiki/Cross-validation_(statistics)
Also who the heck is Wilmott? He just pops up in the last paragraph with no introduction.
Greek mythology resembled a science in those same two ways.
This is absolutely false, except at the micro level. In terms of policy, economics can only inform us of the relative costs of various alternatives. It cannot tell us which alternative is right.
Consider the question of free trade. Virtually every economist agrees that free trade improves total efficiency (viz the Law of Comparative Advantage). However, at the margins, it may harm some individuals. Economics cannot tell us if it is morally right to incur those individualized harms in order to improve the lot of the whole, nor what if anything we should do to make whole those who were affected.
Economics makes predictions, giving us insights. Our morals are then needed to prescribe courses of action.
Of course, a great deal of physics is ultimately about trying to make the parameters go away by explaining them using more fundamental models. But at any given level of abstraction, you'll have parts of the model that are reasoned intuitively, and parts of the model that just are the way they are, for no good particular reason other than that's what you happen to get by measuring.
"Essentially, all models are wrong, but some are useful" -George E.P. Box
This is partly correct but, in general, too strong.
Am I commenting on the OP? Not really!
Why too strong? Because it assumes too little and sometimes more information is available and with the extra information a 'testing data set' may not be needed.
Why are 'testing data sets' important? If about all you have to go on is the 'historical data' and then are just searching for a 'model' based mostly just on what 'fits' the data, then, sure, a 'testing data set' will likely be just crucial. One way to get such a 'testing data set' is to partition the 'historical data' into two parts, use the first to 'fit' a model and the second to 'test' the fit. Of course, there are still risks: If fit 10,000 models, find 10 that fit well and test each of the 10 with the 'testing data set' and accept the model that fits the testing data the best, then still may have some problems from a 'generalized version of overfitting'! As I recall, there has been some mathematical statistics to address this issue.
Where can get by without a 'testing data set'? Broadly if know more than the meager assumptions common in 'machine learning' or 'curve fitting'.
What more can be known? In principle the variety is large.
Examples? Sure: Broadly just simple, old 'regression analysis', looked at as statistical estimation, makes a long list of quite detailed assumptions. E.g., we assume that there is a model the works and that we know in good detail the form of that model. We assume a lot about the 'historical data' we have, E.g., we assume 'homoscadasticity' and mean zero, independent and identically distributed (i.i.d.) Gaussian for the errors. We make some assumptions about dimensionality (e.g., to get around 'overfitting'). Then the usual derivations give minimum variance, unbiased estimates of the unknown parameters and more, all without any use of 'testing data'. "Look Ma, no testing data required!".
"Yes, son, but as your father kept telling you, a LOT of assumptions are required, and the assumptions are not all easy to verify. Or the regression derivations are a nice logical trip from island A to island B we would like to get to but we don't always know how to get to island A.".
Other examples? Sure: Calculate the trajectory of a space craft doing 'slingshots' in the inner solar system and then reaching, say, Saturn. We start with Newton's second law, his law of gravity, maybe a little about the solar wind, a lot of details about the orbits of the planets, and do some good numerical work with an initial value problem of an ordinary differential equation. We build a 'model' but don't really 'fit for parameters' or use 'historical data' and have no real use for 'testing data'. Why? Because we believe in Newton's laws and our numerical work. A 'model'? Yes. Fitting 'parameters'? No,
Can there be a connection between space craft trajectories and economic models? Sure: Bring more assumptions than just curve fitting. An example is to bring, essentially, accounting. So, then can get a Leontief input/output model. We bring basically just accounting data and not other historical data, do no real 'parameter' estimation, and use no 'testing' data. If the input data is noisy, then, sure, so will be the output and we might do some work with confidence intervals. Still we don't check with 'testing data'.
More examples? Sure: The broad field, with many techniques, of distribution-free statistical hypothesis testing is based on historical data and some assumptions and really needs no testing data. What is obtained is much like a 'model' where can plug in new data and get the intended results. The assumptions are typically that the data is i.i.d.
Net, a lot can be done beyond the common approach of machine learning curve fitting.
He used a perfect model (of a hypothetical world) which had exactly the right parameters, and then he calibrated it using exactly correct data.
So I don't see how this could be underspecified or overfitted. Can you please explain?
"The information-theoretic argument demonstrate that a model cannot exactly match the reality unless it's as complex as the reality."
In this case he defined his model to be reality.
As far as overfitting goes, that applies when you have a parameterized general model and need to discover the correct parameters. You probably won't get the exact correct parameters; instead, you'll (hopefully) get parameters that approximate reality well.
More closely matching the training data can actually make it a worse approximation in the general case.
What if reality is self-similar at certain scales? You could generate something that resembles the whole from one part of it.