Are ML and Statistics Complementary? [pdf](ics.uci.edu) |
Are ML and Statistics Complementary? [pdf](ics.uci.edu) |
Unless your startup's core strategy involves machine learning, statistics tends to come handier than machine learning in the early days. Most likely, what moves your company is not a data product built atop machine learning models but the ability to draw less wrong conclusions from your data, which is the very definition of statistics. Also, in the early days of a startup, you experience small/missing data problems: You have very few customers, very incomplete datasets with a lot of gotchas. Interpreting such bad data is no small feat, but it's definitely different from training your Random Forest model against millions of observations.
Great read for anyone interested in the debate.
Probabilistic programming is already a hint of this. The most general class of probability distributions is that of non-deterministic programs. ML is just a quick and dirty way to write these programs.
The correct complement to machine learning is cryptography -- trying to intentionally build things that are provably intractable to reverse engineer.
I like the complement with cryptography. I would add another coding method: compression - Approximating the simplest model with explanatory power.
I find the machine learning approach is far more humble. It starts out by saying that I, as a domain expert or a statistician, probably don't know any better than a lay person what is going to work for prediction or how to best attribute efficacy for explanation. Instead of coming at the problem from a position of hubris, that me and my stats background know what to do, I will instead try to arrive at an algorithmic solution that has provable inference properties, and then allow it to work and commit to it.
Either side can lead to failings if you just try to throw an off-the-shelf method at a problem without thinking, but there's a difference between criticizing the naivety with which a given practitioner uses the method versus criticizing the method itself.
When we look at the methods themselves I see much more care, humility, and carefulness to avoid statistical fallacies in the machine learning world. I see a lot of sloppy hacks and from-first-principles-invalid (like NHST) approaches in the 'statistics' side. And even when we consider how practioners use them, both sides are pretty much equally as guilty of trying to just throw methods at a problem like a black box. Machine learning is no more of a black box than a garbage-can regression from which t-stats will be used for model selection. However, all of the notorious misuses of p-values and conflation over policy questions (questions for which a conditional posterior is necessarily required, but for which likelihood functions are substituted as a proxy for the posterior) seem very uniquely problematic for only the 'statistics' side.
Three papers that I recommend for this sort of discussion are:
[1] "Bayesian estimation supersedes the t-test" by Kruschke, http://www.indiana.edu/~kruschke/BEST/BEST.pdf
[2] "Statistical Modeling: The Two Cultures" by Breiman, https://projecteuclid.org/euclid.ss/1009213726
[3] "Let's put the garbage-can regressions and garbage-can probits where they belong" by Achen, http://www.columbia.edu/~gjw10/achen04.pdf
Besides, it is easy to get wrong explanation and, as Vladimir Vapnik in his 3 metaphors for complex world observed, http://www.lancaster.ac.uk/users/esqn/windsor04/handouts/vap... , "actions based on your understanding of God’s thoughts can bring you to catastrophe".
SVM's were so popular, pretty much because they had a firm theoretical basis on which they were designed (or "cute math" as deep learners may call it). As Patrick Winston would ask his students (paraphrasing): "Did God really meant it this way, or did humans create it, because it was useful to them?". Except maybe for the LSTM, deep learning models are not God-given. We use them because, in practice, they beat other modeling techniques. Now we need to find the theoretical grounding to explain why they work so well, and allow for better model interpretability, so these models can more readily be deployed in health care and under regulation.
If some regulations shall require such explanation, the end result will be fake stories like parents tell to the children that Moon do not fall because it is nailed to the sky.
The problem is to replace inept employees who believe "business decisions" are not scientific questions, so that over time there is a convergence to using the scientific method, with legitimate statistical rigor, when making a so-called business decision.
Generally speaking, the only people who want for there to be a distinction between a "business question" and a "scientific question" are people who can profit from the political manipulation that becomes possible once a question is decoupled from technological and statistical rigor. Once that decoupling happens, you can use almost anything as the basis of a decision, and you can secure blame insurance against almost any outcome.
This is why many of the experiments testing whether prediction markets, when used internally to a company, can force projects to be completed on time and under budget are generally met with extreme resistance from managers even when they are resounding successes.
The managers don't care if the projects are delivered on time or under budget. What they care about is being able to use political tools to argue for bonuses, create pockets of job security, backstab colleagues, block opposing coalitions within the firm. You can't do that stuff if everyone is expected to be scientific, so you have to introduce the arbitrary buzzword "business" into the mix, and start demanding nonsense stuff like "actionable insight" -- things that are intentionally not scientifically rigorous to ensure there is room for pliable political manipulation for self-serving and/or rent-seeking executives, all with plausible deniability that it's supposed to be "quantitative."
> machine learning is more concerned with making predictions, even if the prediction can not be explained very well (a.k.a. “a blackbox prediction”)
So in your example: an algo may explain that a car slows down, before taking a turn, because else it would likely crash. It may even get to a threshold ("under these weather conditions, anything over 55Mph is unsafe when taking a turn of such and such degree"). Statistics can help with that.
Welling is not asking for deep learning models to explain why a person got a cancer, but to explain its reasoning when it diagnoses a person with cancer ("I am confident, because in a random population of a 1000 other patients, these variables are within ..."). Statistics can help with that. It aligns with their mind set and tool set.
Regulations are cheated even with these kind of explanations, but that is for another story (black box models may provide some plausible deniability).
> Thus, for many applications, in order to successfully interact with humans, machines will need to explain their reasoning, including some quantification of confidence, to humans.
No doubtful there are cases when an explanation is easy. Often this is because we have a very solid model like physics of a car. In fact since we know the model, we do not need an explanation, we must demand that the algorithm follows the model or declare it unfit.
But how can we expect an explanation for a behavior in a critical situation on a road that was not explicitly programmed and when the algorithm decided to turn to a particular degree bases on a non-trivial inference? Similarly, when an algorithm decides if a patient needs an emergency operation or if they can wait, why can we expect an simple explanation especially for the patient with rare conditions when algorithm again must perform an inference, not a deduction from 1000 very similar cases?
Yet another way is that data analytics platforms are built from the ground up with hard-wired priorty given to scaling out the ability to test multiple hypotheses without any attempt to correct the significance metrics for the multiplicity of testing (or, even subtler, for subject researcher degrees of freedom that further affect the multiplicity of testing). Often, the business stakeholders who are demanding such an "analytics" system aren't even aware of the statistical fallacies they are inexorably baking right into the platform itself (one might call this the "Hadoop disease", though it's not stricly the fault of Hadoop or Hadoop-like tools).
At any rate, I would say in the current climate of "analytics" in business environments, to a good first approximation, one can assume that "make it easy to understand" is exactly equivalent to "throw out any and all difficult yet rigorous science until the thing is cheap and easy, and then just use that."
What I've seen around this is analytics professionals hired under the pretense that their skills to produce accurate scientific conclusions will be used for the good of a business, yet having their conclusions and efforts dismissed for no good reason other than decision makers 'didn't get it' or otherwise just refused to heed the results. So why did they hire experts in the first place, then? To lend the company credibility that it doesn't really deserve? I'm sure lots of the reason for this type of thing is politically motivated, as you previously mentioned.
I do not know enough about statistics to make a (negative) quality statement about it. I know a bit more about machine learning though, and there I also see things like: Picking the most favorable cross-validation evaluation metric, comparing to "state-of-the-art" while ignoring the real SotA, generating your own data sets instead of using real-life data, improving performance by "reverse engineering" the data sets, reporting only on problems where your algo works, and other such tricks. I believe you when you say much the same is happening for statisticians.
Maybe it was my choice of words (careful, sober). I think its fair to say that (especially applied) machine learners care more about the result, and less about how they got to that result. Cowboys, in the most positive sense of the word. I retraced where I got the cliff analogy. It's from Caruana in his video "Intelligible Machine Learning Models for Health Care" https://vimeo.com/125940125 @37:30.
"We are going too far. I think that our models are a little more complicated and higher variance than they should be. And what we really want to do is to be somewhere in the middle. We want this guy to stop and we want that statistician to get there, together we will find an optimal point, but we are not there yet."
Even if your definition of "statistician" only applied to Wasserman or Gelman types, I'd still say that the machine learning folks of the same level exhibit hugely more caution about the theoretical properties of their models (not a knock against Wasserman or Gelman, just a property of the rigor of e.g. PAC learning versus some ad hoc hierarchical model).
As for the comparison with ML, I think a large chunk of the ML community aims for (with good reason) evidence of predictive capacity rather than theoretical soundness. Not everyone. I'll grant that a good portion care deeply about theory. Look at the arguments between SVM folks and "Neural" Nets folks.
It comes down to a difference in focus. Statistics cares about causal inference. Machine Learning cares about prediction. Nothing wrong with either, but theiir techniques are sometimes ill-suited for the other purpose.
I would just add a big third category that probably encompasses the vast majority of people who "work in statistics" and that would be people who are not interested in causal inference nor in predictive efficacy but are interested in a much less rigorous idea of "explanatory modeling" -- and this group generally is very poor with statistical hygiene.