“The Unreasonable Effectiveness of Deep Learning Representations”(blog.insightdatascience.com) |
“The Unreasonable Effectiveness of Deep Learning Representations”(blog.insightdatascience.com) |
For instance, 'wedding pictures'. A cake being cut; a cute kid throwing flower petals; a black-clad clergyman; a hand with a ring on it. Any human could categorize a pile of pictures into those that are in the 'wedding' category, and those that aren't. But no strategy based on weighting pixels is ever going to get there.
Likewise, 'cute' or 'scary' or 'funny'. And on and on.
That is what feed forward networks and back propagation do for us. So why do we keep using them?
Then there's the statistics of it all.. what are we actually modeling? 'The real world' you say? Think again.
Data has to be changed and manipulated into i.i.d. form, or the algorithms won't work. How does an independent set of random variables give us a model of the actual dataset which is a very limited representation of the real world? It doesn't. It's modeling something else.
Okay, why don't we take dependence into account? Surely that would represent the real world better. Good question! (Shirley has nothing to do with it.)
It's because there is no formal definition of dependence in statistics. Let that sink in for a minute.
So the math needs work, statistics needs a revolution, and then we can begin to change AI enough for it to finally start making sense. Focus on explainable algorithms and actual ability to validate that what models generate make sense and will not be unlawfully biased or have outliers that will cause harm.
There appears to be only one company who has something like this. But few actually care.
What? Statistical dependence (of random variables) is defined clearly and precisely.
Data has to be changed and manipulated into i.i.d. form, or the algorithms won't work
Neural networks don't use the iid assumption.
I downvoted you because it seems like you don't really know what you're talking about and you're currently the top post in the thread. Please don't spread misinformation.
They have magical ways of 'explaining' black box models.
But it's not what DARPA is pushing (box remains black), rather the opposite, illuminating what's inside the box, making it a transparent open box. So much so, that the models they make you can edit by hand, since they make sense (to mere humans). Has rather immense implications.
Here's their crappy website: https://optimizingmind.com
Finally! I thought I was alone (and stupid) for thinking like this.
Is there any literature or any meta-work that discusses the notion of probability itself? What is expectation? What is dependence?
There is a formal mathematical definition:
Let (\Omega, \mathcal{F}, P) be a probability space, and let X: \Omega -> S be a random variable taking values in some measurable space (S, \mathcal{S}).
Then the expectation is \int X(\omegs)dP
In computer science terms, do an experiment with every possible random seed and average the outcome (set \Omega to be the set of all seeds, and set P to be the uniform measure on them).
I would be surprised if Khan Academy didn't cover at least expectation.
probability was mastered far before computers were a thing
It might be their crappy website, but I don't feel like Optimizing Mind is likely to create a better solution than that DARPA project. Their single "Static Demo" shows the importance of various factors in a linear regression model ... but that isn't exactly revolutionary. There might be some value in nicely packaging this for decision makers who use linear regression models but don't know how they work, but I doubt that it scales to much larger models.
Also, in real-world statistical modeling, there's nuance. Just like for any assumption of a parametric model, the data not being iid doesn't mean that the model is 100% crap, it means that you can't draw specific conclusions about the quality of the model.
Which is fine, because maybe you don't care to draw those conclusions, anyway. One of the key differences between machine learning and traditional statistical analysis is that you aren't so worried about developing parsimonious models with well-defined parameters. You're typically just empirically interested in the model's predictive or descriptive utility. This difference isn't a result of one school being more principled and the other being more lackadaisical. It's reflective of differing goals: One approach was developed for use in scientific hypothesis testing, where your primary deliverable is (in the case of something like regression, anyway) the model's parameters, and its estimates are a means to evaluate those parameters. The other approach is used for modeling processes, where the primary deliverable is the estimates, and the parameters are a means to get those estimates.
But "Data has to be changed and manipulated into i.i.d. form, or the algorithms won't work. How does an independent set of random variables give us a model of the actual dataset which is a very limited representation of the real world?" strongly implies that the data itself should be decomposed into iid variables. While whitening ("manipulating into iid form") is a common preprocessing technique because it's simple and effective, that doesn't mean that learning algorithms wouldn't work without it. They'd just take a bit longer to arrive at the same result.
1. Bayesian probability is about degrees of belief. But that's always subjective and belief about what, if not probability? It's circular.
2. Frequentist probability is about, after X >> 1 runs of an experiment, an outcome with odds of Y occurs Y/X times. But it's only exact with an infinite number of runs, which never happens. And what's the odds of exactly Y x 1000 outcomes after 1000 runs? Again, that's circular.
My favourite way to think about probability is the multiverse kind:
3. Assuming there are an infinite number of fungible identical worlds, if a coin flip has 50% of heads, it means observers in exactly half the worlds see heads. However, this isn't actually probability at all - from a god's eye view it's objectively certain what happens.
E.T. Jaynes fleshes out his worldview in "Probability Theory: The Logic of Science", which was published posthumously in 2003.
Except for the infinite number of universes nonsense :-)
Isn't this a bit like saying there are two main camps when it comes to coins:
1. "heads"
2. and "tails"
?
At least to me it felt like the different forms of statistics where only different techniques.
And that only happened in 1933, which is around the time that computers became a thing. Not general purpose ones yet - I agree it was before computers were widespread, but definitely not far before they were a thing.