Groups Round_16 Quarters Semis Finals Out_In Probability
Brazil 87.5% 60.8% 42.0% 27.9% 18.5% Quarters 18.8%
France 81.4% 58.4% 36.6% 19.9% 11.3% Won 11.3%
Germany 80.5% 49.5% 30.5% 18.8% 10.7% Groups 19.5%
Portugal 75.2% 52.8% 32.2% 17.3% 9.4% Round_16 22.4%
Belgium 78.5% 51.1% 27.7% 15.8% 8.2% Semis 11.9%
Spain 72.3% 50.1% 28.8% 15.4% 7.8% Round_16 22.2%
England 73.1% 46.6% 24.4% 13.4% 6.5% Semis 11.0%
Argentina 79.7% 44.2% 24.1% 11.8% 5.7% Round_16 35.5%
Colombia 74.9% 37.3% 17.0% 8.5% 3.7% Round_16 37.6%
Uruguay 74.4% 34.6% 17.2% 7.2% 3.2% Quarters 17.4%
Poland 68.5% 30.5% 12.8% 5.8% 2.3% Groups 31.5%
Denmark 47.8% 26.3% 12.4% 5.2% 2.0% Round_16 21.5%
Mexico 52.0% 23.2% 10.5% 4.9% 1.9% Round_16 28.8%
Sweden 45.9% 19.4% 8.3% 3.7% 1.3% Quarters 11.1%
Iran 35.4% 18.1% 7.2% 2.6% 0.8% Groups 64.6%
Peru 37.3% 17.2% 6.8% 2.5% 0.8% Groups 62.7%
Australia 33.5% 15.4% 6.3% 2.3% 0.7% Groups 66.5%
Russia 47.9% 16.3% 6.0% 2.0% 0.7% Quarters 10.3%
Croatia 49.8% 16.9% 6.3% 2.1% 0.6% Finals 4.2%
Switzerland 52.8% 15.9% 6.1% 2.0% 0.6% Round_16 36.9%
Iceland 45.2% 15.1% 5.6% 1.8% 0.5% Groups 54.8%
Costa_Rica 36.8% 13.3% 4.7% 1.6% 0.5% Groups 63.2%
Serbia 32.9% 12.1% 4.5% 1.5% 0.5% Groups 67.1%
Japan 36.5% 12.8% 3.8% 1.3% 0.4% Round_16 23.7%
Saudi_Arabia 43.4% 12.7% 4.2% 1.3% 0.4% Groups 56.6%
Tunisia 35.2% 13.3% 4.1% 1.3% 0.4% Groups 64.8%
Egypt 34.4% 8.7% 2.5% 0.7% 0.2% Groups 65.6%
South_Korea 21.6% 5.9% 7.1% 0.5% 0.2% Groups 78.4%
Morocco 17.1% 6.8% 1.8% 0.5% 0.1% Groups 82.9%
Nigeria 25.2% 6.5% 1.7% 0.4% 0.0% Groups 74.8%
Senegal 20.1% 4.9% 1.2% 0.3% 0.0% Groups 79.9%
Panama 13.2% 3.3% 0.5% 0.1% 0.0% Groups 86.8%
[0]: Exhibit 2 in http://www.goldmansachs.com/our-thinking/pages/world-cup-201...Edit: Fix copy-paste errors and atrocious maths.
Croatia went out in Finals
And I do not understand what the last column means (except for France and teams out in group phase)
First two were just me making a mistake because I write that in manually.
That last column makes no sense. It was supposed to be the probability that the model gave to the outcome that occurred, but I got the maths wrong.
So all in all, the only teams for which the prediction was more than 1/2 were teams out in groups. That is a little underwhelming.
Ah, for Croatia, I believe, it should read 1.5%.
Cheating.
Before people start booing, let's not forget where this tournament is being held, and all the other nefarious things that country has been up to recently.
> [...]
> But Goldman Sach’s misfire is perhaps the most curious.
The model said, that there is a lot of uncertainty, and as it happens, it was entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5 times the team will not win, and that that is the highest chance does not say much about the model.
And in general this is one instance of the well practiced journalistic technique to wait for results first and then define a bar afterwards to criticize the results according to standards that did not exist when the performance happened. (I guess in this case it is even worse, we could construct a reasonable test of the model performed, I have the suspicion that that was in the original paper and that the journalist either did not understand it, or, more likely, choose to ignore it in favor of writing a better story.)
They overranked Germany, and underranked Croatia. Nearly every other person in the world did the same.
Look how disingenuous the Bloomberg article is. "Goldman Sachs updated the model throughout the tournament. It predicted a Brazil-Spain final on June 29 and Brazil-France on July 4. Its most recent prediction had England and Belgium squaring off for the cup. Both were eliminated in the semifinals." But their actual Brazil-France prediction had 8 teams left, and the winners of that round were all in the top 5. https://twitter.com/GoldmanSachs/statuses/101448576794142720... They even had Croatia over England, and France over Belgium.
A modern model would accommodate for the fact that those numbers alone mean nothing, because they don't. Those are the numbers broadcasters reluctantly put on a screen for entertainment value, but they don't have real analytical power because they have no comparative metric.
How up or down were each of those numbers against previous wins and losses for each team?
What was Brazil's conversion from on-target shots before the tournament?
What was Belgium's success/failure rate on on-target shots they were defending against?
Likewise the other way around: were Brazil guilty of particularly poor defending? Were Belgium finding ways of making on-target shots count against all opposition, or was it luck on this game?
Any human analyst could tell you going into that game that Belgium were "lucky" and easily free scoring beyond expectations, able to make more of fewer opportunities. Likewise the consensus from most experts was that Brazil were guilty of mild complacency, the team were young and not yet formed into a strong unit yet (rather still just 11 strong individuals at any one point in time), and their on-target shots - whilst frequent - were of lower probability of being able to turn into goals due to distance, power, position, etc.
So why did the Bloomberg model not pick that up?
I actually think they did pretty well all things considering, but I'd love to see whether they did any runs on previous World cups to try and check their thinking and whether they over-fitted a little to a couple of key metrics. I think the lack of metrics from previous games might mean they relied on some headline numbers, but there's more that they could have done to get a better model here...
Still, it's not their job is it? Just a bit of fun... which is a good job, because I find it just a little bit amusing.
But do you need a sophisticated model and lots of so-called "AI" to arrive at the conclusion that there's a lot of uncertainty?? The point of the model is to reduce uncertainty, not find that it's there and do nothing about it.
And no, you don’t need statistics or machine learning to say “there is a lot of uncertainty”, but you do in order to quantify that uncertainty.
Predicting the result of an A or B contest the bar is already defined. Either the system gets it right or doesn't, if it gets it right more often than not then (despite this being poor grounds mathematically, on a small result pool) popular press will report it as successful.
IMO if matches become easy to predict then rules will change to reduce that predictability.
I disagree: If team A has a 10-30% chance of winning, and A pulls off the upset, the correct answer was not "A Wins" it was "B has a 70-90% chance of winning".
For Goldman Sachs' investments, the bar is not to predict that A wins or that B wins, it's to predict the probability and variance regarding which team will win. Of course, from a single upset game, it's impossible to tell whether these estimates are correct. You'd need to see the success or failure of many trials.
Score it yourself against implied probabilities from Betfair for example and marvel at the suckage.
I think this is a smoke signal. Soccer is corrupt; you can't predict the winner unless you know what's being passed around under the table. Goldman Sachs does these predictions so people read between the lines to see how corrupt it is.
My argument is: "Goldman is amazing at statistical analysis and they routinely practice it on much tougher models (the global economy), so they should have no problem predicting a simpler model (soccer). But since they drastically failed at predicting soccer, then there must be an equally drastic variable missing from their predictions. Since we can trust Goldman to use all available public information in their analysis, there must be critical information that is hidden from the public which affects the outcomes". I make some assumptions, but it's fairly sound, no?
In the world of sports betting/analytics, you have baseball and basketball at the forefront, and then American football, soccer, and hockey (roughly in that order).
Off the top of my head, there are several reasons why the latter three sports have all lagged behind:
-Lack of data
It wasn't until the last 4-5 years that widely available, affordable, and accurate data for soccer matches was available. Companies like Opta have accomplished this by outsourcing the watching of games and the manual tagging of events, which was made possible by the advent of cheap cloud computing.
It should be self-evident why tracking the position and actions of 22 players is more complicated than something like baseball, where for the most part you are looking at one pitcher vs. one batter, much of which can be automated with computer vision that tracks pitch position, speed, and spin.
-Complexity
It's no accident that baseball was the first sport to be revolutionized by analytics. Most of the time, it's a static game, with a clearly defined action set. I.e. do I swing at the pitch or not. Do I throw a fastball or not. Do I attempt to steal a base or not.
In games like American football, soccer, and hockey, you have anywhere from 12-22 players on the field at a time. Tracking what the players without the ball or the puck are doing is a difficult task technically, as is quantifying their impact. Concepts like expected goals and expected goals added are recent ones.
-Sample size
Typical elite soccer leagues see each team play each other twice. In England and Spain, this means you have 38 games per season.
Baseball has a 162 game season and playoff games, basketball has an 82 game season and playoff games, etc. Coupled with the fact that quality data has been only collected for a few years, and you get other problems.
In basketball and baseball, the effects of aging on player performance and statistics is fairly well understood now. We can generally calculate the 5-year market value of a player etc. In the other sports I mentioned, we don't yet have that kind of time series data to be able to make those judgements.
--
Specific to the World Cup, there are other reasons why you may find it hard to predict results.
-Team chemistry and style
Even though the World Cup is the most high-profile soccer event in the world, most players are spending 1-3 months a year with their national teams. Their "day jobs" with their clubs teams take up most of their playing time and attention.
As anyone who has played the game Football Manager will know, managing a national team is a tough job. You have no say over how the players are practicing when they're away from you, and no control over the physical condition in which they arrive at the World Cup. This year, there was barely a month between the end of the regular European seasons and the start of the World Cup.
In that month's time, you have to get at least 11 players who have not played with each other, to learn your style of play. Do you want to play a pressing style? Are you attempting a slow buildup, or trying long balls? Etc. etc.
-Home field advantage
In baseball and basketball, most modern statistical models account for home field advantage. Having 60,000 Russian fans chanting and heckling likely played a role in the team's ability to upset Spain, particularly during penalty kicks.
This goes back to the sample issue. How many times before have Spain played Russia IN Russia in front of a large crowd? Probably never.
---
All this is to say, cut Goldman some slack. There are a number of non-nefarious reasons why you may expect a soccer model to produce some spectacular miscues.
Pretty sure you just inadvertently identified why GS is so “great” at predicting economic movements.
The World Cup predictions from Goldman Sachs (and also UBS) are a form of recreation and entertainment with machine learning. It's an expression of quant nerd humor.
Analogous intellectual games would be engineers devising ridiculous Rube Goldberg contraptions[1] or programmers building "enterprise" FizzBuzz[2].
(I think it would add to the fun if GS uploaded their raw data and models to Github for others to play with.)
>It certainly didn't predict the final opposing France and Croatia on Sunday.
True, but it did predict France having better chance winning overall but was handicapped by a tougher draw. It also predicted France beating Croatia in round 16 instead of the final. The pdf says:
>While Germany is more likely to get to the final, France has a marginally higher overall chance of winning the tournament,
[1] https://en.wikipedia.org/wiki/Rube_Goldberg_Machine_Contest#...
[2] https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
It was not wrong to say Hillary had a 95% chance of winning the presidential election, but the confidence was low and that value still allowed for the opposite result to happen.
Also football has a lot of variance concerning team capability and end results. The better team might (and does) lose often, especially when going to penalty shoots.
With basketball, the stronger team will be easily scoring more in most cases.
Also, I've seen some people say (not in this forum) that banks now look stupid because they're in the business of making predictions and they can't even get the world cup right. Guess what? Banks make no money on predictions. They make money on flows and taking spreads on trades they do with clients. Any research or prediction is meant to be a catalyst for that trade.
You're mostly right but to further clarify, an investment bank like Goldman Sachs has revenue from mostly "market making" spreads but it does also have activities that depend on predictions such as their proprietary trading (before the Volcker Rule shut them down) and their GSAM (Goldman Sachs Asset Management) fund. The GSAM is basically a hedge fund for their wealthy clients' money. They will run predictions on macro trends on data like interest rates, commodities, indexes, etc to help them pick stocks for their portfolio.
As the pdf noted, the World Cup data models and simulations came from Adam Atkins of GSAM.
The rule was too complex and onerous to be implemtable. Case in point, it's already being rolled back... Certainly because of the current administration we're in. But more because it was just a poorly written and thought out idea to start with.
This is an enterprise for bookies, not Goldman Sachs.
> The ludic fallacy, identified by Nassim Nicholas Taleb in his 2007 book The Black Swan, is "the misuse of games to model real-life situations."
...
> The alleged fallacy is a central argument in the book and a rebuttal of the predictive mathematical models used to predict the future – as well as an attack on the idea of applying naïve and simplified statistical models in complex domains. According to Taleb, statistics is applicable only in some domains, for instance casinos in which the odds are visible and defined.
Both Taleb's books, "The Black Swan" and "Fooled by Randomness" are an interesting take for such models. Meanwhile, most economists know about "Knightian Uncertainty" [1] which talks about differentiation of risk and uncertainty.
> "Uncertainty must be taken in a sense radically distinct from the familiar notion of Risk, from which it has never been properly separated.... The essential fact is that 'risk' means in some cases a quantity susceptible of measurement, while at other times it is something distinctly not of this character; and there are far-reaching and crucial differences in the bearings of the phenomena depending on which of the two is really present and operating.... It will appear that a measurable uncertainty, or 'risk' proper, as we shall use the term, is so far different from an unmeasurable one that it is not in effect an uncertainty at all."
There's no downside, only free publicity. If they, by good fortune and a following wind, get it right - then the publicity is incredible. If it's wrong they laugh and say "well, better stick to predicting what we're good at!" and they still get a shitload of headlines and awareness of their product.
This was not a mistake.
I compared the logloss for their predictions with the "uniform" benchmark (giving each team 1/32 probability of winning, 1/16 probability of getting to the finals, etc) and the results are the following (if I transcribed the data properly):
Getting to second round:
GS: 0.495 UBS: 0.495 bench: 0.693
Getting to quarter-finals:
GS: 0.463 UBS: 0.459 bench: 0.562
Getting to semi-finals:
GS: 0.310 UBS: 0.327 bench: 0.377
Getting to final:
GS: 0.231 UBS: 0.269 bench: 0.234
World-cap winner:
GS: 0.097 UBS: 0.113 bench: 0.139
The performance of the models was ok until Croatia got to the finals. This hurt specially UBS, who predicted less than 0.9% probability of such an event (compared to 2.1% in Goldman's model).
Edit: these would have been the "best case" scores (if the high-probabilty teams had classified to each round, ignoring that this may be impossible due to the structure of the tournament):
GS: 0.432 0.302 0.220 0.141 0.079
UBS: 0.365 0.251 0.176 0.111 0.070
UBS could potentially achive lower logloss metrics because it had more extreme predictions.
Author misses the way models work entirely, the larger the entity, the more statistics and averages kick in, and as a result, better model can be built.
- motivation (Germany and Croatia were the two extremes here, no idea how to measure it)
- team cohesion (number of articles in a few journals questioning the team cohesion, maybe also articles about individual players)
- creativity in offense (maybe measurable via "target missed from close distance" + "ball passed front of the goal")
- number of errors in defense that didn't lead to a goal
- percentage of times ball possession was lost from own goal to enemy's area (England was really bad here against Croatia)
If anything, it worked worse.
"If anything"? All the results are available, so it would be easy to put a precise number on this. Measure the Bayesian regret, or just report the winnings if you had used the GS model to bet on the outcomes. Unless it reports some concrete numbers, this article is garbage.
It doesn't report any concrete numbers.
The reason is easy to see. The game can be decided by one, two or three key plays. Compare that to basket ball. To win a game you have to consistently score more and defend better. Rarely the game is decided by one or two plays. That only happens when the game is already very tight.
The odds shortened as the tournament progressed, I was able to hedge as the shortened odds made lay betting profitable.
(High variance in football outcomes means there's no guarantee of profit, I don't bet big sums.)
If someone were to bet during the round of 16, if someone were to bet $1 on the bottom 8 and $2 on the top 8, the strategy would most likely yield a small profit or a small loss, rather than a total loss.
https://news.ycombinator.com/item?id=17509407
Did this investment bank use same set of algorithms that they use for financial predictions?
...And then I remember there was this Octopus[1] who used to predict winners with 85% accuracy
It's as silly as saying my claim for the odds of nearly perfectly modelling a coin toss (approximately 50/50%) is wrong because a series of 10 coin tosses show different results from my model. The model is not any less correct.
If I toss a fair coin you cannot predict the next outcome. You can only say that if I toss the coin a 1000 times, then close to 500 are going to turn up heads, and another 500 are going to turn up tails.
It was stupid of Goldman Sachs or whoever to predict an outcome. It was stupid of anyone else to lend credence to that prediction.
Hopefully, Goldman Sachs is not relying on prediction of singular outcomes to make their investment decisions. I don't think they are. Probably just marketing brouhaha to ride the soccer wave. Although I'm not sure if that worked as expected.
If you read the actual report they did[0], they never claimed that any single outcome was more than 18.5% likely.
[0]: http://www.goldmansachs.com/our-thinking/pages/world-cup-201...
>"You can only say that if I toss the coin a 1000 times, then close to 500 are going to turn up heads, and another 500 are going to turn up tails."
Sometimes you can do that and every single flip will be heads. It's unlikely, and across zillions of universes you'd only find it once - but we don't have a pool of universes that we can sample statistically.
It's also Goldman Sachs and UBS choosing to attach their names to these and stake some reputation on these predictions. If they had hit the bullseye, they would be lauding these results.
For example, imagine a tournament with a large number of participants, where the winner is picked simply by fairly choosing a single random participant.
If I then gave you all the perfect historical data going back decades, you could do statistical analysis and determine that the winner is completely random and therefore the probability of success, for any particular participant, is p~=(1/n), where n is the number of participants. Your confidence in correctly predicting any particular outcome will drop as n rises.
Not everything can be easily predicted just because you have enough data.
So she had 95% chance of winning with 50% probability or what?
Say you make the assumption that the quantity being estimated is truly fixed: that there's some true value for the force of gravity or some true value for the number of people that vote for X or Y.
The second assumption that comes along is that the stochasticity observed comes from your perspective of observation, and not from the ground truth. To be more blunt, you know that of all the observations you make 95% of them have the probability of yielding the result observed... but the ground truth is still fixed. Gravity has a fixed quantity, despite your experimental error, and you may have been lucky enough to observe it in your sample.
Predicting elections with frequentist methods has this same characteristic, except the observed quantity itself shapeshifts and even lies... so then there are other complications that need to be dealt with.
This is where that 50% feeling comes from. There are two outcomes, one will be true. You're data analysis just tells you that if you repeat your procedure, you'd expect 95% of those result to give you the outcome you observed.
Consider: If someone offered to give you $2 every time a fair coin toss came up heads, or take $0.50 every time it came up tails, you'd be foolish not to take that bet a million times as you can because you know that the coin has exactly a 50% chance of coming up heads.
However, if it was an unfair coin, you'd want to know the degree to which it was unfair, and you'd have to measure it. How much do you trust those measurements? You might say that you're 90% sure that the coin has a 40-60% chance of coming up heads, or give a probability of 2% that a $1.04 to $0.96 wager would be profitable while a $1.03 to $0.97 wager would be unprofitable.
Hillary had a 95% chance to win the election. But on top of the fact that 1 in 20 times she'd lose that election if that really was the probability, the 95% number was uncertain because the measurements were difficult to pin down - maybe she'd have lost 1 in 40 times, or maybe she'd have lost 1 in 5 times. All we know now is that she lost, and that many of the assumptions and measurements the pollsters had to make concerning factors like voter turnout, nationalism, corruption, foreign interference, debate results, and fundraising turned out to be inaccurate.
With unfair coin measurements, you can get very accurate numbers with just a handful tests. When predicting election results or World Cup games, you're much less likely to make an accurate estimate. The confidence is an estimate of how likely that estimate is to be accurate.
What's wrong is thinking 95% chance of winning means they will win
For example, if you are estimating the height of a male in the US, you would collect data on US males and get the average. But unless you surveyed every male in the US, there is some error associated with your estimate. So you would either construct error bounds (a frequentist approach) or a probability distribution (a Bayesian approach) around the mean height. So your results may dictate that the mean height of the American male is 5’11, plus or minus 2 inches. Those two inches represent uncertainty around your data collection. That’s the exact same thing that is done here, but with a percentage instead of a height. Outlets may predict Hillary winning at 95%, but the reality is their methodology should provide a plus-minus value around that. The problem is that few of them actually report that.
But it gets more confusing. That error bound is only around the mean. Pick a random guy out and not only will he likely not be 5’11, there is a decent chance he will be outside of that range of 5’9 - 6’1. You will get 5’7 guys and 6’4 guys pretty commonly. In the case of the election, it may actually be true that Hillary had a chance between say, 93% and 97% of winning. But even if that is the case, she will still lose between 3-7% of the time. But since we only have one reality to observe, we can’t know if she lost simply because we saw that 3-7% realized, or because they people coming up with that number screwed up. That’s why groups like 538 deserve more leeway. When they say that Donald Truml has a 30% chance of winning, and he does. That’s not that crazy. And therefore there is much less reason to assume they screwed something up than the people who predicted a 5% chance of Trump winning. It’s possible those models were right, but much less so.
The reasons events like the World Cup are far more interesting is because it's over a shorter period of time.
I think the problem here isn't the event but rather the sport. Something like snooker or tennis will offer the same brevity over the period but with chance playing a less significant role due to the number of games played per match.
That all said, if my years of watching snooker has taught me anything, it's that people are not machines and thus will perform vastly different from day to day depending on how what mood they're in.
Analytically the difference between premier league and the world cup is that you have momentum and continuity in the premier league and the world cup is essentially one shot. So in the PL team A will play team E and G and H before it plays team B, team B may play team E and H and Q (which played G). Team A may be winning games that your strength model shows they should lose, Team B may be losing games... and so on and so on. There is more evidence that might matter. More importantly you can be wrong quite a lot of the time in a season and still be right at the end of it (as the bounces of the ball even out over time). Not so much in the world cup - one goal knocks you out and there is no coming back! Basically the world cup demands an algorithm that works with less evidence and with a much higher degree of accuracy.
Can you give us some examples?
In the NBA, NHL, or MLB, seven game series tend to even out the variance, so the best team usually wins. And even in NCAA basketball, there's enough scoring that any individual play loses significance.
In [0] you have the following:
> The ludic fallacy, identified by Nassim Nicholas Taleb in his 2007 book The Black Swan, is "the misuse of games to model real-life situations."
And he gives an example of this:
> One example given in the book is the following thought experiment. Two people are involved:
> Dr. John who is regarded as a man of science and logical thinking
> Fat Tony who is regarded as a man who lives by his wits
> A third party asks them to "assume that a coin is fair, i.e., has an equal probability of coming up heads or tails when flipped. I flip it ninety-nine times and get heads each time. What are the odds of my getting tails on my next throw?"
> Dr. John says that the odds are not affected by the previous outcomes so the odds must still be 50:50.
> Fat Tony says that the odds of the coin coming up heads 99 times in a row are so low that the initial assumption that the coin had a 50:50 chance of coming up heads is most likely incorrect. "The coin gotta be loaded. It can't be a fair game."
> The ludic fallacy here is to assume that in real life the rules from the purely hypothetical model (where Dr. John is correct) apply. Would a reasonable person bet on black on a roulette table that has come up red 99 times in a row (especially as the reward for a correct guess is so low when compared with the probable odds that the game is fixed)?
So Nassim Taleb wanted to discuss "using games to model real-life situations" and to demonstrate the pitfalls he uses two characters. He _portrays_ the characters as "man of logical thinking" vs "man who lives by his wits", but as we'll see he's missing one dimension to his characterization.
The first problem here is that implicitely he's suggesting to the reader that the decisions of the "man of logical thinking" represent the pitfalls of "applying games to model real-life situations", whereas the the other guy's decision represent.... it's not specified, but clearly has a better outcome.
The second problem, is that he conflates "applying something you read on some textbook to real life without thinking" with "modelling real-life". He suggests to the reader that those two people are actually "logical" vs "instinct", but they're not. They're a dumb guy who knows maths vs a smart guy who doesn't know math. _Obviously_ real-life is more complex than your textbook examples, and so the smart guy is going to win because his fuzzy heuristics beat the first guys decisions which are optimal within his flawed model. An actual smart and logical person would update his model based on new evidence (i.e. "I was told that this coin was 50-50 but actually the chance of what I just saw is so small that it's more likely that I was just lied to") and then use maths to make predictions and beat the guy who's smart but doesn't know math.
So ironically, he wants to portray the dangers of using over-simplified models and to do that he uses an example where he obscured one dimension.
Nassim Taleb is really good a rhetoric but light on substance.
Basically a book by Nassim Taleb is an incoherent summary of the books that Nassim Taleb has read within the past year, with a few morsels of recycled insight here and there.
I’m not sure why there are so many people who take him seriously.
This one would benefit possession-based teams, so it would fail to give decent odds to the current world and european champions (France and Portugal respectively) which don't play possession. Of course it's possible they're outliers but we'll never know.
As you identified, motivation could be pretty hard to measure ... but even if we could it might be a pretty poor predictor anyway. France in the early stages didn't look very motivated, while England and Colombia looked pretty lively.
Team cohesion - the German team were pretty consistent (not dazzling, but consistent) and we know how that ended. Again France didn't really impress until the latter stages of the WC.
Creativity in offense - I guess it can indicate a sort of calm or confidence in front of goal but actually it can actually be seen as pretty negative. For example Arsenal a few years back came under fire for having plenty of possession in the 18 yard box but failing to convert. Spain's confident quick pass-and-move "tiki-taka" was ever-present and has in my eyes been impotent in the last few years (and more important as a neutral viewer - very frustrating to watch).
Defensive errors that didn't lead to a goal could be a nice indicator of the ability of a defence to pick up after each others mistakes - but at the same time these errors that lead to goals (i.e. Croatia's second goal in the final) are relatively rare and a lack of a goal could just point to the opposing team's inability to convert due to a poorly organised or a lack of opportunism from their strikers.
I'm not sure what you mean with the last one, but I think this could be a nice one - if you mean "times you lost possession in your own half". A profligate midfield and defence is bound to ship goals, I doubt there are many teams that can either fight back after trailing by a goal or two or score enough to maintain a reasonable buffer.
I applaud the effort though - it takes more creativity and care to think of some new angles (like you did) than to think of some possible counter examples (like I did)!
> I'm not sure what you mean with the last one, but I think this could be a nice one - if you mean "times you lost possession in your own half"
Almost, England lost the ball frequently (> 50+x% with a large x AFAI could see) due to the keeper sending out long balls. I'd like to measure that somehow. Could be done via number of seconds in possession after a goal kick, an indicator whether a hypothetical 85% marker of the field was reached or measuring whether the ball was at least 5x successfully passed (or resulted in a goal).
I think people have a strange view of finance. Most people aren't paid obscene amounts of money in finance, just like most software developers don't make the salary of a senior developer at Big Tech. They also work an obscene amount of hours. During earnings season, I would be at my desk by 5am and work 80+ hours per week. Nowadays, It's a rarity to go more than 50. My brother currently works at Big Bank, and makes more than I do on an absolute basis, but I definitely make more than he does hourly. I get to work at 9:30-10, he gets to work at 7:30-8. I get home at 6:30-7:00, he gets home 8:30-9. He works at least a half day every Sunday, I enjoy my hobbies. I'm also commenting on HN at 11:00...
Most of my college friends still work in finance. I make more money than a few of them based on overly honest drunken conversations, and we're all more than 10 years into our careers. There is a glass ceiling in tech that is a lot more all-encompassing, but it's not like it doesn't exist in other industries. There are only so many higher-up positions, and most people burn out (or aren't capable of competing) before they even get in position for that promotion. The running joke when someone was getting poor performance reviews was "That's it, I'm moving to Vermont to open an antique store".
For some more comparison, I grew up in a 1%er town in the suburbs of NY. The average lawyer family lived in nicer houses than the average finance family, who in turn lived in nicer houses than the average medicine family. However, the most expensive house was owned by the CFO of Big Bank. Income is very right-skewed in finance.
Long Answer: Full retirement is hard, Healthcare is a pain in the US. You cant really "save" for it in the US, it can swallow all your savings, so you'll always need some job or another to cover healthcare and catastrophic needs. That said, you can very easily down-shift once you have a house, savings, etc.
Longer Answer: Could have, if I wanted to -- but you always give something up in exchange. These jobs will take everything you give them (time, health, life) and give back a decent percentage (income.) But you cannot easily dial up or down the work, it comes in chunks and you have to complete it. My life was increasingly unhinged at 27 and I decided to jump off the treadmill after seeing a colleague continue to work through his mother's terminal illness and death. Inertia and greed are a toxic combination. Numerous colleagues were on drugs, uppers, anti-depressants, etc. One died from stress (heart attack in his 30s.)
I chose to get married, have two kids. I switched to a pure tech job (now an ML product owner at a Series A pure tech firm.) We have dinner together almost every single day. Weekends are completely ours. We go to the park most warm days. We take 3 to 4 vacations a year, many with my mom as well. There is a decent amount of work but I can choose when to do it (unlike Wall St.) and the work is longer term and I can dial it up/down as family requires. I sit outside and read during lunch. I turn off the markets when i step out of work.
Many of my colleagues were easy millionaires by ~30 and multi-mullionaires if they stuck till their mid 30s and were focused. Many others blew through their bonuses (or snorted it away) and ended up with nothing and just live bonus to bonus. It also depends on the job (business/deal side vs quant side vs tech -- the money is a waterfall across the 3 sections.) As with all industries, you get ripped off if you dont fight for your share of the pie. Plenty of people avoid conflict and life comfortable lives and nothing more. I also saw several C++ programmer/manager earn double digit millions of dollars over several years, one earned over 100MM USD over the course of his time at the hedge fund (public records, check out AIG-FP https://en.wikipedia.org/wiki/AIG_bonus_payments_controversy)
I think I did well and hopefully dont have to worry about poverty anymore. You either get lucky (early FB employee, hot product at xyz.com) or you have to give up something. I havent seen someone truthfully say they got both money and family and happiness all together.
In the election model it's not clear to me what's to gain by saying that there are two probabilities (or more) instead of one. There is one single event.
>Hillary had a 95% chance to win the election. But on top of the fact that 1 in 20 times she'd lose that election if that really was the probability,
Which is the only thing that matters if we say that the probability was 95%.
> the 95% number was uncertain because the measurements were difficult to pin down - maybe she'd have lost 1 in 40 times, or maybe she'd have lost 1 in 5 times.
You have lost me here. Did she have a 95% chance to win or not?
If this 95% is uncertain, because it could have been 97.5% or 80%, then the probability would be the weighted average of those numbers and not 95%. And if it was so uncertain that nothing was known at all it would be 50%.
Consider the following cases:
a) You are going to flip a coin that I know is completely fair. I would say that the probability of heads if 50%.
b) You have flipped a coin that I know is completely fair. Nobody knows what has been the result. I would say that the probability of heads is 50%.
c) You have flipped a coin that I know is completely fair. You know the result but I don't. I would say that the probability of heads is 50%.
In some cases you would say that I'm 100% right on my assesment of the probability being 50% while in others the actual probability is either 100% (with probability 50%) or 0% (with probability 50%). This seems irrelevant as far as my statement about the probability being 50% is concerned.
Interestingly something like this is a tactic used in Rugby (https://www.youtube.com/watch?v=cbti6mLvSJs). I used to play a lot of football when I was younger and at our level (waaaay down the scottish league pyramid) against tired, hungover or generally weak opposition, keeping them under pressure by dominating the territorial game but sacrificing possession was criminally underrated. Usually if you could keep hammering them for 60 minutes and had the legs to step up a gear in the last 30 or so you could grab a valuable goal or two :-)
If you look at both Belgium/England games, you see number 2 against number 12. The ranking was respected there.
https://www.fifa.com/fifa-world-ranking/ranking-table/men/in...
Used to be a silly ranking system, but it's elo based these days, so it's not too shabby.
You're not going to find a statistical approach that will account for the subtleties that led to this outcome. The problem with soccer stats in general is that everything hinges on low-frequency events based on subtle differences of timing and space.
Basketball by comparison is much more stat-rich, and there are a lot of cool advanced analytics, but even still they are full of gaps that are obvious to any expert watching the game. Afterwards maybe you can find the statistical signature of something you saw, but then you risk overfitting again, just the same as soccer.
I think this deserves to be elaborated a bit: a game in which 1 is a good score, and often a game-winning score, is never going to be accurately predicted based on a statistical approach, because scoring is too rare for a statistical approach to work well. Low scores mean that individual games have an extremely large element of chance.
Imagine one team is about 4% better than another team; they should be favored about 51-49 to score a point. If a game scored 300 points, that difference would be perceptible within one game. But to resolve the same difference accurately in games that score 3 points each takes many, many, many games.
Soccer is a pretty data-poor environment, or at least was historically. Before movement trackers, there was very little data to play with. With movement tracking data slowly building up, I suspect that soccer analytics will soon have their "Moneyball" moment the way baseball did.
The reason baseball got there sooner is that, even without advanced player movement tracking, baseball is a data rich environment. There are ~2500 MLB games played per year in the the 30-team era, and we have at least box scores going back to the late 19th century for most professional games, and pitch-by-pitch data going back to the eighties. In addition, a lot of the most important data is cleaner in nature (pitcher-batter match-ups) and also abundant (compare ~200 pitches in a baseball game to ~15 shots on goal in a soccer game, to take a guess at the order of magnitude).
Computing power can help squeeze more information from the soccer data we collect going forward, but there is a century or more of player tracking data that we can just never ever have, since it wasn't being collected. We know Babe Ruth's batting line but we will never have the soccer equivalent of UZR for Pele. I don't know if there is a retrosheet-equivalent effort for soccer to collect stats from old film, but that would be one way to partially bridge the gap.
The 2018 World Cup is not a repeatable event, Elon Musk buying $10M of Tesla shares is not a repeatable event, and Donald Trump winning the 2016 presidential election is not a repeatable event. Therefore, to meaningfully discuss any of these in the context of probabilities and confidence intervals, we must assume that we generalize them to any soccer game, a stock purchase, or an election, and can do this meaningfully by adjusting our priors. It does make the mathematics a lot less pure.
They did indeed win the league with a budget far below many of the normal contenders, but it was a mixture of good management, luck, a few players having the breakout seasons of their careers which took them to the point where only big teams can now afford them, and a few other players having great runs of form that saw them playing better than they would before or after.
Despite the elements of luck, it was an incredible achievement. But the following season they were back to being a team with no realistic chance of competing for the title, and were actually in a relegation fight to stay in the top division.
So would you argue that creating a statistical model of soccer is harder than creating one for global economies? I think it's harder to model economies.
I'm not even trying to give Goldman a hard time! I'm saying that Goldman probably put together a very accurate model of "soccer", but we aren't watching an accurate model of soccer; we're watching the corrupted one where the players and skills don't matter.
If you're talking about GDP growth forecasting, or forecasting unemployment numbers, these are ultimately questions of aggregation. Yes, there are 7.5 billion people, but at the end of the day each individual agent's actions don't make a tremendous difference for an aggregate measure like GDP. During periods of low volatility, as we are currently experiencing, it's really not all that impressive to forecast the unemployment rate +/- 0.25%, or GDP growth within 0.5%.
If you're taking about their market-making and trading businesses, they've had some horrendous quarters recently as well (http://www.businessinsider.com/goldman-sachs-just-had-a-hist...). A very small portion of Goldman's business is taking an opinionated stance, most of their income comes through relatively low-risk market making activities.
And let's not forget that during the 2008 financial crisis, certain departments within the company correctly wagered against credit default swaps, while others had exposure to subprime mortgages. The company still needed an injection of capital from Warren Buffett and the US Treasury to weather the crisis. Point being, they aren't clairvoyant oracles.
---
Regarding your last point, which was also made in your original comment, you seem to be claiming some form of what economists call "omitted variable bias", and seem to be hypothesizing that the "omitted variable" is corruption or cheating.
From the purely technical standpoint of building models, the tiny samples (https://www.theringer.com/soccer/2018/7/11/17557720/world-cu...) and the nature of the "data" being collected means that there are plenty of other explanations, like incorrectly estimated parameters or measurement error.
If you're trying to suggest that there is corruption or cheating in soccer, please point to a concrete example of a team in a critical game receiving a disproportionate number of calls. Unsure if you're aware, but this was the first World Cup with instant video replays for the referees to use. Had this replay been in use more widely in international soccer, the US might've qualified for this World Cup (https://deadspin.com/u-s-a-out-of-world-cup-on-phantom-goal-...), England might've won/tied that pivotal 2010 World Cup game (https://en.wikipedia.org/wiki/Ghost_goal#England_v_Germany_a...), etc.
Soccer may have had a sordid past with the picking of host countries, but the trends in the actual game itself point to technology reducing the ability of referees to make blatantly terrible calls.
> Point being, they aren't clairvoyant oracles.
Yeah, my argument was weak in that regard. They aren't anywhere close to perfect or accurate, I'll admit.
> you seem to be claiming some form of what economists call "omitted variable bias"
Yes! Is that what it's called?
> please point to a concrete example of a team in a critical game receiving a disproportionate number of calls
Corruption doesn't have to be that explicit. Maybe key players or coaches are paid to perform poorly? It doesn't always come down to the ref. But I admit I have no examples.
-Predictions from the general public
-Predictions from football experts
-Predictions from other mathematical models
For example: If over time, the new model is 5% better than the best of the old models, then it's very good.
Doesn't make much sense to compare it with reality and jump to the conclussion that the model doesn't work because no prediction can be 100% accurate.
A model should be judged both on how accurately it characterizes its uncertainty and how much evidence it's able to successfully make use of.
Basically, confidence refers to a hypothetical scenario in which a the data gathering process were to be repeated and the same analysis done, X% of the confidence intervals (essentially, the +/- bounds around your estimate) will contain the true value for what you are trying to estimate.
So in this hypothetical scenario, we say we have the power to go back in time and recollect the polling data in 2016 and run the same analysis used to arrive at that 95% number. And let’s say we use this power over and over again, a very large number of times. Then 95% of the error bounds we construct should contain the true value of the probability Hillary wins, whatever that is.
The thing is that those error bounds can be huge. You can have 95% confidence that the probability that Hillary wins is between 3% and 98%, for example. You can also have 10% confidence that the probability of a Hillary win is between 94% and 96%. Without the confidence intervals, a “confidence level” doesn’t say much. It’s also predicated on the assumption you haven’t screwed up your data collection process or analysis methodology. And if you are predicting something will occur with a probability of 95%, and it doesn’t, that doesn’t automatically mean you are wrong, but the likelihood of you having screwed something up is definitely higher.
The message I replied to said that > It was not wrong to say Hillary had a 95% chance of winning the presidential election,
Frequentist inference cannot be interpreted as a probability unless one goes through some (often misunderstood, as you pointed out) contortions. In your scenario where you have 95% confidence of something it would be wrong to say that Clinton had a 95% chance of winning.
You have a lot of data about donkeys vs elephants. But this contest is between a mule and a mammoth. If you assume a mule is equivalent to a donkey and a mammoth is equivalent to an elephant, the mule has 95% odds in its favor. But you recognize the assumptions so your prediction doesn't have a high confidence.