Goldman Sachs model to predict World Cup game results didn’t come close

Goldman Sachs model to predict World Cup game results didn’t come close(bloomberg.com)

245 points by rodionos 7 years ago | 130 comments

yk 7 years ago |

> And in any case, the model only generated probabilities of winning a game and advancing, and no team was given more than an 18.5 percent chance of winning the World Cup.

> [...]

> But Goldman Sach’s misfire is perhaps the most curious.

The model said, that there is a lot of uncertainty, and as it happens, it was entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5 times the team will not win, and that that is the highest chance does not say much about the model.

And in general this is one instance of the well practiced journalistic technique to wait for results first and then define a bar afterwards to criticize the results according to standards that did not exist when the performance happened. (I guess in this case it is even worse, we could construct a reasonable test of the model performed, I have the suspicion that that was in the original paper and that the journalist either did not understand it, or, more likely, choose to ignore it in favor of writing a better story.)

basch 7 years ago | |

Their model also had France at 2nd most likely, Belgium at 5th, and England at 7th. 3 of their top 7 made the Semi-Finals, and they called the eventual winner as Second Most Likely, and more likely than Germany. They actually predicted the Brazil/Belgium game in the Quarter Finals, but got the winner wrong. Brazil had 27 shots and 9 on target with 59% posession. Belgium only had three shots on target, and made two of them to win.

They overranked Germany, and underranked Croatia. Nearly every other person in the world did the same.

Look how disingenuous the Bloomberg article is. "Goldman Sachs updated the model throughout the tournament. It predicted a Brazil-Spain final on June 29 and Brazil-France on July 4. Its most recent prediction had England and Belgium squaring off for the cup. Both were eliminated in the semifinals." But their actual Brazil-France prediction had 8 teams left, and the winners of that round were all in the top 5. https://twitter.com/GoldmanSachs/statuses/101448576794142720... They even had Croatia over England, and France over Belgium.

PaulRobinson 7 years ago | | |

> Brazil had 27 shots and 9 on target with 59% posession. Belgium only had three shots on target, and made two of them to win.

A modern model would accommodate for the fact that those numbers alone mean nothing, because they don't. Those are the numbers broadcasters reluctantly put on a screen for entertainment value, but they don't have real analytical power because they have no comparative metric.

How up or down were each of those numbers against previous wins and losses for each team?

What was Brazil's conversion from on-target shots before the tournament?

What was Belgium's success/failure rate on on-target shots they were defending against?

Likewise the other way around: were Brazil guilty of particularly poor defending? Were Belgium finding ways of making on-target shots count against all opposition, or was it luck on this game?

Any human analyst could tell you going into that game that Belgium were "lucky" and easily free scoring beyond expectations, able to make more of fewer opportunities. Likewise the consensus from most experts was that Brazil were guilty of mild complacency, the team were young and not yet formed into a strong unit yet (rather still just 11 strong individuals at any one point in time), and their on-target shots - whilst frequent - were of lower probability of being able to turn into goals due to distance, power, position, etc.

So why did the Bloomberg model not pick that up?

I actually think they did pretty well all things considering, but I'd love to see whether they did any runs on previous World cups to try and check their thinking and whether they over-fitted a little to a couple of key metrics. I think the lack of metrics from previous games might mean they relied on some headline numbers, but there's more that they could have done to get a better model here...

Still, it's not their job is it? Just a bit of fun... which is a good job, because I find it just a little bit amusing.

bambax 7 years ago | |

> The model said, that there is a lot of uncertainty, and as it happens, it was entirely correct. A World Cup chance of 18.5 percent means, that 4 out of 5 times the team will not win, and that that is the highest chance does not say much about the model.

But do you need a sophisticated model and lots of so-called "AI" to arrive at the conclusion that there's a lot of uncertainty?? The point of the model is to reduce uncertainty, not find that it's there and do nothing about it.

thousandautumns 7 years ago | | |

The point of the model is absolutely not to reduce uncertainty, it is to quantify it, which are two very different things. No model reduces uncertainty in a probabilistic sense.

And no, you don’t need statistics or machine learning to say “there is a lot of uncertainty”, but you do in order to quantify that uncertainty.

pbhjpbhj 7 years ago | |

Uncertainty is a truism; that's why people want to use a prediction algo. Did the system so better on results it was more certain about?

Predicting the result of an A or B contest the bar is already defined. Either the system gets it right or doesn't, if it gets it right more often than not then (despite this being poor grounds mathematically, on a small result pool) popular press will report it as successful.

IMO if matches become easy to predict then rules will change to reduce that predictability.

LeifCarrotson 7 years ago | | |

> Predicting the result of an A or B contest the bar is already defined.

I disagree: If team A has a 10-30% chance of winning, and A pulls off the upset, the correct answer was not "A Wins" it was "B has a 70-90% chance of winning".

For Goldman Sachs' investments, the bar is not to predict that A wins or that B wins, it's to predict the probability and variance regarding which team will win. Of course, from a single upset game, it's impossible to tell whether these estimates are correct. You'd need to see the success or failure of many trials.

usgroup 7 years ago | |

Model totally sucked against betting odds and if you used the model probabilities to price bets you would have lost a lot of money vs even an average bookmaker.

Score it yourself against implied probabilities from Betfair for example and marvel at the suckage.

jbob2000 7 years ago | |

But Goldman Sachs are the kings of predicting uncertainty! This is their whole business! They make billions predicting certainty through the murky, uncertain waters of the global economy. Would you argue that the global economy is more uncertain that soccer? I'd say so. How is it that they can find success in the market but not in soccer?

I think this is a smoke signal. Soccer is corrupt; you can't predict the winner unless you know what's being passed around under the table. Goldman Sachs does these predictions so people read between the lines to see how corrupt it is.

My argument is: "Goldman is amazing at statistical analysis and they routinely practice it on much tougher models (the global economy), so they should have no problem predicting a simpler model (soccer). But since they drastically failed at predicting soccer, then there must be an equally drastic variable missing from their predictions. Since we can trust Goldman to use all available public information in their analysis, there must be critical information that is hidden from the public which affects the outcomes". I make some assumptions, but it's fairly sound, no?

appleiigs 7 years ago | | |

Goldman's business model is not to predict the future. Goldman has 2 business models: 1) transfer risk, 2) provide advice. For #1, it's a middleman. For #2, it's paid for brain power, experience and speed.

cepth 7 years ago | | |

Unclear if your comment is tongue in cheek, but assuming that you're serious, I'd encourage you to give a listen to a podcast episode like this: https://soundcloud.com/bettheprocess/episode-35-ted-knutson.

In the world of sports betting/analytics, you have baseball and basketball at the forefront, and then American football, soccer, and hockey (roughly in that order).

Off the top of my head, there are several reasons why the latter three sports have all lagged behind:

-Lack of data

It wasn't until the last 4-5 years that widely available, affordable, and accurate data for soccer matches was available. Companies like Opta have accomplished this by outsourcing the watching of games and the manual tagging of events, which was made possible by the advent of cheap cloud computing.

It should be self-evident why tracking the position and actions of 22 players is more complicated than something like baseball, where for the most part you are looking at one pitcher vs. one batter, much of which can be automated with computer vision that tracks pitch position, speed, and spin.

-Complexity

It's no accident that baseball was the first sport to be revolutionized by analytics. Most of the time, it's a static game, with a clearly defined action set. I.e. do I swing at the pitch or not. Do I throw a fastball or not. Do I attempt to steal a base or not.

In games like American football, soccer, and hockey, you have anywhere from 12-22 players on the field at a time. Tracking what the players without the ball or the puck are doing is a difficult task technically, as is quantifying their impact. Concepts like expected goals and expected goals added are recent ones.

-Sample size

Typical elite soccer leagues see each team play each other twice. In England and Spain, this means you have 38 games per season.

Baseball has a 162 game season and playoff games, basketball has an 82 game season and playoff games, etc. Coupled with the fact that quality data has been only collected for a few years, and you get other problems.

In basketball and baseball, the effects of aging on player performance and statistics is fairly well understood now. We can generally calculate the 5-year market value of a player etc. In the other sports I mentioned, we don't yet have that kind of time series data to be able to make those judgements.

Specific to the World Cup, there are other reasons why you may find it hard to predict results.

-Team chemistry and style

Even though the World Cup is the most high-profile soccer event in the world, most players are spending 1-3 months a year with their national teams. Their "day jobs" with their clubs teams take up most of their playing time and attention.

As anyone who has played the game Football Manager will know, managing a national team is a tough job. You have no say over how the players are practicing when they're away from you, and no control over the physical condition in which they arrive at the World Cup. This year, there was barely a month between the end of the regular European seasons and the start of the World Cup.

In that month's time, you have to get at least 11 players who have not played with each other, to learn your style of play. Do you want to play a pressing style? Are you attempting a slow buildup, or trying long balls? Etc. etc.

-Home field advantage

In baseball and basketball, most modern statistical models account for home field advantage. Having 60,000 Russian fans chanting and heckling likely played a role in the team's ability to upset Spain, particularly during penalty kicks.

This goes back to the sample issue. How many times before have Spain played Russia IN Russia in front of a large crowd? Probably never.

---

All this is to say, cut Goldman some slack. There are a number of non-nefarious reasons why you may expect a soccer model to produce some spectacular miscues.

rco8786 7 years ago | | |

> you can't predict the winner unless you know what's being passed around under the table

Pretty sure you just inadvertently identified why GS is so “great” at predicting economic movements.

jasode 7 years ago |

Leonid Bershidsky and a lot of other journalists laughing at Goldman Sachs' incorrect predictions seem to miss the point.

The World Cup predictions from Goldman Sachs (and also UBS) are a form of recreation and entertainment with machine learning. It's an expression of quant nerd humor.

Analogous intellectual games would be engineers devising ridiculous Rube Goldberg contraptions[1] or programmers building "enterprise" FizzBuzz[2].

(I think it would add to the fun if GS uploaded their raw data and models to Github for others to play with.)

>It certainly didn't predict the final opposing France and Croatia on Sunday.

True, but it did predict France having better chance winning overall but was handicapped by a tougher draw. It also predicted France beating Croatia in round 16 instead of the final. The pdf says:

>While Germany is more likely to get to the final, France has a marginally higher overall chance of winning the tournament,

[1] https://en.wikipedia.org/wiki/Rube_Goldberg_Machine_Contest#...

[2] https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...

raverbashing 7 years ago |

People conflate statistics with actual results more often than not and I think those reporting on such stories and maybe even the original authors might fall for this.

It was not wrong to say Hillary had a 95% chance of winning the presidential election, but the confidence was low and that value still allowed for the opposite result to happen.

Also football has a lot of variance concerning team capability and end results. The better team might (and does) lose often, especially when going to penalty shoots.

With basketball, the stronger team will be easily scoring more in most cases.

boomboomsubban 7 years ago |

The World Cup is about the worst sporting event for data led predictions like this, far too much can rely on a few events that are basically a coin flip. It would be interesting to see how the predictions went for something like the Premiere League tables.

anonu 7 years ago |

People love to beat up on these companies because of this stupid world cup prediction. Yes, Goldman is a giant vampire squid wrapped around the face of humanity (Matt Taibi quote). But it turns out it's really just great marketing for their research teams.

Also, I've seen some people say (not in this forum) that banks now look stupid because they're in the business of making predictions and they can't even get the world cup right. Guess what? Banks make no money on predictions. They make money on flows and taking spreads on trades they do with clients. Any research or prediction is meant to be a catalyst for that trade.

jasode 7 years ago | |

>Banks make no money on predictions. They make money on flows and taking spreads on trades they do with clients.

You're mostly right but to further clarify, an investment bank like Goldman Sachs has revenue from mostly "market making" spreads but it does also have activities that depend on predictions such as their proprietary trading (before the Volcker Rule shut them down) and their GSAM (Goldman Sachs Asset Management) fund. The GSAM is basically a hedge fund for their wealthy clients' money. They will run predictions on macro trends on data like interest rates, commodities, indexes, etc to help them pick stocks for their portfolio.

As the pdf noted, the World Cup data models and simulations came from Adam Atkins of GSAM.

anonu 7 years ago | | |

The Volcker Rule shutdown approximately 0 amount of proprietary trading on wall street. Any articles you can point me to were merely media stunts by their respective firms.

The rule was too complex and onerous to be implemtable. Case in point, it's already being rolled back... Certainly because of the current administration we're in. But more because it was just a poorly written and thought out idea to start with.

ig1 7 years ago | |

The line between marketing making and prop trading is blurrier than you think. Whenever you quote a price you're implicitly making a prediction on the future of the market.

chopin 7 years ago | |

I am pretty sure Banks make money on predictions if they get people on following them.

anonu 7 years ago | | |

Yes... This is less a prediction and more a legal form of front running.

crispyambulance 7 years ago |

I am somewhat shocked that GS would jump into the prediction business of the World Cup, even as joke. The risk of people getting the wrong idea about the prediction and GS itself is too great, even with a perfectly defensible model.

This is an enterprise for bookies, not Goldman Sachs.

TuringNYC 7 years ago | |

FYI - I worked at Goldman Sachs and then a hedge fund for a decade. On the Capital Markets / Trading side, you are literally a bookie. In fact the nomenclature is "you have a book." You are setting trading spreads based on where you think things will go. Depending on the market, your work may be more or less statistical and you're trying to gain a statistical advantage.

Maro 7 years ago | | |

Off-topic: were you able to retire after that decade?

soVeryTired 7 years ago | | |

Surely your book is meant to be hedged though?

denzil_correa 7 years ago |

The "Ludic Fallacy" strikes again [0].

> The ludic fallacy, identified by Nassim Nicholas Taleb in his 2007 book The Black Swan, is "the misuse of games to model real-life situations."

...

> The alleged fallacy is a central argument in the book and a rebuttal of the predictive mathematical models used to predict the future – as well as an attack on the idea of applying naïve and simplified statistical models in complex domains. According to Taleb, statistics is applicable only in some domains, for instance casinos in which the odds are visible and defined.

Both Taleb's books, "The Black Swan" and "Fooled by Randomness" are an interesting take for such models. Meanwhile, most economists know about "Knightian Uncertainty" [1] which talks about differentiation of risk and uncertainty.

> "Uncertainty must be taken in a sense radically distinct from the familiar notion of Risk, from which it has never been properly separated.... The essential fact is that 'risk' means in some cases a quantity susceptible of measurement, while at other times it is something distinctly not of this character; and there are far-reaching and crucial differences in the bearings of the phenomena depending on which of the two is really present and operating.... It will appear that a measurable uncertainty, or 'risk' proper, as we shall use the term, is so far different from an unmeasurable one that it is not in effect an uncertainty at all."

[0] https://en.wikipedia.org/wiki/Ludic_fallacy

[1] https://en.wikipedia.org/wiki/Knightian_uncertainty

lowkeyokay 7 years ago |

If anything, this is a clear illustration of poor use of probabilistic prediction. When used for investments you have many outcomes. If the model is any good, you will most of them right. In the World Cup you have very few. Even if you count all games played. Definitely not excusing Goldman Sachs here, they should have known better than to try to predict this. There was only a tiny chance this could be great advertisement for their model.

Ntrails 7 years ago | |

> they should have known better than to try to predict this.

There's no downside, only free publicity. If they, by good fortune and a following wind, get it right - then the publicity is incredible. If it's wrong they laugh and say "well, better stick to predicting what we're good at!" and they still get a shitload of headlines and awareness of their product.

This was not a mistake.

jamespo 7 years ago | | |

Well, there's the downside of articles like this pointing out they've had 4 years to work on their models and they've got worse

geraldbauer 7 years ago |

PS: If you want to build or train your own model or make predications, you can find open (structured) data about all world cups at the football.db, see https://github.com/openfootball/world-cup and https://github.com/openfootball/world-cup.json Enjoy the beautiful game.

kgwgk 7 years ago |

The predictions were not so bad. At least one of the favourites won in the end. GS had France winning with 11.3% probability, second to Brazil with 18.5%. UBS was less fortunate, they had Germany (24%), Brazil (19.8%), Spain (16.1%) and England (8.5%) before France (7.3%).

I compared the logloss for their predictions with the "uniform" benchmark (giving each team 1/32 probability of winning, 1/16 probability of getting to the finals, etc) and the results are the following (if I transcribed the data properly):

Getting to second round:

GS: 0.495 UBS: 0.495 bench: 0.693

Getting to quarter-finals:

GS: 0.463 UBS: 0.459 bench: 0.562

Getting to semi-finals:

GS: 0.310 UBS: 0.327 bench: 0.377

Getting to final:

GS: 0.231 UBS: 0.269 bench: 0.234

World-cap winner:

GS: 0.097 UBS: 0.113 bench: 0.139

The performance of the models was ok until Croatia got to the finals. This hurt specially UBS, who predicted less than 0.9% probability of such an event (compared to 2.1% in Goldman's model).

Edit: these would have been the "best case" scores (if the high-probabilty teams had classified to each round, ignoring that this may be impossible due to the structure of the tournament):

GS: 0.432 0.302 0.220 0.141 0.079

UBS: 0.365 0.251 0.176 0.111 0.070

UBS could potentially achive lower logloss metrics because it had more extreme predictions.

cascom 7 years ago |

Isn’t this a little like flipping a coin four times - getting heads four times in a row, and looking at your friend and saying “but you told me the odds were 50/50 each flip?!”

thousandautumns 7 years ago | |

Yes, it is.

rcdmd 7 years ago |

This article didn't compare the Goldman Sachs model to any other models-- why not compare it with sports betting odds? Would Goldman have made or lost money betting their model was better than the crowd?

sunstone 7 years ago | |

Or compare it with the fivethirtyeight blog predictions.

vl 7 years ago |

>Soccer, with the many factors that affect game outcomes — players’ injuries and intra-team conflicts, the refereeing, the weather, coaches’ errors and moments of inspiration — remains only a tightly-regulated game involving a few dozen people. The behavior and performance of big corporations, entire industries and nations is arguably even more difficult to model based on data about the past.

Author misses the way models work entirely, the larger the entity, the more statistics and averages kick in, and as a result, better model can be built.

Donald 7 years ago | |

Depends on the complexity of the interactions between variables. There are plenty of examples where we have excellent local models, but make (comparatively) worse prediction at scale. A pretty classic example is biology - we have excellent knowledge about how genotypes work and their interactions in cells, but models of phenotypes are typically expensive, error-prone, or non-existent.

dmichulke 7 years ago |

I watched quite a few matches and among the things I saw in the matches but not in any statistics are:

- motivation (Germany and Croatia were the two extremes here, no idea how to measure it)

- team cohesion (number of articles in a few journals questioning the team cohesion, maybe also articles about individual players)

- creativity in offense (maybe measurable via "target missed from close distance" + "ball passed front of the goal")

- number of errors in defense that didn't lead to a goal

- percentage of times ball possession was lost from own goal to enemy's area (England was really bad here against Croatia)

iainmerrick 7 years ago |

Thanks to the use of more granular data, made possible by AI, this year’s model should have worked better than the 2014 one.

If anything, it worked worse.

"If anything"? All the results are available, so it would be easy to put a precise number on this. Measure the Bayesian regret, or just report the winnings if you had used the GS model to bet on the outcomes. Unless it reports some concrete numbers, this article is garbage.

It doesn't report any concrete numbers.

corpMaverick 7 years ago |

Soccer is a sport with a big random component. This is probably why it is so exciting. An average team can beat a better team.

The reason is easy to see. The game can be decided by one, two or three key plays. Compare that to basket ball. To win a game you have to consistently score more and defend better. Rarely the game is decided by one or two plays. That only happens when the game is already very tight.

barrkel 7 years ago |

I put money on Belgium (12.0 decimal odds) and Croatia (15.0) after the group stages, where some form was visible, combined with knowledge that they had some of the world's best players.

The odds shortened as the tournament progressed, I was able to hedge as the shortened odds made lay betting profitable.

(High variance in football outcomes means there's no guarantee of profit, I don't bet big sums.)

anoncoward111 7 years ago | |

This answer is very useful and contains proper strategy advice :)

If someone were to bet during the round of 16, if someone were to bet $1 on the bottom 8 and $2 on the top 8, the strategy would most likely yield a small profit or a small loss, rather than a total loss.

tirumaraiselvan 7 years ago |

It's a fools errand to predict high variance events like football games.

pbhjpbhj 7 years ago | |

Only predict events that are easy to predict, never fail!

patagonia 7 years ago |

Financial modeling is about risk adjust return. Because GS knows they can not determine with certainty the outcome of a given investment, they diversify and hedge. Most of all, GS is a market maker, the equivalent of a bookie. To say that GS’s models “didn’t come close” is to ignore all the ways in which such a grading scheme is different than GS’s actual business model. If their WC prediction efforts acted as anything more than a fun spirited PR project, it was likely that GS wanted to somehow keep its employees engaged and adding business value during the WC which they otherwise would have been certainly watched all month.

rossdavidh 7 years ago |

In addition to the many other problems with this article, I would like to point out that if, somehow, Goldman Sachs had managed to create a model that could accurately predict the results, the game of soccer would have to be changed to make it more unpredictable somehow. It is intrinsic to the nature of sport that, in order to be entertaining, there has to be a realistic chance for more than one team to win. Not many people (even from the winning country) would bother watching if it were accurately predictable.

kulu2002 7 years ago |

Good... There was this discussion thread few days back on HN

https://news.ycombinator.com/item?id=17509407

Did this investment bank use same set of algorithms that they use for financial predictions?

...And then I remember there was this Octopus[1] who used to predict winners with 85% accuracy

[1]https://en.wikipedia.org/wiki/Paul_the_Octopus

IkmoIkmo 7 years ago |

You'd have to run this world cup thousands of times by simulation, running it a single time and determining the results are not in line with the model is meaningless and silly.

It's as silly as saying my claim for the odds of nearly perfectly modelling a coin toss (approximately 50/50%) is wrong because a series of 10 coin tosses show different results from my model. The model is not any less correct.

Keyframe 7 years ago |

It's as good time as any to plug in EA's simulation results: https://www.easports.com/fifa/news/2018/ea-sports-predicts-w...

msravi 7 years ago |

Duh. Looks like there's a fundamental misunderstanding of how statistics works all around. The probability of an event does NOT predict a particular outcome. Ever. It only says that if the experiment is performed again and again and again, like a few thousand times, then X% of those will match that probability.

If I toss a fair coin you cannot predict the next outcome. You can only say that if I toss the coin a 1000 times, then close to 500 are going to turn up heads, and another 500 are going to turn up tails.

It was stupid of Goldman Sachs or whoever to predict an outcome. It was stupid of anyone else to lend credence to that prediction.

Hopefully, Goldman Sachs is not relying on prediction of singular outcomes to make their investment decisions. I don't think they are. Probably just marketing brouhaha to ride the soccer wave. Although I'm not sure if that worked as expected.

Sean1708 7 years ago | |

> It was stupid of Goldman Sachs or whoever to predict an outcome.

If you read the actual report they did[0], they never claimed that any single outcome was more than 18.5% likely.

[0]: http://www.goldmansachs.com/our-thinking/pages/world-cup-201...

pbhjpbhj 7 years ago | |

I agree completely with your opening remarks.

>"You can only say that if I toss the coin a 1000 times, then close to 500 are going to turn up heads, and another 500 are going to turn up tails."

Sometimes you can do that and every single flip will be heads. It's unlikely, and across zillions of universes you'd only find it once - but we don't have a pool of universes that we can sample statistically.

hsienmaneja 7 years ago |

They don’t have an edge like they do in their bread and butter markets, combined with a small sample set == high probability of a single year of sports predictions falling over like this

gesman 7 years ago |

If GS would need to bet money - their actual business model would likely be to sell a bit of each higher probability losers (less risk) vs. buy big on a projected winners (higher risk).

blattimwind 7 years ago |

This site is a good counter-example for website optimization: While it uses many assets, so a CDN domain makes sense, it spreads them out thinly. It loads over 100 CSS files, most of which are below 1K. Similarly it loads approximately 30 JS scripts, most of which are just a few K each. This is mitigated to a large extent by using HTTP/2.0, which permits a few dozen or so parallel requests, but it still means that a repeated load of the page takes 2-3 seconds. (Without HTTP/2.0 this probably takes ages, since browsers open only a few connections to each origin at most). There is also almost no difference between reloading with and without the cache.

rdlecler1 7 years ago |

In the world of models increasing precision for not necessarily increase accuracy.

Sean1708 7 years ago |

In case anyone was interested here is a table of how likely the model thought each team was to make it through any particular stage[0] along with the stage that that team went out in and the probability that the model gave for that particular outcome (i.e. [probability of making it through the final stage they made it through] - [probability of making it through the stage they went out in]).

                Groups  Round_16  Quarters  Semis  Finals    Out_In  Probability
        Brazil   87.5%     60.8%     42.0%  27.9%   18.5%  Quarters        18.8%
        France   81.4%     58.4%     36.6%  19.9%   11.3%       Won        11.3%
       Germany   80.5%     49.5%     30.5%  18.8%   10.7%    Groups        19.5%
      Portugal   75.2%     52.8%     32.2%  17.3%    9.4%  Round_16        22.4%
       Belgium   78.5%     51.1%     27.7%  15.8%    8.2%     Semis        11.9%
         Spain   72.3%     50.1%     28.8%  15.4%    7.8%  Round_16        22.2%
       England   73.1%     46.6%     24.4%  13.4%    6.5%     Semis        11.0%
     Argentina   79.7%     44.2%     24.1%  11.8%    5.7%  Round_16        35.5%
      Colombia   74.9%     37.3%     17.0%   8.5%    3.7%  Round_16        37.6%
       Uruguay   74.4%     34.6%     17.2%   7.2%    3.2%  Quarters        17.4%
        Poland   68.5%     30.5%     12.8%   5.8%    2.3%    Groups        31.5%
       Denmark   47.8%     26.3%     12.4%   5.2%    2.0%  Round_16        21.5%
        Mexico   52.0%     23.2%     10.5%   4.9%    1.9%  Round_16        28.8%
        Sweden   45.9%     19.4%      8.3%   3.7%    1.3%  Quarters        11.1%
          Iran   35.4%     18.1%      7.2%   2.6%    0.8%    Groups        64.6%
          Peru   37.3%     17.2%      6.8%   2.5%    0.8%    Groups        62.7%
     Australia   33.5%     15.4%      6.3%   2.3%    0.7%    Groups        66.5%
        Russia   47.9%     16.3%      6.0%   2.0%    0.7%  Quarters        10.3%
       Croatia   49.8%     16.9%      6.3%   2.1%    0.6%    Finals         4.2%
   Switzerland   52.8%     15.9%      6.1%   2.0%    0.6%  Round_16        36.9%
       Iceland   45.2%     15.1%      5.6%   1.8%    0.5%    Groups        54.8%
    Costa_Rica   36.8%     13.3%      4.7%   1.6%    0.5%    Groups        63.2%
        Serbia   32.9%     12.1%      4.5%   1.5%    0.5%    Groups        67.1%
         Japan   36.5%     12.8%      3.8%   1.3%    0.4%  Round_16        23.7%
  Saudi_Arabia   43.4%     12.7%      4.2%   1.3%    0.4%    Groups        56.6%
       Tunisia   35.2%     13.3%      4.1%   1.3%    0.4%    Groups        64.8%
         Egypt   34.4%      8.7%      2.5%   0.7%    0.2%    Groups        65.6%
   South_Korea   21.6%      5.9%      7.1%   0.5%    0.2%    Groups        78.4%
       Morocco   17.1%      6.8%      1.8%   0.5%    0.1%    Groups        82.9%
       Nigeria   25.2%      6.5%      1.7%   0.4%    0.0%    Groups        74.8%
       Senegal   20.1%      4.9%      1.2%   0.3%    0.0%    Groups        79.9%
        Panama   13.2%      3.3%      0.5%   0.1%    0.0%    Groups        86.8%

[0]: Exhibit 2 in http://www.goldmansachs.com/our-thinking/pages/world-cup-201...

Edit: Fix copy-paste errors and atrocious maths.

ernesth 7 years ago | |

Japan went out in Round_16

Croatia went out in Finals

And I do not understand what the last column means (except for France and teams out in group phase)

Sean1708 7 years ago | | |

Urgh, I hate that you can't edit HN comments.

First two were just me making a mistake because I write that in manually.

That last column makes no sense. It was supposed to be the probability that the model gave to the outcome that occurred, but I got the maths wrong.

ernesth 7 years ago | |

Great work :)

So all in all, the only teams for which the prediction was more than 1/2 were teams out in groups. That is a little underwhelming.

Ah, for Croatia, I believe, it should read 1.5%.

known 7 years ago |

GarbageIn = ML = GarbageOut

known 7 years ago |

I worked in GS; Soccer/football prediction is not their forte

tomelders 7 years ago |

While I agree that it's somewhat silly to try and predict a word cup winner like this (and I suspect it was just a bit of fun anyway), there is one other reason that could explain why all these attempts got it so wrong.

Cheating.

Before people start booing, let's not forget where this tournament is being held, and all the other nefarious things that country has been up to recently.

teamk 7 years ago | |

FIFA has been corrupt for decades. Although supposedly its been cleaned up since Blatter was removed, it is doubtful the institutional corruption has been eliminated completely. The only question is how pervasive it is.