Scaling Machine Learning at Uber with Michelangelo

Scaling Machine Learning at Uber with Michelangelo(eng.uber.com)

112 points by dpandya 7 years ago | 59 comments

marmaduke 7 years ago |

I love what Uber does with machines (ML), hate what it (currently) does to people.

We recently potted some models from Stan to Pyro (SVI on PyTorch), and it’s been reallly exciting (except for the dark corner of poutines), it really has the performance of something being used in production, except the occasional nan explosion.

edit we are lazy and use our GitLab CI/CD to drive model development iteration. It’s not as fully featured as what’s in the article but it’s a zero effort start.

cocobongo 7 years ago | |

What does Uber currently do to people that you hate? Uber currently provides people with more than 2 Million jobs [1]. Uber drivers/couriers made almost $13 Billion in the US alone last year [2].

[1] https://medium.com/@gc/ubers-path-forward-b59ec9bd4ef6 [2] https://www.sfchronicle.com/business/article/Uber-drivers-in...

marmaduke 7 years ago | | |

Treat people like freelancers while paying them like low-wage waiters? $13bn/2mn jobs is $6k/job/yr, not so impressive compared to welfare.

jquery 7 years ago | |

I like what Uber (currently) does to people. Gets passengers from point A to point B efficiently while saving them significant money in the process over alternatives. Metaphorically puts dinner on the table of hundreds of thousands of drivers. Literally puts dinner on the table of millions (UberEats). Has a business model that doesn't rely exposing more eyeballs to more ads, corrupting the press, media, and privacy in the process. Reduces car ownership and dependence. Moving towards encouraging people to ride green vehicles. Literally saves lives (reducing DUI). Yeah, I'm okay with the Uber of 2018.*

*Disclaimer: I work at Uber, and my opinions are solely my own. We're hiring.

platz 7 years ago | | |

"Silicon Valley innovation now is directly aimed at oppressing the underclass, and everybody knows it and can see it. They hate Uber. People hate Uber. It means the death of the era of good feelings that came with this constant Moore's Law style innovation.

And that was an unforced error, by Silicon Valley. It was in their DNA. They didn't have to give Travis Kalanick, a guy they despised and never trusted, for good reason—They didn't have to give him all that venture capital.

But they saw him as an expendable probe, so they cynically gave him money, to see how much law-breaking he could get away with in the name of their disruption activities.

That was hubris—and nemesis is well on the way."

- NEXT17 | Bruce Sterling | Live from 2027

UncleEntity 7 years ago | | |

> Has a business model that doesn't rely exposing more eyeballs to more ads, corrupting the press, media, and privacy in the process.

Though it does have a business model that (did?) flagrantly disregards the law in pretty much every market it moved into.

And we'll see how the privacy thing turns out when they figure out the data they have on millions/billions of people is worth a bunch of money and Wall Street is demanding "more cowbell".

marmaduke 7 years ago | | |

Sorry, I don't believe any of that. It reads like "let them eat cake".

mmq 7 years ago | |

Can you elaborate a bit more about your usage of GitLab CI/CD for model management/development. I am currently working on a platform [1] that tries to solve some of the issues mentioned in the article, i.e. improving data scientists' productivity and velocity, compare models, solve reproducibility issues...

[1] https://github.com/polyaxon/polyaxon

marmaduke 7 years ago | | |

We uh treat models as code, but also have NFS shares setup for the storage and GitLab runner talking to a Slurm cluster to run the models. Results and cross validation upload to GitLab. Main thing we haven’t built out yet are performance dashboards for showing improvement across commits, but with the GitLab APIs that’s a script away (currently we do it by hand)

marmaduke 7 years ago | | |

Polyaxon looks nice but we don’t admin the majority of the GPU resources we use (which is why being able to tell GitLab-runner to invoke Slurm is cool)

Pachyderm is another one I’ve looked at but we don’t have the sys admin bandwidth for that stuff right now.

mlthoughts2018 7 years ago | |

Why would you do this instead of using pymc3?

marmaduke 7 years ago | | |

PyMC3 didn’t run well on GPUs last I tried. That may have changed but I find PyTorch easier to work with than Theano or TensorFlow.

sandGorgon 7 years ago | |

would love to know what is your model development iteration. especially how you do testing, etc

marmaduke 7 years ago | | |

See my comment here, but I can answer other questions if you have them

https://news.ycombinator.com/item?id=18376567

Tickon 7 years ago |

This is not a product, nor is it open sourced - so this is basically just a PR stunt. Or am I missing anything??

typon 7 years ago | |

Looks like a blog post about an internal tool. Not sure why this is interesting to people

paulie_a 7 years ago |

It's kinda funny they tout their usage of GPS. I use Uber on a near daily basis and drivers by an large use Google maps. They have out right said "Uber sucks for directions"

And if you use express pools it will always say to go the wrong side of an intersection. I like uber because of the drivers, but their fancy technology is flawed.

googlemike 7 years ago | |

Please do not conflate GPS with navigation. There is a massive set of problems you can solve with high fidelity GPS Data (Uber knows it is a driver in a car, verifies it with another GPS entity (rider app reports GPS also), etc). There is not that much overlap between great GPS data and great maps - no amount of great GPS data will give you a good basemap. Please let me know if I am not making sense, I am more than happy to provide examples / explain further!

mi_lk 7 years ago | | |

Can you expand on the difference between GPS and navigation?

martinald 7 years ago | |

I've never seen an Uber driver not use Waze in London.

melling 7 years ago | | |

I was in Colombia and South Africa last year. Those Uber drivers also used Waze.

magoghm 7 years ago | | |

In Mexico City I always see them use Waze.

freyir 7 years ago | |

I believe they’re using GPS data here more for analytics, rather than navigation.

They can use GPS data to chart usage metrics, plan pool rides, check for anomalies, and harass journalists, for example.

srean 7 years ago | |

Indeed. What is even more strange about the use of google maps is that Uber bought Bing maps, I am sure for a hefty sum.