Rides of Glory – Uber Blog (2012)(web.archive.org) |
Rides of Glory – Uber Blog (2012)(web.archive.org) |
Anyone (especially the HN crowd) should know they have the data, and if you think they're not carefully analyzing it behind the scenes (like every other tech company who has your data), I've got things to sell you. I personally think a tiny peek like this into the data, much like the usage posts that OKCupid, YouPorn, and others give, is neat.
To test this, we took pairs of bad matches (actual 30% match) and told them they were exceptionally good for each other (displaying a 90% match.)
That's really not something people like having done to them. And the "HN crowd" shouldn't have an expectation of privacy and decency in data? Of course they're analyzing data, but it's really the viewpoint from which they do it that is unsettling. OKCupid says "no, duh, we're unethical. Deal with it." Uber says "Check it out! We drew a line between social security checks and prostitution!" (as waterlesscloud notes at https://news.ycombinator.com/item?id=8644138 )There are a million more beneficial ways that people could be using the data. Fighting hunger, poverty, illiteracy, etc., to me, is a "good" use of Big Data. Looking at sexual habits (when you're not selling sex) or openly manipulating people to get data is, to me, a "bad" use.
I'm sorry if the idea that "people's short overnight stays are evident in their travel data" makes you blush, but that isn't anyone else's problem.
(I say they "probably dropped PII" because when you do work of this sort, PII is boring data that slows down your calculations.)
Similarly, what's wrong with observing a correlation between welfare checks and prostitution? It's an interesting observation. It's potentially useful for public policy and fighting poverty (at least American style relative poverty), though of course a more detailed investigation needs to be done.
By contrast, it's simply not professional and reeks of juvenile behavior for Uber to be writing a post like this. Just because you have data and have these thoughts, doesn't mean you have to do the analysis and show the world. It doesn't help their users, it's not even that interesting, and it's not relevant to their value proposition as a business.
But since they are accused of trying to dig up dirt on people, this is a chilling reminder that they are more than capable of doing that, and apparently quite willing.
That's the creepy bit. Who owns that data? I want to live in a world where I own my data, and it can't be used for creepy purposes like this, or to extract additional value through arbitrage based on asymmetric information availability.
The Streisand effect is so well-known that I'm surprised anyone would delete a blog post nowadays.
EDIT: I actually hadn't read the blog post in detail until now, which was more than a little dumb. I thought it was just an analysis of rides along with some neat heatmap images. I didn't realize it was about sexual datapoints.
And yet.
See, there's another company that occasionally releases interesting data analytics: Google.
See: Word frequency over time, Predicting the spread of viruses from searches, etc.
The issue is that Uber is trying to explain motive and behavior at the individual level ("I know something about you!"). This is something that would be a definite no-no for Google. The cheekiness of the language certainly doesn't help either.
The more and more I hear about this company the more I am thankful we have heavily regulated taxis/cabs.
I think I'd rather my data only be available to a private company and their handful of engineers than the whole world.
(though it's not very well-written, some analysis a bit iffy, and the guesswork towards the peaks and dips in the graph rather low-effort)
Creepy/evil maybe no, because the data is clearly anonymised. However the cringe is all over this article. OKCupid's stuff could easily be just as cringey, but they know it's important to steer clear from that. Also they're a dating site, if they wrote an article about data-mining one-night stands, that would make sense. Not so much for a taxi company, especially not in light of Uber's general attitude.
The final sentence of the article definitely crossed from "cringe" into "creepy" for me, though. In particular from someone called "Uber".
The PDF in which this article was referenced did so to illustrate the availability of this data for frivolous purposes, and is right to call it "questionable" when considered in light of Hourdajian's statements about privacy and Uber's data-policies.
I mean, I found the idea behind the post interesting: of course you can analyze trends in ridership to draw interesting conclusions. At the end of the day, however, it's a horrible idea to say "Hey, we know which of you are being 'frisky' and where!"
Perhaps with a different motivation, this post wouldn't be nearly as ruinous. How about ridership patterns of sick or socioeconomically disadvantaged people? That's the kind of data that can change lives for the better.
The service they provide doesn't allow the "Ministry of Truth"[1] to doctor historical documents to meet their present day narrative.
archive.org respect the robots.txt of the current website owner. This can mean that they have the data but choose not to give you access to them. I have seen cases in the past where a website I once frequented became defunct, then the domain expired, then someone parked a holding page on that domain including a robots.txt that keeps archive.org from displaying the old data (which do not even belong to the current owner of the domain!).
If they wanted to, there are a number of ways Uber could prevent archive.org from displaying that blog post. Many of these ways are due to the good faith under which archive.org operates (nobody is forcing them to respect robots.txt), and some even involve resorting to legal methods. But history is always mutable.
(Nothing but love on my end for archive.org, believe me! But I do want to point out the lengths that some people will go to alter the historical record).
And yes I don't care what you think, but a company with a billion(ish) of funding is more powerful than YOU.
[0] https://web.archive.org/web/20140827195715/http://blog.uber....
Internal metrics teams nearly always have access to complete data. The issue is sharing non-anonymized data externally.
However there's no mention in these posts of such safeguards, and subjectively the post reads more like the analyst is just fishing around in the full raw dataset of ride times, start and end locations, and names. To wit:
"What else can we learn? First, we can devise a way to statistically assess whether there are more women or men in a neighborhood than we’d expect. [...] We used Rapleaf’s Name to Gender API to assess the likelihood of a rider’s gender given their name, only accepting a match if the probability was >= 95%."
And in the original post, he categorizes rides as possibly related to a late-night hookup based on whether the destination and departure points for 2 rides are within 0.1 mi of each other.
>Internal metrics teams nearly always have access to complete data. The issue is sharing non-anonymized data externally.
I disagree pretty strongly with this. Do you think that your average Uber rider would be OK with Uber employees analyzing their ride patterns (with their real names attached) to try to figure out where and when they are having sex? Do you think Uber should allow such access to its employees by policy? (It seems we agree that writing a blog post about it is not a great idea.)
This would also explain the spike near the weekend, among other things.
One could do an analysis like this while still working with anonymized data. Still a bit creepy, but not that different from reports and blog posts you see from other startups and tech companies.
Nothing they've done so far, in isolation, are IMO worth the pitchforks being handed out in tech and mainstream consciousness right now, but taken as a whole it's pretty easy to see why people aren't willing to cut Uber any slack or give them the benefit of the doubt.
So yeah, this thing by itself isn't "that bad", but it's one piece of a large puzzle of Uber's misbehavior.
There have been very, very few times when a company's webpage was down and I needed to go to google-archive or archive.org to refer to some innocuous information. However, the times that I've used those sites to gather evidence of possible whitewashing? Many, many times, in comparison.
OKCupid is a dating website which deliberately branded themselves as further on the "edgy" and "hookup" side of dating websites. Then you have POF somewhere in the middle, with eHarmony way on the other side, quite opposite of OKCupid.
I'm not sure why Uber would want to put themselves anywhere on that same scale (i.e. aligning your brand with notions of sex and one night stands). There's a time and a place for everything, and for edgy data analysis like this -- that "place" is edgy dating websites who want to be known for hooking up.
It's unprofessional and out of line with their brand image, obviously why the post got deleted. IMO this further validates all the bad press the media has been publishing about Uber.
https://web.archive.org/web/20140827195709/http://blog.uber....
Note that both of these posts had been up for years and only disappeared in the last few days.
"uberdata-how-prostitution-and-alcohol-make-uber-better/"
So if you took an uber to some bar/club/friends at 10-11pm and again after 2am when all bars or the T is closed, you're likely counted. I doubt this represents customers having one night stands and is likely just a heat map. This is further explained by the small pocket in Somerville that is not accessible by the train, but by bus where people may opt for an uber.
That's not to say that there are no rides of glory or whatever the hell kids call it today.
Would google publish data that shows how searches for porn spike during different times of the day or times of the year, as if it's some "cool and hip and edgy!" insight?
I don't think so.
And for the same reason they don't (whatever reason that is), it would probably also be wise for Uber not to post stuff like this.
I really don't care, nor am I offended. I'm just speculating that Uber doesn't have the brightest team of execs and still have a lot of "growing up" to do.
Google have been fighting a public relations war for a long time now to not appear creepy or stalkerish. I can think of few things they could blog about to make people consider not using Google more than "we know when you're looking for porn".
Uber have not (yet?) been widely called out as being creepy the way Google have. But Uber have data that can be every bit as personal as your search history, and posts like these make it obvious that people at Uber are thinking hard about putting those data to use.
There's a lot lurking under what at first glance appears to be merely a poorly-considered sophomorish post.
It's the actions of the unscrupulous minority that ruin this for the rest of us. I personally believe that most of the time when companies say "We simply aren't that interested in you." they're probably telling the truth. Stats is pointless if you look at single points. It only takes one person to snoop on an ex or to blow everything up. Unfortunately you have to mitigate that risk, but proper database sanitisation before handing over to the analysts should be sufficient. Provided there is no overlap between the sensitive database and the one the analysts have access to there shouldn't be a problem.
I guess it's a side effect of becoming 'big' that you can no longer run these kind of public posts without looking extremely unprofessional.
Does it really matter these days?
There was a related story published recently, NYC Taxicab Dataset Exposes Strip Club Johns and Celebrity Trips
http://research.neustar.biz/2014/09/15/riding-with-the-stars...
We are watching them pretty close, aren't we?
Is this data fascinating? I guess the time of year patterns and holiday anomalies are interesting but aside from that this behavior seems obvious?
now they have critical mass they can transition into "full boring corp speak"
HN don't throw stones, what boundaries are pushing to get traction right now?
We're not pushing ethical boundaries.
You mean the sort of people who take the bus, who are pretty much the opposite of their target customers?
> Between June 2011 and August 2011 I worked with my friends over at Uber as their data scientist, writing (what I thought were) amusing, data-driven blog posts (among other, more serious roles).
Uber is mining information about people's movements in the real world, which feels ickier.
A correct observation does not shield it from being inappropriate or in poor taste.
Makes sense to remove it now, since it isn't something they want to highlight in light of recent events.
Let's say I post something that I shouldn't have posted -- insider stock information, nude photos, whatever. Perhaps something illegal for me to post. I need to make it go away.
I need to be able to create a robots.txt today which affects stuff I posted yesterday.
This is why archive.org respects the current robots.txt for access to past content.
Uber drivers in the last year have become, without fail, become much worse at pathfinding than cab drivers. I've had drivers completely miss major turns or get lost while the meter's running. And more recently, I've noticed a pattern of behavior where I'd call an Uber and the car wouldn't even begin moving for > 5 minutes.
I'm not sure what that's all about, maybe they're waiting for Surge to kick in in the hopes of getting a fatter fare? Either way, I have not had an Uber arrive within the estimated time for over a year.
Lyft drivers on the other hand start moving right away after they're assigned.
> "oh, the credit card machine is broken," and "I'll only drive uptown right now,"
It sucks that you have to deal with it, but the solution is really simple. Just get in the cab, don't be a sucker and tell the driver where you're going through the window. If they balk, take out your phone and take a picture of their license at the back and tell them you're dialing 311. They will immediately fold and take you where you're going.
Ditto credit card - if the credit card machine is broken they are obligated to tell you at the beginning of the ride, and if they don't you can walk away for free.
Ditto the JFK thing - a cab cannot refuse a fare within city limits.
I've never seen a cabbie not fold like a house of cards when threatened with a 311 call. For all its warts the T&LC actually polices driver complaints pretty hard.
I don't see how this is any different from Google analyzing search data to try and figure I'm pregnant. You could make the argument that "its a algorithm" but at one point someone had to sit down and build that model.
Sure, as long as Uber isn't broadcasting that information with their name attached. The average person really doesn't care about (or understand the extent of) data analysis (from companies or the government) -- what they care about is public disclosure which may mean personal embarrassment or a lawsuit or other form of inconvenience. People who want to control all their data are hoping for a fantasy world where observations and inferences by third parties are magically made impossible. The reasonable thing to focus lawmaking efforts on is limiting legal forms of disclosure and standardizing safe storage requirements for the raw data -- indeed such laws already exist, with the HIPPA privacy rule perhaps being the best known in the US.
>People who want to control all their data are hoping for a fantasy world where observations and inferences by third parties are magically made impossible.
I think you are setting up a straw man here. What I suspect the average user expects is for their sensitive personal data to be dealt with in a professional and respectful way, with protections against abuse by rogue employees. There are plenty of companies who deal with private data and understand this well. Potatolicious had a comment on another Uber thread detailing the hoops an Amazon employee has to go through to get access private customer data [2].
Scrubbing these posts suggests that Uber realizes that they have a real problem, at least at the PR level. I wouldn't be surprised if they are also getting more serious about controls on internal access to ride data.
[0] http://www.hhs.gov/ocr/privacy/hipaa/understanding/covereden...
[1] http://dailybruin.com/2010/05/05/former-ucla-medical-center-...
People aren't used to transit companies interrogating them about the purposes of their journeys, they just want the transit company to get them from point A to point B (imagine if they did this when you got in the car: "Where are you going? Why?")
And obviously from a business perspective the more you understand your customers and their motivations the better you can serve them.
But lets not kid ourselves. This isn't anonymized data. Uber's publishing in a format that is unspecific, but they have all of the detailed data and can poke through it and infer things at their leisure, and they have no compunction around how they're doing it or why.
This is why ethics and trust around data collectors is really important. Uber seems pretty cavalier about it, and that actually is a problem.
That's a fairly large accusation to make.
This blog post was originally published in 2012 - two years ago. Since then has anything come out that would confirm your suspicions? I haven't seen anything.
I meant "decent" in an ethical sense, not in a conservative "don't you look at my 'short overnight stays'" sense.
I don't disagree that they've not scaled any sort of pinnacle in data science, but neither do I think what they're reporting is uninteresting.
In what way is what they're saying outlandish and unethical?
..even if it's from TC (I won't hold it against you).
This is exactly why research that deals with humans at Universities invariably must pass a human subjects review process. "How else would we discover X?" is certainly not reason to subject anyone to an unethical experiment. Subjecting people to what you likely believe to be a bad date should very definitely raise red flags, even if the details in practice would pass a human subjects review.
And that's the trouble: there's a tremendous space of research that just isn't ethical to carry out on actual living humans. As such, we have to find methods to determine answers to those questions that don't breach ethical standards. The burdens of discovery must lie squarely on the researchers, not on the (often unwitting) experimental subjects.
It's actually far, far more invasive than what Uber did as they described it in the blog post.
Then it would be equivalent.