An Update from Robinhood’s Founders

An Update from Robinhood’s Founders(blog.robinhood.com)

212 points by Beowolve 6 years ago | 277 comments

czbond 6 years ago |

On the profession side of this, if you're an engineer at RH in the thick of this - many have been there. It seems dire now, but in a few years the fog, panic, and haze of no sleep will become a story you tell your peers at happy hour.

Many will cast stones - but they have been there too. If they haven't, well maybe their day will also come. You may feel bad at the moment - but the best way professionally forward is "We try our best tomorrow"

cheschire 6 years ago | |

If this were an outage directly caused by a natural disaster, I could understand. This outage was an availability problem. This clearly points to some prioritization problems within the leadership layers if robust and resilient infrastructure was not emphasized.

The prioritization problems may not be due to ignorance or malice though, and may be justifiable if there are other fires that are burning brighter. It's still pointing to problems though, and I think it's completely legitimate for engineers to question the stability of the company when this sort of thing happens.

At the very least as an engineer I would be asking some pointed questions of my leadership. Maybe not dusting off the resume yet, but still I'd want to get reassurance from internally that the leadership problems that caused this are being addressed.

malux85 6 years ago | | |

Sometimes you just have to cut them some slack. Have you engineered a highly available cluster before? I'm not talking about the hot-standby postgres master that gets called on once every 2 years, but I'm talking about a 180 node Cassandra cluster thats doing 15,000 writes a second 24/7 and peaking at 60,000 writes a second every day, and you have to do node replacements every week or two because of the high load.

Or I'm talking about a 200 node hadoop cluster thats doing the electrical metering and billing for 8 million people, and is NOT allowed to stop.

Or the trading platform thats running sub millisecond trades and downtime means 300,000 $ USD per minute.

These are systems I have engineered over the last 10 years, and I can say: These things are complex and have failures in 1000 different ways, and while you're monitoring 999 of them that one thing you're not looking at is festering under the surface (your monitoring system is tracking IRQ hardware interrupt response times, right???)

Part of being in a team is everyone pulling together, and yes it's stressful at the time, but even very good management cant see all ends, just like very good engineering cant predict everything. I don't think it's useful to start pointing the finger at management and "asking some pointed questions at leadership" because sometimes everyone is doing their best. Yes we should analyse our failures so we can do better, but your tone is very accusatory, and I believe that a better approach is an all inclusive chat about how we can do better, and management saying "great job engineering" for fixing it, and giving them a break after the stressful event.

donavanm 6 years ago | | |

Ive seen bigger, scarier, potentially costlier time based bugs personally. I dont think this would make me reevaluate my employment if I was at robinhood. As the parent says you either learn these lessons the hard way or you havent learned them yet. Thats doesnt translate to being a “leadership failure.”

Your smaller point about prioritization is spot on though. I dont believe Ive seen any similar incidents lead to business ending outcomes. I personally point to sony or, more recently, equifax as examples of the disparity between actual business impact and technical abhorrence. In light of that why is it worth trying to preemptively solve technical challenges instead of business needs? Every calorie spent on “what if” subtracts from “whats needed.”

kerng 6 years ago | |

Reminds me of the book Showstopper and the personal stories in - its about the creation of Windows NT. Pretty interesting how things where not so differnet some 30 years ago

In case anyone is interested: https://www.amazon.com/Show-Stopper-Breakneck-Generation-Mic...

dmix 6 years ago | | |

Interesting, so it took 5yrs, $150-million, and 250-employees to get NT shipped. Adding this one to my reading list!

bertil 6 years ago | |

Important step though: have a retro, many maybe and write a report explaining what was messed up and how you might mitigate in the future. It looks like it’s going to be a good one. If you can share a sanitised version publicly, that would hopefully make it all a little bit more worth it.

I think I speak for everyone here if I say that, if that report is public and interesting, everyone on this thread will be happy to get you a drink.

vinaypai 6 years ago | |

This is all true for a company that is actually pushing any boundaries as opposed to failing pathetically at a well solved problem.

indecisive_user 6 years ago | | |

Robinhood opened up stock trading to a large portion of the population that would otherwise not have been interested in traditional trading platforms with high commissions.

Their success helped to pressure companies such as TD and Schwab to mostly get rid of commissions as well, which is great for the average trader

I think Robinhood has a lot of problems, but to say they're not pushing any boundaries ignores the huge changes they've brought to the industry.

unicornmama 6 years ago | | |

Pushing the boundaries? They wrote an app that gamifies stock and options trading...

RayVR 6 years ago |

Having worked as a professional investor since 2012, I can say these outages can happen anywhere. I've seen day long outages at exchanges where tens or hundreds of billions of dollars would have been trading, at brokers where who knows how much would have traded. I've also experienced these outages at retail companies that are more established, including TD Ameritrade (I become a customer when ThinkOrSwim was acquired.) I have also seen brokers screw over individuals on a significant scale without real ramifications.

The fact that Robinhood is telling people anything about the outage is only because they are the company they are, operating in the startup world/mentaity.

To the people thinking they should be compensated in some way...If you are doing >$1m daily volume, maybe you can contact them to see what they can do but even then, I doubt it. The way this should be handled is to have multiple executing brokers. You can implement offsetting positions if needed and transfer positions when your main account becomes available, if you are using a broker that can clear. Right now it seems Robinhood is working to implement clearing but you could still go to neutral or put on your positions.

twic 6 years ago | |

> The fact that Robinhood is telling people anything about the outage is only because they are the company they are, operating in the startup world/mentaity.

Yep. Intercontinental Exchange and Eurex, two huge capital markets exchanges, routinely have multi-hour outages and don't even acknowledge that they've happened, let alone explain them.

whb07 6 years ago | | |

multi hour isn't day and a half.

Itsdijital 6 years ago |

I have mixed feelings of sympathy about this whole RH thing.

Anyone who has used RH regularly should be well aware of how inept it is. Any spikes in volume or volatility, even on a single stock, bring it to it's knees pretty often. Like not just the last week, but even during calm periods. I've personally lost 20-30% on positions solely because RH was bugging out, thankfully I use RH just for "fun trades" usually <$100.

I cannot fathom having the balls to trade any real amount of money on the platform while being aware of these long term issues.

On the flipside I feel for new users and perhaps even generally inactive users who weren't aware of RH's incredible flakiness. I'd imagine (or hope to) the losses of most of those users were small, assuming they were new or casual and just testing the waters.

Even if one of my small plays hit it big on RH, the money would just go to my main account on TD (which has been smooth all week shy of a few hiccups Fri morning during record volume). It's been obvious for a long time that RH should not and cannot be trusted. If you're trading options with a $60K account on RH, well, I don't even have words for that level of ignorance.

stef25 6 years ago | |

I abandoned Coinbase after having difficulties getting a few 1000 bucks out of there. It worked out in the end.

Problems with my data I can tolerate up to a point. Problems with my money I absolutely can not tolerate. As you said, it's unfathomable how people can trade money on a platform that's flaky.

robinson-wall 6 years ago | | |

The interesting thing about working for a UK challenger bank - I now have visibility into all of the outages going on at large, high-street banks here.

Complete outages are rare, and well-publicised, but things go wrong a lot more[1] than you might think without any communications to customers that anything is wrong, sometimes outright denying[2] that there's a problem.

1: https://twitter.com/nickrw/status/1141058572547215360

2: https://twitter.com/nickrw/status/1164162320672669696

0x8BADF00D 6 years ago | |

It’s another example of why DevOps has become a buzzword and most teams just pay lip service to it.

jennyyang 6 years ago |

I know quite a few people that were personally affected by this and lost money due to the two outages and they are all pulling their money from Robinhood. The fact that they can't offer any compensation might be a big problem for them, since they already have zero trading fees, which is what most brokerages offer as compensation.

Personally it doesn't pass the smell test for me. The load was much higher the previous week and load problems go away once the load disappears. They probably had a lot less load the rest of the day, so the fact they were down the entire day suggests it was something else. I would need a fully transparent post mortem before I believed anything they said.

RestlessMind 6 years ago |

This is such an empty update. At the very least, they should have published a detailed postmortem or committed to one by a certain date. How are we supposed to know that they have learned their lessons?

harikb 6 years ago | |

I don’t work for them, but I am pretty sure we can blame the litigious nature of this industry for the lack of detail in the postmortem. Not everyone can afford to be cloudflare :)

Even for Cloudflare, I thought the company will get sued out of existence after the proxy data leak, but finance industry/SEC etc is a completely different ballgame.

dx034 6 years ago | | |

I believe it's the fear of litigation rather than actual litigation. Other companies also manage to publish postmortems and don't get sued out of existence.

elliekelly 6 years ago | |

The compliance world isn’t quite as fast-moving as tech. Even a “high priority” business continuity post mortem at a financial institution is going to take at least a week for all of the lawyers & senior management to agree on the language.

dilly_li 6 years ago |

Start from the email notification. They have been asking themselves the easy questions.

Just look at the top questions in their email:

* Are the funds in my account safe? Yes, your funds are safe.

* Was my personal information affected? No, your personal information was not affected.

* Can I use my Robinhood debit card? Yes. If you have a debit card, you should have been—and should still be able to—use your card, but you may have had issues receiving notifications, viewing your balance, and seeing transactions in your app.

------------

The real question is: How is Robinhood compensating for the missed trades?

Stop asking yourself the easy questions, RH.

jsf01 6 years ago |

I’d be interested to read a deep technical post-mortem like those which have become fairly standard among other big tech companies. Hoping Robinhood does the right thing here.

0xy 6 years ago |

Still silence on the traders who lost tens of thousands of dollars? Are they going to be compensating or not?

This blog post doesn't appear to say anything. It's not an apology, it's not an explanation, it doesn't say what they're going to do in response.

This is after the incident in which there was no status updates or support availability for multiple hours of time. Why can't they commit to updates every hour or every 30 minutes?

dang 6 years ago |

Recent and related:

https://news.ycombinator.com/item?id=22477567

https://news.ycombinator.com/item?id=22475019

https://news.ycombinator.com/item?id=22468361

https://news.ycombinator.com/item?id=22465178

aloknnikhil 6 years ago |

Genuine question: With no commission trading at places like Schwab and eTrade, is it even worth trading on Robinhood? For as far as I could remember (about 2 years ago), Robinhood has always failed to scale.

manigandham 6 years ago | |

Options are completely free on Robinhood while they still have a per-contract fee at other brokerages. If you don't care about that then no, there's no reason to stick with Robinhood.

benmanns 6 years ago | | |

Additionally Robinhood self clears options (or for some other reason?) and does not charge the Options Clearing Corp fee of $0.055/contract or the Options Regulatory Fee of $0.0388/contract which all other brokers charge (incl. ones with $0 or flat rate commissions/fees like WeBull, Gatsby, Tradier). All you pay is the FINRA and SEC fees on sells of about a penny each for small trades.

Actually, if anyone knows of another broker who _doesn't_ charge these, please let me know. If you're first for the broker I'll give you $20 for the tip.

mjs33 6 years ago |

Their DNS system failed? How?! Unless DNS stands for “Do Not Sell”

tbrock 6 years ago | |

This happened to us at Hustle years ago. Basically if you run on AWS there’s a DNS server provided inside each VPC that usually works fine but which has no observable load metrics etc... so you don’t really know you are slamming it and are about to have a problem unless you audit your entire codebase.

Why? Well that tiny DNS server has certain capacity constraints and if you don’t cache DNS lookups by using a http/https agent for example (in NodeJS) you wind up looking up the same dns info over and over and churning sockets like it’s going out of style. If you run really really hot the poor thing falls over (rightly so).

The limits are high and DNS is fast so you usually don’t notice but when you are under load bugs like this come out of the woodwork. When it falls down you look up the AWS docs, lean back in your chair upon finding this isn’t an “elastic” part of AWS and say “FUUUUUUUUCK” so loud it can be heard from outer space.

If you are Robinhood though don’t you have some former Netflix SRE/DevOps beast on staff that knows this and so you run your own DNS and monitor it?

jcheng 6 years ago | | |

I read this and thought, “surely there’s an OS-level DNS cache?”

Apparently not on Linux! https://stackoverflow.com/questions/11020027/dns-caching-in-...

ajsharp 6 years ago | | |

Wait, what?? There's an invisible DNS server running inside your VPC? I get what you're saying wrt cached DNS lookups but this seems wild.

PaywallBuster 6 years ago | | |

AWS should simply provide monitoring and alerting by default on these footnote service limits.

manigandham 6 years ago | | |

What scenarios cause this many DNS lookups though? Connections should be kept-alive after the IP translation, so if it's really new connections being setup constantly then wouldn't that show up as a major bottleneck first?

tempsy 6 years ago |

Sad that there isn’t an actual apology anywhere to be found in the letter at all.

And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.

CamelCaseName 6 years ago | |

> And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.

This isn't really an issue because the fed rate cut impacts everyone. Other institutions will cut their interest rates as well. I know of a few banks (Canadian) that have already lowered their GIC rates.

If anything, this is actually good for RH. Now instead of comparing 1.8% at RH and 1% at another Financial Institution, you're comparing 1.3% and 0.5% -- a much bigger multiple.

tempsy 6 years ago | | |

most brokerages don’t actually pay anything. With another cut it’s going to be <1% vs 0%. Hardly anything even with a six figure balance. That’s my point.

GaryNumanVevo 6 years ago | |

Yeah because if they're culpable then they can be sued via class-action

xyst 6 years ago |

The boys down in the salt mines of WSB will want a blood sacrifice.

Founders should be fired. CTO/CIO should be replaced.

vinaypai 6 years ago |

Historic... Unprecedented... Thundering herd, a bunch of excuses to explain why they couldn't handle the volume that most real brokerages handle every second.

alishan-l 6 years ago |

I heard it was related to the leap year. Apparently they had downtime 4 years ago as well.

ablekh 6 years ago |

I'm curious about your thoughts on why a technical infrastructure, which, by nature of being cloud-native, is supposed to be (and likely has been) architected as a highly elastic platform, have not stood the test of time in this regard.

Based on the in information from Robinhood's careers site, their platform is largely based on the following technology stack:

  - Python, Django, Django Rest Framework
  - Go
  - PostgreSQL
  - Container and container orchestration technologies (Docker, Kubernetes)
  - Microservice-oriented architectures and related OSS technologies (Kafka, Celery/RabbitMQ, nginx, Redis, Memcached, Airflow, Consul)
  - Cloud-native infrastructure (AWS, GCP)
  - Infrastructure as Code and configuration management (Terraform, SaltStack, Ansible, Chef, Puppet)
  - CI/CD and test automation frameworks (Cypress.io, Jenkins, Appium, UIAutomation, Bazel)

vbtemp 6 years ago |

On Reddit I've been trying to ask this ELI5:

Why would you use RH instead of a normal, mainstream brokerage like Vanguard, Fidelity, etc that already has (1) an app and (2) commission-free trades?

lkbm 6 years ago | |

Easy answer: As someone who's used Vanguard for index funds and the like for a couple decades now, I had no idea they had an app or commission-free trades. They don't market this at all.

As a secondary answer, normal, mainstream brokerages have pretty bad tech, tbh. I don't expect it to be worse than Robinhood in terms of things like security, and I expect UX to be worse. (Side note: I just discovered that Vanguard actually has a secret security key option hidden under Account maintenance, so I can finally switch from sms 2fa. +1 to Vanguard.)

infinite8s 6 years ago | | |

> Side note: I just discovered that Vanguard actually has a secret security key option hidden under Account maintenance, so I can finally switch from sms 2fa. +1 to Vanguard.

It looks like you still need security codes setup:

"You'll need to register for both security codes and security keys, however. That's because keys and codes go hand in hand—if you lose your key or don't have it, we'll need to send you a code in order for you to log on. In addition, you'll always need a code to access your accounts from a mobile device."

If an attacker can skip the security key you might as well not use one.

fny 6 years ago | |

My brother has a Fidelity account and apparently even he was blocked from putting in orders online last Thursday, so I'm not sure they're immune either.

acchow 6 years ago |

Can't wait for the post mortem

joobus 6 years ago | |

I don't think we will get a postmortem. Their lawyers will kill it because it will be an admission of guilt and open them up to even more legal liability.

wbl 6 years ago | | |

The SEC did one for Knight Capital.

numlock86 6 years ago | |

Maybe in another four years when they finally realize they still haven't fixed the leap bug. Didn't work out for this year apparently. Last leap year had the exact same problem. The problem is that the ticket is very low priority because right now it is working again and won't happen again until at least 2024 ... By then it will most likely be forgotten. Again.

lemmox 6 years ago |

I'm always amazed by how tricky DNS failures can be.

Ambele 6 years ago |

I can't help but think this glitch was a good thing and Robinhood investors would do better if they traded less anyhow. According to an OpenFolio correlational study, traders who trade more than 12 times per year make 0.5% less than traders who trade less than 12 times per year. OpenFolio was one of the first three websites to have an API integration with Robinhood portfolios.

VWWHFSfQ 6 years ago |

every time I see a company that has "co-CEOs" I always wonder what kind of weird stuff is going on in that company

winrid 6 years ago | |

Wasn't Steve Jobs a "co-CEO" of Apple for a while? (Edit, I mean after he came back from NeXT)

LaserToy 6 years ago |

Blame the load. If only Robinhood were not famous for saying they are hiring only the best...

The best do not go down like that.

vsareto 6 years ago | |

That’s always marketing speak and never to be believed.

dirtydroog 6 years ago |

Companies like Robinhood regularly go down when markets are volatile. It was quite frustrating when the financial crisis was in full swing not being able to log in to my trading account. I reckon I would have made a killing.

0xDEEPFAC 6 years ago |

"Multiple factors contributed to the unprecedented load that ultimately led to the outages. The factors included, among others, highly volatile and historic market conditions; record volume; and record account sign-ups. "

What a sad press release, I am sure people at their corporate office were sweating over this. The long and short of it is that users trusted the service would work and had possibly a great deal invested only to get a comment when everything breaks down deflecting blame "OMG we weren't prepared for what our users did!"

We live in a sad state of software. I expect things like this and the Equifax scandal to continue if things like software security, reliability, and performance aren't taken into account.

homero 6 years ago |

This is common in Bitcoin exchanges. Never thought I'd see a stock broker go down but i guess it's the same issues.

buryat 6 years ago |

> Traditionally depicted dressed in Lincoln green, he is said to have robbed from the rich and given to the poor.

Does the name still stand?

shrimpx 6 years ago |

Wow this is Coinbase in 2017. Trading mass hysteria takes down unprepared Silicon Valley trading service.

shiado 6 years ago |

Does Robinhood have stop-loss orders an if so did they execute when it was down?

sitzkrieg 6 years ago |

volatility always shakes out the trading companies with lame infrastructure

gadders 6 years ago |

I mean this is OK as an apology, but is there an actual post mortem anywhere was can read?

c930 6 years ago |

dnsmasq is your friend.

egdod 6 years ago |

Pretty light on information.

ilrwbwrkhv 6 years ago |

wasnt this cause by leap year and them not taking that into account?

illnewsthat 6 years ago | |

They denied that on Twitter: https://twitter.com/AskRobinhood/status/1234861941413351434

tomc1985 6 years ago | | |

So what of the screenshot in the original twitter post? Was it doctored? Showing GMT?

beart 6 years ago | |

Not according to the linked blog post.

wyxuan 6 years ago | |

I feel like it was that and a bunch of other things that led to this

lowdose 6 years ago |

Did any of the complaining people actually pay RH for their service or is this 1st world entitlement?

yaur 6 years ago |

"That in turn led to a “thundering herd” effect—triggering a failure of our DNS system."

I'm just a spectator but I can not imagine that this was somehow caused by a DNS failure.

zippergz 6 years ago | |

I’m not sure I can even count the number of outages I’ve been involved in that had DNS issues as at least part of the cause.

yaur 6 years ago | | |

sure, I've seen outages that are caused by DNS config problems. But I don't think I've ever seen one caused by a "thundering herd" overwhelming DNS servers.

Another give away that this is a lie is that support emails were getting a stock postfix error message which means that MX records at least were resolving.

rhizome 6 years ago |

Is every bitcoin company run by ex cellphone store employees? Just the exchanges?

dickjocke 6 years ago | |

Robinhood isn't a bitcoin company. That's just a feature they have. Its main product offering is the commision free trading--and their presence pushed a lot of big players to adopt the same offering. The wallstreetbets gang is silly and all, but I think they have really democratized stock trading, and made the whole idea seem much more accesible. I think they founders are former finance guys. I hope this doesn't sound like guerilla marketing. I don't even use the app, I have used it but I'm just not that interested in picking stocks. I just think it's cool as an ex-code monkey to entreprenur story.

triceratops 6 years ago | | |

> I think they have really democratized stock trading

I would think Vanguard did that already. Most people should be trading ETFs, not individual stocks.

tree3 6 years ago | | |

> I think they have really democratized stock trading

How does "free" = "democratizing"? Stocks have been easily accessible for years to retail investors.

> their presence pushed a lot of big players to adopt the same offering

Misleading, big brokers were already going down this path.

mkchoi212 6 years ago |

It's cool that the founders of the company publish blog posts like this for a short outage. Hope other CEOs learn from this and become even more transparent in the future :D

tmpz22 6 years ago | |

A day long outage for a $7B company is a big deal. They don’t deserve credit for this.

manigandham 6 years ago | |

Short outage? They were down for almost the entire trading yesterday and hours today. And there's barely any transparency in this post compared to standard post-mortems.

rpdillon 6 years ago | | |

Was this their post-mortem? I didn't see anything to indicate that.

tomc1985 6 years ago | |

It was a useless puff piece that said absolutely nothing of interest. How is that worthy of a pat on the back?

bayonetz 6 years ago |

Just use Square’s Cash App! Free stock trades AND you can buy fractional shares AND a bunch of other stuff like P2P payments and bitcoin. I work there and so can say with some authority that we can handle more volume without going down than RH can.

frockington1 6 years ago | |

As an alternative use a real brokerage that has a history of success. It's becoming clearer everyday that fintech startups are not responsible

minimaxir 6 years ago | |

If you actually work at Square, it's poor form to advertise in this manner.

redis_mlc 6 years ago | | |

Actually, bayonetz's posting is the only useful one in the comments for this article. Most of us are here for information from actual industry insiders, and this qualifies.

Here's some more inside info ...

If your "financial app" provider doesn't have a banking charter, run. None of the recent trendy fintech companies have a charter, and are thus clown cars.

bayonetz 6 years ago | | |

Disagree. I’m suggesting a better alternative at a contextually relevant time based on personally earned experience.

pensatoio 6 years ago | | |

Or they’re just taking pride in their work?

kortilla 6 years ago | |

Everyone is a Super Bowl winner when they’re armchair quarterbacking.