An Update from Robinhood’s Founders(blog.robinhood.com) |
An Update from Robinhood’s Founders(blog.robinhood.com) |
Does the name still stand?
I'm just a spectator but I can not imagine that this was somehow caused by a DNS failure.
Another give away that this is a lie is that support emails were getting a stock postfix error message which means that MX records at least were resolving.
I would think Vanguard did that already. Most people should be trading ETFs, not individual stocks.
How does "free" = "democratizing"? Stocks have been easily accessible for years to retail investors.
> their presence pushed a lot of big players to adopt the same offering
Misleading, big brokers were already going down this path.
Here's some more inside info ...
If your "financial app" provider doesn't have a banking charter, run. None of the recent trendy fintech companies have a charter, and are thus clown cars.
Many will cast stones - but they have been there too. If they haven't, well maybe their day will also come. You may feel bad at the moment - but the best way professionally forward is "We try our best tomorrow"
The prioritization problems may not be due to ignorance or malice though, and may be justifiable if there are other fires that are burning brighter. It's still pointing to problems though, and I think it's completely legitimate for engineers to question the stability of the company when this sort of thing happens.
At the very least as an engineer I would be asking some pointed questions of my leadership. Maybe not dusting off the resume yet, but still I'd want to get reassurance from internally that the leadership problems that caused this are being addressed.
Or I'm talking about a 200 node hadoop cluster thats doing the electrical metering and billing for 8 million people, and is NOT allowed to stop.
Or the trading platform thats running sub millisecond trades and downtime means 300,000 $ USD per minute.
These are systems I have engineered over the last 10 years, and I can say: These things are complex and have failures in 1000 different ways, and while you're monitoring 999 of them that one thing you're not looking at is festering under the surface (your monitoring system is tracking IRQ hardware interrupt response times, right???)
Part of being in a team is everyone pulling together, and yes it's stressful at the time, but even very good management cant see all ends, just like very good engineering cant predict everything. I don't think it's useful to start pointing the finger at management and "asking some pointed questions at leadership" because sometimes everyone is doing their best. Yes we should analyse our failures so we can do better, but your tone is very accusatory, and I believe that a better approach is an all inclusive chat about how we can do better, and management saying "great job engineering" for fixing it, and giving them a break after the stressful event.
Your smaller point about prioritization is spot on though. I dont believe Ive seen any similar incidents lead to business ending outcomes. I personally point to sony or, more recently, equifax as examples of the disparity between actual business impact and technical abhorrence. In light of that why is it worth trying to preemptively solve technical challenges instead of business needs? Every calorie spent on “what if” subtracts from “whats needed.”
In case anyone is interested: https://www.amazon.com/Show-Stopper-Breakneck-Generation-Mic...
I think I speak for everyone here if I say that, if that report is public and interesting, everyone on this thread will be happy to get you a drink.
Their success helped to pressure companies such as TD and Schwab to mostly get rid of commissions as well, which is great for the average trader
I think Robinhood has a lot of problems, but to say they're not pushing any boundaries ignores the huge changes they've brought to the industry.
The fact that Robinhood is telling people anything about the outage is only because they are the company they are, operating in the startup world/mentaity.
To the people thinking they should be compensated in some way...If you are doing >$1m daily volume, maybe you can contact them to see what they can do but even then, I doubt it. The way this should be handled is to have multiple executing brokers. You can implement offsetting positions if needed and transfer positions when your main account becomes available, if you are using a broker that can clear. Right now it seems Robinhood is working to implement clearing but you could still go to neutral or put on your positions.
Yep. Intercontinental Exchange and Eurex, two huge capital markets exchanges, routinely have multi-hour outages and don't even acknowledge that they've happened, let alone explain them.
Anyone who has used RH regularly should be well aware of how inept it is. Any spikes in volume or volatility, even on a single stock, bring it to it's knees pretty often. Like not just the last week, but even during calm periods. I've personally lost 20-30% on positions solely because RH was bugging out, thankfully I use RH just for "fun trades" usually <$100.
I cannot fathom having the balls to trade any real amount of money on the platform while being aware of these long term issues.
On the flipside I feel for new users and perhaps even generally inactive users who weren't aware of RH's incredible flakiness. I'd imagine (or hope to) the losses of most of those users were small, assuming they were new or casual and just testing the waters.
Even if one of my small plays hit it big on RH, the money would just go to my main account on TD (which has been smooth all week shy of a few hiccups Fri morning during record volume). It's been obvious for a long time that RH should not and cannot be trusted. If you're trading options with a $60K account on RH, well, I don't even have words for that level of ignorance.
Problems with my data I can tolerate up to a point. Problems with my money I absolutely can not tolerate. As you said, it's unfathomable how people can trade money on a platform that's flaky.
Complete outages are rare, and well-publicised, but things go wrong a lot more[1] than you might think without any communications to customers that anything is wrong, sometimes outright denying[2] that there's a problem.
Personally it doesn't pass the smell test for me. The load was much higher the previous week and load problems go away once the load disappears. They probably had a lot less load the rest of the day, so the fact they were down the entire day suggests it was something else. I would need a fully transparent post mortem before I believed anything they said.
Even for Cloudflare, I thought the company will get sued out of existence after the proxy data leak, but finance industry/SEC etc is a completely different ballgame.
Just look at the top questions in their email:
* Are the funds in my account safe? Yes, your funds are safe.
* Was my personal information affected? No, your personal information was not affected.
* Can I use my Robinhood debit card? Yes. If you have a debit card, you should have been—and should still be able to—use your card, but you may have had issues receiving notifications, viewing your balance, and seeing transactions in your app.
------------
The real question is: How is Robinhood compensating for the missed trades?
Stop asking yourself the easy questions, RH.
This blog post doesn't appear to say anything. It's not an apology, it's not an explanation, it doesn't say what they're going to do in response.
This is after the incident in which there was no status updates or support availability for multiple hours of time. Why can't they commit to updates every hour or every 30 minutes?
Actually, if anyone knows of another broker who _doesn't_ charge these, please let me know. If you're first for the broker I'll give you $20 for the tip.
Why? Well that tiny DNS server has certain capacity constraints and if you don’t cache DNS lookups by using a http/https agent for example (in NodeJS) you wind up looking up the same dns info over and over and churning sockets like it’s going out of style. If you run really really hot the poor thing falls over (rightly so).
The limits are high and DNS is fast so you usually don’t notice but when you are under load bugs like this come out of the woodwork. When it falls down you look up the AWS docs, lean back in your chair upon finding this isn’t an “elastic” part of AWS and say “FUUUUUUUUCK” so loud it can be heard from outer space.
If you are Robinhood though don’t you have some former Netflix SRE/DevOps beast on staff that knows this and so you run your own DNS and monitor it?
Apparently not on Linux! https://stackoverflow.com/questions/11020027/dns-caching-in-...
And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.
This isn't really an issue because the fed rate cut impacts everyone. Other institutions will cut their interest rates as well. I know of a few banks (Canadian) that have already lowered their GIC rates.
If anything, this is actually good for RH. Now instead of comparing 1.8% at RH and 1% at another Financial Institution, you're comparing 1.3% and 0.5% -- a much bigger multiple.
Founders should be fired. CTO/CIO should be replaced.
Based on the in information from Robinhood's careers site, their platform is largely based on the following technology stack:
- Python, Django, Django Rest Framework
- Go
- PostgreSQL
- Container and container orchestration technologies (Docker, Kubernetes)
- Microservice-oriented architectures and related OSS technologies (Kafka, Celery/RabbitMQ, nginx, Redis, Memcached, Airflow, Consul)
- Cloud-native infrastructure (AWS, GCP)
- Infrastructure as Code and configuration management (Terraform, SaltStack, Ansible, Chef, Puppet)
- CI/CD and test automation frameworks (Cypress.io, Jenkins, Appium, UIAutomation, Bazel)Why would you use RH instead of a normal, mainstream brokerage like Vanguard, Fidelity, etc that already has (1) an app and (2) commission-free trades?
As a secondary answer, normal, mainstream brokerages have pretty bad tech, tbh. I don't expect it to be worse than Robinhood in terms of things like security, and I expect UX to be worse. (Side note: I just discovered that Vanguard actually has a secret security key option hidden under Account maintenance, so I can finally switch from sms 2fa. +1 to Vanguard.)
It looks like you still need security codes setup:
"You'll need to register for both security codes and security keys, however. That's because keys and codes go hand in hand—if you lose your key or don't have it, we'll need to send you a code in order for you to log on. In addition, you'll always need a code to access your accounts from a mobile device."
If an attacker can skip the security key you might as well not use one.
The best do not go down like that.
What a sad press release, I am sure people at their corporate office were sweating over this. The long and short of it is that users trusted the service would work and had possibly a great deal invested only to get a comment when everything breaks down deflecting blame "OMG we weren't prepared for what our users did!"
We live in a sad state of software. I expect things like this and the Equifax scandal to continue if things like software security, reliability, and performance aren't taken into account.
There are no public details about the root cause.
I think RH is bad for people in general, but this pile-on is outrageous.
We wrote a bit about this here: https://landing.google.com/sre/sre-book/chapters/addressing-...
I would strongly caution anyone who thinks this subject is trivial, just add a bit of load shedding and you're done. I wrote a bit about my team's work (including a simplified view of some of the considerations that go into how we do retries) here: https://landing.google.com/sre/sre-book/chapters/handling-ov...
Monday morning puts were down - it was obvious the market was recovering in a big way. Instead of cutting losses at ~20% in the morning they lost ~99% of their position. Some lost 100% since the options expired EOD.
Robinhood makes the most money than any known firm on Wall Street by getting paid specifically to leak user's trades to other traders.
SEC requires a periodic report on that which shows compensation.
Can't believe people are still buying Robinhood's pitch of misdirection.
Even if the trades were well-defined at the time the outage occurred, there would still be an asymmetry between people demanding compensation on their profitable trades while eschewing losses on their bad trades. It's doubtful any brokerage would be willing to eat that.
Execution risk is a risk.
> During periods of heavy trading and/or wide price fluctuations ("Fast Markets"), there may be delays in executing your order or providing trade status reports to you. […] Schwab is not liable to you for any losses, lost opportunities or increased commissions that may result from you being unable to place orders for these stocks through the Electronic Services.
The reason nobody will be compensated here is due to two things,
(1) There is no way to determine what a fair execution would have been, since clients couldn't submit orders in the first place.
(2) Clients will adversely select their losing trades for corrections and this would bankrupt Robinhood in about five minutes.
Source: work at a wholesaler.
It's no different than you breaking your phone or losing your network connection. Nothing is guaranteed to work all the time. RH might face fines for the extended nature of the outage though, specially since they've managed to avoid them for plenty of past mistakes so far.
It follows that Robinhood must never reimburse for outages.
Unless I have an SLA with a provider outlining penalties, they don't owe me anything if they go down. How is this any different?
They may not have a legal/contractual obligation here, but that doesn't mean that treating their customers poorly is without consequence.
While RH's ToS does theoretically absolve them of technical issues, they are obligated to comply with 'best execution' securities mandates, no? Separately, it'd be extremely bad for business if they refused compensation.
The point is moot anyway, since they're offering "case-by-case" compensation.
On the advice of any good lawyer.
Of course, no one complains when RH makes a mistake in the client's favor.
I think your point is that it's a very different mindset to the native internet world, and that is certainly true!
Every Unix system having a local caching DNS proxy was and is as much a norm as every Unix system having a local MTS. A quarter of a century ago, this would have been BIND and Sendmail. Things are more variable, now.
To illustrate that this was considered the norm, here is a random book from the 1990s. Smoot Carl-Mitchell's _Practical Internetworking with TCP/IP and UNIX_ says, quite unequivocally:
> You must run a DNS server if you have Internet connectivity. The most common UNIX DNS server is the Berkeley Internet Name Daemon (BIND), which is part of most UNIX systems.
People sometimes think that this is not the case nowadays, and the fact that a computer is a personal computer magically means that a Unix or Linux-based operating system should offload this task and not perform it locally. They are wrong, and that is DOS Think. Ironically, they don't even get to play the resource allocation card nowadays. The amount of memory and network bandwidth that needs to be devoted to caching proxy DNS service on a personal computer is dwarfed by the amounts nowadays consumed by WWW browsers and HTTP(S).
There's no similar argument for a node in a datacentre.
Ideally, not only should every machine have a (forwarding/resolving) caching proxy DNS server, every organization (or LAN, or even machine) should have a local root content DNS server. A lot of (quite valid) DNS lookups stop at the root with fixed or negative answers. Stopping that from leaving the site/LAN/machine is beneficial.
Ironically, putting a forwarding caching proxy DNS service on the local end of any congested, slow, expensive, or otherwise limited link is advice that I and others have been handing out for over 20 years. It's exactly what one should be doing with things like Amazon's non-local proxy DNS server limited to 1024 packets/second/interface.
* http://jdebp.uk./FGA/dns-server-roles.html#ChoosingProxy
So the question is not whether there a local DNS cache mechanism exists. It's whether it's set up by the company dishing out the VMs, and if not why not. Amazon provides instructions on how to add dnsmasq, and clearly labels this as how to reduce DNS outages. So it's not even the case that Amazon is wrongly discouraging having local caching proxy DNS servers.
* https://aws.amazon.com/premiumsupport/knowledge-center/dns-r...
Every DNS request for external domains turns into 10 if you don't explicitly configure FQDNs (dot at the end). This is because in the default configuration the resolver runs with ndots 5 to search all the possible internal Kubernetes and cloud-provider names. Then you have lookups for IPv4 and IPv6 in parallel. So for every external name you look up, you storm the upstream DNS with 10 requests for non existing domains.
Furthermore, the current default DNS service in Kubernetes doesn't have any kind of caching for these kinds of lookups (especially not NXDOMAIN) enabled.
But like I said, this is one of the first issues you hit running Kubernetes on Amazon. It is widely known and can easily be fixed by scaling up some more instances, changing ndots settings, using FQDNs or configuring caching. There is no way that this was the issue, it is plastered all over the internet, the logs are clear and the fixes can be implemented in minutes.
It also doesn't go down completely, the rate-limiter is packets/s on the interface.
https://cdn.robinhood.com/assets/robinhood/legal/RHS%20SEC%2...
https://www.google.com/url?sa=t&source=web&rct=j&url=http://...
Former market maker here.
Retail flow is low risk. If I buy $100mm of institutional flow, I could get a bunch of corporate hedging orders. Or I could make a single bet against George Soros. With retail, one tends to find lots of small orders. Even if there are some with high information, i.e. they're smart money and I'm going to lose money trading against them, they're small enough to be manageable.
Retail is also low information. At an old job, we bought a prominent retail broker's options flow. The number of in-the-money unexercised options that would come through that pipe was mind-blowing. (Today, whoever was buying Robinhood's flow likely got the same.)
E.g. Buying TD in Canada, and wanting to sell on NYSE for US$.
https://www.schwab.com/public/schwab/nn/agreements/schwab_br...
Maybe in some cases they go above and beyond their account agreement if they like you as a customer, but according to the agreement you sign with them its not their problem if things go bad in this way.
On the flip side, clients have no guarantee that there would have been a counterparty for their order.
FoK/IoC means “do not queue this order”. It’s immediately filled (or not) (or, for IoC, partially) based on whatever orders are already in the book, and then you’re done.
Whereas a day order is queued until the end of the day or until it’s filled, whichever comes first.
Arbitration is forced, but Robinhood is on the hook for the fees for everyone who decides to arbitrate. Robinhood users might not get anything, but they can still cause pain.
If it's chump change you're trading, sure, use RH.
If it's serious money, the $0.65/contract or whatever pays for itself many times over. Even if it's just the ability to regularly get filled between the spread it pays for itself.
You get what you pay for.
Most people just want the high leverage and quick wins which usually ends badly.
In particular, there's plenty of opportunity to find a successful strategy that just doesn't scale. It won't make you rich but it will make you money.
Otherwise it's on the broker to sell it at close to someone who can afford to exercise. And who knows if RH pulled that off or not.
https://www.reddit.com/r/wallstreetbets/comments/fcqkmo/so_r...
There's also some threads where it looks like it did not go so well...
https://www.reddit.com/r/wallstreetbets/comments/fd2eko/upda...
Is it just me, or does it feel like the only people using Robinhood are college students gambling with their parent's money?
Given that many extremely smart people who have devoted their lives to the stock market cannot beat average returns, the lack of Robinhood user's knowledge of "pin risk" seems to miss the greater point.
1) Somewhat pedantic: A big reason why performance is-what-it-is is that at any real $$$ liquidity/volume becomes an issue. Lots of option markets are just not that liquid. If you play with only a few $k and robinhood pays for much of market friction then you can potentially outperform market at risk parity.
2) More real: For most people active trading is not about investing, it is about easy and legal gambling. There is a thrill of throwing you money into high risk options or skyrocketing meme-stocks. Because markets are (relatively) efficient the prices of these assets usually reflect their risk profile, so on average you should gain money (flip side of it being hard to beat market is that it is hard to severely underperform, on average, as long as you don't all-in; normally friction cost makes these kind of strategies not work but RH reduces that significantly). It ends up like going to a casino where on average you make a bit of money (but with high volatility means some people lose a lot, some people gain a lot).
True.
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.htm...
https://aws.amazon.com/premiumsupport/knowledge-center/vpc-e...
RH has constantly had issues at least since I started using it over a year ago. I didn't notice it really at first, but I also didn't know much about anything trading related back then. It didn't take long though for me to have my first "incident" where my market orders were seemingly vanishing into the abyss as the underlying moved. I'm not talking seconds, I'm talking minutes. For a market order on high liquidity options. Never mind trying to get filled at anything besides the ask (buying) or bid (selling).
RH has had serious underlying issues for a long time now. This incident didn't happen in vacuum. The writing has been in huge block letters on the wall for a long time.
1. Dealing with other people's money 2. Monitoring/managing other people's health
Generally true, but there is a couple of exceptions to this rule: if everyone knows that the company is brand new and does not have an established reputation, then using that app requires a general acceptance of risk.
Robinhood was brand new, and outages should have been expected. The problem with Robinhood isn't the outage, it's that it was marketed to college students gambling with their parent's money, who know just enough about the stock market to be dangerous, but not enough to invest properly.
Luckily for everyone, those industries are so old, they have accidental redundancy built in (paper records for old doctors who can't be arsed to use a computer, etc.).
It is confirmed they are worse than virtually any reputable brokerage. It might not be their fault directly but its 2020, not 1998
So people who were going to continue to sell off got lucky that they couldn't make that trade, and people who were going to buy got unlucky?
Does anyone seriously expect compensation, or think that it's deserved, or is it group wishful thinking? How would it even work? Would they just take people's word for their supposed intent? Or are people wanting some sort of "here's a gift card" type deal?
This is not to defend RobinHood - I've personally kept my money with well-established companies cause conservative, old, proven systems seem like a good thing for a product in this space - but shit happens, no? There will be more good days, and more bad days, in the market, it's a long-run game anyway, and it's pretty easy to vote with your wallet in this space.
I suspect you're right though, that it's mostly sour grapes concerning the opposite case - inability to buy as the market rallied.
I had never heard of /r/wallstreetbets or Robin Hood (well, barely) until a couple weeks ago.
There is an entire generation that has never traded through a crisis.
Of course the problem with the "compensate me" arguments is that a lot of people were going to make decisions that would have turned out poorly yesterday (indeed, the market is balanced and every transaction has a counterparty), though of course with the amazing clarity of hindsight few would recognize or admit that. So if they need to compensate for illusory lost trades, do some people have to pay them for losses they would have incurred?
[I get that there are some complex options that can legitimately be all downside when trading isn't available, but that's a less common option]
Was it? Markets started up today but ended way lower.
So now you also have people making decisions based on the wrong data.
Honestly I dont see how this doesnt turn into a lawsuit.
Given that most crises seem to occur roughly every 7-15 years, there will always be such a generation.
A hypothesis: the reason why crises occur roughly 7-15 years is because that is approximately the length of society's collective memory concerning monetary issues.
And FWIW, they have down time every day and weekend, at least in a virtual sense; the load does drop off in a very real sense too. You are spiritually correct, they should pull together and sort it out, and they owe nobody money here (don’t use a discount broker if you want some sort of guarantee about trades) but as a general rule you should ever feel too sorry for banker under just about any circumstances. The harshest lesson here, for everybody, was the only thing they would do for you was give you some commission free trades but that won’t work with this one, so a non-apology is what you get.
The post reads to me like all those examples were meant to be concrete examples to drive home a more general argument that complex systems are, well, complex, and that there's an element of hubris in taking potshots from the peanut gallery.
Personally I don't think Robinhood will ever release a full honest post-mortem and so we'll never know (and never be able to judge fairly).
If the system failed by virtue of being too complex, that is also malfeasance because any devops/SRE worth their salt (as might be expected at a 7 BILLION DOLLAR company) should smell unnecessary complexity from a mile away and slowly refactor it away over the course of several years - which looking at Robinhoods downtime history they never did.
The closest example to Robinhoods engineering woes is Reddit, which throughout its early history made fairly poor infrastructure and data modeling decisions but have since repaired and improved on. We should hold Robinhood to higher expectations then Reddit for obvious reasons. Them having similar engineering capability to circa ~2012 start-up reddit is INEXCUSABLE.
I think the issue here isn’t so much that the system went down but the blog post.
It’s very light on details and doesn’t go far enough in terms of re-establishing trust with the customers that were affected. Which by the looks of it is everyone attempting any trade most of the day on Monday.
Sure, don't burn people at the stake, but "hey, it's hard, don't blame them, they are doing their best" doesn't cut it for me. I'm sure they're expecting to be paid and not for someone to "do their best" to pay them.
Because the largest distributed system I have seen and worked on was at Apple (or maybe DFP at Google) - and even though they had some of the smartest people in the world and literally billions of dollars behind them, there were still an endless list of problems and downtime events.
Spoiler alert: It doesn't exist.
If you're running a HA system and you only need one nine to express your availability percentage, sure, sure, you have the smartest people etc and you're doing such a great job, and yeah, yeah, show me one system that has 100% uptime etc.
I mean, I'll bite. Assuming you only traded 6 hours a day (ie US) that'd be a 27bn dollar a year strategy, and the only way for returns to be linear and trading to be sub milli is market making/arbitrage.
That is a lot of half spreads...
I understand GP's tone wasn't exactly nice here. But here's the rub with RH's outage. RH is unfortunately in an industry (Finance, Healthcare, Aviation, Food, etc.) where people _need_ to trust them to be successful. The consequences of failure in these industries is very catastrophic not only for them but their clients. Sure failures happen but the scale at which RH has failed and the lukewarm response they've put out has pissed off people. I don't recall any brokerage, old or new, that has failed so catastrophically and has responded to it so poorly. If you think you have a worse example, I am all ears.
I’m taking my account off their platform.
People lose money in trading all the time, for hundreds of reasons and some of those reasons are infrastructure downtime.
If your risk profile doesn't reflect that, maybe you should take your money out of trading altogether.
Blew. My. Mind. Not only because of the radio silence and then dropping back in out of the blue as if no time had passed, but also because they had a data loss issue.
So rechecked out my previous branch, upgraded Scylla versions and sure enough the data differences we were noticing before appeared to be resolved. I couldn't believe the amount of time I had spent combing through my code to see if I had a hard to detect bug somewhere...but nope, it was ScyllaDB (although I am sure there were plenty of other bugs...just they weren't the cause of this specific symptom).
I am actually a fan of ScyllaDB and what is trying to do. Performance was great (as advertised) and management was simple enough; but they are going to need to work pretty hard to convince me "instability" is just rumor after that experience not too many years ago.
We built an entire new DC and had Tottenham Court Road dug up in case the Thames flooded.
In fact any big telecom will have down times for a switch (central office) measured in generations
Spoiler alert: it doesn't exist
https://www.profit-loss.com/cme-hit-by-globex-outage/
I don’t remember them offering any apology or explanation at all.
That’s an exchange mind you where things like the global price of oil and s&p futures trade. Not a small boutique brokerage.
Further they have planned downtime every week & at that point still had planned daily downtime I think.
I think Robinhood screwed up. I think they should learn a hard lesson. But people thinking that trading is some high reliability industry haven’t spent any time in it.
The scary thing to me is are healthcare, aviation & food the same?
Someone in one of these threads said there's a hidden DNS within VPCs that can fail and isn't scaled, so if that's true, they might just have to architect around that unless they can get AWS to change it. It's on RH for not knowing that but it's also kind of on AWS too.
But as far as what you can do, you can really only split your cash across brokerages if you want to engineer the same redundancy yourself. Otherwise, RH would need to route everything to another exchange to keep satisfying orders, and even that is just another system that could fail. Keeping all of your money in one brokerage doesn't seem ideal if you want to completely avoid downtime. Doing the same redundancy yourself with those industries isn't really practical.
Non technical people dont want a technical apology, they just want an 'our bad, working on it' which is what was provided. The company will be fine. Should they be is another question all together.
High trust systems require just that, high trust. And once broken it's hard to re-establish.
Crypto exchanges certainly have their fair issues of downtime, but don't forget that crypto exchanges for a long time operated purely for early adopters as crypto wasn't something that everyone traded. There was also less availability of competition, because again the industry was newer and there were fewer choices.
And certainly Coinbase helped to popularize crypto trading and they had their fair issues, but I don't believe they had an outage of this exact magnitude, and again they were in an early adopter area where mistakes are seen as part of the process. If not expressly, then at least subconsciously.
No service guarantees 100% availability, it doesn’t exist.
Online consumption is different than in person. You go to a restaurant and the food is bad you probably don't go back. Online the bulk of consumers just keep going back because that's what they are used to. We love our favorites.
I remember all of AWS going down a couple years ago.
Boeing itself is fine even though one product killed hundreds. Robin Hood is going to be fine. This will be forgotten in a week.
What is annoying - only 50k per day is allowed.
My point was that failure is inevitable in any complex system, and I was responding to the parents point that he immediately pointed the finger at management in an accusatory way, and I was saying that's not constructive.
Also your point "They expect to be paid" is actually implicitly "I expect management do do their best to pay me" - there could be a failure in the payroll system, there could be a failure in the banks, there could be many reasons outside managements control that means I'm not getting paid. I can say "why don't you have redundant payroll systems" (which is a stupid waste of resources given the cost/benefit/low failure rate) But my point is again - complex systems have failures - and SOMETIMES, JUST SOMETIMES, YOU CAN CUT THEM SOME SLACK.
It sucks, I'm not defending it, but it's fact
Let’s say I’m RobinHood. Let’s pick an SLO. I think three nines monthly SLO is a good start, that budgets ~45 minutes of down time per month. Maybe I can argue for a more aggressive SLO, but let’s pick this one - because I think it will keep users relatively happy as trades aren’t blocked for more than an hour at worst. I drive an agreement with stakeholders that if we needle out of this SLO, we drop all feature work and focus on hardening reliability.
RobinHood was out for a whole day. This is unacceptable. It points to a complete organizational fuck up - product and feature development have too much power and priority at the expense of reliability.
I’m not sure that RobinHood has ever heard of SLOs or reliability engineering. I really hope their leadership is smart enough to hire and empower the right people that will drive organizational change.
The users are not saying "We measured your 5 9's and I'm going to quit if you have 6 minutes more downtime"
Sure they lose some users who get annoyed, but they have a 5.6 billion dollar company, some users will go, a lot more are coming
Your reliability target is a product decision. Maybe with the right features the market will tolerate shitty unreliable financial services that falls over for an entire day. Or maybe RobinHood will go from a 5.6 billion dollar company to a zero dollar company because users hate them.
Point is high reliability is choice based on priorities - which seems like RobinHood does not care about. And I will certainly stay the fuck away from their platform.
This works in the acquisition phase, which I suspect Robinhood is nearing the end of.
Once their userbase turns into the retention or conversion (competitors have $0 trades now, too) phases, mistakes like this are much more costly in the long term.
Nobody is debating whether people will continue using RH and that was never the issue. RH has massively damaged its reputation and reputation _is_ everything.