Incident Report: Railway Blocked by Google Cloud [resolved](status.railway.com) Subsequent thread: Incident Report: May 19, 2026 – GCP Account Suspension - https://news.ycombinator.com/item?id=48204770 |
Incident Report: Railway Blocked by Google Cloud [resolved](status.railway.com) Subsequent thread: Incident Report: May 19, 2026 – GCP Account Suspension - https://news.ycombinator.com/item?id=48204770 |
And I’m talking about having disparate failovers that don’t rely on a single hosting provider. At that point, who cares what Google does to your cloud account… work with the hot failover and spin up another hot failover somewhere else.
Honestly, I don’t know where the downvotes are coming from. Do people have no clue about service resiliency? I can understand if it’s a personal project or you haven’t yet scaled to paying customers, but anything at scale with serious money involved needs to be completely independent of the underlying hosting. It should remain up even if an entire provider goes titsup.
Looks like they were sold at the beginning of the year to a company without a Wikipedia page whose parent company doesn’t have one either https://en.wikipedia.org/wiki/Markmonitor
Acquired in November 2022 by Newfold Digital, it was later announced that the firm would be sold to Com Laude, a company owned by PX3 Partners.
-Edit-Private equity apparently https://px3partners.com
PX3 stands for purpose, passion, and performance. It is a pan-European private equity firm with headquarters in London. It invests behind transformative themes and targets companies operating within select segments of the business services, consumer and leisure, and industrials sectors with strong business fundamentals.Who deleted it?
All these companies are fraud
"Absolutely. The Railway network is a mesh ring between AWS, GCP, and Metal
So: - High availability interconnects - High availability path routing between clouds - Database itself is high availability
However, Google's VPC itself is not. So we will add a shard to Metal and AWS"
Railway dot com
Has nothing to do with railways.
I wish software people would get their own words.
… on the Unix command line …
… to a cloud older than AWS…
… if only …
Thank God I'm not dealing with any public-facing sites! Would have been an expensive lesson for a newbie coder if my job depended on this.
I'm exaggerating but someone said they got "auto banned"
what if that happens to a small account which hosts some really important data/services there?
If that person turns it off you're screwed.
I am with you entirely and would not have taken that route today, but it is really easy to see why people go that route.
I know a startup of my acquaintances that have been running on AWS for 5 years straight without paying a single dollar to AWS. When the credits almost run out, they started to migrate their data over to another account with credit. That happened twice already.
It helps to have a portable, replicable IaC config. But also this is sustainable because they are a pretty small struggling shop. You will probably not be able to do this if you are trying to maintain more than 3 nines for an enterprise client.
Is Google's communication good? No, not particularly. The only way something like TFA happens is if the relationship is neglected (by one or both parties). I'm not saying Railway did something wrong, but there are usually many flags and opportunities to correct long before drastic actions.
I get the impression that Railway plays fast and loose with a lot of their limits and resources and that Google may not be a fan of that.
Edit: would also like to say that if you put all your resources in one GCP project you are going to have a bad time. If you organize stuff over many projects it is very unlikely that they will ever take account wide action. I've had issues with, for example, a particular tenant's behavior, but it never jeopardized the other tenants.
Pray to @dang that you will make the front page of HN?
Agreed. Railway are probably not far off a billion dollar company though!
I don't feel safe with any one single point of failure. "Your credit card bounced", "you thought it was dev", "you got hacked", etc. are all the same problem to me and no cloud provider solves those merely by setting up an account.
>https://blog.railway.com/p/incident-report-may-19-2026-gcp-a...
>May 20, 07:57 UTC
Incident Report: May 19, 2026 – GCP Account Suspension - https://news.ycombinator.com/item?id=48204770
This is an excellent closing statement.
In total, down for >11 hours on our side.
You see this at least once a year. Never heard of this from AWS or Azure.
In all seriousness, this is why we don't use them. They have the most ergonomic cloud of the big three, then absolutely murder it by having this kind of reputation.
https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...
They had a really bad global outage a year ago. At least with AWS outages are contained to a single region.
Your cloud provider blocking your business from running is far worse.
I can’t imagine AWS ever doing such a cascading delete. I mean, they have made deletion protection a difficult thing to ignore even for individual resources.
It had lasting effects for us for a little over 3 hours.
AWS does it more efficiently; it takes down many startups at a time when us-east-1 goes down.
Azure nerfed the front door of all Azure and O365 services last year.
All of these companies are great at what they did, and occasionally fuck up.
It's AWS and Azure that are the outliers and tend not to care too much what their customers do with their infrastructure. AWS is perfectly fine with allowing me to run copies of 15 year old vulnerable AMIs copied from AMIs they've long since deprecated and removed. Even for removed features like NAT AMIs.
The only anecdotal thing I've seen is we hired a vendor to do a pentest a few years ago, and they setup some stuff in an AWS account and that account got totally yeeted out of existence by AWS if memory serves.
Regardless of how it happened, for me, this is the straw that broke the camel's back.
https://www.unisuper.com.au/about-us/media-centre/2024/a-joi...
A joint statement from UniSuper CEO Peter Chun and Google Cloud CEO Thomas Kurian
8 May 2024
UniSuper and Google Cloud understand the disruption to services experienced by members has been extremely frustrating and disappointing. We extend our sincere apologies to all members.
While supporting UniSuper to bring its systems back online, Google Cloud has been conducting a root cause analysis.
Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events, where an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription.
This is described as an isolated, “one-of-a-kind occurrence” that has never before occurred with any Google Cloud client globally. This should not have happened. Google Cloud has identified the sequence of events and taken measures to ensure it does not happen again.
Why did the outage last so long?
UniSuper had duplication across two geographies as protection against outages and data loss. However, the deletion of the Private Cloud subscription triggered deletion across both geographies.
Restoring the Private Cloud required significant coordination and effort between UniSuper and Google Cloud, including recovery of hundreds of virtual machines, databases, and applications.
Implement anti-abuse measures and you will hit some loud false positives (this may be the case with GCP here).
I don't envy anybody running a hosting co - the internet is a really ugly place under the surface.
edit: to add - AWS are really good here. Must be the ~30 years of retail fraud and abuse experience.
Damned if you do, damned if you don't.
Or did they just mean that they’re not renting VPSs but only metal from the cloud provider?
In my mind I was so excited that there was another provider not just paying one of the hyperscalars but at a minimum colocating and owning more of their stack. https://blog.railway.com/p/heroku-walked-railway-run
The other notion that we have intuited is that you can’t build a cloud on another cloud. We have devoted years of practice running our own metal (and playing well with other clouds) to make sure that Railway’s business, which invariably becomes your customer’s business, is as rock solid as possible."
> The fact of the matter is, you simply cannot build a cloud on someone else’s cloud.
Indeed…
My guess is that many are abusing their free tier, causing them trouble with their service providers.
I take no joy in seeing Railway take a hit like this, even as a competitor, but free compute attracts all sorts of strange users. We've been there and decided early on to avoid free compute even it costs us our top of the funnel.
Google really need to improve their support team. It's strange such a big corp can't even afford to have proper support team.
Railway say they are in touch with that support team.
They must’ve upgraded them to Gemini 3.5 by now.
This seems to be by design.
In the cloud space it seems like AWS does nothing and wins.
You should also read the story, as you're perpetuating a false version of it: https://x.com/lifeof_jer/status/2048103471019434248
Not sure too many VCs would be cool with deep redundancy when there's more features to build to bring in more customers instead.
The only reasonable explanation is Railway lost control of their estate and something was happening that warranted a group of humans to decide flipping the kill switch was the best of a set of bad alternatives.
i actually built a good plan out of those horror stories for my companies.
Then they send you very strongly worded messages that says trying to work around the ban will lead to something bad happening.
I've been worried my main email account provider would do this. The core issue is even if you pay, even if you are a company as shown here companies don't carefully enough have limits on banning. I can only imagine they ban lots of scammy things every day so "they think it's working great".
I’m aware of some companies hosting their own metal and infra, but I’m not aware of large companies mitigating risk by hosting on separate cloud providers as a fallback mechanism. We might disagree with cloud provider choice, or think they should have been hosting their own metal, but that’s still an “all your eggs in one basket” choice, right?
Heck, they might even have multi-region fallback with GCP, but if GCP bans your account, that doesn’t matter.
Are there good examples of running a company of railway’s size so redundantly that their host could nuke one of their accounts and they’d just keep on trucking?
Common ways this happens? They are using a credit card to run their business with no backup payment method. Then the company's contact person is on vacation.
Sign up for terms. It will get you payment terms!
Railway hosts applications for customers. An uneducated guess for some possible reasons: 1) one of those customers hosted something they shouldn't have 2) railway had something spawn that took up too many resources 3) Or their account balance was too high 4) Or something...
But all of this probably culminates in someone needed to read an email that was missed.
Scaling a customer infrastructure setup like Railway is hard. This is one of the non-technical hard parts - how to make sure your account with your primary vendor is safe. But, I'm willing to wait to pass judgement here until more information is available. I'm sure the post-mortem will have lessons. I'd like to know more.
If it's anything like AWS, that may be just one of hundreds of emails they send every day, most of which are just noise.
I had a toy Free Tier account that managed to overstep a limit one month and rack up $0.0038 in charges.
AWS hounded me about it for an entire year before finally putting the account on hold. Then kept at it for months more before finally deleting it.
It’s pike the paperboy from Better off Dead, if he were to continue delivering newspapers while hounding you for his two dollars.
But that's not what will happen. Google will offer an apology (perhaps even a public one), a giant pile of account credit, and a pinky promise not to do it again. Railway will accept it and hmmm and haw internally about whether to decrease their reliance on GCP, and then when they calculate the cost of going in on other clouds more heavily (or their own metal), they'll just think harder about weird failure modes.
> May 19, 22:22 UTC - P0 ticket filed with Google Cloud. Railway's GCP account manager engaged directly.
Google Cloud accidentally deletes $125 billion Australian pension fund - May 2024
https://www.business-standard.com/world-news/google-cloud-ac...While its possible to to isolate the effects, judging by how many things stop working when there is an AWS failure a lot of people fail to do that. I think the shit of responsibility to AWS removes the incentive to put effort into resilience against AWS failure.
The outage in the linked article appears to have been resolved in 4-5 hours.
I remember a Workspaces outage about 5 or 6 years ago, and the problem for us was that the redirect link in the console had US East 1 in it.
The workspaces themselves weren't in US East 1 and nothing relied on US East 1.
Emailing users who needed it an alternative link with a different region in the URL for the login redirect fixed it for us.
I know some workloads help to be colocated but all these places are connected by fiber and every cloud has a worldwide CDN it seems.
It's one of the oldest and largest regions. It hosts the most services, both low-level platform stuff and higher level managed services (which run on the low-level platform stuff), so services tend to be more performant.
Geographic location is also good.
Also, due to scale their pricing ends up being cheaper.
Let's say that it's the region people use by default, unless they have a compelling reason to have a presence in any other particular region.
Then there's the platform game theory. If you adopt you add friction which reduces signups, and there will always be a competitor who would risk the 10x fraud increase in order to capture 100x the market. Railway has seen hyper-growth because it's so easy to run from, and is recommended by, coding agents[1].
The solutions are here already just not well implemented or understood - probabilistic fraud detection, resource limits, service and automation limits, standard gov identity verification as a signal, enterprise sales channels with human relationships, etc.
There are tradeoffs with each platform choice that just aren't well understood. Most users shop on price and DX and don't see the abuse infra or problem until it hits them.
Google and GCP have a problem where they completely cook users who get flagged in their automated fraud net (this isn't news - or shouldn't be)
[0] https://www.coindesk.com/policy/2023/05/24/black-market-for-...
[1] and the problems that come with providing that simple interface, like sometimes dropping prod
AWS may have data centers[0] go[1] down[2], but that's within expected bounds of standard ops.
[0] https://hooks.slack.com/services/TJ7HQS7FC/B0B5S7UTBJ4/PUHIC...
[1] https://www.aljazeera.com/news/2025/10/21/what-caused-amazon...
[2] https://netflixtechblog.com/lessons-netflix-learned-from-the...
Obviously a fiasco but I’m not prepared to call them liars when it could be an honest mistake.
They even made fun of cloud providers being down when AWS was down.
1. We depend on X but could gracefully migrate to an alternate in a week if we really needed to.
2. All data is mirrored instantly so that we can do seamless fail-over in case X has its own outage.
This is why businesses should put in the effort and sign up for credit terms. Then it's an invoice, and you reduce this risk substantially.
Credit cards are _not_ reliable at this scale. Banks are offline all the time, cards are marked stolen, protocols change, all sorts of things that will cause flags indicating "the money can't move down that path".
Businesses that pay for AWS/Cloud/etc via credit cards are trying to buy reliability but put it behind a single point of failure.
Credit cards are not how you should be paying for business services with uptime requirements!
I'd be curious to know why Railway's account was suspended. Was it a similar payment issue or something else?
But most likely, it's just automations in place without an appropriate human override coupled with gross negligence.
This. It's very odd that in other threads we see a bunch of accounts heavily invested in criticizing a cloud provider, but what's conspicuously absent from this wave of indignation is any curiosity in the root cause, or even any interest in exploring what it might have been. Quite odd.
But TheRegister did reach out to Google and they have not replied yet: https://www.theregister.com/off-prem/2026/05/20/google-cloud...
So I will hold my judgement until this has been disected a bit more
Sure but not even a warning before shutting down their account?
If it was actually suspended the yeah it’s weird not to get an email.
It's google, come on.
That's pretty amazing. Not due to a cascading failure from someone changing a config deep inside of a system that caused a bunch of unintended effects, just someone who messed up writing a shell script?
It's called single point of failure, and it's the nightmare of everyone who was ever in charge of safety.
I don't agree. What do you expect to happen when you explicitly delete your user account? Do you expect your systems to remain in operation for a week? That itself would be a major risk and liability, as your whole infrastructure would still be up even though you cut your access to it.
Also, isn't your whole infrastructure expected to be automatically deployed with IaC? The notable exception is data, which is already soft deleted and recoverable through customer support.
All in all, where do you expect the customer's responsibility to end and the cloud provider's to start? The shared responsibility model is covered by any intro course in no uncertain terms.
They all introduce themselves, beg me to setup a meeting w/them and some sort of engineering resource(s), and they come to a meeting with a canned slide deck that is so absurdly unrelated to us that I just laugh, and then the next time I hear from them it's because we have a new AE.
This is my most recent reply (right after Next '26):
> I really appreciate you reaching out; however, we have met with, I dunno at this point, more than a dozen GCP Account reps, execs, technical teams, etc over the years and there's little to no value for us or you, now or in the future. Please do feel free to invest your time on your other clients. We're good; truly.
I love GCP and its services; we have been very pleased with it over the years, but the human side of it? Fucking sucks and I just don't see why they even bother.
I said this in the other thread, we got access to our account back, but even with a Account Rep. and a CSM on our account- it still took them a while to figure out what was going on.
I'm sure it could have been worse if we didn't have a rep on our account.
I can provide an explanation about the GCP dependency. Yes, we have host workloads off GCP, and we have been able to build a good business by performing a cloud exit. However, we were worried that we would have a circular dependency on our own cloud. I don't think we expected to get auto-modded out of our own account, hence we left our DB on CloudSQL.
It was never our intent to deceive people that we didn't own our own destiny with our business. The last GCP issue, we were assured that this scenario wouldn't happen (when we got auto-ratelimited, which was bad, but survivable) - but it seems like we have further work to do. Apologies.
With that said, I would not say few companies rely on GCP. Search for "GCP" in this month's HN hiring thread. There are 23 hits, more than Azure's 21. AWS has 90 hits, which I guess shows its sheer dominance in the startup space. But these figures more or less agree with my intuition of the major clouds being AWS/GCP/Azure.
GCP is the world's third largest cloud provider, and has around half of AWS' market share. Claiming no one uses it reads like Yogi Berra's "no one goes there anymore, it's too crowded".
AWS also includes Amazon WorkSpaces. Moreover, AWS includes all of Amazon's cloud infrastructure for things like Amazon music, Ring, Amazon Prime Video, etc.
also, I can't help but imagine if instead of render, it was Apple's account which could've been auto-banned (Render is almost a billion dollar company or series-B, I am not sure)
I haven't read the articles and I admit that but can you please elaborate to me on why Apple uses GCP themselves for idrive, I would love to know the technical decisions behind it on a genuinely curious level.
From my (let's face it) limited understanding of GCP, it isn't particularly good or price performant and one of the wonders is that Google sells it directly with Google photos too and an competitive lineup at android.
So in some sense if Apple is using gcp's for icloud then aren't they just reselling google storage themselves and google can always beat them in pricing while also wanting to chew away at the percentage of iphones themselves too?
I mean, I can still try to understand the google search pays apple 10 billion dollars (right?) deal but I don't quite understand why apple would pick GCP when the hosting market is one of the more competitive ones with lots of companies.
I would love to get some explainations or theories as to why exactly is that the case
(Also given its HN, if anyone from apple is reading or knows the answer, I would love that too!)
I believe you mean Railway.
Render (a $1.5B company) has been hosting customers on GCP since 2018, and has never been banned.
Look up “buy or build” which is the industry term for this kind of evaluation: buy product and use it/resell it or build your own.
Apple has gone for different strategies in various areas:
Build own Apple silicon chips, do not buy off the shelf chips from intel or nvidia or amd.
Buy and resell google storage but don’t want to build their own distributed data store for end users.
It’s about what matters more for the company and the core products. Apple’s laptops, cell phones are considered core products. Icloud is a value add.
This is also why apple is making their own cell phone broadband chips. For most companies, this is a “buy from qualcolm” but apple needs to build their own for independence for their number 1 core product: the iphone.
Apple uses Samsung displays and Sony camera sensors, iirc, both of which are flagship Android phone makers. That doesn't really seem to be a concern in their procurement thinking. iCloud and Google Photos are not that direct competitors because which one is native depends on which phone you already bought. Google Photos definitely does have some market share on iOS due to having 3x the free storage and a handy compression mode (which used to be entirely unmetered at launch but now still uses storage, just less of it). But it will never be a full competitor because it is a separate app you have to install and it can't magically fetch cloud-only photos from the camera roll and photo picker UI like iCloud can.
The pricing of Google One and Apple One/iCloud+ isn't really dictated by underlying storage costs. At the higher tiers like 2TB, many don't come close to using all, while the laughable 5GB iCloud free tier clearly costs almost nothing in raw store, even on nVME SSD, if you compare it to S3/Backblaze or even raw disk pricing on the cloud.
Then there's Anthropic...huge user.
No it didn't impact us.
December 2021: https://www.cloudcomputing-news.net/news/aws-outage-takes-do...
June 2023: https://newsletter.pragmaticengineer.com/p/the-scoop-52
October 2025: https://www.cnbc.com/2025/10/20/amazon-web-services-outage-t...
Each of these were massive outages impacting very large services across the web.
Someone is just asking to get Google's side and explaining why they want that, which seems reasonable since we're in a post where Google is being punched/blamed for this, and it sounds like it isn't Railways first questionable outage.
That is exactly what GCP should do: not comment on a customer's issues. Even when it's due to abuse from a customer, which might even be the case.
From what I've read in other comments, the root cause seems to have been automatic account suspension as an anti-abuse measures.
It's also telling that Railway describes the root cause simply as "Google Cloud Platform has suspended Railway's production account." It then mentions this
> At 22:20 UTC on May 19, Google Cloud placed Railway’s production account into a suspended status incorrectly, as part of an automated action. This action extended to many accounts within Google Cloud. As this was a platform-wide action, there was no proactive outreach to individual customers prior to the restriction.
The why is conspicuously absent, but this sort of sweep is indeed consistent with anti-abuse measure.
If this is the case I would be cautious in accusing a cloud provider of wrongdoing. Many things need to go awfully wrong to trigger this sort of alarm, and I'm not talking about GCP's anti-abuse system. In fact, it's telling that no reputable, well established business is reporting any impact. The whole point of any anti-abuse system is to suspend accounts that are caught engaging in some sort of abuse.
Oof.
https://x.com/theo/status/2056946993407369300
https://xcancel.com/theo/status/2056946993407369300
Couldn't find it on yt.
Either way, I agree with blensor here, there's no new info on the railway incident itself but mostly about google's direction towards antigravity.
About the author of the video mentioning that he's scared, unfortunately, that has always been the case with Journalism/columnists etc, speaking ill of the platform which you use to sell your wares tends to backfire. Wish him all the luck
This was less "Oh look, a rare edge case that was easy to miss!" and more "We don't bother putting guardrails into critical systems. Oops!"
Weekends and public holidays are a thing, plus it’s quite common for companies to shut down for 2 weeks over Christmas.
There’s a lot of opportunity for mistakes or malicious actions to happen at times that won’t be discovered for a while.
It’s ok though, Claude helped us cut >45% of our monthly costs. I’m surprised they haven’t been beating down my door after we made that level-shift. Probably in AE transition. ¯\_(ツ)_/¯
> Around 22:20 UTC, our Google Cloud account was placed into a "restricted" status hence removing all of our cloud overflow VMs, our CloudSQL instance, and our API.
At $2m/mo spend, this kind of thing is insane. GCP has never been the most reliable of clouds but this is pretty awful. I would never have expected this.
Each division also cross-charges, so Samsung Mobile would be paying Samsung Displays for the screens, and possibly at a small, guaranteed and non-negotiable margin.
Without a global strategy not to do so, divisions within an enterprise optimize for their own bottom line and have internal discussions on build-vs-buy even if they have an internal factory.
Their primary concern seemed to have been to keep as much of our workload inside AWS as possible and to win workload from 3rd party services we used (e.g. CDNs). The actual revenue appeared secondary.
> Render (a $1.5B company) has been hosting customers on GCP since 2018, and has never been banned.
speaking of which, I have a question for render but how does render prevent something like what has happened with railway (ie. the account getting banned), I would love to know more about what the team at render thinks of such and also I would love to get some thoughts on why Render is using GCP, I would love to know some architectural decisions behind it as I am curious about it!
Once again, thanks for responding to me and waiting for your response and have a nice day anurag!
I mean, the pain we have caused our customer ultimately proves you correct. That said, we made our decisions with the information and constraints that we knew in that moment in time. Railway has hosts in AWS/GCP/and co-los, so coordinating those workloads in a fully distributed manner would be ideal but end of the day, we didn't forsee that would just have our project get deleted just like that.
(Even if we did get assurances from them in 2024, that it wouldn't happen again, although we just got auto-rate limited the last time.)
Because in case of a compromise/unauthorized access that's exactly what you don't want to happen
No, not really. That's pretty basic stuff. You would do well in reading up on the shared responsibility model. Customers are responsible for setting up their own infrastructure, and platform/service providers are only responsible for the services they manage. Even then, stuff like persisted data is still recoverable by design.
But you are absolutely responsible for the service you put together. This is a basic principle for around two decades. Infrastructure as code tools are pervasive and ubiquitous for over a decade.
Try experiencing it in person
Again this is the naive view
Again if someone compromises your accounts and everything is deleted instantly you'll be the one looking like a fool
Cuz otherwise you look like a threat actor.
That’s assuming your vendor was pentesting AWS systems. If you meant you hired a vendor to pentest your own systems on AWS, that’s of course a totally different matter.
Sorry for being unclear, the vendor was attacking our organization only, and any other company was expressly forbidden in the contract. As I recall it was a fake SSO sign-in page to collect credentials that they would try and social engineer our employees with.