GitHub having issues today

GitHub having issues today(githubstatus.com)

137 points by mangoman 2 years ago | 87 comments

acedTrex 2 years ago |

The fact that Github has been so unstable for so long is absolutely insane to me. I know ops is hard, but this level of consistent outage points to an endemic problem. Is it the legacy rails/mysql stack that is the largest culprit or is there systemic rot in the engineering org?

AlchemistCamp 2 years ago | |

More likely, it's efforts to migrate away from the previously solid Rails stack to MS's preferred stack.

They've had a long history of this kind of stability issue when migrating or trying to migrate acquisitions from their previous stack to an MS one. This happened with Hotmail (Unix server -> Windows server), LinkedIn (custom cloud -> MS cloud) and others since.

mrweasel 2 years ago | | |

Is Github moving to .Net and/or SQLServer, or is it "just" moving everything to Azure?

gjsman-1000 2 years ago | | |

The LinkedIn to Azure migration was indefinitely postponed.

nurettin 2 years ago | | |

Never heard of linkedin problems before.

nijave 2 years ago | |

No need to speculate, Github posts fairly detailed information on availability and outage causes https://github.blog/tag/github-availability-report/

duxup 2 years ago | |

> so unstable for so long

Has it?

I’ve had hardly any problems. Occasional issues, but rarely have I been impacted to the extent I notice for more than say an hour…. maybe I notice it a couple times a year.

My internet access at home is more likely the issue when I hit GitHub issues.

nathan_douglas 2 years ago | | |

I've experienced a ton of issues, but it's likely because almost every aspect of our operation depends in someway on GitHub -- the repo itself and basic push/pull, or PRs, or webhooks, or actions, or some aspect of status updates, or random API tasks, etc. We use a lot of GitHub.

izietto 2 years ago | |

Why people think it can be related to Rails when there are tons of companies out there using Rails not affected by this degradation?

jasode 2 years ago | | |

>Why people think it can be related to Rails

Probably because there have been more high-profile stories of companies migrating off of Ruby on Rails to something else (e.g. Java, Go, etc) rather than vice-versa of migrating into it.

E.g. the high-profile story of Twitter's previous "whale fail" scaling problems supposedly being partially solved by switching from Ruby to Java/Scala/JVM : https://www.google.com/search?q=twitter+whale+fail+ruby+rail...

Ruby may be unfairly blamed but nevertheless, the narrative is already out there even though other big sites like Shopify, etc still use it.

dtech 2 years ago | | |

One difference is that Rails and MySQL on a Github scale is rare, even when taking into account Github scale is rare.

sonicanatidae 2 years ago | |

Sometimes it's not Ops. Sometimes its crap code. ;)

Source: <-- OPs

mrweasel 2 years ago | | |

Clients I worked with: Our service crashed, why?

Because you designed and implemented it poorly, that's why. Alternatively: How should I know, you wrote it.

If you're ever bored as a developer, switch to operations, you get to be the person developers turn to when they can't code, debug, do logging or security.

dcchambers 2 years ago | |

Every time there's a GitHub outage of any severity one of the top comments on HN is inevitably suggesting that it's probably due to Rails. It's getting pretty tiresome.

Calling it a "legacy rails" stack is incredibly disingenuous as well. It's not like they're running a 5 year old unsupported version of Rails/MySQL. GitHub runs from the Rails main branch - the latest stable version they possibly can - and they update several times per month.[^1] They're one of the largest known Rails code bases and contributors to the framework. Outside of maybe 37 Signals and Shopify they employ more experts in the framework and Ruby itself than any other company.

It's far more likely the issue is elsewhere in their stack. Despite running a rails monolith, GitHub is still a complex distributed system with many moving parts.

I feel like it's usually configuration changes and infra/platform issues, not code changes, that cause most outages these days. We're all A/B testing, canary deployments, and using feature flags to test actual code changes...

[^1]: https://github.blog/2023-04-06-building-github-with-ruby-and...

nijave 2 years ago | | |

It's easier to blame <piece of technology> than to admit running a service at Github's scale is highly complex and takes deep expertise.

acedTrex 2 years ago | | |

I wasn't really blaming rails per say, if anything their main database mysql1 seems to pop up in their post mortems more then anything else

indymike 2 years ago | |

> Is it the legacy rails/mysql stack that is the largest culprit or is there systemic rot in the engineering org?

The culprit is change. Infra changes, config changes, new features, system state (os updates, building new images, rebooting, etc...), even fixing existing bugs all are larger changes to the system than most think. It's really remarkable at this point that Github is as stable as it is. It is a testament to the Github team they have been as stable as they are. It's not "rot" it's just a huge system.

ahmgeek 2 years ago | |

I don't think you understand ops :), there's no 100% availability anywhere, so issues and degradation will always happen no matter what. https://sre.google/sre-book/service-level-objectives/

It's not rails nor MySQL, both proven good for years.

dijit 2 years ago | | |

please permit me to indulge the most extreme example of what you just said.

"What do you mean the database is down after I loaded 500 TiB and indexed all columns? It's MySQL, Facebook uses MySQL a high scale for years without incident!"

hunkins 2 years ago |

Getting intermittent 500s browsing repositories right now.

Hugs to the GitHub ops team.

efrecon 2 years ago | |

Also actions won't be able to checkout.

aaomidi 2 years ago |

We have a slack channel that monitors GitHub availability. There’s content there nearly 3-4 times a week. It’s amazing how awful this has become.

amelius 2 years ago |

In the old days we had mirrors for many online repositories.

sph 2 years ago | |

We've lost the technology for decentralised Internet services in the early 2000s.

We just hope SMTP keeps ticking along somehow or we're fcuked.

berkes 2 years ago | | |

Teams, Slack, Discord, whatsapp, imessage, etc are trying hard, to replace that too, though,

djbusby 2 years ago | |

Yep. Now I'm glad I vendor-mirror all my dependencies so I can keep doing tests and all that.

8organicbits 2 years ago | | |

I wonder if projects should proactively mirror to another git site (github as main and gitlab as mirror, for example). Collaboration on the project may stop when the main is down, but consumers could proceed using the mirror. I'm not sure how well various tooling supports fallback origins. This would reduce (but not eliminate) the need for users to vendor their dependencies.

richardwhiuk 2 years ago | |

Everyone with a git clone has a mirror....

amelius 2 years ago | | |

But how do I find someone with a clone of the repo I'm looking for?

robinhoodexe 2 years ago |

And just as we're about to migrate 4 kubernetes clusters with a total of ~4k pods. Terraform in github actions on selfhosted runners and argoCD is failing.

mrweasel 2 years ago | |

Oh that sucks, there's always going to be those who will say that it's the price you pay for using Github, but locally hosted VCS and CI/CD systems have issues as well.

External dependencies are always problem, but do you have the capacity and resources required to manage those dependencies internally? Most don't and will still get a better product/service by using an external service.

marginalia_nu 2 years ago | | |

Rate of outages on github last few years has been orders of magnitude higher than anything I've encountered on a locally hosted VCS.

Local also means you can orchestrate maintenance windows to avoid outages at critical phases.

darkwater 2 years ago | | |

> Most don't

Define "most". There is a surprisingly high number of small/mid-sized companies which have dedicated people for this kind of things.

snarkyturtle 2 years ago | |

That's where I feel like it's actually pretty nice to not have CI tied to your source code. It's probably more expensive to use Travis/Circle but at least you don't have a single point of failure for deploys.

sleepychu 2 years ago | | |

Doesn't this give you 2xSPF? Or can I use my local copy of the source to kick of Travis/Circle?

cupofjoakim 2 years ago |

Happy I'm not in a hurry with any specific work items today. Hope it's not too much of a mess to figure out for the github peeps. Much love to them.

imdsm 2 years ago |

Time to sword fight outside the offi... oh we all work remotely now.

pklack 2 years ago | |

Time to do the laundry and dishes then, I guess... Sometimes WFH has its boring sides.

inhumantsar 2 years ago | |

well, at least that's one good reason to buy into zuck's metaverse

jdthedisciple 2 years ago |

GitHub having issues?

Sounds like all in good order then ...

chaxor 2 years ago |

Wouldn't it be wonderful if the most popular version control system was is decentralized? This is achievable, and is the correct solution.

This way your git repo could be located on: - GitHub - Your Closet (...) - UCLA's supercomputer - JBOD in Max Planck Institute (...) - GitLab

Doing this with a simple file with "[ipfs, github, gitlab]" on it would be revolutionary, especially for data version control, like nn weights or databases that are too large for git and cost too much on other services, as they would be free on ipf/torrent.

Then no one is phased by the inevitable failure of various companies.

richardwhiuk 2 years ago | |

Git is already decentralized....

chuckadams 2 years ago | | |

I think what GP was getting at is that the topology is still point-to-point, which tends to lead to a hub-and-spoke system. Hell, it’s even in the name: GitHUB. And fact is, this works out for most people, but it leads to some undesirable failure modes. Maybe we need, say, a “git-discover-remote” command…

shrimp_emoji 2 years ago | | |

So is cryptocurrency, but people still lose their money when exchanges get hacked. :D It's a cry for help. "Make decentralization as convenient as centralization!"

belter 2 years ago | | |

But not GitHub... :-)

blauditore 2 years ago | |

Can't tell if this comment is sarcastic, but that's exactly what git is: Every clone of the repo is independent, and acts as a full backup. Likewise, a local repo can be pushed to various remotes, there is no inherent strong server-client coupling (even though it's often used in such a way).

chaxor 2 years ago | | |

Yes but not having to specify every single user who pulls the git repo from GitHub by IP address as an additional remote is a huge win no?

If "ipfs" can be added as a remote, and it automatically pulls from thousands of different devices without having to specify them, that's a pretty big win for redundancy right?

Bognar 2 years ago | |

Ah yes, because version control is only about file storage.

rvz 2 years ago |

Still having regular incidents at GitHub in 2024, even with Microsoft's infrastructure after 5 years since the acquisition with something always going down.

Just expect GitHub to go down at least once every month as it is that unreliable.

This certainly has aged well: [0]

[0] https://news.ycombinator.com/item?id=22868406

dartos 2 years ago | |

The price teams will pay to offload their ops.

People really like avoiding ops

abhinavk 2 years ago |

It really feels like the frequency of this happening has increased lately. Are they facing high employee turnover?

_Maybe it’s time for rewriting it in Rust._

Edit: RIIR was said in jest. I forgot HN doesn’t support markdown.

sumtechguy 2 years ago | |

Full stack re-writes are not always the best way. Sometimes you end up with worse. Sometimes you end up with better. If you do go the 'full stack rewrite' you better have a decent plan in place. Because you are about to get to support 2 code bases for awhile.

edit: fair enough

vitiral 2 years ago | | |

I'm pretty sure (hope?) the comment was said _in jest_ (though I'm not familiar with _this internet standard_)

bombcar 2 years ago |

Whoops guess that was one too many PRs.

Sorry everyone!

alexnewman 2 years ago |

Phew, I thought it was because I am in Puerto Rico. Has GitHub or Microsoft done any bigelayoffs?

ftkftk 2 years ago |

Somebody is having a bad Tuesday.

lillecarl 2 years ago |

copilot is purring like a cat, best wishes to their infra team!

iddan 2 years ago |

It is back

Narciss 2 years ago |

Run for your liveeeees!