GitHub having issues today(githubstatus.com) |
GitHub having issues today(githubstatus.com) |
They've had a long history of this kind of stability issue when migrating or trying to migrate acquisitions from their previous stack to an MS one. This happened with Hotmail (Unix server -> Windows server), LinkedIn (custom cloud -> MS cloud) and others since.
Has it?
I’ve had hardly any problems. Occasional issues, but rarely have I been impacted to the extent I notice for more than say an hour…. maybe I notice it a couple times a year.
My internet access at home is more likely the issue when I hit GitHub issues.
Probably because there have been more high-profile stories of companies migrating off of Ruby on Rails to something else (e.g. Java, Go, etc) rather than vice-versa of migrating into it.
E.g. the high-profile story of Twitter's previous "whale fail" scaling problems supposedly being partially solved by switching from Ruby to Java/Scala/JVM : https://www.google.com/search?q=twitter+whale+fail+ruby+rail...
Ruby may be unfairly blamed but nevertheless, the narrative is already out there even though other big sites like Shopify, etc still use it.
Source: <-- OPs
Because you designed and implemented it poorly, that's why. Alternatively: How should I know, you wrote it.
If you're ever bored as a developer, switch to operations, you get to be the person developers turn to when they can't code, debug, do logging or security.
Calling it a "legacy rails" stack is incredibly disingenuous as well. It's not like they're running a 5 year old unsupported version of Rails/MySQL. GitHub runs from the Rails main branch - the latest stable version they possibly can - and they update several times per month.[^1] They're one of the largest known Rails code bases and contributors to the framework. Outside of maybe 37 Signals and Shopify they employ more experts in the framework and Ruby itself than any other company.
It's far more likely the issue is elsewhere in their stack. Despite running a rails monolith, GitHub is still a complex distributed system with many moving parts.
I feel like it's usually configuration changes and infra/platform issues, not code changes, that cause most outages these days. We're all A/B testing, canary deployments, and using feature flags to test actual code changes...
[^1]: https://github.blog/2023-04-06-building-github-with-ruby-and...
The culprit is change. Infra changes, config changes, new features, system state (os updates, building new images, rebooting, etc...), even fixing existing bugs all are larger changes to the system than most think. It's really remarkable at this point that Github is as stable as it is. It is a testament to the Github team they have been as stable as they are. It's not "rot" it's just a huge system.
It's not rails nor MySQL, both proven good for years.
"What do you mean the database is down after I loaded 500 TiB and indexed all columns? It's MySQL, Facebook uses MySQL a high scale for years without incident!"
Hugs to the GitHub ops team.
We just hope SMTP keeps ticking along somehow or we're fcuked.
External dependencies are always problem, but do you have the capacity and resources required to manage those dependencies internally? Most don't and will still get a better product/service by using an external service.
Local also means you can orchestrate maintenance windows to avoid outages at critical phases.
Define "most". There is a surprisingly high number of small/mid-sized companies which have dedicated people for this kind of things.
Sounds like all in good order then ...
This way your git repo could be located on: - GitHub - Your Closet (...) - UCLA's supercomputer - JBOD in Max Planck Institute (...) - GitLab
Doing this with a simple file with "[ipfs, github, gitlab]" on it would be revolutionary, especially for data version control, like nn weights or databases that are too large for git and cost too much on other services, as they would be free on ipf/torrent.
Then no one is phased by the inevitable failure of various companies.
If "ipfs" can be added as a remote, and it automatically pulls from thousands of different devices without having to specify them, that's a pretty big win for redundancy right?
Just expect GitHub to go down at least once every month as it is that unreliable.
This certainly has aged well: [0]
People really like avoiding ops
_Maybe it’s time for rewriting it in Rust._
Edit: RIIR was said in jest. I forgot HN doesn’t support markdown.
edit: fair enough
Sorry everyone!
My exposure is increasing, but it's still intermittent / not stuff ongoing all the time and if it doesn't work ... oh well it will run later when it does. So the impact is lower than some.
Well, actually on second thought it does: If ipfs or torrent can be used as a data version control backend, people no longer have to pay $600 to get the very popular and basic dataset of arxiv.org from AWS - under torrent or IPFS it's free. So basically decentralization here means running away from money hungry scams like AWS.
You should always engineer things so you can fall back to something akin to:
./scripts/deploy_the_things
Ideally backed by a real build system and task engine ala Bazel, Gradle, whatever else floats your boat.
It also means you are free to move between different runners/CI providers and just have to work out how to re-plumb secrets/whatever.
GH actions/friends really provide minimal value, the fact they have convinced everyone that encoding task graphs into franken-YAML-bash is somehow good is one of the more egregious lies sold to the development community at large.
Additionally capabilities can be added in a Makefile form to use IPFS as the cache to ensure that a script that takes 3 months on a HPC/supercomputer to make a 3MB file only has to grab the file in a few seconds, even if not yet computed locally (e.g. you just computed it at work and do `git pull && ipfsmake`).
It's fairly "easy" and relatively safe to let an AI loose on your Java code base and use it to add new features or find bugs. Very few people would let a similar AI roam around production servers and databases.
If you collect enough logs, exceptions/crash dumps, network traffic and so on, you could feed that to the AI and have it tell you why a service crashed. The majority of my job as an operations person is to figure out why something crashed with only a subset of that information and being able to read the code and reason about why current circumstances resulted in the crash or data corruption. Sometimes the job is even to implement the stuff the developers didn't, while not actually touching the code and relying on what the operating system, database, web server or network tells you.
If developers where better, or had more time, more resource then yes, an AI could do the job faster and better. In current environment, operations is pretty safe.
As you can see you don't need to give AI write access to your production environment.
Trust me, there will always be OPs people.
Source: 30+ years in Ops/SysAdmin
The only Rails projects that I worked on that never had performance problems are the ones that never reached any scale. All Rails projects that gained traction that I worked on, needed serious refactorings, partial rewrites, tuning and tweaking to keep 'em running. If only to tame the server-bills, but most of the times to just keep the servers up. Good news is that it's very doable to tune, tweak and optimize a Rails stack. But the bad news is that every "premature optimization is the root of all evil" project made a lot of choices back in the days that make this nessecary optimization today hard or impossible even.
What I'm trying to say is: Performance issues with Rails will sound very familiar to anyone who worked seriously with Rails. So it's not so strange that people reach for this conclusion if almost everyone in the community has some first-hand experience with this conclusion.
You'll be hard pressed to find any stack that doesn't require this.
A big problem with rails, though, is how easy it makes it to "do the bad thing" (and in rare cases, how hard it makes it to do the "good" thing). A has_many/belongs_to that crosses bounded domains (adds tight coupling) is a mere oneliner: only discipline and experience prevents that. A quick call to the database from within a view, something that not even linters catch, it takes vigilance from reviewers to catch that. Reliance on some "external" (i.e. set by a module, concern, lib, hook or other method) instace-var in a controller can be caught by a tighly set linter, but too, is tough.
Code that introduces poor joins, filters or sorts on unindexed columns, N+1 queries and more, are often simple, clean-looking setups.
`Organization.top(10).map(&:spending).sum` looks lean and neat, but hides all sorts of gnarly details in ~three~ four different layers of abstraction: Ruby-language because "spending" might be an attribute or a method, you won't know, Rails, because it overloads stuff like "sort", "sum" and whatnot to sometimes operate on data (and then first actually load ALL that data) and sometimes on the query/in-database. It might even be a database-column, but you won't know without looking at the database-model. And finally the app for how a scope like top(10) is really implemented. For all we know, it might even make 10 HTTP calls.
Rails (and ruby) lack quite some common tools and safety nets that other frameworks do have. And yes, that's a trade-off, because many of these safety nets (like strong and static typing) come at a cost to certain use-cases, people or situations.
Edit: I realize there are four layers of abstraction if the all-important and dictating database is taken into account, which in Rails it always is.