GitHub degraded performance – resolved

GitHub degraded performance – resolved(githubstatus.com)

101 points by mot2ba 4 years ago | 57 comments

bob1029 4 years ago |

This one is pretty nasty. Getting tired of the disengagement this causes for the team. It's basically a lost day of productivity even if GH goes down for only 30 minutes. Yes, we can continue coding locally, but issues & PRs are a huge part of our daily process.

When I get back from vacation we are moving our shit to the enterprise plan. $21/user-month is really not that big of a deal when you are running basically your entire business through the product.

I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers, but at the same time our infrastructure has proven itself to be extremely reliable over the last 4-5 years. Even hosting GH enterprise on public AWS/Azure is more ideal in my eyes now, because I can control the physical region and tenancy. There is an Azure datacenter within 100 miles of many of our home offices and I can ensure that our Github stack spins up there. Minimizing the amount of internet you have to transit to get to your applications can sidestep a lot of this stormy public cloud/internet weather bullshit.

andrewstuart2 4 years ago | |

I think you will quickly find if you're just deploying GH Enterprise on premises that it is not at all what you get from the GH Cloud offering. GHE has its own product roadmap that is quite a bit behind the cloud product, and in many cases (IMO) unacceptably so. It still doesn't support cache for runners, last I checked, though I've since moved on from the org that required me to work with GHE. I'm back to my happy place with self-hosted GitLab, and a little bit of GitHub cloud.

bob1029 4 years ago | | |

> GHE has its own product roadmap that is quite a bit behind the cloud product

This is exactly what we want though. We don't need the new fancy shit on a regular cadence. Issues, Code, PRs and 1 line checkbuild scripts are all we care about. Everything else is built into our software.

onionisafruit 4 years ago | |

> I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers

That by itself would be ridiculous, but there's more to it than that. Your GHE server won't have new code deployed to it hundreds of times per day the way github.com does. You probably won't be the target of ddos attacks either.

Very few of github.com outages are the result of maintenance errors.

cmckn 4 years ago | |

> It's basically a lost day of productivity even if GH goes down for only 30 minutes. Yes, we can continue coding locally, but issues & PRs are a huge part of our daily process.

I know outages are frustrating, but how does 30 minutes before 10am ruin an entire day? Maybe you’re just being hyperbolic, but people take coffee breaks longer than that.

bob1029 4 years ago | | |

> how does 30 minutes before 10am ruin an entire day?

Not everyone is on pacific time. This impacted us right in the middle of a standup call and disrupted our planning for the day.

Also, the problem is more that you don't know how long the outage is going to last at first, so you start finding other ways to occupy your time. Through the lens of hindsight, yes we are certainly being hyperbolic in those cases where it was only 30 minutes.

nonbirithm 4 years ago | |

> It's basically a lost day of productivity even if GH goes down for only 30 minutes.

Are we entering an era where if we don't have hundreds of thousands of servers running 24/7 to host our services, with all the resource consumption and environmental implications that result, that we will no longer be able to remain productive as a society? Is this gradually becoming a new baseline for humanity from which we cannot reasonably downsize?

ygjb 4 years ago | | |

Yes.

It's worth noting that you can take almost any software from before the late 90s to early 2000s, depending on the vendor, that is still available, and with a layer of emulation get it running in minutes.

The vast majority of software that is being built today for end users simply will not function in a short time frame because of aggressively built in dependencies on cloud based services, often with those dependencies designed to encourage customer lock-in and prevent piracy by forcing users to have active accounts and shift core logic from endpoints to cloud services.

Even moving past licensing servers and account capabilities, tools like Grammarly ship much of their analysis to the cloud, same for most translation services. Many modern text to speech services are cloud based as well (just look at how useless a modern cell phone becomes when you are without a data connection, for example).

I don't know what the statistics would look like, but I shudder to think how much of the world economy would grind to a halt if Amazon or another significant cloud provider had a sustained, multi-region outage (say 24-48 hours).

It's a god-damn mess, and we did it to ourselves.

bob1029 4 years ago | | |

> Are we entering an era

We've been in this era for about 4 decades now. There are mainframes which do payment processing that, if they were to fail, would cause substantial harm to the global economy almost instantly.

sudhirj 4 years ago | | |

Utility technology become foundational very quickly. Despite it being very new, humanity is already fundamentally reliant on global supply chains, oil, electricity, networks, satellites and many other technologies that we cannot downsize. We could collectively plan and execute decades long exit plans, like we do for oil, but outages will bring daily life to a halt.

lmm 4 years ago | | |

This is like being shocked that so many companies now rely on having an electricity supply and can't work during power outages.

jrochkind1 4 years ago | | |

Yes. You're just noticing?

teh_klev 4 years ago | |

> There is an Azure datacenter within 100 miles of many of our home offices and I can ensure that our Github stack spins up there. Minimizing the amount of internet you have to transit to get to your applications can sidestep a lot of this stormy public cloud/internet weather bullshit.

You've no guarantees that your local'ish data centre is going to be hop-wise, route-wise and peering-wise any better than a DC 1500 miles away, in relation to your home or office ISP.

bob1029 4 years ago | | |

> You've no guarantees that your local'ish data centre is going to be hop-wise, route-wise and peering-wise any better than a DC 1500 miles away, in relation to your home or office ISP.

You're correct. In fact, as I type this reply my cloudflare diagnostics are indicating I am talking to a datacenter 200 miles further away than would otherwise be ideal. That said, its still within an extremely reasonable distance. This is a "risk" I am willing to take. It's certainly a better starting point than guaranteed 70ms minimums.

saghm 4 years ago | |

After the first paragraph, I was somewhat expecting this to be about wanting to move off of github due to the issues, so I was kind of surprised to see that you instead decided to start paying them! I don't think you're wrong to decide that or anything; it's just interesting to see that negative experience can drive free users to become paid users when naively I would expect the opposite.

bob1029 4 years ago | | |

It's even more nuanced than that. We already pay them money and are looking to pay more.

lima 4 years ago | |

> I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers

Why? It's totally reasonable that your own GHE instance will have better uptime.

Running GitHub.com is much, much harder than a private instance (DB scale-out, load, ...).

Our in-house Gerrit infra and CI has had a significantly better uptime than GitHub over the past year, but we have hundreds, not 60 million users and exabytes of storage :-)

cyberpunk 4 years ago | |

I would strongly recommend (having used both extensively) going all in in gitlab instead if you have to do a migration anyway.

scaryclam 4 years ago | |

I don't buy this at all. Even if GH is down for the entire day, how the heck does NO-ONE on your team know what they're doing? Was no-one working on something already? Did nobody pay any attention during planning? Do you not have anything you can do from memory from your backlog?!

If you can't work for a day just because Github is down, then there's bigger problems in your process that github being down. I'm sorry of that sounds harsh, but you're either being hyperbolic or you have some real issues to fix in your team or organisation.

rightbyte 4 years ago | |

> I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers

Not really. You can mess with stuff when it suits you with a risk for downtime. Hosting yourself has the same advantage as disabling auto-updates - you are in control of when to break stuff.

wdb 4 years ago | |

Personally, I had more issues with all the Google Cloud issues that happened this week. It's difficult to debug issues when Cloud Logging stops working. Most worrisome was that Google reported issues in US regions while we only have EU regions.

blitzar 4 years ago | |

On the plus side, you have a process and in theory it works, or you would be looking elsewhere.

You might have lost a day today, but how many days have you gained thanks to these tools the last month?

dabeeeenster 4 years ago | |

How does moving to the Enterprise plan help?

onionisafruit 4 years ago | | |

Presumably GP means they will deploy their on GitHub Enterprise server instead of using github.com.

sudhirj 4 years ago | | |

GH Enterprise is run on your own servers, so you'd theoretically run it right in the office. It may not move the needle on actual downtime, but there's some control in the downtime - if the LAN is out no one can work anyway, and upgrades will only happen on your business's lean days, be tested out with the teams that have an appetite for experimentation, etc.

intunderflow 4 years ago |

When GitHub is down we legally don't have to do any work, it's in the constitution I swear

m_a_g 4 years ago | |

Best comment in this thread by far

funOtter 4 years ago |

I was just doing some development on my GitHub Actions ... can only assume it was my fault.

roland35 4 years ago | |

Should have probably left out that sudo rm -rf from your Action

mabbo 4 years ago |

In the middle of onboarding with a new company. We're at a critical point of training that requires GitHub.

Oh boy, this is going to be a fun day.

awestroke 4 years ago | |

The incident lasted less than an hour. How does this affect your whole day?

mabbo 4 years ago | | |

It didn't. But at the time, the trainer was very, very worried.

tyingq 4 years ago | |

Maybe you could sign up for a GitHub enterprise trial? At the least first few screens seem to be working.

mot2ba 4 years ago |

This incident has been marked as resolved. You guys could check how often Github notable incidents that really captures the audience's attention in HN [0] [1]:

[0]https://news.ycombinator.com/from?site=githubstatus.com

[1]https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=fal...

jrochkind1 4 years ago | |

Looks like about once every two months?

Which is actually more than I expected, and seems like kind of too much.

contingencies 4 years ago |

They were down at least a solid hour last week and didn't even post to their status page. I put in a query and got no response. They then unilaterally closed the ticket asked me how my support experience was. Time to move to Gitlab. https://news.ycombinator.com/item?id=28874751

jrochkind1 4 years ago |

Hmm, how do I get the Github Actions CI to run on all the already existing PR's for which it never ran? Anyone know?

mdaniel 4 years ago | |

Would this do it? https://docs.github.com/en/rest/reference/actions#create-a-w... or, depending on the error, perhaps this? https://docs.github.com/en/rest/reference/actions#re-run-a-w...

jrochkind1 4 years ago | | |

Not sure. There was no error reported, just a "waiting" message, with no "run" created.

i'm not sure I have any ID's to give to those API calls. Would have preferred something from the web UI.

I just went ahead and created new commits for them all. (Create an empty commit, or --amend to re-commit the last commit with a new timestamp).

Would prefer if there were an easier way to do it, but fighting with http API's I am not familiar with when they aren't immediately apparent and I'm not sure they'll work at all was not that easier way.

maherbeg 4 years ago |

I'm surprised they don't have a couple of separate clusters that they roll things out to and monitor. Seems like you could have a very stable "high paying customers" cluster that is at the very end of your deployment cycle after a ton of canary checks on the way get through.

intunderflow 4 years ago | |

With Actions they do this, if you're on GitHub Enterprise and run an action it picks machines out of a special pool set aside just for enterprise customers.

judge2020 4 years ago |

GitHub Universe is only a week away, could be related to a bad deploy for some feature updates to-be-revealed then.

blondin 4 years ago |

i wonder if github ever considered a hybrid approach? an on-prem enterprise solution that syncs with github cloud.

jacobrussell 4 years ago |

Good time for a lunch break I guess

encryptluks2 4 years ago |

Just noticed GitHub is down for me. Can't access repos :(