Has GitHub been down more since its acquisition by Microsoft?

Has GitHub been down more since its acquisition by Microsoft?(nimbleindustries.io)

243 points by tdrnd 6 years ago | 141 comments

oliwarner 6 years ago |

GitHub is a lot more enticing now it has free private repos and the launch of Actions. I suspect they're a lot busier than they were before Microsoft. More features also naturally means more status tickets.

robertlagrant 6 years ago |

Has nimbleindustries.io been down more since Github's acquisition by Microsoft?

colinbartlett 6 years ago | |

Sorry... we probably should have put a CDN in front of our humble Linode box earlier. Here's a cache link:

https://web.archive.org/web/20200629103627/https://nimbleind...

Edit: Cloudflare seems to be working now. Wow, that was really easy to setup.

mattbillenstein 6 years ago | | |

How many req/s before you fell over?

robertlagrant 6 years ago | | |

:-) Cloudflare is amazing, yes. Everyone should use it by default, I say.

risyachka 6 years ago | | |

Its hard for me to trust your products if you didn't even have a Cloudflare in front.

EDIT: I don't imply the product is bad, sorry

thescribbblr 6 years ago | |

Yes

bryanrasmussen 6 years ago | | |

I think it's pretty fair to say we've all been pretty down since GitHub's acquisition by Microsoft.

dilandau 6 years ago | |

This is the comment I wish I had made.

This is just clickbait. The "GitHub down" schadenfreude has really gotten tired.

srean 6 years ago |

Taking the numbers at their face value this is a good exercise in some text book statistical test of hypothesis.

Incidents: Before 89. After 126. What is the chance of this happening if the 'rate' of occurrence has not changed ?

Assuming an unknown but constant Poisson rate, we get the probability of observing what has been observed to be 0.00225.

A fortuitous thing about this test is that one does not need to know what that unknown constant rate is.

solidasparagus 6 years ago | |

> Assuming a Poisson rate

That feels like a mighty big assumption. Probably big enough that trying to calculate the probability is more misleading than enlightening.

srean 6 years ago | | |

As I mentioned, my comment is meant as an exercise. If we were to take the numbers more seriously, due diligence is necessary. That said, if we assume that one incident does not affect the other, then the Poisson nature falls out as a natural consequence of that independence and the assumption of a constant rate (our Null hypothesis).

As long as the incidents are spaced out enough, that the possibility of one incident affecting the other is low, Poisson can be surprisingly realistic. Quite remarkable, given how simple it is. All in all not that bad an assumption for a back of the envelope calculation in a meeting.

In practice, however, given more time, I would be looking at the statistics of inter-incident times more carefully. If those look sufficiently different from Exponentially distributed, a non-Poisson renewal process might be more appropriate than a Poisson process.

visarga 6 years ago | |

Maybe Github has grown significantly since then. More complexity, more down time.

dutchmartin 6 years ago | | |

Well, Github introduced way more new features since Microsoft took over.

paulintrognon 6 years ago | | |

Also, according to the article itself, "[it] could be all a part of coordinated effort to be more transparent about their service status".

jtdev 6 years ago | | |

Maybe Microsoft is less averse to complexity - which would still make uptime degradation land squarely on Microsoft’s shoulders. Microsoft seems to be an organization that actually introduces too much complexity to their products imo.

jcrubino 6 years ago | | |

Microsoft has every right to move fast and break things on their watch.

cj 6 years ago | |

This was exactly my thought.

More or less downtime (as reported by a status page) is probably affected more by changes to policies re: how / when incidents are posted publicly between "Github" and "Microsoft Github".

Subjectively speaking (using Github daily) I haven't noticed a difference. In general, Github has never been extremely reliable even pre-Microsoft.

wilde 6 years ago | |

What was the YoY change in incidents prior to the acquisition? I’d expect incidents to grow as a function of employee count at least.

battery423 6 years ago |

It could also mean that they are more active and creating more chaos due to more changes.

shameless and useless advertising.

danpalmer 6 years ago | |

On my team our downtime is very closely correlated with how much we're working and changing the product. I'd be surprised if GitHub is much different.

waheoo 6 years ago | | |

Or in other words, facebook is more stable on the holidays.

virtue3 6 years ago | | |

exactly. Before M$ there wasn't much product movement at Github :/

bnegreve 6 years ago | |

That's pretty much their conclusion

> [..] GitHub has been down more since the acquisition by Microsoft. But that could be all a part of coordinated effort to be more transparent about their service status, an effort that should be applauded.

robertlagrant 6 years ago | | |

That's not a conclusion, it's a theory.

blueflow 6 years ago | |

Personally, I'd prefer some availability over sparkly features.

echelon 6 years ago | | |

Not counting the redesign, there haven't been "sparkly" features. Sparkly features are the kinds of things the old management used to let engineers have free reign to implement.

Github pre-Microsoft was rudderless. They were more prone to implementing silly 3d model diff tools and things instead of supporting enterprise features or building powerful CI/CD tools and automation.

New Github is on the right track.

tnolet 6 years ago | |

changes = downtime. Has been since the advent of computing

h91wka 6 years ago | | |

Not if you have a competent devops team and a deployment pipeline.

scribu 6 years ago |

It's easy to see from the uptime history [1] that there have been many more incidents in April-June than in January-April.

Don't know if that has anything to do with the Microsoft acquisition, but it is concerning.

[1] https://www.githubstatus.com/uptime?page=1

esperent 6 years ago | |

They're still averaging 99.81% up-time over the last three months. This means they have been down for a total of 4 hours out of 90 days. I don't think it's all that concerning.

jillesvangurp 6 years ago | | |

The gold standard here is 5 9s, not 2.

They have paying customers that are being inconvenienced by this. We lost time over this multiple times in the last few weeks.

So, not good. IMHO they are having some major issues with their release process that they need to address. Standards have slipped there; they used to be better at this.

trm42 6 years ago | | |

This kind of depends whether you have something urgent work to handle through Github. Been waiting for over an hour so I could continue my work but Github PR-functionality hasn't shown latest changes because of the currently happening outage so I cannot. Really, really annoying and wastes partially my planned work for the day.

(of course I'm working on other stuff in the meantime but splices focus unnecessarily)

rattray 6 years ago | |

Wow, that's stark. I count 6 in April-June (2 major outages) and only 1 in January-March (0 major outages.

I wonder if COVID has affected this somehow. Anecdotally I've heard of at least one other ~peer company with a large rise in incidents/outages since April.

Strange since both companies previously had a strong culture of remote work (maybe 1/4 to 1/3 of eng was remote) going into the pandemic, so I'd be quite surprised if all-WFH contributed somehow...

samwhiteUK 6 years ago |

This is a thinly veiled advert, and tells you literally nothing concrete, apart from Github's status page is updated more. The article even admits this!

vadasambar 6 years ago |

Mirror on wayback machine: https://web.archive.org/web/20200629103627/https://nimbleind...

Denvercoder9 6 years ago |

Ironically this link is down for me.

RMPR 6 years ago | |

Was about to say the same, but it's working now

colinbartlett 6 years ago | |

Author here, just put a CDN in front of the blog so maybe that will help.

shawabawa3 6 years ago | | |

CDN's not working i'm afraid (SSL error, 403 forbidden if you ignore it)

robertlagrant 6 years ago | | |

Not yet :)

Fiveplus 6 years ago |

Ironically, Firefox tells me to stay away from your website. Which, I will.

update: the website is back and well now from my end.

colinbartlett 6 years ago | |

I can't reproduce, would you mind emailing me a screenshot of this? (I'm the author) colin at any of the domains in my bio.

Fiveplus 6 years ago | | |

I would have happily complied. On a welcome note, however, the backend changes made after the initial hickups brought the website back on its feet.

sbmthakur 6 years ago | |

I am also using Firefox. It blocked Google analytics on the site but did not get a warning.

z9e 6 years ago |

Has anyone noticed their search has gotten worse? I sometimes will search for some code that I know exists in a repo, but it gives no results, and when searching a little later it shows them as I expect.

We have Github enterprise, and Slack notifications when Github has any issues. Nearly every week there’s a problem, sometimes it’s resolved in a few minutes, other times it goes for an hour or so. I’ve pondered the question if there has been more outages since the MS acquisition and in my experience that’s a hard yes.

m12k 6 years ago |

What I don't get is how they can be down for hours on end. If I make a deploy that turns out to be buggy, I'll roll my site back to a stable previous version within minutes. Sure, that's not always possible if I've made incompatible database schema changes, but in my experience those are very, very rare (i.e. I almost always add db columns, and only rarely delete or rename columns - and when I do, I do so after those columns haven't been in use for a while).

samcrawford 6 years ago |

Independent of whether it's been down more frequently since the acquisition, I find Github's status page reporting to be lacking. It's also quite generous with definitions of outages and downtime.

For example, on June 22nd they had an issue whereby half of their nameservers were responding with an empty answer for queries for github.com. A very nice explanation is here: https://news.ycombinator.com/item?id=23605409. So for roughly half of users (including myself) this would have manifested itself as a complete outage. It also lasted for a good couple of hours. Yet on their status page it's listed as a 46 minute degradation only.

So relying on their status page reporting to draw conclusions about availability (as Statusgator seems to) will mean that an overly optimistic picture of availability is presented.

solidninja 6 years ago |

I really wish there was an option to decouple the core git functionality from the 'fancy' stuff (PRs, actions, issues) etc. When your CI pipeline runs directly from your git repository, not being able to commit/merge means you cannot release to production.

For binary artifacts, you can get around dependencies on a single provider by mirroring, but because git is mutable, you can't just mirror the repo and allow changes there, because you need the same permissions, ssh keys etc. for the repositories and because changes will need to be synced back to the source repository. You might as well not use Github in the first place and self-host (with all the problems that entails).

As far as I know, none of the major git providers offer this - I've experienced outages with GitLab, Bitbucket and GitHub that all affected the production environment (luckily, it's never been critical so far).

hashhar 6 years ago | |

That is exactly how GitHub is architectured. There have been a lot of time I've been to able to push/pull but GitHub pages was down or their web backend was down.

Bitbucket is the worst in this regard.

majkinetor 6 years ago |

Definitelly by my perosnal experience.

Part of it might be due to covid, but it started happening before that.

On the positive side, github had tone of changes since then, all seem to be good ones, so its understandable that it has more problems now as well.

jna_sh 6 years ago |

A number of the staff who left GitHub in protest of the ICE contract were senior SRE. I wonder if the data says anything about the dates of their departures...

tehjoker 6 years ago |

Looking at the chart, there were more warns but fewer downtimes so I'd say no. There was one giant downtime that happened, but I don't know why it happened so it's hard to say if there was a cause relevant to the acquisition. The other downtimes that happened seemed to be slightly longer in duration than before.

bovermyer 6 years ago |

I'm more inclined to believe this is due to the increased reporting and attention to the status rather than an actual increase in downtime.

However, I have no hard data to suggest that, only my experience as someone who's had to manage and maintain reports like this.

darkwater 6 years ago |

Anyone from GH willing to share - anonymously maybe - some insights? I really find the lack of post-mortems from GitHub outages a bit... weird given GH audience. I think all we dev could learn from a properly written public GH post-mortem.

xtracto 6 years ago | |

It´s Microsoft, when has Microsoft shared a downtime postmortem?

oltdaniel 6 years ago |

Personally I think it is connected to adding more features to GitHub. Microsoft is rolling out more and more small but also big changes, and as everyone knows, this always can have a few minutes downtime for each update.

samth 6 years ago |

This analysis has a significant comparability problem because of added features. The core git-repository service is almost never down, but GitHub Actions has had a lot of reliability problems (you can see this clearly on the status page). But GitHub Actions is new, so it skews the recent availability problems up. Unfortunately, as the post notes, the detailed status information is _also_ new, so it's not really possible to analyze the data accurately.

lwheelock 6 years ago |

Irony aside, there is some degree of credibility lost when offering a critical opinion on availability and your cloud hosted site throws a 502

Yuioup 6 years ago |

I have zero evidence but I would not be surprised if they are busy absorbing Azure Devops into GitHub and that's causing some hiccups.

dang 6 years ago |

This was prompted by the latest "Github is down" (now transformed to "Github was down" by the arrow of time):

https://news.ycombinator.com/item?id=23675864

maypeacepreva1l 6 years ago |

Github is evolving pretty rapidly than I thought it would. I like their new changes. Deep underneath it being with Microsoft just reminds me of oracle/google debacle. Definitely discourages big projects to be hosted there.

nitinreddy88 6 years ago |

There's already one news on front page https://news.ycombinator.com/item?id=23675864

Can we merge them? Mods

st_goliath 6 years ago | |

> There's already one news on front page

And the comment that is currently at/near the top and mostly discussed happens to link to exactly this site.

heinrich5991 6 years ago | |

Mail hn@ycombinator.com to talk to the mods (see footer).

julius_set 6 years ago |

I mean makes sense. They are probably migrating their infrastructure over to Microsoft’s and such.

Disclaimer: I don’t work at MS so no clue, but have been part of acquisitions

jtolds 6 years ago | |

I'm sure MS is super uncomfortable with GitHub's AWS bill. I've just assumed they're fast-tracking a migration to Azure

IceWreck 6 years ago |

Yeah really. I feel the same thing. Can't comment anything right now and Ive been trying for the past hour. It dies every three weeks.

pbnjay 6 years ago |

How has traffic/attacks changed since acquisition? Microsoft name surely gives them an even larger target on their back.

bizzleDawg 6 years ago |

The charts and diagrams in that post are really nice, how did you make them?

quickthrower2 6 years ago |

If it has, what is p?

mikece 6 years ago |

Any chance that prior to acquisition there was more focus on stability —- because it was assumed there would be a buyer —- than upgrading to the latest version of Rails and adding features?

oknoorap 6 years ago |

maybe this down caused by cloud migration from AWS to Azure?

DrBazza 6 years ago |

tl;dr yes - and an exception to Betteridge's law!!! https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

w0m 6 years ago | |

tl;dr maybe - is that an exception?

alt_f4 6 years ago |

Anecdotally, yes. I almost never saw internal error pages before, now I've been seeing them every week for a few weeks. Some people need to be fired.

zxcvbn4038 6 years ago |

What’s really wonderful is when Github goes down I get a flurry of people broadcasting on Slack channels that they can’t work and all their builds are failing, and I’m wondering What they want me to do about it. Should I get out and push? Call Bill Gates? Tweet Trump? When Github doesn’t work that’s your excuse to go get coffee, grab a pint, browse Hacker News, etc. - people should be happy with GitHub goes down. (Or self-host gitea and work from that if your so determined to keep on)

brylie 6 years ago |

They are porting things over to ASP.NET on Azure.

quickthrower2 6 years ago | |

If the implication is that asp.net or Azure are unreliable then think again

cdbattags 6 years ago | |

Such an underrated comment. I applaud.

rvz 6 years ago |

Yes.

Last time it was down was just 6 days ago: [0]

I think its time to look for alternatives or frankly in the long term, follow what some open-source orgs are doing and self-host instead.

[0] https://news.ycombinator.com/item?id=23604944

tmpfs 6 years ago |

Both the link and github comments are down for me too; I am unsure whether it has been down more since the acquisition but this is unacceptable to me. I have been trying to post a comment to an issue i made for the last few hours and every time it says "You can't comment at this time" which I find very misleading. It should say "Sorry, our system has a problem, please try again later". The implication of the current message is that my permissions are wrong or something else has happened.

This has bumped self-hosting all my repositories much higher up my list.