HN was down

468 points by jontro 5 years ago | 246 comments

mikiem 5 years ago |

Founder and CEO of M5 Hosting here. We did have a network outage today that affected Hacker News. As with any outage, we will do an RCA and we will learn and improve as a result.

I'm a big fan of HN and YC in general, we host of other YC alum, and I have taken a few things through YC Startup School. During this incident, I spoke to YC personally when they called this morning.

nas 5 years ago | |

We have been using M5 Hosting for one of our servers since 2011. They have been extremely reliable up until today. Based on what was posted about the Hacker News server setup, we have something similar. We have a "warm spare" server in a different data center. We use Debian, not FreeBSD.

We are in the process of slowly moving to a distributed system (distributed DB) that is going to make fallover easier. However, that kind of setup is orders of magnitudes more complex than the current (manual fallover) setup. I really wonder if the planned design is going to be more reliable in practice. Complexity is almost always a bad idea, in my experience. Distributed systems are just fundamentally very complicated.

mikiem 5 years ago | | |

Oh hi! Thank you for the kind words. I cant tell who you are by your name here, but if you've been with us since 2011, we have certainly spoken. Are you using our second San Diego data center for your failover location? If you and I aren't already talking directly, ask to speak with Mike in your ticket.

PaybackTony 5 years ago | |

I had used M5 some years ago to host an online rent payment / property management app. Have nothing but positive things to say about that experience. We once had an outage that was our own fault on our single server and they had someone go in, in the middle of the night, to reboot it for us and we weren't even on an SLA.

mikiem 5 years ago | | |

Thank you for sharing your positive experience! We can power cycle power outlets remotely and can connect a console (ip kvm)... and we are staffed 24x7.... in case you need another server. Thanks again!

drusepth 5 years ago |

HN is one of the few sites I always keep zoomed-in (around 200%), which led to me finding an interesting bug in Chrome while HN was down: Chrome's internal "This site can't be reached" page uses the zoom level of the site you would be visiting (if it were up), rather than Chrome's default zoom.

Screenshot: https://i.imgur.com/VwFtgQh.png

interestica 5 years ago | |

Chrome used to store 'zoom level' for URLs even if you were in incognito mode: and in plain text. Not sure if it still does.... (if you changed the zoom level for a site while in incognito from the default, it would save the value and the associated URL).

marshmallow_12 5 years ago | | |

not anymore. it does it the other way 'round though, which can be frustrating.

chirag64 5 years ago | |

I noticed the same behavior in Firefox as well. I wouldn't consider this as a bug though

jdoliner 5 years ago | |

It would be cool if zooming in / out on the T-Rex game caused it to switch your character to larger / smaller dinosaurs.

WalterSear 5 years ago | | |

The Trex game really needs a meteor animation when the connection is re-established.

losvedir 5 years ago | | |

what's the t-rex game?

lukec11 5 years ago | |

Firefox does the same, as I discovered - I don't know whether it's a bug or intended functionality.

(As an aside, I keep HN at 150% and old reddit at 120% - those are the only 2 sites I have permanently zoomed)

_Microft 5 years ago | | |

Either a bug or an over-eager member of the Mozilla UX team had actually filed a bug with a feature-parity Chrome tag on it in BMO.

redisman 5 years ago | | |

It's part of the charm. Unusable on retina without zooming (at least with my eyes).

exikyut 5 years ago | |

I've observed a related issue with much amusement for a few years now: when loading a new resource (specifically: spinner going anticlockwise, waiting for TTFB), Chrome will invisibly switch the renderer over to the font size settings of the to-be-loaded resource, then carefully inhibit repainting the view.

But, if said destination resource is very slow to hit TTFB, you switch to a different tab, then back to the loading tab, you'll see the current page at the destination page's zoom settings.

My guess is that the interstitial system that injects error pages, Safe Browsing warnings, etc, doesn't hit the code path that says "we loaded a new (regular) page, go find its zoom settings".

Demo/PoC:

1. Run $anything that will serve a webpage on an arbitrary port - even an error page or directory listing. eg, python3 -m http.server, php -S 0:8000, etc.

2. Open the resource you just set up in a new tab, zoom in or out as preferred (eg, to a crazy level), copy the URL (for convenience), then close the tab.

3. Stop the server in (1), then run `nc -lp 8000` (or netcat, ncat, or $anything that will listen but never respond).

4. Open a new tab, navigate to a valid website (eg here :), example.com, etc), then once it's loaded, paste the URL you copied. With the page spinning and waiting for netcat (et al), navigate away from the tab, then back to it again.

Think I noticed this for the first time a couple years ago. Seems harmless enough.

gkoberger 5 years ago | |

Is that really a bug?

taeric 5 years ago | | |

Feels like it to me. I'd expect the zoom to be associated with the site.

Granted, I am probably importing old thoughts of it being a sort of user provided style sheet.

jxramos 5 years ago | | |

I think the zoom level for Chrome is global per window at the least, it's definitely not per tab.

jedberg 5 years ago | |

FWIW Safari doesn't have this bug (I too keep HN zoomed at 200% for some reason).

eins1234 5 years ago | |

Glad to hear I'm not alone in this. Currently at 133%, so not quite as extreme.

Judging from the responses, this is actually a lot more popular than I assumed.

Which begs the question: Does anyone feel the default font is just perfect and wouldn't want it to be bigger even by a tiny bit?

Xplune13 5 years ago | | |

I think, the font size around 105-110% would be perfect but the default one is fine as well. It definitely is the smallest default font I've seen on a popular website but it's workable for me.

account42 5 years ago | | |

> Which begs the question: Does anyone feel the default font is just perfect and wouldn't want it to be bigger even by a tiny bit?

I think it's perfect. What is your screen DPI (or rather angular pixel size from your normal viewing position) and is your browser set up to do any scaling based on that? Maybe it should be.

I really dislike the trend of giant fonts and whitespace.

sontek 5 years ago | |

Same, I don't understand why the font is so small by default. Just use the default browser font size I've defined as a user!

TeMPOraL 5 years ago | |

Same with Firefox. I have HN at 190%, and got startled by the error message being so. big. and. weird.

omilu 5 years ago | |

I had to check my own zoom, 200% as well.

abrowne 5 years ago | |

The "View page source" pages too.

RaketenStadt 5 years ago | |

The font-size on HN is barely readable, I'm working on an accessible skin for the HN frontend that addresses this.

I'm targeting WCAG 2.0. Keep an eye out for the "Show HN" coming soon!

p1necone 5 years ago | | |

Are you using a high dpi monitor but not using > 100% display scaling in your OS or something? It's roughly the same size as most other sites for me.

(And pretty much all browsers have a zoom function for exactly this, it feels like a totally separate frontend would be more hassle to use than just ctrl + scroll wheel once)

davchana 5 years ago | | |

While you are there, can I feature request that bring the upvote icon/button to end of the comment? Right now that triangle is at the beginning of comment, sometimes a comment is long & interesting,I want to upvote it because its relevant, interesting & correct, have to scroll back up.

dkersten 5 years ago | |

I consider that a feature, not a bug. I typically do all browsing zoomed in somewhat and I expect the "page can't load" to also be zoomed. Or am I misunderstanding what you're saying?

EDIT: People who disagree, care to explain? I zoomed in, so why would I expect it to zoom out just because its a different page? What am I missing?

symisc_devel 5 years ago |

Hacker News is hosted at M5 and they are having a network outage:

http://status.m5hosting.com/pages/incident/5407b8e2b00244251...

edit: Unrelated to the Azure outage.

1vuio0pswjnm7 5 years ago | |

HN is also available through Cloudflare but that seems to depend on M5.

Don't take my word for it. Test it for yourself:

  printf 'GET / HTTP/1.1\r\nHost: news.ycombinator.com\r\nConnection: close\r\n\r\n' \
   |openssl s_client -connect cloudflare.com:443 -ign_eof -servername news.ycombinator.com

dang 5 years ago | | |

We stopped using Cloudflare a few years ago.

https://news.ycombinator.com/item?id=18188832

https://news.ycombinator.com/item?id=21799045

slig 5 years ago | | |

Cloudflare only proxies dynamic websites.

fotta 5 years ago | |

I'm surprised that a site as big as HN is only hosted in one place.

Aperocky 5 years ago | | |

HN is probably very small. Curious as to the minimum size of the backend that will hold up the website.

There may need to be read replicas, but maybe not even that is needed.

jsty 5 years ago | | |

Until 2018 at least it was ... wait for it ... a single server!

https://news.ycombinator.com/item?id=18496344

(Anyone know if that's still the case?)

mwcampbell 5 years ago | | |

Running on a single server is cheaper, and nobody loses money if HN is down (as far as I know), so it makes sense.

alvatech 5 years ago |

I think I have tried to visit HN for more than 10 times in last 2 hours and failed. This made me realize how much I'm addicted to HN

jorl17 5 years ago | |

I keep 3 pinned tabs in my browser:

- Reddit (my main source of addiction)

- HackerNews (the second source of addiction)

- Cookie Clicker (a rather recent addition that I'm slightly embarassed of)

At a point in time I also had facebook, but I've since stopped going there (maybe once a week).

willis936 5 years ago | | |

Just cheat. It'll break the spell quickly.

Also, check out universal paperclips if you haven't already. it has a definite end. You likely won't play more than maybe 10-20 hours.

ethbr0 5 years ago | | |

Just wait until you find out Reddit and Cookie Clicker have the same endgame...

wave100 5 years ago | | |

I'm not sure if this is still a thing, but at one point you could open up a JS console on cookie clicker and run game.ruinTheFun() to unlock everything. :)

ghgdynb1 5 years ago | | |

I used HN to quit Reddit and I must say it’s been a change for the better.

jart 5 years ago | | |

I kind of feel like we need a "Year Zero" clicker game where once you get up to 1.7m "clicks" you'll see Pol Pot start dancing in the corner. Then as you accumulate more clicks, you'll see Hitler, Stalin, and Mao make appearances as well. Then, finally, once you've overflown the 32-bit integer, the year resets to 1970 and dennis ritchie and brian kernighan start dancing in the corner as well.

IndySun 5 years ago | |

>more than 10 times in last 2 hours

You could utilise the noprocrast option in your HN settings.

alvatech 5 years ago | | |

Didn't know that this feature existed. I enabled it.

app4soft 5 years ago | |

> I think I have tried to visit HN for more than 10 times in last 2 hours and failed.

Mee too!

> This made me realize how much I'm addicted to HN

I sought that my IP was shadow-banned by HN...

marshmallow_12 5 years ago | |

I was scared Dang had blocked me

hnrodey 5 years ago |

Had me quite confused because I'm also having home internet issues. I was trying to get my laptop to switch to my mobile hotspot and HN is one of the sites I used to test connectivity because a) it's almost always available and b) loads very quick.

A bit of a mindfuck trying to assess my actual internet connectivity via a site that was also down : )_

aasasd 5 years ago | |

The common method of testing connectivity is opening Bing. Because it's guaranteed to not be cached in the browser.

dylan604 5 years ago | | |

That reminds me of the old IE joke. IE, the most commonly used browser to download another browser.

beaconstudios 5 years ago | | |

Ha, I do this with yahoo.com. I never have a reason to visit it otherwise.

nhylated 5 years ago | | |

Found some use for Bing!

divbzero 5 years ago | |

Ditto. HN is so reliable and light on JavaScript that I typically use it to test my connection. I thought my connection was down earlier but guess this was the rare case where it was HN.

(Other comments suggest it was a network outage at M5 where HN is hosted.)

blakehaswell 5 years ago | | |

Me too. I was trying to browse HN on my phone earlier and my first instinct was that my WiFi was having a moment. It's a testament to how reliable HN is.

k__ 5 years ago | |

I was trying to read some news while training in the basement, where I don't have very good Wi-Fi. Usually HN is one of the pages that work better down there, haha.

koolba 5 years ago |

The title should be updated to "Productivity was up".

j_walter 5 years ago | |

Not sure that is true...trying to find other info as to why HN was down led to more productivity lost here

bombcar 5 years ago | | |

Can't sign into Azure Portal, let's check HN, oh that's down too, hmm is my internet up ...

Huge rabbit hole

MattGaiser 5 years ago | |

With Azure also going down, lots of people were probably scrambling to figure out what blew up.

yawnxyz 5 years ago |

my fingers automatically just start typing in "news.y" when I'm idle, I definitely didn't know what to do when greeted with a 404!

Is there any way to put the HN homepage on an edge cache so at least the homepage shows up? Or am I admitting that I'm addicted to checking HN too many times a day?

breckinloggins 5 years ago | |

It's gotten so bad for me that I'm down to just "n". I think I have a problem.

jrockway 5 years ago | | |

I used to use a web browser with Emacs keybindings, so visiting a URL was the same keystroke as opening a file. I'd type "C-x C-f news.ycombinator.com" quite regularly, and my fingers still go to that "n" when I visit a file in Emacs.

axaxs 5 years ago | | |

LOL, same. In fact, every site I visit often is one char + enter in the browser. With the exception of W, being east of the Mississippi every station starts with W.

That got me to thinking about 'first letter advantages.' If a site has a first letter not currently in use, I'm much more likely to visit it more often(mostly out of boredom, sure).

V and X are still available if anyone is wondering. Zillow got Z!

slater 5 years ago | | |

i was gonna say, check out that n00b that has to type all the way to "news.y" for the browser autocomplete! :D

madjam002 5 years ago | | |

Oh man this hits home so much

h2odragon 5 years ago | |

Admitting you have a problem is the first step to recovery, right? I'm sure I've heard that. Dunno how it's supposed to help.

dylan604 5 years ago | |

Open a 2nd tab. Turn your 2 404s into one 808. Then start making music

theshrike79 5 years ago | |

Just don't type "news." and hit enter, it'll redirect to some domain squatter crap and it'll be stuck in your autocomplete for a while =)

rossdavidh 5 years ago | |

Yes, that's what you're admitting. :) Not that you're alone in that...

coding123 5 years ago |

I never expect HN to be down... I asked my wife - hey is the internet down? She said - no, it's working for me. I clicked on another site and my mouth dropped.

ibraheemdev 5 years ago |

It says a lot that @hnstatus has not tweeted since 2018.

dangwu 5 years ago | |

@HNStatus tweeted about the outage 3 hours ago.

TheRealNGenius 5 years ago | | |

That's the point...

tiffanyh 5 years ago |

@dang

Would love an updated post on what the current hardware / software stack that’s running HN.

It’s been years since I’ve seen a post/comment on this topic.

Are you still running FreeBSD, on a few high frequency cores (iirc)?

vulcan01 5 years ago | |

He commented on this a couple hours ago:

https://news.ycombinator.com/item?id=26469566

eternalban 5 years ago | | |

TIL HN uses "mirrored magnetic for logs (UFS)". Is there a privacy policy posted anywhere? What's in these logs? Magnetic is for long term storage. How far back does it go?

2bitencryption 5 years ago |

Azure AAD also had an outage at this time - perhaps linked in some domino effect, or perhaps a coincidence?

https://status.azure.com/en-us/status

bombcar 5 years ago | |

Looks like it was a coincidence - unless Azure auth going down shut off a rack in San Diego.

SigmundA 5 years ago | |

Wondering this too, Teams started going came to HN to get commentary and it was down too.

tartoran 5 years ago |

Seeing HN unresolved was a bit weird as it is the best performing site I ever visit on my low bandwidth phone so several times I thought the problem was on my end. But in the end it helped me realize how frequently I dial into HN while it was down. I have a bit of a problem and I think I need to turn on that no procrastination flag on.

tpowell 5 years ago |

I've been on this site 12+ years, and I don't ever remember it being down. I assumed we were under nuclear attack.

technick 5 years ago |

HN is my homepage and when it wouldn't load, I told my coworkers the internet was down and took a 3 hour lunch.

ibraheemdev 5 years ago |

> Back now. Got to write some code for a change... - 5:39

https://twitter.com/HNStatus/status/1371576822748487683

fabbari 5 years ago |

It's seems an odd coincidence of this and the Azure AD outage -- I was trying to get to HN to see what people were saying about it!

PenguinCoder 5 years ago | |

Definitely. My thought was "HN is hosted on Azure"? So I went looking into their hosting provider, and lo, they were down too. M5 might be Azure hosted... couldn't confirm that.

fotta 5 years ago |

Something that I learned from this is that HN has a status Twitter. Rarely used though, which is a testament to the team.

https://twitter.com/HNStatus/status/1371525940656803848?s=20

EasyTiger_ 5 years ago | |

Only they never posted anything during the outage there

edub 5 years ago | | |

they did, if you click on the link in the post you replied to, you'll see it is a link to their post from today about the outage.

fotta 5 years ago | | |

Unsure what you mean there, as the linked tweet was from 4h ago.

JasonFruit 5 years ago |

I thought pg posted "Memphis".

spondyl 5 years ago |

I didn't notice unfortunately due to the Azure outage blowing everything up :(

bombcar 5 years ago |

So strange that this coincided with Azure authentication eating it.

mikiem 5 years ago | |

Unrelated issues, but I did hear from our other clients that O365 was having issues at the same time as our network outage affected HN and many others.

enobrev 5 years ago |

The one time I'm actually reading HN for actually relevant information for actual work, it's down for half a day. Made for a great excuse to take a nap.

PhilosAccnting 5 years ago |

I have a massive learning project[1], and I think 2/3 of my "to get through as soon as sensible" content is news.ycombinator links.

Needless to say, this site is my own personal StackOverflow, and I think there's something about ingratitude bouncing around in my mind somewhere.

[1]https://github.com/PhilosAccounting/ts-learning

bfostbfostbfost 5 years ago | |

Wow, looks like a wealth of knowledge. Forked it for myself, only for reference, hope that is ok. Just seems like a ton of great info that I’d love to comb through myself. Cheers.

PhilosAccnting 5 years ago | | |

Totally okay, though I added the PDFs/videos to gitignore. I'm mildly paranoid about IPs[1]!

[1]https://gainedin.site/ip/

rattray 5 years ago |

Is HN fully back? Looks like this was a little less than 3 hours total, is that right?

dang 5 years ago | |

Between 3 and 3.5 hours to judge by when PagerDuty stopped bugging me. I was working on code and someone had to tell me it was back up.

rattray 5 years ago | | |

Thanks! (And thanks for all your hard work!)

tomxor 5 years ago |

Funny, my first thought was "oh no they've blacklisted VPNs", can't remember when HN was ever down!

1vuio0pswjnm7 5 years ago |

It did not seem to affect the Firebase feed.

protomikron 5 years ago |

Curious, what is the uptime of HN - is there some data about that?

My guess is around 99.9% ... but maybe that's too optimistic?

simonebrunozzi 5 years ago | |

Why too optimistic?

Probably closer to 4 9s.

With this outage of ~2 hours, we are at ~99.97% for this year. (I am not aware of any other downtime during 2021)

Rule of thumb (I strongly prefer minutes/year instead of 9s, to get an immediate sense of how good the availability is):

99.9% : down for 525 minutes / year, or roughly ~10 hours

99.99% : down for 52 minutes / year, or roughly ~1 hour

99.999% : down for 5 minutes / year

protomikron 5 years ago | | |

Yes, but it would be nice to read some "official" numbers backed by HN's monitoring (although I'm on HN quite/too often, I would not notice every downtime).

southerntofu 5 years ago |

Who needs so many 9's when there's actually interesting content we keep coming back for?

vincentmarle 5 years ago |

We also had issues with our YC application earlier today, was that related to this issue?

Koshkin 5 years ago |

It’s OK. The skies haven’t fallen. (And I was able to accomplish something today.)

aritmo 5 years ago |

There was a noticeable increase in productivity during the last hour or so.

kenm47 5 years ago |

It's 3pm.... do you know where your servers are?

peanut_worm 5 years ago |

And now it looks like there is an outage at reddit

tempestn 5 years ago |

Is this related to the big Microsoft outage?

deepsun 5 years ago |

If only they used Kubernetes! /s

0xbadcafebee 5 years ago |

Since I couldn't get to HN, I wrote up how to make the site resilient to outages: https://gist.github.com/peterwwillis/ce2bfaba7fc72e4af44c281...

tl;dr 1 server x 2 providers, different regions, replicate content

comprev 5 years ago | |

Can't decide if the Gist is tongue-in-cheek or actually serious...

nickthemagicman 5 years ago |

This site is so reliable, I thought my I.P. had gotten banned.