HN was down(twitter.com) |
HN was down(twitter.com) |
My guess is around 99.9% ... but maybe that's too optimistic?
Probably closer to 4 9s.
With this outage of ~2 hours, we are at ~99.97% for this year. (I am not aware of any other downtime during 2021)
Rule of thumb (I strongly prefer minutes/year instead of 9s, to get an immediate sense of how good the availability is):
99.9% : down for 525 minutes / year, or roughly ~10 hours
99.99% : down for 52 minutes / year, or roughly ~1 hour
99.999% : down for 5 minutes / year
tl;dr 1 server x 2 providers, different regions, replicate content
I'm a big fan of HN and YC in general, we host of other YC alum, and I have taken a few things through YC Startup School. During this incident, I spoke to YC personally when they called this morning.
We are in the process of slowly moving to a distributed system (distributed DB) that is going to make fallover easier. However, that kind of setup is orders of magnitudes more complex than the current (manual fallover) setup. I really wonder if the planned design is going to be more reliable in practice. Complexity is almost always a bad idea, in my experience. Distributed systems are just fundamentally very complicated.
Screenshot: https://i.imgur.com/VwFtgQh.png
(As an aside, I keep HN at 150% and old reddit at 120% - those are the only 2 sites I have permanently zoomed)
But, if said destination resource is very slow to hit TTFB, you switch to a different tab, then back to the loading tab, you'll see the current page at the destination page's zoom settings.
My guess is that the interstitial system that injects error pages, Safe Browsing warnings, etc, doesn't hit the code path that says "we loaded a new (regular) page, go find its zoom settings".
Demo/PoC:
1. Run $anything that will serve a webpage on an arbitrary port - even an error page or directory listing. eg, python3 -m http.server, php -S 0:8000, etc.
2. Open the resource you just set up in a new tab, zoom in or out as preferred (eg, to a crazy level), copy the URL (for convenience), then close the tab.
3. Stop the server in (1), then run `nc -lp 8000` (or netcat, ncat, or $anything that will listen but never respond).
4. Open a new tab, navigate to a valid website (eg here :), example.com, etc), then once it's loaded, paste the URL you copied. With the page spinning and waiting for netcat (et al), navigate away from the tab, then back to it again.
Think I noticed this for the first time a couple years ago. Seems harmless enough.
Granted, I am probably importing old thoughts of it being a sort of user provided style sheet.
Judging from the responses, this is actually a lot more popular than I assumed.
Which begs the question: Does anyone feel the default font is just perfect and wouldn't want it to be bigger even by a tiny bit?
I think it's perfect. What is your screen DPI (or rather angular pixel size from your normal viewing position) and is your browser set up to do any scaling based on that? Maybe it should be.
I really dislike the trend of giant fonts and whitespace.
I'm targeting WCAG 2.0. Keep an eye out for the "Show HN" coming soon!
(And pretty much all browsers have a zoom function for exactly this, it feels like a totally separate frontend would be more hassle to use than just ctrl + scroll wheel once)
EDIT: People who disagree, care to explain? I zoomed in, so why would I expect it to zoom out just because its a different page? What am I missing?
http://status.m5hosting.com/pages/incident/5407b8e2b00244251...
edit: Unrelated to the Azure outage.
Don't take my word for it. Test it for yourself:
printf 'GET / HTTP/1.1\r\nHost: news.ycombinator.com\r\nConnection: close\r\n\r\n' \
|openssl s_client -connect cloudflare.com:443 -ign_eof -servername news.ycombinator.comThere may need to be read replicas, but maybe not even that is needed.
https://news.ycombinator.com/item?id=18496344
(Anyone know if that's still the case?)
- Reddit (my main source of addiction)
- HackerNews (the second source of addiction)
- Cookie Clicker (a rather recent addition that I'm slightly embarassed of)
At a point in time I also had facebook, but I've since stopped going there (maybe once a week).
Also, check out universal paperclips if you haven't already. it has a definite end. You likely won't play more than maybe 10-20 hours.
You could utilise the noprocrast option in your HN settings.
Mee too!
> This made me realize how much I'm addicted to HN
I sought that my IP was shadow-banned by HN...
A bit of a mindfuck trying to assess my actual internet connectivity via a site that was also down : )_
(Other comments suggest it was a network outage at M5 where HN is hosted.)
Huge rabbit hole
Is there any way to put the HN homepage on an edge cache so at least the homepage shows up? Or am I admitting that I'm addicted to checking HN too many times a day?
That got me to thinking about 'first letter advantages.' If a site has a first letter not currently in use, I'm much more likely to visit it more often(mostly out of boredom, sure).
V and X are still available if anyone is wondering. Zillow got Z!
/s
Would love an updated post on what the current hardware / software stack that’s running HN.
It’s been years since I’ve seen a post/comment on this topic.
Are you still running FreeBSD, on a few high frequency cores (iirc)?
https://twitter.com/HNStatus/status/1371525940656803848?s=20
Needless to say, this site is my own personal StackOverflow, and I think there's something about ingratitude bouncing around in my mind somewhere.
This actually got me thinking. Do we really need CDN? This is one of those thing we take and use without actually thinking whether we could do without it.
Interesting thought experiment.
Static websites will get the best speed boost from locally served assets (much reduced latency from the local POP) because the page itself can be cached (presuming headers on origin site are correctly set). Especially for page requests from international users.
https://meta.stackexchange.com/questions/10369/which-tools-a...
tl;dr StackOverflow's architecture is fairly simple and has done mostly vertical scaling (more powerful machines) and bare metal servers rather than virtual servers. They also realize their use patterns are read-heavy so there's a lot of caching and they take advantage of CDNs for static content which completely offloads that traffic off their main servers.
[0] https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...
Asking as someone who was impacted by the OVH fire last week, and I didn't have recent backups and therefore lost data.
Sorry to hear that, that sucks.
(I don't have any relationship to HN)
The section on ("HN Information") - does this include e.g. IP address under "any submissions or comments that you have publicly posted to the Hacker News site"? My naive reading of that would say "no". But is that correct?
"If you create a Hacker News profile, we may collect your username (please note that references to your username in this Privacy Policy include your Hacker News ID or another username that you are permitted to create in connection with the Site, depending on the circumstances), password, email address (only if you choose to provide it), the date you created your account, your karma (HN points accumulated by your account in response to submissions and comments you post), any information you choose to provide in the “about” field, and any submissions or comments that you have publicly posted to the Hacker News site (“HN Information”)."
"Log data: Information that your browser automatically sends whenever you visit the Site (“log data”). Log data includes your Internet Protocol address, browser type and settings, the date and time of your request, and how you interacted with the Site."
Then there's also this section:
"Online Tracking and Do Not Track Signals: We and our third party service providers may use cookies or other tracking technologies to collect information about your browsing activities over time and across different websites following your use of the Site."
I would assume that "other tracking technologies" includes IP addresses.
I always recommend Universal Paperclips to people who don't like cookie clicker games, because I fell in love with it the first time I tried it (heard of it from the Hello Internet podcast)
Once you've played a fair and truly exponential clicker through a few times you can't tolerate the forced linearity of a pay-to-win clicker app.
I have to say that UP was definitely a much better experience.
Universal Paperclips cures/inoculates against Cookie Clicker 2 easily. Cookie Clicker Classic looks too addictive for me to let myself try it.
Long covid sucks.
Two of my hypotheses are:
(1) some designers are working on huge screens themselves, and don't test enough in usual resolutions
(2) it's easier to achieve good visual composition by doing a lot of whitespace (to the expense of hiding things below fold or in triggerable containers)
HN is readable - just - but it's definitely on the small side.
The complete lack of some sort of horizontal constraint doesn't help either. 200 character lines are no bueno for reading.
It worked much better to just tell it to output 1080p and let my television scale it... less graphics memory too. I still need to scale HN up relative to other sites in order to read it though.
If I compare the text of your comment to the text of an article on npr.org it seems like about the same as the difference between 9pt and 12pt, and they are using a serif font that seems to be a lot easier to read.
It's a style choice I guess? It seems like it would work best on a large 1080p display, so maybe that's just what the person who designed the layout was using.
Zoom doesn't fix line lengths of 1500 characters and terrible color contrast.
The link to the site guidelines is 7pt with a contrast that fails WCAG 2.0. No wonder no one reads them.
That depends on your browsers zoom implementation. Firefox is able to zoom the text/element sizes while keeping the page width the same on HN.
https://www.onegraph.com/docs/subscriptions.html (it'll load in at the top of the page)
E.g. some periodic replication + external down detector + a break-before make failover that brings up the cold, accepting any unreplicated state will be trashed and rendering the hot inactive until manual reactivation
[0]: https://twitter.com/HNStatus
[1]: Last tweet since before today's incident was 2 years ago, 4 years ago since the one before that.
You could say that Chrome is designed to tie the zoom level to the viewport but I wouldn't count on this behavior springing up from an underlying design and implementation rather than it being a design choice for the user experience.
That is, consider your network is down. You try to go to an address. It doesn't load, so you try another address, the page changes; but it is the same content.
That's what the GP comment said happened: the zoom level was the one associated with what they previously had set on HN, and they expected it to be the opposite, the default zoom level for the browser.
Is easier to see as broken by thinking of "how could I set it so that my browser's error page has a default zoom?"
In both cases, those are the browser supplying a resource representation, while still technically being on the resource specified in the navigation bar. The thing you're seeing is an overridden representation of the server's response. (Which, in this case, just happened to be "no response.")
It's almost exactly the same as how the server sending a 304 gets the browser to load the document from cache. The server's actual response was a 304; but the browser's representation of that response is the cached HTML DOM it had laying around from the last 2xx resource-representation it received "about" the same resource.
And I can see the argument for either. If I increase my terminal's font and run a curl, the response is scaled up. That makes sense, I scaled up my terminal.
To that end, it is odd that scalling up is per document origin. I'm assuming that is configurable?
But we get around 6M requests a day now.
(Just so nobody misinterprets my question, nothing wrong with FreeBSD, I know other stuff also runs on it like Netflix’s CDN. Still always interested to hear why people choose the road less travelled)
FreeBSD is still an excellent choice for servers. You may prefer Linux for servers if you're more familiar with it from using it on your laptop. But you use Mac laptops, FreeBSD sysadmin will seem at least as comfortable as Linux.
For failures that don't take down the datacenter, we already have a hot standby. For datacenter failures, we can migrate to a different host (at least, we believe we can—it's been a while since we verified this). But it would take at least a few hours, and probably the inevitable glitches would make it take the better part of a day. Let's say a day. The question is whether the considerable effort to build and maintain a cross-datacenter standby, in order to prevent outages of a few hours like today's, would be a good investment of resources.
And a status page would be nice.
Obviously the entire HN dataset could and should be in RAM, but the biggest performance improvements I ever made came from shrinking the working set as much as possible. Yes, we have long-term plans to fix this, but at present the only reliable strategy for getting to work on the code is for HN to go down hard, and we don't. want. that.
There's an open source fork at https://github.com/arclanguage/anarki, but it doesn't have any direct relationship with HN.
That's a significant distinction because if you swap the underlying implementation then the same application should magically become multithreaded, which is exactly the plan.
That being said, if modern websites were rated by utility to user divided by complexity of tech stack, I must say Hacker News would be one of the top ranked sites compared to something similar like Reddit or Twitter which at times feels... like a juggling act on top of unicycle just to read some comments. :)
It's interesting that they might still be on Lisp if they hadn't picked FreeBSD (a chiefly cited concern was that spez's local dev environment couldn't actually run reddit, which seems like it wouldn't have been a problem with Linux, since Linux & OS X both had OpenMCL (now known as CCL) as a choice for threaded Lisp implementations at the time).
I don't know how Reddit came to use FreeBSD, but if you asked which OS to use around university CS departments in 2005 you'd get that answer pretty often.
Thanks for answering! That's really interesting about clisp; I've always found it a more comfortable interactive environment than any other Common Lisp, but it definitely sacrifices portability for comfort in more ways than one (lots of symbols out of the box that aren't in the HyperSpec or any other implementation, too, for example). I'm now really thankful I've never been tempted to look to its source!
It might be a good idea to verify it; see the recent events at OVH (https://news.ycombinator.com/item?id=26407323).
Obviously does not apply to engineering effort outside of hacker news website, which the team might be working on.
But this forum has seen little change over the years and it's pretty awesome as is.
(Though I didn't use HN api too much so not sure what's going on that side).
I'm currently working on fixing a bug where collapsing comments in Firefox jumps you back to the top of the page. I'm taking it as an opportunity to refine my (deliberately) dead-simple implementation from 2016.
> But this forum has seen little change over the years and it's pretty awesome as is.
That's an illusion that we work hard to preserve, because users like it. People may not have seen much change over the years but that's not because change isn't happening, it's because we work mostly behind the scenes. Though I have to say, I really need more time to work on the code. I shouldn't have to wait for 3 hours of network outage to do that (but before anyone gets indignant, it's my own fault).
I do wonder how you are defining "real UNIX system" in that statement.
Massive systems miss the design intent and, to a great extent, nearly every benefit of using UNIX over VAX.
This excludes many of the operating systems licensed to use the trademark "UNIX." In this regard, even though Plan 9 is obviously not UNIX, it's a lot closer to it than (any) Linux and FreeBSD.
I take it you meant to say "VMS" here, not VAX.
I don't think the size of a system is essential to whether it counts as "UNIX" or not. The normal trajectory of any system which starts small is to progressively grow bigger, as demands and use cases and person-years invested all accumulate. UNIX has followed exactly that trajectory. I don't see why if a small system gradually grows bigger it at some point stops being itself.
I think there are three main senses of UNIX – "trademark UNIX" (passing the conformance test suite and licensing the trademark from the Open Group), "heritage/genealogical UNIX" (being descended from the original Bell Labs Unix code base), "Unix-like" (systems like Linux which don't descend from Bell Labs code and, with rare exception, don't formally pass the test suite and license the trademark, but which still aim at a very high degree of Unix compatibility). I think all three senses are valid, and I don't think size or scale is an essential component of any of them.
UNIX began life on small machines (PDP-7 then PDP-11), but was before long ported to some very large ones (for their day) – such as IBM mainframes – and the operating system tends to grow to match the scale of the environment it is running in. AT&T's early 1980s IBM mainframe port [0] was noticeably complicated, being written as a layer on top of the pre-existing (and obscure) IBM mainframe operating system TSS/370. If being small is essential to being UNIX, UNIX was only a little more than 10 years old before it was already starting to grow out of being itself.
[0] https://www.bell-labs.com/usr/dmr/www/otherports/ibm.pdf
Embarrassing slip in this context (I was just reading the CLE spec, too!), but yes.
> UNIX has followed exactly that trajectory. I don't see why if a small system gradually grows bigger it at some point stops being itself.
Adding onto something (and tearing down the principles it was created on, as Linux and most modern BSDs do) doesn't always preserve the initial thing; a well-built house is better as itself than reworked into a McMansion. Moissanite isn't diamond; it's actually quite different.
An operating system that has a kernel with more lines of code than the entirety of v7 (including user programs) is too much larger than UNIX, and too much of the structure has been changed, to count as UNIX in any meaningful sense of the word.
> If being small is essential to being UNIX, UNIX was only a little more than 10 years old before it was already starting to grow out of being itself.
Correct, which is why many of the initial UNIX contributors started work on Plan 9.
> the only real UNIX systems available these days are illumos and xv6
And then when I ask you what makes those "real UNIX systems" you say:
> I'd say it's easiest to define what isn't: massive systems.
But I don't see how illumos doesn't count as a "massive system". Think of all the features included in illumos and its various distributions: two networking APIs (STREAMS and sockets), DTrace, ZFS, SMF, Contracts, Doors, zones, KVM, projects, NFS, NIS, iSCSI, NSS, PAM, Crossbow, X11, Gnome, IPS (or pkgsrc on SmartOS), the list just goes. illumos strictly speaking is just the kernel, and while much of the preceding is in the kernel, some of it is user space only; but, to really do an apples-to-apples comparison, we have to include the user space (OpenIndiana, SmartOS, whatever) as well. Solaris and its descendant illumos are just as massive systems as Linux or *BSD or AIX or macOS are.
I will grant you that xv6 is not a massive system. But xv6 was designed for use in operating systems education, not for production use (whether as a workstation or server). If you actually tried to use xv6 for production purposes, you'd soon enough add so much stuff to it, that it would turn into just as massive a system as any of these are.
Much of what you mention isn't actually necessary/isn't actually in every distribution! Including X11 and GNOME as a piece of it is a bit extreme, don't you think? I also think it's a bit extreme to put things that are obviously mistakes (Zones, doors, SMF, IPS) in with things that actually simplify the system (DTrace and ZFS, most importantly) as reasons for why illumos is overly-complex.
I mostly agree with the idea that we have to include user space; even then, it's still clear that illumos is much closer to sane, UNIX-ideals than Linux is. I'm not going to claim that the illumos libc is perfect (far from it!), but the difference in approach between it and glibc highlights how deep the divide runs here. illumos, including its userspace, is significantly smaller than most Linux, massively smaller than macOS, slightly smaller than FreeBSD (and much better designed). All of these, though, are of course much smaller and far more elegant than AIX, so in that way we all win.
I don't actually know much more I would add to xv6. If anything, I'd start by removing things. Mainly, I hate fork. Of course, its userspace is relatively small, but v7's userspace is more or less enough for me (anecdotally, I spend much of my time within via SIMH and it's pretty comfortable, although there are obviously limits to this), so it wouldn't take many more additions to make it a comfortable environment.
Again, I'm not claiming Linux is bad (I love Linux!), simply that it isn't UNIX and doesn't adhere to the UNIX philosophy.
I talked earlier about three different definitions of UNIX – "trademark/certified UNIX", "heritage/genealogical UNIX" and "UNIX-like/UNIX-compatible". Maybe we could add a fourth, "philosophical UNIX". I don't know why we should say that is the only valid definition and ignore the validity of the other three.
The fact is that opinions differ on exactly what the "UNIX philosophy" is, and on how well various systems comply with it. The other three definitions have the advantage of being more objective/clearcut and less subject to debate or differing personal opinions.
Some would argue that UNIX itself doesn't always follow the UNIX philosophy – or at least not as well as it could – which leads to the conclusion that maybe UNIX itself isn't UNIX, and that maybe a "real UNIX" system has never actually existed.
It is claimed that one part of the UNIX philosophy is that "everything is a file". And yet, UNIX started out not treating processes as files, which leads to various problems, like how do I wait on a subprocess to terminate and a file descriptor at the same time? Even if I have an API to wait on a set of file descriptors, I can't wait on a subprocess to terminate using that API since a subprocess isn't a file descriptor.
People often point to /proc in Linux as an answer to this, but it didn't really solve the problem, since Linux's /proc was mostly read-only and the file descriptor returned by open(/proc/PID) didn't let you control or wait on the process – this is no longer true with the introduction of pidfd, but that's a rather new feature, only since 2019; Plan 9's /proc is much closer, due to the ctl file; V8 Unix's is better than the traditional Linux /proc (you can manipulate the process using ioctl) but not as good as Plan 9's (its ioctls expose more limited functionality than Plan 9's ctl file); FreeBSD's pdfork/pdkill is a good approach but they've only been around since 2012.
For "trademark UNIX": very few of the systems within are small, comprehensible or elegant.
For "heritage/genealogical UNIX": Windows 10 may have the heritage of DOS, but I wouldn't call it "DOS with a GUI."
For "UNIX-like/UNIX-compatible": nothing is really UNIX-compatible or all that UNIX-like. Do you define it as "source compatibility?" Nothing from v7 or before will compile; it's before standardization of C. Do you define it as "script compatibility?" UNIX never consistently stuck to a shell, which is why POSIX requires POSIX sh which is in many ways more limited than the Bourne shell.
I personally take McIllroy's view on the UNIX philosophy:
A number of maxims have gained currency among the builders and users of the UNIX system to explain and promote its characteristic style:
* Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features."
* Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.
* Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them.
* Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them.
Throwing out things that don't work is a good idea, which is why the modern backwards-compatible-ish hell is far from UNIX (in this regard, I'll admit illumos doesn't qualify).I fully agree with you that Plan 9 is closer to UNIX than Linux and FreeBSD!
Does AT&T's c. 1980 port of Unix to run on top of IBM's TSS/370 mainframe operating system [0] count as a real Unix? It appears that Ritchie did think it was a Unix, he linked to the paper from his page on Unix portability [1].
So is your definition of "Unix" broad enough to include that system? If not, you are defining the term differently from how Ritchie defined it; in which case I think we should prefer Ritchie's definition to yours. (McIlroy's maxims are explicating the Unix philosophy, but I don't read him as saying that systems which historically count as Unix aren't really Unix if they fall short in following his maxims.)
[0] https://www.bell-labs.com/usr/dmr/www/otherports/ibm.pdf
This is why I used the quote, not for this reason:
> but I don't read him as saying that systems which historically count as Unix aren't really Unix if they fall short in following his maxims.
I'd say yes, a port of v7 is fine, because it's not meaningfully more complex. It can still be comprehended by a single individual (unlike FreeBSD, Linux, everything currently called Certified Commercial UNIX trademark symbol, etcetera).
I think AT&T's port of V7 (or something close to V7, I guess it was probably actually a variant of PWB) to run on top of TSS/370 really is meaningfully more complex because in order to understand it you also have to understand IBM TSS/370 and the interactions between TSS/370 and Unix.