Leaving the Basement

Leaving the Basement(community.hachyderm.io)

80 points by timf 3 years ago | 50 comments

dang 3 years ago |

Recent and related:

Post mortem on Mastodon outage with 30k users - https://news.ycombinator.com/item?id=33855250 - Dec 2022 (101 comments)

(Offtopic meta note: Alert users will note that that thread was posted later than this one. This is because the second-chance process (https://news.ycombinator.com/item?id=26998308) has a race condition: the events "story makes front page" and "moderator puts story in second-chance pool" sometimes diverge and can happen in any order.)

Enderboi 3 years ago |

Ah yes.

I've also seen NFS/ZFS on Linux have very... bizzare... issues with locking, latency, and poor handling of errors bubbled up from the block layer taking down clients or even the host.

All of these went away when we redeployed everything into a Solaris-based distro (still exporting ZFS shares to Linux clients via NFS). It does seem something specific to the interaction of these two components under load on a Linux kernel.

Unfortunately, it also only happens under real-world production load and was impossible to create reliable test-case with simulated stress tests or benchmarking :(

sydbarrett74 3 years ago | |

Did you ever evaluate FreeBSD? My hesitation with Illumos and downstream distros is the small number of people maintaining the ecosystem (not that any of the BSD's have huge dev teams by comparison, but still).

That said, I think OpenSolaris is technically superior in most ways to any of the BSD's.

Enderboi 3 years ago | | |

Yeah, FreeBSD was an improvement with regards to ZFS/NFS integration not having any major issues.

Unfortunately we had some strange HBA issues with our disk shelves, which went away with an Illumos downstream. Since our use case for this was basically an isolated box that just supplied NFS shares, the limited ecosystem wasn't a major concern more so than stability :)

hossbeast 3 years ago | |

As someone storing important stuff on ZoL on my desktop, this is concerning to read. OTOH I'm 4+ years in, I've never had an issue, and I have backups (outside of the zfs volume itself).

lakomen 3 years ago | |

Yeah I've had nothing but bad experiences with ZoL

rglullis 3 years ago |

As someone running a commercial provider for Mastodon (and Matrix, and XMPP...), I am somewhat envious of these posts. "Wow, 30000 users! If I had that many users on my service paying the $0.50/month I am charging, it would be enough to pay myself a full salary!".

But then I realize that they are only getting these many people because they are not driven by commercial interests: even with donations, I can bet they are not collecting enough to keep things afloat and they only keep going because they don't mind spending all this time, money and resources of their own on this project. They can treat it as a (relatively expensive) hobby, and they can keep it running as long as it satisfies them.

The problem is that I think that this is harmful in the long run. Yes, people now are finally seeing the issue with ad-funded social media. But if we want to have a healthy alternative, we need to understand TANSTAAFL, we need to accept that we need to give real money to the people working on this and to have the servers available 24/7 to store and distribute the hot takes and stupid memes that we so bizarrely crave every day.

I worry that if we don't change the mindset quickly, the whole Twitter drama would be a wasted opportunity and Mastodon (and the Fediverse in general) will go back to the status quo, where surveillance capitalism is the norm and truly open systems are just a geeky curiosity.

I wish I could fund a tech-equivalent of the "buy local and organic" campaign. I wish I had more people thinking "ok, I will pay $5/month to this guy and I will bring 10 people to this instance" because it is the ethical thing to do.

buovjaga 3 years ago | |

For financials, see https://community.hachyderm.io/blog/2022/12/04/growth-and-su...

rglullis 3 years ago | | |

You lost me at "call for moderators and volunteers". Unless these people are actually paid for this taxing and stressful work, I don't believe it can be called "sustainable".

watchdogtimer 3 years ago |

Kris described hachyderm's infrastructure operating in her basement on the Oxide and Friends podcast in mid-November. Kudos to her for being able to keep it going there so long!

CharlesW 3 years ago | |

As someone who'd hoped to start an industry-focused instance, I found Kris' Medium articles about Hachyderm's growth really interesting: https://medium.com/@kris-nova

The latest post, "Yelping: Action Through Criticism", includes links to additional Hachyderm-related content near the top, then talks about they handle an obnoxious online behavior they've experienced because of the recent popularity of Hachyderm.

cyberpunk 3 years ago |

> "We can then leverage Mastodon’s S3 feature to write the “hot” data directly back to Digital Ocean using a reverse Nginx proxy."

How does that work?

josteink 3 years ago | |

Digital Ocean offers S3 compatible storage.

I guess they use Nginx to reroute traffic which by default is targeting aws.amazon.com?

evankanderson 3 years ago | | |

I think the default is "files-off-disk" image serving (hence NFS, etc). This can be replaced by S3 storage, with the URLs remaining the same (or maybe redirected, if you don't want to pay the bandwidth bill twice).

Disclaimer: I have this setup using Humanmade/S3-uploads for WordPress, but am not currently running Mastodon.

convolvatron 3 years ago |

"In other words, every ugly system is also a successful system. Every beautiful system, has never seen spontaneous adoption." - not only is this logically fallacious, its pretty offensive about the general notion of software quality

coverup 3 years ago | |

Where's the fallacy? She's saying the only systems that are beautiful are the ones that haven't been forced by massive spontaneous adoption to scale faster than the developer(s) can come up with and implement a beautiful design to meet the new requirements.

Maybe you think if the original system were _really_ beautiful and of high quality, it would have scaled with the adoption on its own, with no need for ugly patches... but in that case the original system would have had the capacity to do a lot more than what was originally required. It would have been overengineered, in other words, and it would have been more beautiful if it had met its original requirements more cheaply.

The notion that a sudden change in requirements that must be dealt with quickly results in an uglier system seems fairly straightforward to me, and certainly not offensive.

bluedino 3 years ago |

Why weren't the disks just replaced?

uniqueuid 3 years ago | |

Wild speculation, but perhaps ZFS + postgres + potentially write-sensitive SSDs resulted in write amplification that would just occur again.

robga 3 years ago | |

Answer from the author https://news.ycombinator.com/item?id=33855250#33856184

lakomen 3 years ago |

If Mastodon is the new cool, where all the climate activists hang out, why does it use node and not Go, Rust or C++?

musk_micropenis 3 years ago |

I would like to understand why Mastodon requires such a huge amount of hardware for mediocre traffic volumes. Not just the lazy "it's Rails" answer - I know Rails is a resource hog, but that doesn't go far enough to explain the extreme requirements here.

As a point of reference, look at what Stack Overflow is run on. As a caveat, SO is probably more read-heavy than Mastodon, but it also serves several orders of magnitude more volume (on a normal day in 2016 they would serve 209,420,973 HTTP requests[0]). They did this on 4 DB servers and 11 web servers. And in fact, it can (and has) worked serving this volume of traffic on only a single server.

With this setup SO was not even close to maxing out their hardware (servers were under 10% load, approximately). SO also listed their server hardware[1] in 2016. I don't know enough about server hardware to assess the difference, but to my eye they look similar on the web tier with similar amounts of memory, similar disk, etc.

I'm not saying Hachyderm is doing anything wrong, but it makes me wonder if there's a fundamental problem with the design of Mastodon. And to be clear I understand that this particular issue was caused by a disk failure, but that they even had this hardware in place running Hachyderm is surprising to me.

[0] https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

[1] https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

imtringued 3 years ago |

That basement hardware didn't last long. If you don't know how big your userbase is going to be it would be better to avoid committing money to specific hardware.

ocdtrekkie 3 years ago | |

Note that Nova didn't feel an actual hardware capacity level was hit here. However, the setup lacked the redundancy to handle hardware outages for something like drive replacements without a significant outage. And I believe one of the main considerations in moving to a cloud service was actually limited connectivity options, because only so much fiber capacity was even available.

> Our limiting factor in Hachyderm had almost nothing to do with the amount of users accessing the system as much as it did the amount of data we were federating. Our system would have flapped if we had 100 users, or if we had 1,000,000 users. We were nowhere close to hitting limits of DB size, storage size, or network capacity. We just had bad disks.

lantry 3 years ago | |

hacker news: you don't need the cloud! you can just run a couple machines in your basement!

also hacker news: why would you try to run something in your basement? Just use the cloud!

CleverLikeAnOx 3 years ago | | |

There are multiple people here and their opinions vary.

nix0n 3 years ago | |

> committing money to specific hardware

Note that Dell R620 and R630 servers have been discontinued for a couple of years now, were probably bought used, and can probably be re-sold.