k8s is probably a great excuse to think how to compose your infrastructure and software in a declarative way - I'm still fascinated by https://demo.kubevious.io/ - It just made "click" when playing with that demo - it's not goo it's a different operating system and a different mindset.
You can do 80% of that with docker-compose / swarm for small projects but:
If you read HN you are in a huge bubble - gruelsome patched tomcat7 apps on Java8 with 20 properties/ini/xml config files are still popular - hosting things in docker or doing ci/cd is still not mainstream. At least in Europe in the public sector stuff where I was involved.
Sure you can mock it - but the declarative approach is powerful - if you can pull it off to have it across all your infrastructure and code with ci/cd and tests you are fast.
This alone correctly implemented https://github.com/adobe/rules_gitops solves so many problems I can't count the useless meetings we had over any of these bullet points, bazel alone would have solved most major pain points in that project. Just by beeing explizit and declarative.
Don't believe the hype but it's a powerful weapon.
Comparing a troubleshooting guide to running a site on a couple of servers is a bit too different for me. Compare it to a troubleshooting guide for those two servers, let's see how they stack up. No using any "ask {specific person}" either
Don't get me wrong, kubernetes is overkill for most side project level things. I don't disagree, I just like to see things knocked down a peg fairly!
Also as mentioned by this tweeter they use more than 2 servers anyway
https://twitter.com/shadowmanos/status/1434980544740306947
They could probably save on resources and maintenance effort if they switched to containers assuming this is still the same or more
> This is #1 in a very long series of posts on Stack Overflow’s architecture. Welcome.
https://stackexchange.com/performance
1.3 billion page views per month, 9 web servers, 4 sql servers
Stackoverflow is notable because they went down the C#/MVC/SQL Server route from the start, which meant much better performance per server. Thats why they make an interesting counterexample to the usual way...
And also notable because everything is under an expensive license, so big performant servers is the cheaper option.
Edit: everything = Windows servers for their .NET app ( apparently in the process of migrating to .NET Core) and SQL Server
It's worth mentioning that the diagram is explicitly incomplete. The yellow endpoints are fixes but except for "END" the other endpoints are all either "unknown state" (i.e. "I have no idea what's broken") or problems that aren't addressed in further detail like "The issue could be with Kube Proxy" or even "Consult StackOverflow".
I'm not sure what a complete diagram would even look like but I don't think there's any way to infer complexity by looking at them in comparison.
SO is relatively simple; it's basically customized forum software which is a solved problem that has been around for decades. A junior dev can build an alternative, and it can be built using tried and true solutions like MySQL + PHP, which are horizontally scalable with database sharding, read replicas, and maybe stuff like memcached to accumulate votes before updating the database or a CDN for caching static files.
Google has different problems and different workloads, and they have hundreds of times more applications with thousands of times more load. Apples and oranges.
> Google has different problems and different workloads
Which of these do you think most organisations most closely resemble?
I don’t think anybody would disagree if you said that you should use Kubernetes for organisations that resemble Google. But most organisations don’t look anything like Google. They look a lot more like Stack Overflow. So the “You don’t need this…” statement holds true for almost everyone.
Of course junior dev can do it, the same way junior dev can make Youtube
it'll work as long as there's less than 100 concurrent users on SO and less than 50 4K 20min videos on youtube
Except 99% of forums run terrible software that doesn't perform, is not easily usable and won't work right on phones. That tells me it's not a solved problem at all.
Features that seemed to be advocating for k8s were not server provisionning, but instead :
log management, easy setup of blue/green & canary deployment, not having to restart a vm upon new code deployment, etc...
How would you do those things as easily with other techs ?
I’m sure a troubleshooting map for bare linux server wouldn’t be less complicated than that.
Except your k8s runs on a Linux server, so this is just an addition. (Unless you're using a fully managed k8s cloud offering, but then you have an even bigger toubleshooting flowchart to navigate the provider's management interface: at least that's my experience with GKE, maybe Amazon and others are better)
Wouldn't it be more likely in this case that the server is built from configs? Ansible or whatever
The troubleshooting for the Linux server side is "spin up a new one and delete the old one"
https://stackoverflow.blog/2021/07/21/why-you-should-build-o...
Does it?
Pretty impressive I think.
• 9 web servers
• 4 SQL servers
• 2 Redis servers
• 3 tag engine servers
• 3 Elasticsearch servers
• 2 HAProxy servers
That comes to 23. I know “a couple” is sometimes used to mean more than two, but… not that much more than two.
“A couple” is just flat-out wrong; I’d guess that he’s misinterpreting ancient figures, taking the figures from no later than about 2013 about how many web servers (ignoring other types, which are presently more than half) they needed to cope with the load (ignoring the lots more servers that they have for headroom, redundancy and future-readiness).
StackOverflow didn't use that either and instead chose to invent their own query builder/mapper known as Dapper.
K8s can as well.
The difference is a bunch of servers running k8s or a bunch of servers running custom code to duplicate parts of k8s.
Every other project as different constraints.
It is a choice. I have personally moved on from the "Kubernetes is never a good choice over running things yourselves" camp.
I've written about Nomad vs k8s on my blog if that might interest you:
https://atodorov.me/2021/02/27/why-you-should-take-a-look-at...
And I've also written about some common things, like Traefik for ingress, Loki for logs, etc. to supplement the pretty complete Hashicorp tutorials.
As long as your applications follow 12-factor principles it shouldn't be too hard to move between different orchestration tools and you can pick the one that best suits your needs.
You'll still get layer-upon-layer of abstraction - for example Consul for key-value and service discovery, Traefik for load balancing, Terraform to build up the service discovery rules, etc - but it feels somewhat more intentful, less boilerplate.
all what a modern web app needs out of the box with 1 day to learn instead of years... and the best? you don't have to modify your code to work on it, ie environment and code are separate.
Nowadays, everybody insists on putting stuff on K8s regardless of how large or small it is.
An application is an application for the purpose of running it on a server. It doesn't really matter how much functionality it has.
It is microservices "revolution" (quotes intentionally) that caused larger applications to be split a lot of small ones and complicated the execution environment to the point that a lot of people spend a lot of time just trying to figure out how to run their applications reliably.
That is not necessary.
If you can have multiple microservices, more likely than not you can have them as separate libraries used by single application or a separate modules of a single application. Just make sure to put the same thought into modularizing it as you would designing microservices APIs and you can have the same but much easier and with much better performance (no serialization/deserialization, no network hops, no HTTP stack, no auth, etc.)
But if you already have the tooling, experience and support for k8s, why wouldn't you use it?
I can fire up a k8s cluster on any major cloud provider in minutes (or bare metal in slightly longer), and deploy almost any web app using a standard interface that has extremely wide adoption.
K8s supports a ton of things, but you don't have to use them. It can be complicated, but usually it's not.
It feels a bit like saying why use Linux for your simple app when it could run just fine on DOS or QNX. How many years of my life have I wasted debugging networking issues on Linux or Windows that turned out to be caused by a feature I wasn't even (intentionally) using...
At a previous employer, we had a k8s cluster with a bunch of machines that were designed to a) load a filesystem kernel module inside the container (yes inside, not outside), b) mount /dev from the host in the container with Docker, and c) mount hard drives from the host /dev inside the container using the "mount" command.
In a twist that should surprise no one, those containers don't work well. And they failed to work in crazy, confusing ways for which there is no documentation to troubleshoot, because who in their right mind would do something like that?
I've had better luck in places that have a Platform as a Service team that owns the k8s infra. They generally have a lot more pushback to say "no, you're not going to do that on our cluster" which helps to tamp down some of the crazier ideas.
But it does seem to have most of the features of something like Reddit.
There are too many developers and teams that don't understand the underlying principles and limitations of the technology they use. In a good case those developers get schooled by experienced engineers, but frequently they end up in an important position in some big enterprise software company and makes life miserable for many people.
Sure, maybe a junior dev can make a simple forum, or you could make a web app that duplicates the functionality of SO at low loads in a weekend. But, when you're serving an average of 500 requests per second, you need to know what you're doing. And, IMO, this "knowing what you're doing" is the difference between knowing how to engineer a system and how to write an application.
EDIT: To expand on it - they had to manually build a lot of what could be much easier handled today, and thus had no incentive to change later. Interestingly enough, SO has a lot of moving parts distributed over multiple servers, even if all of those servers used to fit into 1-3 racks per DC.
Also these days you might not want to implement login functionality from scratch, or have better logging, monitoring, etc. and that might involve running more applications than just your LAMP stack.
Either way is quite baffling how common this kind of comment is, almost a decade reading Hacker News and it pops up constantly.
If it's so trivial, go be rich!
I mean these politics are pushed by rich Americans and foreign interests, surely they have the means to start their own hosting platform. It's the same that advocate in favor of businesses rejecting customers because of free market.
It's not just based on particular politics. It can be any behavioral screw-up done by you or any employee, at any point in your life, real or perceived, that crosses the current cultural expectations. The surface area of risk is ridiculously huge.
No further comment on the specific political content - those sensibilities can change. Can your business adapt quickly enough?
The usual story. Everything works, until it doesn't.
If you are a huge corp with good engineering you can have people dedicated to understanding k8s and then it kinda makes sense. They can spend time to learn it really well so that they have necessary chops to deal with problems when they happen.
On the other hand, if you are smaller company, you are more likely embracing this new idea of developers running everything including k8s, you are in for a trouble.
They will know how to make it work but that's about it.
Because if you need to learn everything you actually learn nothing very well. And there certainly isn't enough time in the world to learn everything in development.
My philosophy is applications must be built for when it breaks and it is unacceptable to run an application with a team that will not be able to fix it if it breaks.
**
Couple of years ago I joined a small group of teams of developers (together about 40 devs) who together maintained a collection of 140 services, all with same or very similar stack (Java, Spring, REST, RabbitMQ).
They had trouble delivering anything to prod because of complex dependencies, requirements, complex networking, complex process to find out where stuff broke between 7 layers of services between the original user call and the actual execution of their action.
I rolled my sleeves and put everything in a single repo, single build, single set of dependencies, single version, single deployment process, single layer of static app servers.
I left after the team was reduced from 40 to 5. There was no problem delivering anything (we had 60 successful deployments in a row when I left) and the guys who were left admitted they are bored and underutilized because of how easy it is to make changes to the system.
These were still the same guys that were advocating for microservices. From what I heard they are not advocating for microservices anymore.
Can microservices be done well? Sure they can. But it takes additional effort and experience to do it well. Why make your life difficult when it is not needed?
But the same is true of all the other orchestration tools isn't it?
I've had similarly complicated problems with Terraform, Ansible, Chef and Puppet and just plain Linux as I have had with Kubernetes. Meanwhile K8S saves a lot of time when things do work properly - which is nearly always.
A while ago, we had an issue with dotnet where the JIT was sometimes emitting bad instructions and crashing the process. That was an absolute bloody nightmare to debug and reproduce, it took weeks - but nobody would say running a high level language is bad because the compiler might have a bug, right?
We are a small company (under 20 developers), we have one dedicated ops person and one devops, and have never had any issues with k8s that couldn't be resolved by one of them within a few hours. We run a monorepo with 6 app services as part of our core product, 10 beta/research services, then a handful more infrastructure services (redis etc.), and honestly it's been the smoothest environment I've ever worked with. All the infrastructure is defined in code, can be released onto a blank AWS account (or k3s instance) in minutes, all scales up and down dynamically with load, and most of the time something goes wrong it's a bug in our code.
Maybe the problem with your system was more about the excessive use of microservices and general system architecture rather than Kubernetes itself?
Of course. The difference being how complicated it is to deal with problems.
For example I find it is way easier to deal with problems with Ansible compared to Chef.
So, assuming that both get me what I need, I prefer Ansible because it is less drag for when I have least time available to babysit it (which usually happens at least opportune moment).
What I am trying to say is that, just because it works for you now doesn't mean it will not end in a disaster at some time in the future. It is not my position to tell you if the risk is acceptable for you. But I personally try to avoid situations from which I cannot back easily.
If I have a script that starts an application on a regular VM I KNOW I can fix it whatever may happen to it. Not that I advocate running your services with a script, I just mean there is a spectrum of possible solutions with tradeoffs and it is good to understand those tradeoffs.
Some of those tradeoffs are not easily visible because they may only show themselves in special situations or, opposite, be so spread over time and over your domain that you just don't perceive the little drag you get on everything you do.
I find that if there is any overarching principle to build better solutions it is simplicity.
Presented with two solutions to the problem, the simpler solution is almost always better (the issue being the exact definition what it means to be simpler).
For example, I have joined many teams in the past that had huge problems with their applications. I met teams that were very advanced (they liked overcomplicating their code) and I met teams that could barely develop (they had trouble even writing code, did not know or use more advanced patterns).
I found that it is easier to help teams that had huge problems but wrote stupid code because it is easier to refactor stupid simple code that beginner developers write than it is to try to unpack extremely convoluted structures that "advanced" developers make.
I think similar applies to infrastructure.
For example, when faced with an outage I would usually prefer simpler infrastructure that I know I understand every element of.
If someone says SO runs on a couple servers this might be about the number actually necessary to run it with full traffic, not the number of servers they use in production. This is a more useful comparison if the question is only about performance, but not that useful if you're comparing operating the entire thing.
Then there's support services (iirc, all of elasticsearch was non-functional requirements stuff and technically could be run without?) and HA.
That is still doable with mid-90s era hand management of servers (all named after characters in lord of the rings).
Not that you should, but you could.
And the growth rate must be very low and pretty easy to plan out your O/S upgrade and hardware upgrade tempo.
And it was actually possible to manage tens of thousands of servers before containers. The only thing you really need is what they now call a "cattle not pets" mentality.
What you lose is the flexibility of shoving around software programmatically to other bits of hardware to scale/failover and you'll need to overprovision some, but even if half of SOs infrastructure is "wasted" that isn't a lot of money.
And if they're running that hardware lean in racks in a datacenter that they lease and they're not writing large checks to VMware/EMC/NetApp for anything, then they'd probably spend 10x the money microservicing everything and shoving it all into someone's kubernetes cloud.
In most places though this will fail due to resume-driven design and you'll wind up with a lot of sprawl because managers don't say no to overengineering. So at SO there must be at least one person in management with a cheap vision of how to engineer software and hardware. Once they leave or that culture changes the footprint will eventually start to explode.
Also 23 is very much a couple for a company and application of that size. It's not uncommon to see several hundred or thousands of nodes deployed by similar sites.
They aren't magically more efficient than other sites. They just chose to scale vertically instead of horizontally.
It's certainly not magic but good architecture decisions and solid engineering. This includes choosing SQL Server over other databases (especially when they started), using ASP.NET server-side as a monolithic app with a focus on fast rendering, and yes, scaling vertically on their own colo hardware. The overall footprint for the scale they serve is very small.
It's the sum of all these factors together, and it absolutely makes them more efficient than many other sites.
* Maybe a bit more/less, because it's not clear to me if DB RAM is per server, or per cluster. Likely server, as on other servers. There is also no data on how big is their haproxy.
They have a typical vertically scaled infrastructure, most services have just two nodes, one active. The biggest ones are databases which in many companies are handled in "the classic way" anyway. Clearly it's not designed as microservices and doesn't need dynamic automation at all. Why on earth would they even bring k8s up in their plans?
2018:
>This is the query pattern that caused StackOverflow.com to go offline yesterday:
https://github.com/dotnet/efcore/issues/13524
but on the other hand their tech stack site says: C# + ASP.NET MVC
instead of ASP.NET Core, but I don't think it is proof of anything.
As far as I know they have been using .NET Core (or just EF Core? would be weird to go EF Core + .NET Framework, I guess?) on prod in 2018:
>This is the query pattern that caused StackOverflow.com to go offline yesterday:
https://github.com/dotnet/efcore/issues/13524
So maybe they used some things on Linux and some things on Windows - those that couldnt be ported to .NET Core back in the days
It's not about the travel specifically, it's that servers and licenses are the cheapest cost of any company when compared to salaries, especially for engineering and sales.
EF Core (like ASP.NET Core) ran on .NET Framework until version 2.1. Everything after that requires .NET (Core).
I've not really looked at hosting anything on windows before, do they have unusual licensing terms in such a way that it would be a significant cost?
Why does this make scaling out "never an easy option for them"?
How? didn't they migrate to .NET Core?
https://www.infoq.com/news/2020/04/Stack-Overflow-New-Archit...
That doesn't mean they've moved away from Windows servers hosting it though.