Why I recommended ECS instead of Kubernetes to my latest customer

88 points by alien_ 3 years ago | 123 comments

> When looking at the cloud resources, we noticed many On-Demand EC2 instances with relatively low CPU utilization, which can be expected considering they don't have customers yet.

As a software consultant myself, I'd probably stop the conversation right there and ask why they are building such a robust distributed system — SQS, SNS, etc — without any customers. Still want to be deployed in AWS? Toss the damn app on a single EC2 instance...

ljm 3 years ago | |

I’ve been exploring this lately because, honestly, the cloud is total overkill for small startups and hobby projects.

Kubernetes has its value even for small scale workloads like that, but it’s still a few steps more than, say, running a Capistrano script to push your code to a small Linux box with a database on a second one.

You’ll get really far on minimal resources these days, especially with cheaper ARM boxes that offer far more bang for your buck. Paying 1k+ a month to AWS/GCP/Azure is total insanity when you’re not even averaging a single active user a day.

clvx 3 years ago | | |

At the beginning, just for the development experience I would just put an instance in some cloud provider and use microk8s or k3s to serve the app. It's very straightforward and then you can move to a managed service if needed. You will probably be using the same tooling and integrations at different steps. Context switching is low and you can reproduce locally. I'm down for serverless options when needed but I have a strong preference for local development.

CharlesW 3 years ago | | |

> …the cloud is total overkill for small startups and hobby projects.

It absolutely can be, sure. But solutions like Vercel, Cloudflare Workers, Supabase, etc. can be excellent and inexpensive for those use cases.

TheNewsIsHere 3 years ago | | |

It’s not the first time I’ve written about this. The hyperscalers are pretty much the most expensive way to build a business that isn’t presently hyperscale, and their ecosystems are increasingly optimized for sprawling stacks built on a virtually unlimited number of microservices.

That’s just not a realistic or necessary approach for everyone.

AWS is engineered for excruciatingly detailed billing right down to the moment you’re consuming or releasing capacity, and that’s how they built it. Managing that spend is exhausting.

My business runs on under $200/mo in Linode compute resources and the performance is significantly better than on similarly situated EC2 instances. We were spending that on databases alone with AWS and getting a fraction of the performance.

I make extensive use of “pure” Linode Kubernetes Engine k8s. It’s portable to any other Kubernetes cluster, and it lets me take my stack _anywhere_, even to a rack in the nearest data center willing to rent me space, if I really wanted.

ecshafer 3 years ago | |

With so many developers I feel that there is a complete lack of familiarity with what it takes to just run a website. So many came up in the land of cloud and k8s and etc. There are use cases for these more advanced production environments. But if more developers just learned how to make a website on linux, with a db, a webserver, and an application. They would know that a lot of more complex things just aren't needed... especially when you don't even have customers.

bamfly 3 years ago | | |

Truly, a very small number of real servers, just enough for blue/green deployments and so you can stay up if any one server goes offline, meets any plausible needs for a really, really high percentage of businesses & products. A ton of early-stage ones can get away with skipping most of that and just run on one or two servers, period, for quite a while.

If you're outsourcing operations to AWS or whomever, a couple largish instances and a couple supporting services can get you pretty much that same thing, for a bit more money and a bit less control over performance-consistency.

All that HA/scaling/clustering/cloud stuff is expensive, not just in monetary terms, but in performance terms. If you don't actually need it, a high percentage of your compute & (especially) your network traffic may be going to that, rather than actually serving the product. It also adds a hell of a lot of complexity, which comes at a significant time-cost for development, unless you want your defect rate to shoot up.

> But if more developers just learned how to make a website on linux, with a db, a webserver, and an application.

And hell, nothing's stopping you from writing 12-factor apps and deploying containers, and scripting your server set-up and config, even if you don't go straight for heavy, "scalable" architecture. Even if your server's a beige Linux box in a closet. Enough benefits that the effort's probably a wash at worst (hey, documentation you can execute is the best documentation!) even if you never need to switch architectures, and then you'll have a relatively easy time of it, if you do end up needing to.

sigstoat 3 years ago | |

i had a client who was burning… $10k? maybe $20k per month largely on nodes for EKS when they had no paying customers and ~zero load. (they had fully “production” sized clusters in all of their environments, and they had a slew of weird not-quite-prod environments.)

they also had some rabbitmq-on-k8s system going that fell over during small tests because they couldn’t get k8s to actually scale it. (which then convinced them they needed k8s, and bigger nodes)

sigh

interroboink 3 years ago | | |

The promise of cloud infrastructure is that it can scale to fit demand — start small, and grow as needed. But sometimes the truth is that it just lets people spend money more easily (:

Back in the day, it would have required a whole procedure to buy that hardware, have it set up, etc. Now you can needlessly spend $10k per month with just a few clicks!

waffletower 3 years ago | |

And in that case, why ec2, why not a more affordable provider?

grogenaut 3 years ago | | |

Because I already have an AWS account that bills directly to my credit card along with some other stuff that I'm already paying for. Every time I go down the let me save money route I spend hours reading through CD website reviews for hosting providers without any real understanding of their quality to save a few dollars and end up burning tens of hours of time. Or I could just fire the fucking thing up on AWS and then turn it off if I decide not to work on the project further

jmholla 3 years ago | | |

Who would you recommend as a more affordable provider?

mywittyname 3 years ago | | |

There's a lot of expertise in AWS-land.

alien_ 3 years ago | |

The OP here, thanks for your comment.

To be honest I wasn't hired to challenge their entire setup, only to make it more cost effective.

So I chose the most straightforward way I could think of that would allow us to come up with a cost effective setup that will be scalable, fault tolerant and simple to maintain later on.

It all probably started with such a single instance running Docker compose, but then over time it evolved into this setup.

The ideal setup I mentioned would have been also cost effective, scalable and resilient.

politician 3 years ago | |

I recently spoke with some folks who declined to invest because our solution was too simple: specifically, the fact that we don't use Kubernetes was a negative signal.

That's baffling to me, but that perspective is out there too.

nemothekid 3 years ago | |

>ask why they are building such a robust distributed system — SQS, SNS, etc — without any customers

I think this is one of those things that really depends on the use case. If they are performing expensive inference, I think having any queue is better than no queue. Going from a synchronous system to an asynchronous one is not easy and it's not something you would want anyone to be paged for once it starts to matter. Getting SQS/SNS up and running now could be a couple hours of work today and is practically free if your traffic is low.

Similarly I have a number of side projects that run extremely cheaply just using ECS and Fargate. I don't even think about Kubernetes really, it's just a PaaS to me that I'm shipping ARM binaries to. As a result I don't think very hard about autoscaling, failover, load balancing or deployment. A github action just pushes master to ec2 and everything "just works".

HatchedLake721 3 years ago | |

What SQS has to do with EC2?

One is a queuing service, the other one is a VM.

So instead of using SQS that has $0 cost when there are no customers, you suggest I install, configure and run RabbitMQ on an EC2, to save $0 when there are no customers?

Or save $1 when I have 100 customers? SQS is dirt cheap.

The point of SQS or any other usage-based AWS _developer_ service compared to DIY is that you can be up and running in minutes at a minuscule cost.

I agree with you about over-engineering and building a distributed "microservices" architecture when you have no customers.

But I'll pick SQS any time of the day when I need queueing functionality to increase my developer velocity so I can focus on building value rather than wasting my life installing, configuring and running anything on EC2.

Nextgrid 3 years ago | | |

The AMQP protocol alone and its various, good client libraries (compared to terrible AWS SDK which is a very thin abstraction over just sending/parsing raw JSON off the wire) is by itself enough to justify RabbitMQ.

> when I need queueing functionality to increase my developer velocity so I can focus on building value rather than wasting my life installing, configuring and running anything on EC2.

SQS still requires configuration, which means you either need to use the (terrible) AWS console UI or spin up a whole Terraform/CloudFormation/CDK/etc stack, not to mention that merely connecting to it requires correctly setting up AWS IAM (so you don't use a key that gives access to your entire AWS account). Vim'ing the RabbitMQ config file in contrast doesn't seem so bad, and even just using a static hardcoded password means the worst an attacker can do is take down your queue instead of taking over your entire cloud infra.

djbusby 3 years ago | | |

The question is: what are queueing for zero customers?

RcouF1uZ4gsC 3 years ago | |

A single EC2 instance with SQLite as the database can get you pretty far.

renewiltord 3 years ago | |

Yeah that's a good starting point. Maybe just docker on those when you have two apps so they don't step on each other.

whatever1 3 years ago | |

Exactly. Worry about scaling when scaling is in the horizon

x86x87 3 years ago | | |

No no no. We want to be like Google. Web Scale. Big big data. Huuge

taeric 3 years ago | |

It is rather amusing how over engineered most seed projects have a tendency to be.

I do think ddb and lambda hit a sweet spot for costs on ramping up. The rest, though, really struggle.

mlhpdx 3 years ago | | |

For me, setting up connections between SQS, SNS, DDB, Lambda, step functions, S3, Route53, API Gateway in CloudFormation is just a muscle memory. I’m much faster at it at this point that I am at standing up an EC2. I agree it can be hard to learn, but it certainly isn’t hard to do.

Elsewhere in the comments, there’s a suggestion that this kind of thing isn’t appropriate for “hobby projects” and early stage but I disagree. Those are the times when you really want something you can step away from without doing a disservice to your customers (i.e. letting packages go out of date and get vulnerable) and cost you as little as possible in a steady state so you can focus on acquiring customers and not worrying about fuddling around with the guts.

0xEFF 3 years ago | |

A single EC2 instance is an equally bad trade-off on the opposite side of the spectrum from over architected SQS, SNS, etc…

The ideal trade off is a single Kubernetes cluster with as much in the cluster as makes sense for the team and stage of the project. As you say, toss the app on a single node to start, but the control plane is tremendously valuable from on the onset of most projects.

intelthrow6 3 years ago | | |

I don’t see the reasoning.

A startup that outgrows an EC2 server will be making enough money to hire more people to scale the system properly than what was initially designed: trading away everything for development velocity.

Kubernetes is not the right tool for this startup. Kubernetes is what large, old-school non-tech companies use to orchestrate resources, because it’s easier to find someone that “knows k8s” (no one knows k8s unless they’re consulting) than it is to find someone that can build properly distributed systems (in the eyes of whoever is in charge of hiring).

dangus 3 years ago | |

Why would you plan not to have customers? Don't you think the company is able to forecast demand for a new product launch?

Disney: We'd like to launch a new streaming service.

Consultant: Great! You have no customers right now so you can run it on a singleton EC2 instance until you outgrow that scale!

Disney: ...We expect 20 million people to sign up in the first week

Dylan16807 3 years ago | | |

> Don't you think the company is able to forecast demand for a new product launch?

I'm pretty sure "follow the forecast" is exactly what motivated that post.

In other words, the infrastructure is overkill for the initial forecast of customers.

They're not working for Disney.

justrealist 3 years ago | |

SQS and SNS are a perfectly good primitives for building a robust distributed system that costs $0 when not in use, by triggering compute via Lambda or Batch.

Your comment is really pretty ignorant of how these tools interact. Using serverless primitives is the opposite of leaving nodes running for no reason.

boredumb 3 years ago | |

This is 100% the line of questioning to pursue.

danpalmer 3 years ago |

More accurately: "Given using AWS as a requirement, I recommended ECS instead of K8s".

It's not really surprising that AWS's K8S setup isn't great, and their own implementation ties in more closely with other services they offer. It's lock-in. AWS provides just enough K8S to tick the box on a spec sheet, but have little incentive to go beyond that.

waffletower 3 years ago |

The management of infrastructure via Terraform has a hidden engineering cost that should also be considered. Engineers can much more easily maintain, learn and introspect infrastructure via Kubernetes, despite its own complexity, given the immature, inconsistent and undeniably awkward qualities of the Terraform toolchain. Engineering time is expensive -- the morass of Terraform can easily quadruple engineering efforts.

honkycat 3 years ago |

I was not convinced by this article that ECS was the right choice. It felt more like a contrarian choice.

> ECS is also relatively simple and not so far from their Docker-compose setup, but much more flexible and scalable. It also enables us to convert their somewhat stateful pets to identically looking stateless cattle that could be converted to Spot instances later.

Have you ever built something in ECS? I have, and it is missing HUGE SWATHS of the convenient functionality that EKS provides. It lacks the network effect of being a widely-used product, so searching for issues is a constant issue. It breaks and nobody knows how to help.

"Not far from their docker-compose setup..." What are you even talking about? ECS is massively more complex than docker-compose and the main similarity I see between them is that they both run docker. It's similar to docker-compose if you ignore the fact that you need permissions, load balancers, networking, etc. Which is the hard part, NOT running some containers on EC2, by the way.

It has it's own bizarre and verbose container deployment spec that is less portable, less flexible, less feature-ful, and less widely used than EKS.

> ECS will also offer ECS container logs and metrics out of the box, giving us better visibility into the application and enabling us to right-size each service based on its actual resource consumption, in the end allowing us to reduce the number of instances in the ECS cluster once everything is optimized.

Something you also get with EKS. So half of the reasons you have claimed ECS was the right choice are now in the garbage.

What you DON'T get with ECS is awesome working-out-of-the-box open source software like External Secrets, External DNS, LetsEncrypt, the Amazon Ingress Controller, argo rollouts, services, ingresses, cronjobs... I could go on and on.

They are going to try and hire DevOps engineers, and they will all have to ramp up ( and likely complain about ) ECS instead of having people walk on already prepared and ready to start implementing high quality software on a system they already know.

rdsubhas 3 years ago |

What they didn't appear to have considered – was the Dev side of DevOps. Kubernetes runs on developer machines and single-node CI agents. In my company, all CI agents are single-node k3s clusters, all our engineers kubectl apply their services there for integration and e2e testing, same environment from dev to prod. We provide the same single-node VMs for development on the cloud, and Podman desktop for local kubernetes. It has hooks to inject stuff (injecting centralized secrets, configuration, sidecars, etc) in a single way, no need to implement centralized features separately for CI and separately for Prod. It has hooks to validate & reject stuff that doesn't comply with org policies (e.g. limit only core workloads, upper bounds on cpu/memory, volumes, validate everyone sticks to core workload specs and do not use any alpha/beta APIs, etc) so that SRE can allow decentralization while still being in control of what runs and how.

ECS is a deployment tool. Kubernetes is a dev-to-ci-to-prod tool, providing same environment for standard workload specs across the full development cycle, and a single way to inject common features into the standard workloads.

lapser 3 years ago |

I find it really wild that anyone would ever recommend ECS. A developer deploying a service involves:

- Setting up certs (managed as TF) - Setting up ALBs (managed as TF) - Setting up the actual service definition (often done as a JSON, that is passed into TF)

Possibly other things I'm forgetting.

Some other things. It requires a *developer* to know about certs and ALBs and whatever else.

With EKS, this can all be automated. The devops engineer can set it up so that deploying a service automatically sets up certs, LBs etc. Why are we removing such good abstractions for a proprietary system that is *supposed* to be less management overheads, when in reality, it causes devs to do so much more, and understand so much more?

topspin 3 years ago |

I'm tracking cloud-hypervisor and kata containers closely. I'm convinced there is a unicorn opportunity here for the SME/private-cloud world. An easily managed cluster of lightweight, live-migratable, hardware isolated VMs running containers (as opposed to just herding containers) solves problems people actually have, as opposed to the problems k8s solves. k8s is fine for the scale of enterprise for which it is actually intended and the problem space it was designed to address. It's not fine for everything else.

LispSporks22 3 years ago |

The last company I worked at was building a Kubernetes cluster. It was the usual story – "Heroku is way too expensive. How hard can it be to build our own Heroku?" Classic trap. Fell right into it. Company size: maybe 200. Tried to tell them it will be a huge time suck, and they were doing it Azure, and then EKS IIRC. Tried to explain that massive companies have whole departments in charge of building and maintaining that and it's an entire hobby for some masochists. I think they're probably still building it.

e12e 3 years ago |

Interesting that the before and after figures are isomorphic as far as i can tell?

They introduced Terraform and dropped docker compose in favour of some Amazon proprietary container scheduler?

fwungy 3 years ago |

ECS is hardly perfect, but I'd use it before EKS for a client who wasn't ready for that.

bdcravens 3 years ago |

We have run ECS with great success for several years. It has always appeared to me to be 80% of K8S for 20% of the effort, but for us, that 80% contained 100% of our need.

pid-1 3 years ago |

IMO ECS is in a weird position right now, because:

1 - It's simpler thank K8s, but not that much simpler than your avg managed K8s offering

2 - It really locks you in the AWS ecosystem

3 - It is way less used than K8s or just running things on servers, so there are way less help / learning resources

I really don't see how using ECS is much better than EC2 + compose for small setups and this post didn't provide many good arguments to convince me.

dabeeeenster 3 years ago | |

ECS is just docker instances. I don't really see how that locks you in.

moduspol 3 years ago | |

The IaC / knowledge you need to know to use ECS in a way that replaces EC2 + compose is minimal, bordering on negligible.

I'd use it on day 1 (over EC2 + compose) just to avoid managing an OS or deployment infrastructure.

pharmakom 3 years ago | |

EC2 runs AMIs but ECS runs Docker images. The development experience with Docker containers is a bit smoother.

sigstoat 3 years ago | |

> It really locks you in the AWS ecosystem

the bar for being "locked in" seems to drop further every day.

mediascreen 3 years ago |

I would almost always go with what the team or someone on it was most familiar with and can setup in less than a day. I think it should include an easy way to scale at least for a few months to come, a reasonable way to provision more capacity, a managed database, a CDN, backups, access and error logging and a simple but automatic deployment pipeline.

At work we use ECS Fargate, Aurora MySQL and Bitbucket pipelines to host a little over 100 client web applications. It takes about an hour to configure a new AWS account and staging/production environments for a new client using Cloudformation (and a number manual steps) and the monthly AWS cost is around $100. There are cheaper ways and probably easier ways, but we feel like we have reached a good balance between stability, ease of use, cost and features. And we are not that worried about being tied to AWS.

what-the-grump 3 years ago |

Static host stuff on an S3 bucket / static web app. Blob storage account with a table, maybe an on-demand function app.

Sub $15/mo to run your thing until you get real demand, yeah. But its not new, the K8S shtick is coming from investors not tech people. And if its coming from the tech people throw them out of the door.

Why are you cooking for 8000 people when 6 are coming over? Why are you building a kitchen to cook for 8000 people. Why are you renting space to fit 8000 people.

You need a table and maybe 6 chairs who knows they might eat standing.

hdjjhhvvhga 3 years ago | |

> the K8S shtick is coming from investors not tech people.

Not necessarily. If you need to deal with many containerized apps that are updated and deployed regularly, k8s is a really great tool.

As a rule of thumb, I'd say < 5 - no, > 20 - yes, and everything in between - up to you.

zimpenfish 3 years ago | |

> Why are you cooking for 8000 people when 6 are coming over?

Place I worked at had a service running on K8s with, I think, 4 pods, and it got on average one hit every 2-3 seconds during office hours (and virtually none outside those.)

hdjjhhvvhga 3 years ago | | |

So I'd say the number of pods was appropriate.

efnx 3 years ago |

I expected to read some wild story about Entity Component Systems but was disappointed to find it’s about picking the correct AWS services.

ddalcino 3 years ago |

Archive.org link: https://web.archive.org/web/20230608135300/https://leanerclo...

I think it got the HN hug of death

sharkbot 3 years ago |

Mostly a note to self: it is interesting to read this account and connect it to the financial planning case studies that show up in personal finance blogs and articles. It seems like there’s a lot of shared terminology and practice between the domains.

mixxit 3 years ago |

we have retargeted some of our infrastructure from kubernetes and onto ecs fargate for the last 12 months and it has massively reduced errors and support tickets and also the cost

unfortunately this is a deal with the devil for vendor lock-in

pmarreck 3 years ago |

The article never mentions what ECS stands for, that’s not yet a TLA I’m familiar with

Torwald 3 years ago | |

It stands for Enhanced Chip Set as improvement from OCS, which stands for Original Chip Set. Note that even ECS found it's successor AGA, which stands for Advanced Graphics Architecture. AGA can display 256 colors even without using HAM (hold and modify).

lapser 3 years ago | | |

Assuming you're joking, ECS here stands up Elastic Container Service.