Why I recommended ECS instead of Kubernetes to my latest customer(leanercloud.beehiiv.com) |
Why I recommended ECS instead of Kubernetes to my latest customer(leanercloud.beehiiv.com) |
As a software consultant myself, I'd probably stop the conversation right there and ask why they are building such a robust distributed system — SQS, SNS, etc — without any customers. Still want to be deployed in AWS? Toss the damn app on a single EC2 instance...
Kubernetes has its value even for small scale workloads like that, but it’s still a few steps more than, say, running a Capistrano script to push your code to a small Linux box with a database on a second one.
You’ll get really far on minimal resources these days, especially with cheaper ARM boxes that offer far more bang for your buck. Paying 1k+ a month to AWS/GCP/Azure is total insanity when you’re not even averaging a single active user a day.
It absolutely can be, sure. But solutions like Vercel, Cloudflare Workers, Supabase, etc. can be excellent and inexpensive for those use cases.
That’s just not a realistic or necessary approach for everyone.
AWS is engineered for excruciatingly detailed billing right down to the moment you’re consuming or releasing capacity, and that’s how they built it. Managing that spend is exhausting.
My business runs on under $200/mo in Linode compute resources and the performance is significantly better than on similarly situated EC2 instances. We were spending that on databases alone with AWS and getting a fraction of the performance.
I make extensive use of “pure” Linode Kubernetes Engine k8s. It’s portable to any other Kubernetes cluster, and it lets me take my stack _anywhere_, even to a rack in the nearest data center willing to rent me space, if I really wanted.
If you're outsourcing operations to AWS or whomever, a couple largish instances and a couple supporting services can get you pretty much that same thing, for a bit more money and a bit less control over performance-consistency.
All that HA/scaling/clustering/cloud stuff is expensive, not just in monetary terms, but in performance terms. If you don't actually need it, a high percentage of your compute & (especially) your network traffic may be going to that, rather than actually serving the product. It also adds a hell of a lot of complexity, which comes at a significant time-cost for development, unless you want your defect rate to shoot up.
> But if more developers just learned how to make a website on linux, with a db, a webserver, and an application.
And hell, nothing's stopping you from writing 12-factor apps and deploying containers, and scripting your server set-up and config, even if you don't go straight for heavy, "scalable" architecture. Even if your server's a beige Linux box in a closet. Enough benefits that the effort's probably a wash at worst (hey, documentation you can execute is the best documentation!) even if you never need to switch architectures, and then you'll have a relatively easy time of it, if you do end up needing to.
they also had some rabbitmq-on-k8s system going that fell over during small tests because they couldn’t get k8s to actually scale it. (which then convinced them they needed k8s, and bigger nodes)
sigh
Back in the day, it would have required a whole procedure to buy that hardware, have it set up, etc. Now you can needlessly spend $10k per month with just a few clicks!
To be honest I wasn't hired to challenge their entire setup, only to make it more cost effective.
So I chose the most straightforward way I could think of that would allow us to come up with a cost effective setup that will be scalable, fault tolerant and simple to maintain later on.
It all probably started with such a single instance running Docker compose, but then over time it evolved into this setup.
The ideal setup I mentioned would have been also cost effective, scalable and resilient.
That's baffling to me, but that perspective is out there too.
I think this is one of those things that really depends on the use case. If they are performing expensive inference, I think having any queue is better than no queue. Going from a synchronous system to an asynchronous one is not easy and it's not something you would want anyone to be paged for once it starts to matter. Getting SQS/SNS up and running now could be a couple hours of work today and is practically free if your traffic is low.
Similarly I have a number of side projects that run extremely cheaply just using ECS and Fargate. I don't even think about Kubernetes really, it's just a PaaS to me that I'm shipping ARM binaries to. As a result I don't think very hard about autoscaling, failover, load balancing or deployment. A github action just pushes master to ec2 and everything "just works".
One is a queuing service, the other one is a VM.
So instead of using SQS that has $0 cost when there are no customers, you suggest I install, configure and run RabbitMQ on an EC2, to save $0 when there are no customers?
Or save $1 when I have 100 customers? SQS is dirt cheap.
The point of SQS or any other usage-based AWS _developer_ service compared to DIY is that you can be up and running in minutes at a minuscule cost.
I agree with you about over-engineering and building a distributed "microservices" architecture when you have no customers.
But I'll pick SQS any time of the day when I need queueing functionality to increase my developer velocity so I can focus on building value rather than wasting my life installing, configuring and running anything on EC2.
> when I need queueing functionality to increase my developer velocity so I can focus on building value rather than wasting my life installing, configuring and running anything on EC2.
SQS still requires configuration, which means you either need to use the (terrible) AWS console UI or spin up a whole Terraform/CloudFormation/CDK/etc stack, not to mention that merely connecting to it requires correctly setting up AWS IAM (so you don't use a key that gives access to your entire AWS account). Vim'ing the RabbitMQ config file in contrast doesn't seem so bad, and even just using a static hardcoded password means the worst an attacker can do is take down your queue instead of taking over your entire cloud infra.
I do think ddb and lambda hit a sweet spot for costs on ramping up. The rest, though, really struggle.
Elsewhere in the comments, there’s a suggestion that this kind of thing isn’t appropriate for “hobby projects” and early stage but I disagree. Those are the times when you really want something you can step away from without doing a disservice to your customers (i.e. letting packages go out of date and get vulnerable) and cost you as little as possible in a steady state so you can focus on acquiring customers and not worrying about fuddling around with the guts.
The ideal trade off is a single Kubernetes cluster with as much in the cluster as makes sense for the team and stage of the project. As you say, toss the app on a single node to start, but the control plane is tremendously valuable from on the onset of most projects.
A startup that outgrows an EC2 server will be making enough money to hire more people to scale the system properly than what was initially designed: trading away everything for development velocity.
Kubernetes is not the right tool for this startup. Kubernetes is what large, old-school non-tech companies use to orchestrate resources, because it’s easier to find someone that “knows k8s” (no one knows k8s unless they’re consulting) than it is to find someone that can build properly distributed systems (in the eyes of whoever is in charge of hiring).
Disney: We'd like to launch a new streaming service.
Consultant: Great! You have no customers right now so you can run it on a singleton EC2 instance until you outgrow that scale!
Disney: ...We expect 20 million people to sign up in the first week
I'm pretty sure "follow the forecast" is exactly what motivated that post.
In other words, the infrastructure is overkill for the initial forecast of customers.
They're not working for Disney.
Your comment is really pretty ignorant of how these tools interact. Using serverless primitives is the opposite of leaving nodes running for no reason.
It's not really surprising that AWS's K8S setup isn't great, and their own implementation ties in more closely with other services they offer. It's lock-in. AWS provides just enough K8S to tick the box on a spec sheet, but have little incentive to go beyond that.
> ECS is also relatively simple and not so far from their Docker-compose setup, but much more flexible and scalable. It also enables us to convert their somewhat stateful pets to identically looking stateless cattle that could be converted to Spot instances later.
Have you ever built something in ECS? I have, and it is missing HUGE SWATHS of the convenient functionality that EKS provides. It lacks the network effect of being a widely-used product, so searching for issues is a constant issue. It breaks and nobody knows how to help.
"Not far from their docker-compose setup..." What are you even talking about? ECS is massively more complex than docker-compose and the main similarity I see between them is that they both run docker. It's similar to docker-compose if you ignore the fact that you need permissions, load balancers, networking, etc. Which is the hard part, NOT running some containers on EC2, by the way.
It has it's own bizarre and verbose container deployment spec that is less portable, less flexible, less feature-ful, and less widely used than EKS.
> ECS will also offer ECS container logs and metrics out of the box, giving us better visibility into the application and enabling us to right-size each service based on its actual resource consumption, in the end allowing us to reduce the number of instances in the ECS cluster once everything is optimized.
Something you also get with EKS. So half of the reasons you have claimed ECS was the right choice are now in the garbage.
What you DON'T get with ECS is awesome working-out-of-the-box open source software like External Secrets, External DNS, LetsEncrypt, the Amazon Ingress Controller, argo rollouts, services, ingresses, cronjobs... I could go on and on.
They are going to try and hire DevOps engineers, and they will all have to ramp up ( and likely complain about ) ECS instead of having people walk on already prepared and ready to start implementing high quality software on a system they already know.
ECS is a deployment tool. Kubernetes is a dev-to-ci-to-prod tool, providing same environment for standard workload specs across the full development cycle, and a single way to inject common features into the standard workloads.
- Setting up certs (managed as TF) - Setting up ALBs (managed as TF) - Setting up the actual service definition (often done as a JSON, that is passed into TF)
Possibly other things I'm forgetting.
Some other things. It requires a *developer* to know about certs and ALBs and whatever else.
With EKS, this can all be automated. The devops engineer can set it up so that deploying a service automatically sets up certs, LBs etc. Why are we removing such good abstractions for a proprietary system that is *supposed* to be less management overheads, when in reality, it causes devs to do so much more, and understand so much more?
They introduced Terraform and dropped docker compose in favour of some Amazon proprietary container scheduler?
1 - It's simpler thank K8s, but not that much simpler than your avg managed K8s offering
2 - It really locks you in the AWS ecosystem
3 - It is way less used than K8s or just running things on servers, so there are way less help / learning resources
I really don't see how using ECS is much better than EC2 + compose for small setups and this post didn't provide many good arguments to convince me.
I'd use it on day 1 (over EC2 + compose) just to avoid managing an OS or deployment infrastructure.
the bar for being "locked in" seems to drop further every day.
At work we use ECS Fargate, Aurora MySQL and Bitbucket pipelines to host a little over 100 client web applications. It takes about an hour to configure a new AWS account and staging/production environments for a new client using Cloudformation (and a number manual steps) and the monthly AWS cost is around $100. There are cheaper ways and probably easier ways, but we feel like we have reached a good balance between stability, ease of use, cost and features. And we are not that worried about being tied to AWS.
Sub $15/mo to run your thing until you get real demand, yeah. But its not new, the K8S shtick is coming from investors not tech people. And if its coming from the tech people throw them out of the door.
Why are you cooking for 8000 people when 6 are coming over? Why are you building a kitchen to cook for 8000 people. Why are you renting space to fit 8000 people.
You need a table and maybe 6 chairs who knows they might eat standing.
Not necessarily. If you need to deal with many containerized apps that are updated and deployed regularly, k8s is a really great tool.
As a rule of thumb, I'd say < 5 - no, > 20 - yes, and everything in between - up to you.
Place I worked at had a service running on K8s with, I think, 4 pods, and it got on average one hit every 2-3 seconds during office hours (and virtually none outside those.)
I think it got the HN hug of death
unfortunately this is a deal with the devil for vendor lock-in
You can do everything from the CLI with kubectl of course, but there are also a bunch of apps that will work with any K8S cluster:
https://medium.com/dictcp/kubernetes-gui-clients-in-2020-kub...
It's very nice to have a consistent interface across multiple cloud providers.
> The team didn't have much DevOps expertise in-house, so a Kubernetes setup, even using a managed service like EKS, would have been way too complex for them at this stage, not to mention the additional costs of running the control plane which they wanted to avoid.
The control plane cost makes sense, but I can't imagine learning Terraform to set up ECS is that much easier than learning Yaml to configure k8s. Unless EKS is much harder to use than GKE.
Eventually EKS was built to satisfy customers that insisted these issues were just FUD from aws to lock customers into the aws infrastructure. However what I have seen since is a basic progression of: customer uses k8s on prem, is fanatical about its use. They try to use it in aws and it’s about as successful as on prem. Their peers squint at it and say “but wouldn’t this be easier with ECS/fargate?” K8s folks lose their influence and a migration happens to ECS. I’ve seen this happen inside aws working with customers and in three megacorps I’ve worked on cloud strategies for. I’ve yet to encounter a counter example, and this was sort of what Andy predicted at the time. I’m not saying there aren’t counter examples, or that this isn’t a conspiracy against k8s to get your dollars locked into aws.
On standards Andy always said that at some point cloud stuff would converge into a standards process but at the moment too little is known about patterns that work for standards to be practical. Any company launching into standards this early would get bogged down and open the door to their competitors innovating around them and setting the future standard once the time is right for it. Obviously not an unbiased viewpoint, but a view that’s fairly canonical at Amazon.
I mean..the customers are not wrong.
I do think that as organizations grow, the ability for components to be defined in smaller units without being enmeshed in a big-ass tf dependency graph is a big draw of the controller model. The flipside is this comes with accepting the operational overhead of k8s plus the attendant controllers/operators you're running and hiring/staffing accordingly. There are ways you can structure your terraform that avoids creating the tight coupling some folks don't like where you have to literally define the entire universe to change a machine image. Not to mention, there do exist tools that allow you to inspect and visualize tf state.
Right now, Terraform maximalism requires reproducible builds, which is not something most orgs can achieve.
Citation needed.
K8s has a whole bunch of footguns that people who don't want to manage infra can easily blunder into.
Terraform and ecs is not immature, and its fairly simple to maintain especially if they are just pushing updates without significant infra changes. (ie bumping the container version)
> Engineering time is expensive
which is why ECS is probably better, because its good enough for running a few containers that talk to a load balancer.
They will continue to make it more appealing to lock your software into their platform than to go with their thinner facilities for OSS, doing the minimum to keep up to date with trends in open source, just enough to lure you in and create “easier” paths until you can’t afford to leave.
We have this problem with Azure - sure it’s easier to get a knucklehead to push buttons and get an app running, but after years you’ll be scrambling to reduce costs. Good luck with that when all of your terraforms use Azure Resource Manager and all of your source code uses Azure Functions. Being stuck with microsoft/amazon and a team of engineers who spent their time learning vendor-specific skills instead of the open source tech that enables it, sounds awful.
Hahahaha
The AWS ecosystem has much of this baked-in. (Parameter Store, Certificate Manager, etc) Vendor lock-in is of course a concern, but for many, a theoretical one.
If you can choose an option that is going to be way less work even if it's "more complex" that is often the right choice as long as you understand what that complexity is and can pierce through the covers if necessary.
When I was at Rad AI we went with ECS. I made a terraform module that handled literally everything you're talking about, and developers were able to use that to launch to ECS without even having to think about it. Developers literally launched things in minutes after that, and they didn't have to think about any of those underlying resources.
A major benefit of k8s that is usually massively overlooked is it's RBAC system and specifically how nice a namespace per team or per service model can be.
It's probably not something a lot of people think about until they need to handle compliance and controls for SOC II and friends but as someone that has done many such audits it's always been great to be able to simply show exactly how can do what on which service in which environment in a completely declarative way.
You can try achieve the same things with AWS IAM but the sheer complexity of it makes it hard to sell to auditors which have come to associate "Terraform == god powers" and convincing them that you have locked it down enough to safely hand to app teams is... tiresome.
Why does the developer need to care about the certs and ALBs? The devops engineer you need to set up all those controllers could as well deploy those resources from Terraform.
As I showed in the diagrams from the article this application has a single ALB and a single cert per environment and the internal services only talk to each other through the rabbit MQ queue.
DNS, ALB and TLS certs could be easily handled from just a few lines of Terraform, and nobody needs to touch it ever again.
With EKS you would need multiple controllers and multiple annotations controlling them, and then each controller will end up setting up a single resource per environment.
The controllers make sense if you have a ton of distinct applications sharing the same clusters, but this is not the case here, and would be overkill.
Welcome to reality, where this is not the case.
I'm currently working at a company where we're using TF and ECS, and app specific infra is supposedly owned by the service developers.
In reality, what happens is devs write up some janky terraform, potentially using the modules we provide, and then when something goes wrong, they come to us cos they accidentally messed around with the state or whatever. DNS records change. ALB listener rules need to change.
Honestly, if they had said: "So instead we set up some bare-metal EC2 instances" I would be on-board.
It was definitely not about being contrarian but about offering first and foremost a more cost effective but still relatively simple, scalable and robust alternative to their current setup.
They have a single small team of less than a dozen people, all working on a single application, with a single frontend component.
Imagine instead this team managing a K8s setup with DNS, ALB and SSL controllers that each set up a single resource. I personally find that overkill.
Acme corp loves containers as much as everyone else. Containers provide great value. However, muddling around with docker/containerd/crio without some form of orchestration is just another path to a herd of fragile, neglected pet machines.
Acme corp is very different from the Big Tech world k8s came from. Acme corp doesn't have Linux kernel contributors and language developers and an IT payroll so large that the mundane devops people are lost in the noise. Acme corp must use what prevails and doesn't mystify. The "team" managing something is frequently one person, or less.
Acme corp ends up with a collection of pet VMs, all different. Lots of stuff is containerized. Some stuff isn't. Much of it is high-value: let one of those go down and an angry so-and-so will be on the horn right now, even if they haven't noticed for weeks. Most of it is low load: there will never ever be a world where these get reworked into scalable, stateless, distributed cloud apps.
How to get from a herd of pet VMs that happen to run containers (sometimes) to an orchestrated cluster of containers?
In my imagination the answer is something that looks like a mashup of Proxmox and docker-compose. It has the following features:
-- Orchestration: micro-VMs running containers scheduled across a cluster of nodes. The "micro-VM" term deserves some definition. I don't have a precise definition. I know Firecracker is too anemic and full featured VMs are too much. The micro-VMs of cloud-hypervisor are just about right. Above all "micro" just means simple, not necessarily small: a micro-VM that needs a lot of RAM and takes longer then 0.0003 us to start is fine.
-- Live migration: low-load, high-value applications need to stay up despite cluster node maintenance and despite never becoming candidates for re-engineering into cloud native applications. This feature is the #1 reason the VM part is necessary: live-migration is a native capability of KVM et al. that works well since forever, whereas containers (CRIU not withstanding) can't be live-migrated.
-- Trivially simple support of network transparent block storage: iSCSI and other network block storage is rampant at Acme corp because it's cheap, reliable, easy and fast enough. Re-engineering everything for dynamodb or whatever isn't an option. Fortunately, because we're running a micro-VM with its own kernel that has native support for network block (The other #1 reason for the VM part) we get this for free.
-- Simple operation: if it imposes a bunch of concepts that one can't already find in docker-compose it's wrong. Acme corp doesn't have the depth to deal with more and can't find that depth even if it wanted to, which it doesn't. Grug Brained Devops: not stupid, just instinctually uninterested in unnecessary abstraction, opaque jargon terminology, overengineering and fads.
Anyhow, that's my sincere attempt to answer your question. Respectfully, if you think you know of a solution you're likely wrong: I've wormed into every corner of that which prevails and it doesn't exist at the moment. That's why I claim there is an opportunity. I'm happy to be proven wrong, but you'd have to go a long way.
(disclaimer: I’m part of the team)
famously, StackOverflow
AWS/GCP/Azure knowledge definitely helps when deploying there, but it's also not really necessary to get something running.
What do you suggest I do before I get my first customer?
- Blast 1000 emails in one go and pray upstream accepts it?
- Push these to a database and keep checking it with a CRON?
- Run RabbitMQ on an EC2 and push 1000 messages there?
- Implement SQS in "15 minutes" at $0 cost?
At some point, it might make sense to move off for cost reductions, but tools like GCP Cloudrun (deploy dockerized app servers that scale dramatically better than k8s) can be really nice for a small team.
Remember, the comment I replied to said:
> As a software consultant myself, I'd probably stop the conversation right there and ask why they are building such a robust distributed system — SQS, SNS, etc — without any customers. Still want to be deployed in AWS? Toss the damn app on a single EC2 instance...
But in the article, it's pointed out that SQS and SNS would have been better choices at lower costs for low usage:
> When it comes to the application, if I had been involved from scratch, I would have recommended SQS and/or SNS for the message bus, which are free of charge at low utilization.
Basically, this company is in a pickle because they didn't have architecture experts from the beginning, and the development team started writing an application without much thought to areas where SRE and DevOps teams often get involved: scaling and cost optimization.
Which is another way to say that most startups seem to wait too long to hire DevOps/SRE teams because they are roles considered to be "cost centers:" work that is not directly contributing to the money-making business logic.
Sure, you might be lucky with the engineers you have, they might be savvy enough to wrangle a couple of hosts for you. But are they backed up? what's the disaster recovery procedure like? How do you rotate keys/passcodes, how do you audit who has access?
Unless you are doing something wrong, your biggest costs are engineering time.
> disadvantage of strongly coupling your infrastructure and often your code to AWS.
you can say the same thing about any infrastructure. yes, you can migrate k8s from one physical host to another. But, for that to be effective, you need to not be using any manage services. So that means you're on the hook for all the painful things like DB state and recovery, messaging systems, etc,etc,etc.
Then you think as a business, what are you actually spending money to do? maintain the code that makes you money, or maintain the system underneath it, just in case you might need to move to save opex thats almost certainly going to be less than one engineer.
98% of companies have no issue with scale. They have issues with availability, features, backups, speed. exchanged 10% of an engineer's salary to never really have to deal with any of those issues is a good deal for most[1] companies
[1] most, but not all. However unless you are getting close to spending 1 engineer in AWS fees, moving to self hosting is nothing but premature optimisation.
What you say may make sense for a large corporation with hundreds of developers from many teams, all sharing a single cluster, but remember this is a pre-revenue startup with a single dev team of less than a dozen people.
But then with a large cluster you will struggle with splitting the costs. In such scenarios I'd rather give each team its own AWS account and have some devops people set up everything from the landing zone.
In this particular case, every service is set up from less than 100 lines of Terraform, which includes Docker image build and push, as well as the task and service definition that deploys that docker image.
yes, they need to handle Terraform, but it's really not so different from the previous Docker-compose YAML file, not to mention the way it would look if converted to K8s YAML.
However IAM isn’t really for humans. It is just really hard to reason about roles programmatically. Some of the new minimal rights discovery from cloud trail analysis leads to an interesting pattern I’ve not seen a lot of : in lower environments permissions are wide open, but a capture of the required roles happens pre-prod and is used and tested against in preprod then promoted to production. This seems like a really useful pattern, and it exposes where your integration tests are incomplete.
My thought in this space is go with whatever is the least effort. There is no meaningful portability between cloud providers using anything right now. But if you don’t make your stuff baroque it’s also not hard to port between one provider and another from an infrastructure specification point of view. I think the “lock in” at the specification of infrastructure is a canard. Lock in happens to a much deeper level at the integrations between dependencies inside the customers own infrastructure and the stored state. Having 1000 services across an enterprise integrated inside (aws|gcp|azure|oracle|on prem) makes it hard to switch anywhere else from a basic connectivity, rights, identity, etc level - so hard that it degenerates into why “hybrid” cloud infrastructures basically fail. But that means switching is either all or nothing, which is impractical, or you bite off this integration problem, which is apparently impossible or at least absurdly hard. Then you’re also left with stored state, which is heavy and difficult to move, let alone expensive, but also the challenge of moving the state over with the state managing services without downtime or loss of data is also pretty hard. Hard enough that you can’t expect every team owning the 1000 services can do it.
So, you can pick k8s and run an abstraction on an abstraction, or not, but when it comes time to break your lockin, k8s won’t buy you anything.
Where are you getting this from? If you use k8s as base layer, lift and shift your infra or even running multi-cloud is not much harder than bringing up new region on the same cloud
While it's not impossible to do with a single instance, you can spend a lot of time shaving that yak. It's reasonable to pay a bit more to have that stuff handled for you in a robust way.
* automatic service account for each workload
* automatic service to service auth to 3rd party services
* the audit log
* role based access control
* well defined api
* the explain subcommand
* liveness and readiness probes
* custom resources
The list goes on, but the big ones for a small team just getting started are workload identity and security.K8S is basically another answer to Conway’s Law. Every startup I’ve worked at switched to it because then the infrastructure could map more closely to the code. Not unlike microservices at a higher level.
The old-skool approach is depending on a team of SREs or sysadmins to provision hardware for you and basically handle the deployment, which K8S plus container images basically abstract away.
Not to say that dedicating resources to platform development (k8s style) isn’t a time sink when you’re trying to build product and find a fit in the market.
“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure — and which mirrors the skills of its key creators.”
K8s is designed to solve Google problems. Your startup will not have Google problems. Your startup will have Pinterest problems, or Gitlab problems, or Reddit problems — at which point you do not need K8s; you need someone who knows infra (which I expect devs to be working on distributed systems to understand).
Using K8s in a startup context is a sign of conformist thinking, detached from any critical aspect.
This assumes that K8s won't require a "team of SREs". My experience is you need the same amount of SREs to maintain Kubernetes, probably more, because now you have a complicated control plane, a networking nightmare, then you layer that on top of resource-contention issues, security issues, cloud provider compatibility issues, buggy controllers, the list goes on.
The only thing K8s is great for is the maintainers, the consultants, and highly experienced SREs that inevitably have to be hired to clean up the mess that was created. This is my experience working in two similar sized environments, one with >1M containers, and another with an equivalent scale of bare metal servers.
applications running on bare metal don't have resource contention issues? or security issues?
You're right that K8S is an answer to Conway's Law: our people don't get along or can't collaborate or we have too many of them, so we will split them into team per service and force them to collaborate over network interfaces. Likewise, the infrastructure people will communicate with the other teams using Dockerfiles.
If you tried to replicate them on the same cloud provider, end to end, it would cost far more than they charge.
For small projects the cloud is not needed, and a lot of effort that won't pay off. The only case where it'll pay off is if you "go viral" and rapidly need to increase capacity.
This is not free. While the cloud helps with scaling, your application still needs to support it. So there's a development cost to it, even when starting.
Then, if you scale, the cost makes it almost a necessity to rapidly get back off of the cloud ...