Cloud, Why So Difficult?

104 points by inssein 3 years ago | 151 comments

The complexity of the cloud exists because the cloud vendors allows a user to do advanced things if the user understands how. Using AWS, GCP, and Azure as Infrastructure-as-a-Service (Iaas) means that there's no easy mode.

If you want easy (or easier) mode, you'll have to use a Platform-as-a-Service (PaaS).

The major cloud vendors might have problems with quirky designs and poor documentation, but beyond that is necessary complexity.

You want a high-availability website allows user-uploaded files and does asynchronous task processing? You're probably going to have to get familiar with servers, load balancers, queues, and object storage, at a minimum.

You want it all to be secure? You're going to have to configure network rules/firewalls and set up lots of access policies.

There's no free lunch.

cosmotic 3 years ago | |

How many totally-different-yet-common sets of requirements are there for user uploads or task processing do you think there are that necessitate the ultimate flexibility and complexity? I suspect vendor lock-in is a more likely cause of the complexity.

Spivak 3 years ago | | |

> I suspect vendor lock-in is a more likely cause of the complexity.

I think people give large organizations credit for being mustache twirlingly evil when the collective consciousness that makes up AWS is simply not smart enough to be this evil. If AWS had the coordination to do this the product would be better.

It's much more likely that the complexity is the result of a huge number of teams working independently and integration complexity being 2^n. Like AWS had one good transformative idea to make coordination easier which is to be API first but that only forces superficial consistency.

nightpool 3 years ago | |

"You want a high-availability website allows user-uploaded files and does asynchronous task processing? You're probably going to have to get familiar with servers, load balancers, queues, and object storage, at a minimum."

Really? I disagree. I could probably build that with Rails and Heroku in an afternoon, after creating a single S3 bucket and an access key for presigned POST. AWS has "necessary complexity" in the same way a giant hole in your head improves your brain's cooling potential. (i.e. maybe, in some very rare cases, but you almost certainly don't need it)

esperent 3 years ago | | |

Now scale it to twenty million concurrent users.

0xedd 3 years ago | |

All of these are basic requirements. Yet they're available for use only after AWS specific proficiency. Why do I need an AWS setting to permit DB access when the DB had that feature already? I don't need the extra layer of complexity AWS puts on most things.

Tangent - Their docs are abysmal. Written like a novel which I'm meant to cross reference to their SDK.

throwbadubadu 3 years ago | |

Only partly agree, definitely not all complexity there is necessary.. part is vendor lockin and another one is their own grown complexity due to rewiring/wrapping their own stuff for reuse in different forms.. and grown complexity that maybe made sense at some point, but grew too much - as almost everywhere.

fulafel 3 years ago | |

Are you saying that there is already only little incidental complexity and friction, and it's mostly essential to the problem domain?

What evidence or convincing arguments are there for your position?

For me it seems clear that this is not the case, the needless friction in eg CDK dev experience seems ridiculous.

re-thc 3 years ago | |

> The complexity of the cloud exists because the cloud vendors allows a user to do advanced things if the user understands how.

The complexity of the cloud exists because it wasn't designed very well and all reactionary.

You look at AWS and it feels like things are getting tacked on because there's "demand" instead of thinking of what the platform should look like and building it out. Every service is done by a different team that doesn't talk to each other as well. There's no consistency anywhere.

> If you want easy (or easier) mode, you'll have to use a Platform-as-a-Service (PaaS).

It's been blurred a long time ago so how do you make this distinction? They all have PaaS features / services.

> There's no free lunch.

You had to pay to begin with so what's free?

quickthrower2 3 years ago | |

Yeah wait till you need to do SOC2 and friends! Azure has tools to help you with this.

sacnoradhq 3 years ago | |

The "free lunch" was sold in a form of the hybrid cloud orchestrator. These pretended to make all clouds look the same, but were more shallow layers of abstraction that didn't add much value.

nathants 3 years ago | | |

to be fair it was a brilliant play. my sympathies go out to all those who ended up holding the bag.

nathants 3 years ago | |

this is true, but doesn’t stop mistakes from being made or criticism from being useful.

Nihilartikel 3 years ago |

This thought hits home for me:

> When I started programming, I used Borland C++. It used to take about 100ms to compile and run a program on an IBM PC AT machine (TURBO ON). An average iteration cycle in the cloud takes minutes. Minutes! Sometimes dozens of minutes!

I'm a fast-feedback fan myself, and my weapons of choice in refuge from a dark decade of c++ are the Python notebook and Clojure REPL. With that as it is, the lurching tedium of cloud development (infrastructure especially) makes me want to pull my skin off.

What is so galling about it is that, for dev purposes, almost none of these SaaSes and cloud services are really so 'big' that they couldn't be run on a beefy local workstation for development. The galling reason that I have to wait N minutes for terraform or cdk or whatever to rebuild some junk and deploy it to a bunch of neigh un-remote-debuggerable-without-firewall-shenanigans lambdas and docker containers is commercial moat-keeping for the services.

At least Azure and GCP put some token effort into local emulators of their services. AWS work has to rely on the valiant but incomplete efforts of LocalStack if they want a fast and disposable way to test infra.

swader999 3 years ago | |

Yeah I miss the days when I used to run a web site for a niche crowd of about ten thousand users on my bedroom pc. Now I'm pushing buttons on heroku and getting grief for dyno overages.

nathants 3 years ago | |

stop using terraform et al, start using boto3 from your python notebook.

patching a lambda zip takes seconds. it’s done before you can alt tab and curl.

lemarchr 3 years ago |

I don't understand why there's so much negativity here. From my cursory perusal of the docs this looks like a simplified, vendor agnostic re-imaging of something like CDK, with cool tooling including visualisations and out-of-the-box support for local dev. Where's the beef?

You think cloud is too expensive or unnecessary? Fair enough, this tool is not for you.

You think cloud infra is necessarily complex because you need to support <insert use case here>. You're right! This tool is not for you (yet?).

You don't need this because you already know <CDK / Terraform / whatever abstraction is already in your repertoire>? I agree, the juice is probably not worth the squeeze to learn yet another tool.

Are you approaching cloud for the first time or have been managing existing simple infra (buckets, queues, lambdas) via ClickOps and want to explore a feature constrained (hence easy to grok) Infrastructure as Code solution? Maybe give this a look.

While it's still early days, I suspect there will be many who will find this useful, and congratulate the authors for their efforts!

toiletduck 3 years ago | |

> simplified, vendor agnostic

I just don't see this being true. Being "cloud agnostic" likely means it's an incredibly leaky abstraction where at the first sign of trouble, you're going to have to understand winglang + the specific provider API. Any IaC product requires you intimiately understand what it's actually doing if you care about security and performance. Just because it's a managed service doesn't mean you get to ignore all it's implementation detail, right?

All the cloud providers give you a function as a service, or a nosql database, or a file bucket: ignoring all the nuance as an agnostic is at a minimum leaving optimisation on the table and more likely dangerous and expensive, surely?

shaiber 3 years ago | | |

Hey, I understand where you're coming from, but I still think there is room abstraction when done right. I wrote about it here: https://docs.winglang.io/faq/why-cloud-abstraction

fulafel 3 years ago | |

To add, CDK is quite distant vs this - it's AWS proprietary, painfully slow testing cycle, lots of limitations & faults leak in due to being based on CloudFormation, has nothing like the compiler plugins, etc. Plus the stuff in https://docs.winglang.io/faq/why-a-language

zmmmmm 3 years ago |

The annoying thing about using cloud infra is finding that all your skills and knowledge have to be relearned N times over for N different cloud vendors for a huge array of their services, mostly to do basic things that you already know how to do anyway in traditional environments.

The fact they all offer similar but subtly different versions of every type of product and that cross platform tools like Terraform etc have some ability to paper over these only makes it worse. (Your google cloud bucket is just like your S3 bucket right? Until it's not). When I rant about platform independence people think I have a philosophical objection to lockin, but its really much more basic than that. I just don't have time to learn thousands of vendor specific APIs, bugs, constraints etc on top of the perfectly good built up knowledge I have from 25 years of working with software systems already. I am busy using all that time and brainspace trying to keep up with the fundamental knowledge that is actually important.

nathants 3 years ago | |

this.

don’t learn gcp if you know aws. don’t learn android if you know iphone. don’t learn ruby if you know python.

instead use that time to building interesting things. these tools are much more similar than different, and their differences are inconsequential.

wackget 3 years ago |

"Cloud is too difficult, you have to learn tons of stuff to use it!"

"BTW here's the new product I'm selling which requires you to learn a new cloud-oriented programming language and has its own CLI and has diagrams like [this](https://docs.winglang.io/assets/images/arch-f803472c761aa198...) on its introduction page!"

The cognitive dissonance is overwhelming...

magikstm 3 years ago |

The cloud is renting someone else's computer at a way higher price than it would cost you to use your own.

If it makes things difficult, you shouldn't be using it.

It is overhyped and it sucks for most use case.

siliconc0w 3 years ago |

The difficulty with cloud is Joel's rule: "All non-trivial abstractions are leaky" they just abstract complexity and eventually the abstraction breaks and you actually need to know some amount of linux, networking, security, or distributed system engineering to fix it.

The easiest way to not get bitten by this is to avoid the abstractions and keep it simple as long as possible. Most apps can probably do fine with a single beefy box and a local sqlite database - this can likely scale vertically indefinitely with moore's law and still probably have less downtime than if you relied on all the fancy cloud technology.

namaria 3 years ago | |

What I don't get in this discussions is, why not just target containers and after that do the least amount you need to have you container online somewhere? At most you'd need to do Kubernetes if it gets complex enough but by then you have manifests that would work on any cluster anyway, doesn't matter which cloud you use. Am I being too naive?

nathants 3 years ago | |

learning cloud is not necessarily using cloud for all things.

cloud adds capability to any engineer.

if i’m on coffeeshop wifi with my low power laptop, and i need to do something intense like compile linux, i’m sol.

unless i know aws. then i can open a new terminal, spin up a massive spot instance for 19.27 minutes, get that done, then self destruct[1].

being able to test lambda to s3 io, or ec2 to s3 io, with the same ease one uses grep and sed, is for great good. also it’s fun.

1. https://github.com/nathants/mighty-snitch/blob/master/kernel...

timtam33 3 years ago |

All big tech platforms make their riches mostly this same way over the last ~40 year: solve 80-90% of the problem in a super simple, slick manner that is priced competitively. Then as customers build out the unique parts that match their needs (the last 10-20%), bleed them dry with features that are more expensive and establish lock-in.

timw4mail 3 years ago |

I'm sorry, what's the advantage to the cloud again?

This just reminds me why I just run (my personal) web apps on the server in my basement: it's actually simpler.

I really think the worst part of programming is dealing with the development environment.

tempfortwitt90 3 years ago |

I currently run a micro-saas product on a $4 a month namecheap server using the LAMP stack. It runs fast for all 34 companies, or 400+ employees, using it.

I've looked into moving it to Google cloud or AWS and it just seems daunting. Honestly, I use ftp, cpanel, and phpmyadmin.

Is there a way to get this product into the 'cloud' in case it grows, easily?

cpnz 3 years ago | |

You'll never match the price, but if you use some small rds and ec2 instances you wont be paying much. Less if you use lambda and api gateway.

Moving lamp stacks is a piece of cake. If you don't want to do it you could find a freelancer pretty easily.

namaria 3 years ago | | |

>Less if you use lambda and api gateway.

There's the slippery slope into vendor lock-in via combinatorial explosion of complexity

matus_congrady 3 years ago |

(Please take my opinion with a grain of salt, as I might be biased - I'm a founder at a startup that solves a very similar problem).

The cloud (and by "cloud" I mostly mean AWS) in general is indeed insanely complex. Not only is it complex and hard to use for dedicated and trained DevOps/Cloud experts, it's even more overwhelming for developers wanting to just deploy their simple apps.

This statement is in my opinion almost universaly accepted - during our market research, we've interviewed ~150 DevOps/Cloud experts and ~250 developers that have been using AWS. Only ~2.5% of them have said that the complexity of AWS is not an issue for them.

That being said, I understand that AWS has to be complex by design. Not only it offers ~250 different services, but the flexible/configurable way it's designed simply requires a lot of expertise and configuration. For example, the granularity and capabilities of AWS IAM is unparalelled. But it comes at a cost - the configurational and architectural complexity is just beyond what an average AWS user is willing to accept.

An alternative to the cloud complexity are the PaaS platforms (such as Heroku or Render). But they also have their disadvantages - mostly significantly increased costs, lower flexibility and far less supported use-cases.

At https://stacktape.com, we're developing an abstraction over AWS, that is simple enough so that any developer can use it, yet allows to configure/extend anything you might need for complex applications. Stacktape is like a PaaS platform that deploys applications/infrastructure to your own AWS account.

We believe that Stacktape offers the perfect mix of ease-of-use, productivity, cost-efficiency and flexibility. It can be used to deploy anything from side projects to complex enterprise applications.

I'll be very happy to hear your thoughts or to hear any feedback.

nathants 3 years ago | |

those 150 cloud experts have a serious conflict of interest wrt cloud complexity.

those 250 developers are likely entrapped by the cloud experts.

this is fine, and is a rich market that should be served.

regardless, perceived complexity or generated complexity are not the same as actual complexity. all of these complexities are real, some are optional.

erulabs 3 years ago |

Applications developers often tell me about (to quote another post here) "the lurching tedium" of cloud infrastructure development.

Growing up in a datacenter, opening tickets and checking them weekly, hoping for the vendor to finally ship the right backplane; datacenter engineering used to take weeks, months, years. Waiting 30 seconds for terraform plan to check 120 resources which corresponds to thousands of pounds of metal and enough wattage to blow up the city I live in... doesn't seem too bad. That said, I understand where you javascript folks are coming from with your iteration loops, but still, you've gotten understand: it's so easy now.

Leaky abstraction, sure. But it's always great to see innovation in cloud infra.

mkl95 3 years ago |

> It doesn't make sense that every time I want to execute code inside an AWS Lambda function, I have to understand that it needs to be bundled with tree-shaken dependencies, uploaded as a zip file to S3 and deployed through Terraform. Or that in order to be able to publish a message to SNS, my IAM policy must have a statement that allows the sns:Publish action on the topic's ARN. And does every developer need to understand what ARNs are at all?

The Terraform AWS provider is a very thin abstraction. If your needs are not too specific, there are probably a few higher level abstractions out there that you can use. This is one of the main reasons PaaS are so popular.

acdha 3 years ago | |

It’s also weird not to know about terraform modules. There are a lot of things you _can_ configure because there are a lot of things people need to configure but if you’re using something like https://github.com/terraform-aws-modules/terraform-aws-lambd... it’s only a couple of lines of config for a Lambda.

CharlesW 3 years ago | |

Coincidentally, the author’s startup is solving the problem they complain about for most of the article. I think they’re trying to be Vercel?

bob1029 3 years ago |

I am finding "cloud" to be a pleasant experience at the moment.

We are building a B2B service in Azure using Az Functions & Az SQL Database as the primary components. That's about it. We figured out you can "abuse" Az Functions to serve all manner of MVC-style web app (in addition to API-style apps) by using simple PHP-style templating code. Sprinkle in AAD authentication and B2B collaboration and you have a really powerful, secure/MFA auth solution without much suffering. Things like role enforcement is as simple as taking a dep on ClaimsPrincipal in the various functions.

The compliance offerings are really nice too. Turns out if you use the compliant services without involving a bunch of complicated 3rd party bullshit, you wind up with something that is also approximately compliant. For those of us in finance, this is a really important factor. 10 person startups don't have much bandwidth for auditing in-house stacks every year. If you do everything "the Azure way", it is feasible for you to grant your partners (or their auditors) access to your tenant for inspection and expect that they could find their own way around. If you do it "my way" you better be prepared to get pulled into every goddamn meeting.

I am starting to wonder if not all clouds are made equal anymore. We also have some footprint in AWS (we used to be 100% AWS), but it's really only for domain registration and S3 buckets these days. GCP doesn't even fly on my radar. I've only ever see one of our partners using it.

theonething 3 years ago |

If you want easy and simple, use Render, fly.io, etc. Those cover 90% of the cases K8s be damned.

tracker1 3 years ago | |

Similar for Deno deploy and Cloudflare Workers. In the end, the layers you get on AWS are not the same for all providers. It varies a lot from platform to platform in terms of "cloud".

I'm working on a relatively complex cloud solution centered round AWS Lambda, SQS/SNS and DynamoDB with many different lambda endpoints and isolated databases. It works, but it's incredibly hard to test. The fortunate thing is there's a system in place to stand up an environment at the PR level, but even that take almost an hour to test/build/deploy/test/deploy for the PR and every commit after the PR is made. Local runs are sorely lacking. And I can only imagine the number of environments with similar issues.

I've been playing with Cloudflare Pages/Workers and CockroachLabs (CockroachDB Cloud) on a personal/side project and it's quite a bit different. Still in the getting groundwork done and experimenting phase, but wanting to avoid the complexity of the work project while still providing enough scale to not have to worry too much about falling over under load.

Not every application needs to scale to hundreds of millions of users, and it all comes at a cost that may not be worth the price of entry. The platform at work is at well over 1.6 million in terms of the story/bug/feature numbers at this point... it's a big, complex system. But working in/on it feels a few steps more complex than it could/should be. It'll absolutely scale horizontally and is unlikely to fall over under load in any way... but is it really worth it with the relatively slow turn around, in what could have still used dynamo, but the service layers in a more traditional monolith with simply more instances running?

I have to say, I'm somewhat mixed on it all.

topspin 3 years ago | |

> K8s be damned

I've noticed the frequency of K8s headlines has diminished. (Very) roughly two years ago you never saw a HN front page without one or more K8s headlines. I suspect it has saturated the market that it appeals to.

cramjabsyn 3 years ago |

Sorry but I’m struggling to read with the 20 characters per line formatting on mobile

shaiber 3 years ago | |

Hey, try this version: https://docs.winglang.io/blog/2022/11/23/manifesto

HL33tibCe7 3 years ago |

Website is unreadable on mobile

indigodaddy 3 years ago | |

I was going to comment “website, why so difficult (on mobile)” before I saw your comment

sshine 3 years ago | | |

I was thinking “Responsive CSS, why so difficult?”

shaiber 3 years ago | | |

Sorry about that, please use this link: https://docs.winglang.io/blog/2022/11/23/manifesto

iskander 3 years ago |

>To be honest, give me the developer experience of the 90s. I want to make a change, and I want to be able to test this change either interactively or through a unit test within milliseconds, and I want to do this while sitting in an airplane with no WiFi, okay? (we didn't have WiFi in the 90s).

I'm going to make an impossible request and ask that any readers ignore everything else they know about "crypto", but...this is one of the things that feels right in EVM development compared with normal cloud applications. Especially with frameworks like Foundry, unit tests for distributed applications run very quickly on local copies of the whole production environment. It's a lot more fun than anything which touches AWS.

Obviously, there are some major downsides (such as Ethereum being comparable to a late 70s microcomputer in computing power). But the model of a single well specified execution + storage protocol might be worth porting over to non-financialized cloud application development.

eropple 3 years ago | |

> But the model of a single well specified execution + storage protocol might be worth porting over to non-financialized cloud application development.

To a first approximation, it exists, and it's called Cloudflare Workers (and KV).

If I had to bet money, mine would be on the bet that Workers represents an early example of what will be in-the-main development in a decade.

beckford 3 years ago |

Was in agreement until I drilled down to the following statement on the Wing GitHub page and in the docs (https://docs.winglang.io/faq/why-a-language#very-cool-but-wh...):

“In existing languages, where there is no way to distinguish between multiple execution phases, it is impossible to naturally represent this idea that an object has methods that can only be executed from within a specific execution phase.”

This is not true. Several languages (Haskell, OCaml, F#, Scala, etc) allow you to define and use monads. Granted, monads are not something many developers know about … but it may make sense to learn about them before writing a new language.

Otherwise, this is a great read.

JoelMcCracken 3 years ago | |

Great point. This also reminds me of discussion I saw somewhere else about programming languages with execution capabilities support for security, the author saying that no languages supported it, and a commenter saying that this was basically the same as an effects system.

I'm personally a professional Haskell programmer and quite like it, but I think we are circling a core notion: There are many problems with having programming languages where any code can do literally anything at all, and being able to restrict it is extremely powerful.

Constraints liberate, liberties constrain. You can always loosen strictures, but once loosened, they are extremely hard to reintroduce.

shaiber 3 years ago | |

Thanks, we'll rephrase to say that it is impossible in most languages, not all. BTW, the company that is behind the project is called Monada (https://monada.co), so we've heard of monads :) Would be happy to hear what else you think about the language

carterschonwald 3 years ago |

I hate to say this, but this is straight up rediscovering monads.

They have a notion of phases of execution which are different execution contexts with an ordering. And yes most languages don’t have a decent facility for expressing this or staged computation. Let alone a notion of computation phases that map to distributed systems state.

KptMarchewa 3 years ago |

I would love something like networking-as-a-service. My ignorant ass do not understand the specifics related to it. I would love to option to specify that service A should be able to call service B irregardless of IP schemes, peering, firewalls, service discovery and 1000 other layers.

ecliptik 3 years ago | |

Tailscale [1] can do this. I've always had a limit of how much networking I can grok, and Taiscale basically let's me run a 90s style LAN across clouds and my local network like they're all connected to the same switch and VLAN. No port forwarding, firewall rules, or subnet management.

It can be made even more secure with some relatively incompatible ACLs sprinkled in too.

1. https://tailscale.com/

JohnMakin 3 years ago | |

The problem with your request is there's a million ways to do that, some much less secure than others - I could satisfy your request by putting service B's security group wide open to 0.0.0.0/0, but then every other possible service could also reach it.

jsz0 3 years ago |

Utilizing an infinite amount of computing resources is probably always going to be an inherently difficult thing. It's a task made more difficult by all he snake oil salesman promising perfect easy solutions. The only workable solution is to find out what works for everyone else and, even if it doesn't fit your needs perfectly, figure out how to make it work for you. Never underestimate the power of strength in numbers. If the entire industry is utilizing a technology or paradigm you can either get on board or get left behind.

sisve 3 years ago |

I agree so much with the article about cloud being to complicated and fast feedback being import. I wish them the best of luck.

PaaS solutions and not IaaS is also a solution for many.

No code / low cose beeing a solution for someone else.

It's not to many days since my solution was on HN. Windmill.dev is really something special.

And by mine i do not mean that i has any affiliation with windmill, just that it solves my problem. quick iterations including ui building.

But it's not a low code plaform either. I would call it a code platform for developers

Thaxll 3 years ago |

I don't understand what that solution brings when you have Terraform? Also linking your code with you infra code, who thought this was a good idea?

Last question how actually the simluator works, is it one of those case where it try to emulate some high level concept but then your prod code break because the simulator was 40% accurate?

shaiber 3 years ago | |

I wrote about comparison to Terraform here: https://docs.winglang.io/faq/differences-from-other-solution...

About the simulator, it is a functional simulator to be able to test and interact with the business logic of the application. There are other solutions, like LocalStack to simulate the non functional parts too.

Thaxll 3 years ago | | |

I mean your comparison is weird, in one line you create a bucket what are the permissions exactly? Why is there no region?

Edit: after reading: "Generated automatically from intent", well it's a red flag for me.

brunkerhart 3 years ago |

What complexity are we talking about? If your app needs queues, databases or shared storage, it is because of how you designed it. Now just imagine you’re doing this yourself: install servers, os, software, configure integrations, do patches, upgrades, repair hardware. Would it be any simpler?

quickthrower2 3 years ago | |

The comparison should be this vs. aws rather than this vs. bare metal … but I get your point.

Apps need some kind of persistence state. That requires some thought and annoyances to deal with however you do it. There is no leakproof abstraction. Takes 5s to retrieve the data? Get digging!

nathants 3 years ago |

the core of aws is actually pretty good. it’s just wrapped in several layers of nutshells which aren’t needed except by the equally thick enterprise.

aws is also inescapably imperative. it’s how the api and everything behind the scenes work.

aws is gonna have a lot of nonsense best practices, both 1st and 3rd party, that you have to aggressively ignore.

if you can come to peace with these three truths, aws is great. i use it like this:

https://github.com/nathants/libaws

for those trying to improve, the best aws docs are currently the go sdk and gopls.

paulddraper 3 years ago |

The part I don't get is why in 2023 AWS still pushes me to VPC and NAT my servers like it's 1990.

It's just so unnecessary. Greybeards moved to the cloud but never changed.

nathants 3 years ago | |

a vpc is non-optional. a nat is very optional.

frogperson 3 years ago |

There is no way I'm trusting anything built with NPM or Javascript to run my cloud infra.

efxhoy 3 years ago |

We decided to use DynamoDB as a cache for our expensive-to-compute statistics from the data warehouse so we can show them to users without putting a big read load on the data warehouse. DynamoDB scales really well right? And big statistics tables don't belong in the main app database. I'll build the statistics in our postgres data warehouse and use a little ruby script I wrote to push them from postgres to Dynamo with batch_write_item, shouldn't take long.

After spending a couple of days in terraform (I'm no infra expert) creating roles to assume, cross account permissions, modifying my script to assume those roles, figuring out the ECS run-task api syntax and some other things I'd rather forget about I kicked off the jobs to copy the data and left for the weekend. Sunday cost alert email put that thought out of my head: I just spent 8k USD on writing 2 billion rows to two tables because I misunderstood how Dynamo charges for write units. I thought I was going to spend a few hundred (still a lot, but our bill is big anyway) because I'm doing batch write requests of 25 items per call. But the dynamodb pricing doesn't care about API calls, it cares about rows written, or write capacity units, or something. OK, so how do we backfill all the historical data into Dynamo without it costing like a new car for two tables?

Apparently you can create new dynamodb tables via imports from S3 (can't insert into existing tables though) for basically no money (the pricing is incomprehensible but numbers I can find look small). Now I just need to write my statistics to line delimited dynamodb-flavored json in S3 (statements dreamed up by the utterly deranged). You need to put the type you want the value to have as the key to the value you see. A little postgres view with some CTEs to create the dynamodb-json and use the aws_s3.query_export_to_s3 function in RDS Postgres and I had a few hundred GB of nice special-snowflake json in my S3 bucket. Neat!

But the bucket is in the analytics account, and I needed the dynamo tables in the prod and staging accounts. More cross account permissions, more IAM. Now prod and staging can access the analytics bucket, cool! But they aren't allowed to read the actual data because they don't have access to the KMS keys used to encrypt the data in the analytics bucket.

OK, I'll create a user managed KMS key in analytics and more IAM policies to allow prod and staging accounts to use them to decrypt the data. But the data I'm writing from RDS is still using the AWS managed key, even after I setup my aws_s3_bucket_server_side_encryption_configuration in terraform to use my own managed key. Turns out writes from RDS to S3 always use the S3 managed key, no one cares about my aws_s3_bucket_server_side_encryption_configuration. "Currently, you can't export data (from RDS) to a bucket that's encrypted with a customer managed key.". Great. So I need to manually (yes I could figure out the aws api call and script it I know) change the encryption settings of the files in S3 after they've been written by RDS to my own custom key. And now, 4 hours of un-abortable dynamodb import jobs later, I finally have my tables in prod and staging in DynamoDB.

Now I just need to figure out the DynamoDB query language to actually read the data in the app. And how to mock that query language and the responses from dynamo.

At least I'm learning a lot...

nathants 3 years ago | |

lots of learning! also sounds like a bit of pre-existing aws state.

for round two, try:

- spinup a new subaccount to ensure you have total control of its state.

- data goes in s3 as jsonl/csv/parquet/etc.

- lambdas on cron manage ephemeral ec2 for when heavy lifting is needed for data ingress, egress, or aggregation.

- lambdas on http manage light lifting. grab an object from s3, do some stuff, return some subset or aggregate of the data.

- data granularity (size and layout in s3) depends on use case. think about the latency you want for light and heavy lifting, and test different lambda/ec2 sizes and their performance processing data in s3.

lambda is a supercomputer on demand billed by the millisecond.

ec2 spot is a cheaper supercomputer with better bandwidth on a 30 second delay billed by the second.

bandwidth to s3 is high and free within an AZ, for ec2 and lambda.

bandwidth is so high that you are almost always bottlenecked on [de]serialization, and then on data processing. switch to go, then maybe to c, for cpu work.

dynamodb is great, but unless you need compare-and-swap, it costs too much.

showdeddd 3 years ago |

Cloud is not difficult, they just charge too much per API request.

namaria 3 years ago | |

Putting a network in the middle of your system, and making everything go through API requests (or RPC shudders) is almost never a good idea. If you have planet scale systems and the budget to hire enough bright engineers and still turn record breaking profits, sure. If not, you're just draining value from your bottom line to the vendor's

bring cloud; let queue = new cloud.Queue(timeout: 2m); let bucket = new cloud.Bucket(); let counter = new cloud.Counter(initial: 100); queue.addConsumer(inflight (body: str): str => { let next = counter.inc(); let key = "myfile-${next}.txt"; bucket.put(key, body); });