A Decade of Dynamo(allthingsdistributed.com) |
A Decade of Dynamo(allthingsdistributed.com) |
Last year I built out a recommendation engine for my company; it worked well, but we wanted to make it real-time (a user would get recommendations from actions they made seconds ago, instead of hours or days ago). I planned a 4-6 week project to implement this and put it into production. Long story short: I learned about DynamoDB and built it out in a day of dev time (start to finish, including learning the ins and outs of DynamoDB). The whole project was in stable production within a week. There has been zero down time, the app has seamlessly scaled up ~10x with consistently low latency, and it all costs virtually (relatively) nothing.
The flip side: Dynamo gets expensive and it gets expensive quick, and being a custom API (and, indeed, a very different way to think about datastores) makes migration difficult.
It's great to use, if you understand the tradeoffs. Just make sure you understand them before you make the leap.
DynamoDB's pricing scales sublinearly with volume; if it starts getting expensive it was an initial misuse of DynamoDB that got obvious with scale. There are a lot of factors that go into whether you should use DynamoDB and how you implement it. I recommend anyone who is considering using it very carefully understand this page first: http://docs.aws.amazon.com/amazondynamodb/latest/developergu...
Of course as you scale this an be less true, so it's all in the application.
Lots of good memories of time spent with him, and one of the sad aspects for me of leaving Amazon.
His blog writings are really interesting. If you haven't already, I suggest you search the archives, there are several hidden gems there.
Can you even imagine? I want to know what his direct report structure looks like.
- [0] https://hackernoon.com/the-problems-with-dynamodb-auto-scali...
Also, if you need to scan a ton of items to assemble your desired result, you should re-think using DynamoDB as a whole.
The main problems faced are not the ability to scale or reach performance benchmarks or keep data safe. They are operational, and primarily problems of infrastructure complexity and management. Oh, and having developers architect and manage the operations of a really freaking huge service is a bad idea. (No offense intended - those developers don't want to be woken up in the middle of the night either)
The only downside is we do find ourselves sometimes implementing relational DB functionality at the application level to compensate for Dynamo DB's "flexibility." Postgres is still the go-to for data that is relational in nature. But man, letting Amazon worry about hosting and scaling is also pretty awesome...
Yep, this is A-grade crazy, and exactly my point. I would question if it's "sometimes", or "actually almost all the time, now that we think about it, there's not much that we CAN do with DynamoDB without writing application level database functionality."
Production DB without backups is unthinkable. It just takes one human mistake to erase tons of data. Consistent and regular backups are must have for any production system.
No, sorry, it was Memcached and Bigtable paper that popularized "NoSQL" term. Although there were many NoSQL databases tracing way back to 60s [1], those were the ones that "served as catalyst" for the term "NoSQL".
[1] http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.h...
The phrasing “served as a catalyst” seems right — it doesn’t imply the only catalyst.
Actually I'm not right: http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.h...
Other than a dirty js config or a prototype store this db is useless.
One of my prior gigs was pushing a billion data points a day through DynamoDB without it breaking a sweat. We were paying for it, too--but it was there and it worked.
I stay by my comment that dynamodb is a joke wrapped in thick layer of marketing crap.
You can read more about our ACID transactions here: https://fauna.com/blog/consistent-transactions-in-a-globally...
It is the 10th anniversary of Dynamo as a CS milestone.
It feels like using a relational model (with SQL) is fundamentally different in that you're designing how you're storing your data based on how it relates to other data. It's certainly a whole lot easier to design for, but it makes me wonder if it is possible to close that gap in terms of "easy to design / SQL / vertically scalable" and "serverless / cheap / horizontally scalable".
[1]: https://aws.amazon.com/blogs/aws/new-auto-scaling-for-amazon...
https://thedailywtf.com/articles/The-Query-of-Despair
You do this on your own server, slowness and bad performance are the result (but it may never, or very rarely get called). You do it on dynamo, a $10k bill may be the result.
The only way you could be "surprised" with a $10k bill is if you set up autoscaling for it with your upper limits (which it requires you to choose) high enough to reach $10k. And then you'd have to forget that you did that.
But I agree, I also don't like running databases billed by load. The risk of costly bugs is just too high.
Please think very carefully before architecting your app with S3 as a makeshift-database. S3 would be a valid option if you don't care about millisecond latency; don't require safe updates; never expect your application to scale past 100 requests per second; and don't have multiple query patterns for the same data (unless you're okay with several redundant copies of the same dataset). Consider just about any other database solution if this does not hold true.
Edit: as other commenters have noted, you can perform a read after write on a new key.