PlanetScale Insights: Advanced query monitoring(planetscale.com) |
PlanetScale Insights: Advanced query monitoring(planetscale.com) |
So far my usage has been minuscule in comparison to the PS limits. That said, I had been hoping for better tools to identify "problem queries" early instead of just when the billing cycle comes up and so I'm happy to see work in that direction.
Just a random anecdote about PlanetScale:
A few weeks ago I realized their databases did not have the timezone database loaded into them and it wasn't something I could do myself. I needed this so I could do `CONVERT_TZ` to convert from UTC to the user's TZ (for report aggregation). I reached out to support and in about a week they had added it to their roadmap, shipped it, and turned on the new feature for my DBs. They have been a joy to work with so far and I encourage you give them a shot, especially if you are on Aurora Serverless (V1 or V2).
I see the documentation and graphs, but can't find if PlanetScale provides a query interface into this data.
Single user systems do okay with reports like the ones I see in the link, where you can actually go in and drill-down into specific details in there.
It would much more awesome for the DBA style user if the reporting data was actually just loaded into another database/table with join schemas to query the data exactly as the report did. That of course, assumes the DBA is consultant type who is dropped in to fix cost overruns rather than the app developer going over their queries again.
In my last job, I built something of this sort (Hive has a protobuf SerDe + a special table named sys.query_data), so that I could connect up a CDSW Jupyter notebook and narrow down queries with a python program + a loop.
Of course, the queries themselves were also customer-paid queries, but it was much more flexible + a bunch of canned reports did most of the work when moving it across customers.
But before it was baked-in into Hive/CDW, it was actually a syslog parser which fed into a sqlite db which is almost exactly the same (but mostly intended at solving txn locking/conflict checking across hundreds of queries touching the same informatica audit log table).
this is fourth day in a row of planetscale ads^H^H^H blog posts being on hn front page. as i mentioned on yesterdays thread, innodb_rows_read is known to be buggy. regardless, by design it includes cached rows. terrible thing to base billing on. real cloud providers base it on i/o instead since this is more reasonable metric of "use"
planetscale's fork of mysql-server adds only a single commit, which exposes rows_read in an extra place. this from company that keeps talking about "building a database" https://github.com/planetscale/mysql-server
The trickier part is orchestrating the ongoing management of that across a large dynamic fleet. And in this case, it was much more than simply loading the tables but about using them to support importing databases into PlanetScale: https://github.com/vitessio/vitess/pull/10102
I'll link to my other comment on the billing issue: https://news.ycombinator.com/item?id=31509240
We've had to do some other changes to our MySQL fork as well that will show up there, but we'd love to not have any patches! We'd love to keep the patch set minimal (just as Amazon certainly does with RDS and Aurora). And I would certainly argue that Vitess, which is what we build PlanetScale around, is a meaningful piece of technology that pairs with MySQL to make a great database: https://vitess.io. You're of course free to disagree — and I wish you all the best as you work to build something great in the future.
https://aws.amazon.com/dynamodb/pricing/
Their team is awesome, I requested a couple features in the CLI and they were there within a few hours. Support is responsive and the sales team was super helpful getting everything running and migrated.
your first two examples are nosql. third example charges by data size processed, not by rows!
rows is weird metric since some tables have tiny rows, some have huge
Those nosql options (probably the most popular in the world) also have the issue that row sizes are different, and if you're super cost conscious, you can change your architecture to take advantage of it. For example with Planetscale, you could store a lot more in JSON columns instead of other tables to reduce costs if that was your primary objective.
Is your frustration that you'd like to use Planetscale or a managed Vitess, but you are worried about locking yourself into a pricing model that you don't think will work for you?
This is a pretty major limitation. Part of the reason to use Vitess is to scale out. It is often very valuable to have a small number of root elements in a star schema and to have foreign keys which ladder up to them.
https://vitess.io/docs/13.0/user-guides/vschema-guide/advanc...
When you're choosing your sharding keys, you want to design it where the bulk of your operations happen on a single shard, often a tenant/customer id. That guarantees that all customer data lands on a single shard, with FKs across every table you want.
We're running on ~40 shards across 6 keyspaces, and there are very few cases where we can't use FKs.
That being said, depending on the data you're working with, you may be fine with that trade-off.
This seems hard to do in Vitess or PlanetScale given the technically READ UNCOMMITTED isolation [1] when cross shard, and, in scaled deployments, will still require experimental 2PC [2] cross shard transaction. So, like, yeah, if you had serializable isolation, then transactions might save you so long as your code isn't buggy, but, literally the reason the system doesn't implement them is because it doesn't have isolated transactions.
[1]: https://news.ycombinator.com/item?id=22170416#22177783 [2]: https://vitess.io/docs/13.0/reference/features/two-phase-com...
and for the record, despite planetscale staffers repeatedly denigrating rds (your competitor) on hn, aurora’s patch set is not “minimal”
i do think vitess is cool for what its worth. i just think your managed db product has bananas billing and also is horrendously over hyped, and your ceo’s responses to criticism are very reminiscant of theranos or wework’s responses to same
Pointing out that surely Amazon would like to keep their patch set to a minimum (there's a high cost in maintaining custom patches as you upgrade MySQL) is in no way implying that their patch set is small. Minimal means the minimum required for what you need, rather than being some point of pride.
I'm certainly not on here bashing any other offerings. Between the two of us, I only see one person trolling / bashing. :-) With that, I will leave you to your opinions which you are of course free to have. Best of luck.
anyway i gather the answer to my question is that no, there are no other examples of managed sql dbs that bill the way you do. my complaint is this is inherently not transparent because it violates user expectations. users try comparing to io based provders and fail to understand the pricing math comparison (on io pricing 1 read = many rows) or caching implications (on io pricing, cached rows dont count as io)
as for denigrating rds, look to your ceos past hn comments. i would link to it, but last time i did that i got flagged, despite it being a recent thread that i was directly participating in
According to AWS you're paying for chunks of CPU and memory on a per second basis: https://aws.amazon.com/rds/aurora/faqs/
It's hard to imagine that the CPU capacity is measured in anything other than CPU cycles (time slices of physical capacity) — in the same way it's hard to imagine that the memory capacity is measured in anything but bytes. But whatever, I don't care. It's cool, good for them. The point was... you don't think you're paying for reads of records that are cached? I give up, I fail to see how this can really be a good faith discussion.
I don't know how all other serverless database offerings do pricing. What difference does it make? They're all different. As a user, you want it to be based on your usage and to be fairly and reasonably priced while also being easily audited and predictable. Those are the key properties I would care about.
I honestly cannot see how you could be missing the point by this much and still be operating in good faith so I'll for real, for real stop. :-)
i originally said pricing for other managed sql dbs, not specifically “serverless” ones. we both know that is just a marketing term anyway
with ACUs the point is you configure min and max, and your cluster scales up/down based on a cpu utilization threshold. so, sure reading from memory uses cpu cycles — but a large cached read is incredibly unlikely to bump you over a scaling threshold which affects your bill, unless you’re doing some huge heavy sort operations
another key point is aurora serverless v2 does not scale down to 0 acu. you are always paying a predictable small amount for your base cpu and ram. minor increases in cpu usage literally do not impact your bill at all, which is why i do not believe your argument makes sense regarding cached reads.
edit to add: the reason this matters for monetary cost of ELT/ETL is it often involves very large reads. if your jobs only extract recent/changed data, this will very likely be in buffer pool, and cost way less with io pricing than with your row based pricing. clear?