Faster MySQL with HTTP/3(planetscale.com) |
Faster MySQL with HTTP/3(planetscale.com) |
When I found it (you can thank Theo), shocked this isn't what AWS' serverless DB offering already was.
I agree with what the author mentioned in another comment, not dropping performance for non-serverless use cases is a decided win. I deeply appreciate the work being done to enable serverless applications, so thank you for the work and thank you for sharing your findings OP.
I'm also curious about the comparison to the MySQL Classic protocol - would be interesting have an "as-close-as-possible" benchmark between Aurora MySQL "Serverless V2" and Planetscale. Even if it was as naive as "Given 100$ of credits, how many reads can you do at what average latency".
https://dev.mysql.com/doc/dev/mysql-server/latest/page_mysql...
Similarly, since support is so low, it didn't make a lot of sense to double down and support it when we could do what works for us.
The developer experience with PlanetScale has been my favorite so far, I use it with a few Next.js apps and the "scaling" part has been the easiest as I haven't had to think about a burst of traffic b/c PlanetScale handles it without me lifting a finger.
There's a ton of work to optimize TCP including hardware offloads that help push higher throughput. Basically we're talking library + kernel + hardware changes. It might be possible to get some of these into QUIC, but since QUIC is most compelling for WAN traffic, there's probably not much incentive for that.
In our case though, lots of our customers and lots of use cases do communicate over a WAN, and potentially large geographic distances. I think having this as an option is super interesting to see what we can do with it in the future.
I feel like I live under a rock because I just don't get what's so great.
Planetscale is doing for Vitess what Fastly does for Varnish, if that makes sense. Or maybe, what Datadog does to statd? It's a hosted platform around an awesome and complex-to-maintain bit of open source software.
1) use persistent connections, let the OS handle them and tweak it to allow (both connecting server and mysql server). And never close the connection on the application side. (This could lead to potential deadlocks, but there are ways around it, like closing bad connections to clear thread info on mysql).
2) run the whole thing in a transaction, simply begin transaction or autocommit if allowed (same thing)
Doing so, when you are done rendering the content, flush it and send the correct signal to say nginx or apache to say it's done (like PHP's fastcgi_finish_request when working with FPM), and then run your commit. Obviously used when you can safely disregard failed inserts.
This is definitely ideal, but one thing that you can't entirely control is the server side or what's between. Sometimes your connections get interrupted, and it's not possible to maintain a connection forever. Yes tho, this is the ideal thing you should do with a connection pooler.
> 2) run the whole thing in a transaction, simply begin transaction or autocommit if allowed (same thing)
This shouldn't really help with latency. Being in a transaction doesn't reduce latency. If we're being pedantic, it would likely increase latency due to having to execute a BEGIN and COMMIT query, which is typically two more round trips, one per query.
I think what you're getting at is something like pipelining, where you can send multiple queries in one request, and get multiple results back. This is technically supported over the mysql protocol, but isn't very widely adopted in practice.
Why?
If you're not running stuff on other peoples computers you're very much in control.
What do I miss?
> If you're not running stuff on other people's computers you're very much in control.
The other tests are measuring already a warmed up connection.
There's also reason why I intentionally coupled "connect + select 1" as the test, because I wanted to make it as close of a comparison as possible. If it was simply a "connect", our HTTP API would be even more favorable since connecting doesn't do authentication or anything like that like the mysql protocol does.
Which leaves serverless and scripts (your other example from the blog post). Which, let’s be honest, are both edge cases at this point in time. Maybe that’ll change, but today it’s true.
Twenty year SRE here backing up the person you’re dismissing: you’re optimizing an edge case. Literally step one of operationalizing every system in existence is burying your DB behind a pooler. 100ms off a connect call in a script is not useful. The serverless improvement has some potential, but one would be forgiven for asking why you’d use an environment which doesn’t let you speak network protocols you’d like to speak.
So of course HTTP/2 will outperform, that's what it's designed to do.
Now try again, but use one connection per thread, and connect it before you start benchmarking, i.e. use it the way it's meant to be used.
But either way, yes, that's fundamentally a benefit of being able to use HTTP. We can multiplex multiple sessions over one underlying connection.
The whole premise of HTTP/[23] is to do the same thing as you do with N TCP sessions, but paying for the session establishment latency only once instead of N times.
And most applications couldn't care less about that latency, because you only do it once.
Connection pooling doesn't solve all the things we can improve by using HTTP as a base. We can be faster in just data transfer through compression, for example.
Using HTTP/3 starts to help tail latency that we can't solve with TCP. Unreliable networks with packet loss suffer greatly with TCP and not as badly with QUIC.
However, no support for FOREIGN KEYs is a bit of a bummer. However, they explain it very well.[0]
Thanks for the reply! Will definitely give PlanetScale a try over the weekend!
[0] https://planetscale.com/docs/learn/operating-without-foreign...
But in any case, even if your client did support this and the server supported it, we still need HTTP for other things. I don't think it's particularly a "gotcha". HTTP is also stateless, which has lots of benefits for us.
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-con...
And it's in the C API too...
I know there is the "Persistent Database Connections" section of the PHP manual and the mysqli extension within PHP supports connection pooling / persistent connections, but in my own experiences I've rarely seen them utilized, especially by the bigger open source projects out there such as WordPress, which has an 8 year old enhancement topic on the subject: https://core.trac.wordpress.org/ticket/31018. Putting your database behind a pooler, like ProxySQL let's say, is another option as the level of sophistication for a company/application increases, but most typical PHP setups I've used don't have that immediately available.
I've generally been under the impression that most projects/applications don't use the built-in pooling features for some of the reasons discussed in the link above, leading to those applications being more impacted by lengthier connection times due to a new connection being created at the beginning each request and then closed at the end of the request.
Now I'm inclined to experiment a bit with the built-in mysqli pooling feature though since it would seem a worthwhile feature for developers to experiment with more if it would lessen the connection time impact for each PHP request, particularly for databases that are further away and require secure connections.
Shaving off 100ms for a connection would be significant for most PHP users if they are currently having to open fresh connections on each request, especially if they were previously used to connection times of < 1ms when connecting to a local MySQL database.
It may be advised, but I can assure you that it's not very common! I would guess that the vast, vast majority of PHP applications are _not_ pooling their connections. Especially when it comes to PHP hosted on Lambda, which is surprisingly a non-trivial amount of applications at this point.
So while it may be an edge case for you, it's not for others. It also doesn't discredit any of the other testing that doesn't focus on cold starts.
Edit since you edited yours after I posted:
I'm not going to argue the merits of what platforms people choose and it's not really our position as PlanetScale to do that. We serve our customers.
On what grounds? Why are you able to unilaterally decide what the situation is?
To me, the fact that it's not slower at all is the big win. I didn't anticipate that the results of this are going to say "this is 5x better". The stereotype is that if it's over HTTP, it must be slower.
And by every measure, it's not slower. In cases, that may be edge to you, or don't care about extra latency, they're still improved. Why would you not want something that's generically better?
There are many other things that are beneficial with using HTTP as a transport that haven't even been discussed here since this was entirely focused on performance. Without at least matching in performance, not many of the other things would matter.
This article IMO contributes to spreading the misinformation that HTTP/[23] is useful for many applications, when it is actually a very niche protocol only useful to web browsers, or other similar applications that continuously need to connect to endpoints they don't know in advance.
Web tech has already done sufficient damage by pushing HTTP/1.1 and SSL everywhere in IT, we don't need to force those protocols onto everything.
I personally cant think of a protocol I’d rather have go everywhere than QUIC/HTTP3.
Forced TLS is disappointing (there’s some discussion), but the upside is that this means SNI works and you can now very easily route UDP traffic via SNI.
Given the flexibility of HTTP/3 I wouldnt mind having it everywhere.
But you do you.
To me, HTTP sits in a similar boat as JSON. Is it perfect? Is it good? Not necessary. But it's extremely extremely scrutinized and optimized due to its ubiquity in ways that other protocols and formats haven't been.
This was the entire point of this experiment and it's proved successful. The bias I wanted to challenge is exactly what you mentioned. Turns out, using HTTP as a transport and protobuf for encoding (which is basically gRPC) is comparable.
What you can use instead is simply TCP.
TCP is simple, mature, has all kinds of support in software and hardware.
The only advantage of HTTP/3 doesn't matter to most use cases and doesn't warrant its complexity and throwing away all the networking ecosystem built on TCP.
I'd go so far as to say that it's possible TCP should always have been another layer up the stack rather than an alternative to UDP.
But coming back to concrete use cases, here's one I've been kicking around in my head lately -- it would be great to build a video calling application that relied exclusively on SFUs (so just redistributing frames, no WebRTC stuff) but only needed one port for all the traffic. It seems that HTTP/3 brings benefits to this use case, two things in particular:
- Running all off of ~one port (443/tcp or 443/udp)
- The ability to easily take advantage of a bidirectional stream with UDP semantics (I guess HTTP/2 could have done this), something we rely on WebRTC to do now.
I say this because I've felt the pain of trying to set up complicated video conferencing software and while WebRTC is awesome, it is a bit heavy weight, if you're alright with an intermediary forwarding the frames.
[EDIT]: as far as perf goes, Cloudflare has a good blog post on this: https://blog.cloudflare.com/cubic-and-hystart-support-in-qui...
https://github.com/pion/webrtc/tree/master/examples/ice-sing... is one example of that.