Curl HTTP/3 Performance(github.com) |
Curl HTTP/3 Performance(github.com) |
TBQH, I'm actually really pleased with these performance figures - we haven't had time yet to do this kind of profiling or make any optimisations. So what we're seeing here is the performance prior to any kind of concerted measurement or optimisation effort on our part. In that context I'm actually very pleasantly surprised at how close things are to existing, more mature implementations in some of these benchmarks. Of course there's now plenty of tuning and optimisation work to be done to close this gap.
With something like QUIC "optimisation" breaks down into two areas: performance tuning in terms of algorithms, and tuning for throughput or latency in terms of how the protocol is used.
The first part is actually not the major issue, at least in our design everything is pretty efficient and designed to avoid unnecessary copying. Most of the optimisation I'm talking about above is not about things like CPU usage but things like tuning loss detection, congestion control and how to schedule different types of data into different packets. In other words, a question of tuning to make more optimal decisions in terms of how to use the network, as opposed to reducing the execution time of some algorithm. These aren't QUIC specific issues but largely intrinsic to the process of developing a transport protocol implementation.
It is true that QUIC is intrinsically less efficient than say, TCP+TLS in terms of CPU load. There are various reasons for this, but one is that QUIC performs encryption per packet, whereas TLS performs encryption per TLS record, where one record can be larger than one packet (which is limited by the MTU). I believe there's some discussion ongoing on possible ways to improve on this.
There are also features which can be added to enhance performance, like UDP GSO, or extensions like the currently in development ACK frequency proposal.
Also, hat tip for such competitive performance on an untuned implementation.
Something more agreeably not performance based is the security is better. E.g. more of the conversation is enclosed in encryption at the protocol layer. Whether that's a good reason depends on who you ask though.
Obligatory:
One of QUIC's weaknesses is that it's known to be much less CPU efficient, largely due to the lack of things like HW offload for TLS.
> Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server.
To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests. You'll see connection pooling of, uh, 6 (at least for Chrome and Firefox), so the problems of head-of-line blocking that HTTP/2 and HTTP/3 attempt to solve would have manifested in more realistic benchmarks.
Some questions I have:
- What kind of CPU is in use? How much actual hw parallelism do you have in practice?
- Are these requests actually going over the network (even a LAN)? What's the MTU?
- How many trials went into each of these graphs? What are the error bars on these?
https://github.com/curl/curl/blob/0f4c19b66ad5c646ebc3c4268a...
TLS offloads are very niche. There’s barely anyone using them in production, and the benchmarks are very likely without
They will. You just need to go bump that number in the settings. :-)
The biggest factor is the caddy version used for the benchmark. The quic-go library in caddy v2.6.2 lacks GSO support, which is crucial to avoid high syscall overhead.
The quic-go version in caddy v2.6.2 also doesn't adjust UDP buffer sizes.
The other thing that's not clear from the blog post is the network path used. Running the benchmark over loopback only would give TCP-based protocols an advantage if the QUIC library doesn't support MTU discovery.
Even more relevantly, HTTP/3 was first supported out of the box in 2.6.0 - released Sep 20, 2022. Even if 2.6.2 had been just 3 months old that it's from the first 22 days of having HTTP/3 support out of the box instead of the versions from the following 3 months would definitely be relevant criticism to note.
https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-S...
I'm one of the Debian maintainers of curl and we are close to enabling http3 on the gnutls libcurl we ship.
We have also started discussing the plan for enabling http3 on the curl CLI in time for the next stable release.
Right now the only option is to switch the CLI to use the gnutls libcurl, but looks like it might be possible to stay with openssl, depending on when non-experimental support lands and how good openssl's implementation is.
The whole lot just seems an order of magnitude slower than I was hoping to see.
No surprise here.
As is right now it is impossible to host a HTTP/3 server visitable by a random person you've never met without a corporate CA continually re-approving your ability to. HTTP/3 is great for corporate needs but it'll be the death of the human web.
Like the other poster said, it all makes sense for the corporate web. Not so much for the human web. For humans, self-signed certs with automatic TOFU makes sense, but browsers are controlled by and made for the corporate web.
Of course there are legacy reasons for maintaining existing codebases, but what is it going to take to shift away from using C for greenfield projects?
Right now, it's mostly a special purpose language for a narrow range of platforms.
This chart shows that HTTP2 is more than half as slow as HTTP1.1, and HTTP3 is half as slow as HTTP2. Jesus christ. If these get adopted across the whole web, the whole web's performance could get up to 75% slower . That's insane. There should be giant red flags on these protocols that say "warning: slows down the internet"
The benchmark meant to compare how resource efficient the new backend for curl is by using localhost connectivity. By using localhost connectivity any real world network considerations (such as throughput discovery, loss, latency, jitter, or buffering) are sidestepped to allow a direct measurement of how fast the backend alone is. You can't then assume those numbers have a meaningful direct extrapolation to the actual performance of the web because you don't know how the additional things the newer protocols do impact performance once you add a real network. Ingoring that, you still have to consider the notes like "Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server." before making claims about HTTP2 being more than half as slow as HTTP1.1.
Faster, more secure, and more reliable, yes. The numbers in this article looks terrible, but real-world testing¹ shows that real-world HTTP/3 performance is quite good, even though implementations are relatively young.
"…we saw substantially higher throughput on HTTP/3 compared to HTTP/2. For example, we saw about 69% of HTTP/3 connections reach a throughput of 5 Mbps or more […] compared to only 56% of HTTP/2 connections. In practice, this means that the video streams will be of a higher visual quality, and/or have fewer stalls over HTTP/3."
¹https://pulse.internetsociety.org/blog/measuring-http-3-real...
Improving upload throughput to S3 bucket would be great, right?
For non-interactively retrieving a single page of HTML, or some other resource, such as a video, or retrieving 100 pages, or 100 videos, in a single TCP connection, without any ads or tracking, HTTP/3 is overkill. It's more complex and it's slower than HTTP/1.0 and HTTP/1.1.
The best would be to have something like SMTP: the core is extremely simple and yet real-life-works everywhere, and via announced options/extensions it can _optionnaly_ grow in complexity.
Google Chrome shows that 6 requests require 2-3 seconds to complete simultaneously. 3 of those requests are tiny static files served by nginx, 3 of those requests are very simple DB queries. Each request completes in few milliseconds using curl, but few seconds in Google Chrome.
Long story short: I wasn't able to track down true source of this obviously wrong behaviour. But I switched ingress-nginx to disable HTTP 2 and with HTTP 1.1 it worked as expected, instantly serving all requests.
I don't know if it's Google Chrome bug or if it's nginx bug. But I learned my lesson: HTTP 1.1 is good enough and higher versions are not stable yet. HTTP 3 is not even supported in ingress-nginx.
To be fair, HTTP/2 and HTTP/3 weren't exactly aimed at maximizing bandwidth. They were focused on mitigating the performance constraints of having to spawn dozens of connections to perform the dozens of requests required to open a single webpage.
I'm not sure why you believe otherwise. Got any references?
Consider the case of requesting a webpage with hundreds of small images, one should embed all of the images into the single webpage! Requiring each image to be fetched in a different http request is ridiculous. It pains me to look at the network tab of modern websites.
- sendfile() cannot be done with QUIC, since the QUIC stack runs in userspace. That means that data must be read into kernel memory, copied to the webserver's memory, then copied back into the kernel, then sent down to the NIC. Worse, if crypto is not offloaded, userspace also needs to encrypt the data.
- LSO/LRO are (mostly) not implemented in hardware for QUIC, meaning that the NIC is sent 1500b packets, rather than being sent a 64K packet that it segments down to 1500b.
- The crypto is designed to prevent MiTM attacks, which also makes doing NIC crypto offload a lot harder. I'm not currently aware of any mainstream (eg, not an FPGA by a startup) that can do inline TLS offload for QUIC.
There is work ongoing by a lot of folks to make this better. But at least for now, on the server side, Quic is roughly an order of magnitude less efficient than TCP.
I did some experiments last year for a talk I gave which approximated loosing the optimizations above. https://people.freebsd.org/~gallatin/talks/euro2022.pdf For a video CDN type workload with static content, we'd go from being about to serve ~400Gb/s per single-core AMD "rome" based EPYC (with plenty of CPU idle) to less than 100Gb/s per server with the CPU maxed out.
Workloads where the content is not static and has to be touched already in userspace, things won't be so comparatively bad.
Huh? Surely what you're doing in the accelerated path is just AES encryption/ decryption with a parameterised key which can't be much different from TLS?
Then there’s the userspace work for assembling and encrypting all these tiny packets individually, and looking up the right datastructures (connections, streams).
And there’s challenges load balancing the load of multiple Quic connections or streams across CPU cores. If only one core dequeues UDP datagrams for all connections on an endpoint then those will be bottlenecked by that core - whereas for TCP the kernel and drivers can already do more work with multiple receive queues and threads. And while one can run multiple sockets and threads with port reuse, it poses other challenges if a packet for a certain connection gets routed to the wrong thread due to connection migration. Theres also solutions for that - eg in the form of sophisticated eBPF programs. But they require a lot of work and are hard to apply for regular users that just want to use QUIC as a library.
We have our own apt repo which always has the latest version: https://caddyserver.com/docs/install#debian-ubuntu-raspbian
The distribution you chose, also makes you make that choice. If you're using Debian Stable, it's because your prefer stable in favor of latest. If you use Debian Testing/Unstable, you favor latest versions before stable ones.
Can't really blame Debian as they even have two different versions, for the folks who want to make the explicit decision.
FWIW I don't know if that is what was claimed.
But you're talking in a very affirmative manner and apparently trying to say that one is "correct" and the other is not, while absolutely ignoring context and usage.
I recommend you either run yourself, or find on the web, a benchmark about not a single HTTP request, but an actual web page, requesting the html, the css, the js, and the images. Don't even need to go modern web, even any old pre 2010 design with no font or anything else fancy will do.
You will see that HTTP 1 and 1.1 are way, way worse at it that HTTP 2. Which is why HTTP 2 was created, and why it should be used. Also the sad joke that was domain rolling to trick simultaneous request per host configurations.
Overall, your point of view doesn't make sense because this is not a winner takes all game. Plenty of server should and do also run HTTP 1 for their usage, notably file servers and the likes. The question to ask is "how many request in parallel do the user need, and how important that they all finish as close to each other as possible instead of one after the other".
Similarily, HTTP3 is mostly about latency.
I interpreted his comment as saying “where’s the 20% speed up” which seems like a more reasonable interpretation in context. A 20% speed up is actually quite substantial because that’s aggregate - it must mean there’s situations where it’s more as well as situations where it’s unchanged or worse (although unlikely to be worse under internet conditions).
When people talk about "stableness" and distributions, we're usually referring to the stableness of interfaces offered by the distribution together with the package.
> There's really no reason at all to use such an old version
Sometimes though, there is. And for those people, they can use the distribution they wish. If you want to get the latest versions via a package repository, use a package repository that fits with what you want.
But you cannot dictate what others need or want. That's why there is a choice in the first place.
But you're right people are free to choose. Every version is still available on the Caddy GitHub releases page for example. What's being talked about here is the default behavior not aligning with the promise of being a maintained release, instead being full of security holes and known major bugs. It's unrelated to whether Debian is a stable or rolling distro rather about the lack of patches they carry for their version.
I'm sure the volunteers working on this stuff is doing the best they can, but stuff like this isn't usually "sexy" enough to attract a ton of attention and care, compared to other "fun" FOSS work.
For example, in December they noticed this CVE and determined it didn't apply to them because it was in one of the features they removed from Caddy: https://security-tracker.debian.org/tracker/CVE-2023-50463
The protocol does not have any such limitation.
The fact that anyone running wrk or hey can coerce a web server to produce hundreds of thousands of RPS and saturate 100Gb links with plain old HTTP/1.1 with connection reuse and parallel threads (assuming of course that your load tester, server, and network are powerful enough) ought to be enough to convince anyone that the protocol is more than capable.
But whether it’s the best one for the real world of thousands of different consumer device agents, flaky networks with huge throughput and latency and error/drop rates, etc. is a different question indeed, and these newer protocols may in fact provide better overall user experiences. Protocols that work well under perfect conditions may not be the right ones for imperfect conditions.
The benefit of DSR became mitigated a bit after CGNAT (in the IPv4 space anyway) began to be rolled out, since it can masquerade a large group of clients behind a single IP address. (CGNAT poses other, different problems related to fraud and abuse mitigation.)
My guess is it has more to do with resources you probably allocated to your app (especially cpu limit) than any networking overhead which should be negligible in such a trivial setup if done correctly
Why do you even need to have proxies or load balancers in between? Another invention of the web cloud people.
Front TCP load balancer allows to use a single IP address for multiple backend servers. It allows for server maintenance without service interruptions. Or even server crash.
Ingress controller is nginx which does reverse proxying. It handles HTTPS termination and it handles routing to the application which is supposed to serve the request. It's either another nginx instance configured to serve static resources or some kind of HTTP server which implements an actual server logic.
I agree that having two nginxes for static resources seems weird, but that's not a big deal, because nginx does not consume much resources.
If request N goes to server A, then request N+1 from the same session goes to server B, most likely that request will fail because the session does not exist within that server's context.
Envoy is significantly better in this department.
Could this have been solved a different way? Perhaps but QUIC is tackling a number of TCP limitations at once (eg protocol ossification due to middleware).
If anything the HTTP2 requires more state due to the nesting nature of the contexts.
HTTP/2 is indeed more complex to maintain in the application layer but that’s less to do with the memory implications of multiplexing requests /responses over a single TCP connection.
Maybe with SO_REUSEPORT, but not in general.
A TCP connection is identified by a 5-tuple that requires a unique port for both the client and server. TCP represents ports as uint16s, which means the max number of possible ports per address is 65536.
tl;dr: 1 server IP address = no more than 64k incoming connections
And, as I said before, with DSR, there's a broad diversity of client IPs, so a single server address doesn't typically cause concerns with 4-tuple exhaustion.
DSR is niche at best, not something that can be assumed.