Curl HTTP/3 Performance

157 points by BitPirate 2 years ago | 118 comments

hlandau 2 years ago |

Author of the OpenSSL QUIC stack here. Great writeup.

TBQH, I'm actually really pleased with these performance figures - we haven't had time yet to do this kind of profiling or make any optimisations. So what we're seeing here is the performance prior to any kind of concerted measurement or optimisation effort on our part. In that context I'm actually very pleasantly surprised at how close things are to existing, more mature implementations in some of these benchmarks. Of course there's now plenty of tuning and optimisation work to be done to close this gap.

apitman 2 years ago | |

I'm curious if you've architected it in such a way that it lends itself to optimization in the future? I'd love to hear more about how these sorts of things are planned, especially in large C projects.

hlandau 2 years ago | | |

As much as possible, yes.

With something like QUIC "optimisation" breaks down into two areas: performance tuning in terms of algorithms, and tuning for throughput or latency in terms of how the protocol is used.

The first part is actually not the major issue, at least in our design everything is pretty efficient and designed to avoid unnecessary copying. Most of the optimisation I'm talking about above is not about things like CPU usage but things like tuning loss detection, congestion control and how to schedule different types of data into different packets. In other words, a question of tuning to make more optimal decisions in terms of how to use the network, as opposed to reducing the execution time of some algorithm. These aren't QUIC specific issues but largely intrinsic to the process of developing a transport protocol implementation.

It is true that QUIC is intrinsically less efficient than say, TCP+TLS in terms of CPU load. There are various reasons for this, but one is that QUIC performs encryption per packet, whereas TLS performs encryption per TLS record, where one record can be larger than one packet (which is limited by the MTU). I believe there's some discussion ongoing on possible ways to improve on this.

There are also features which can be added to enhance performance, like UDP GSO, or extensions like the currently in development ACK frequency proposal.

benreesman 2 years ago | |

Thank you kindly for your work. These protocols are critically important and more the more high-quality and open implementations exist the more likely they are to be free and inclusive.

Also, hat tip for such competitive performance on an untuned implementation.

spullara 2 years ago | |

Are there good reasons to use HTTP3/QUIC that aren't based on performance?

zamadatix 2 years ago | | |

I suppose that depends on your definitions of "good" and what counts as being "based on performance". For instance QUIC and HTTP/3 support better reliability via things like FEC and connection migration. You can resume a session on a different network (think going from Wi-Fi to cellular or similar) instead of recreating the session and FEC can make the delivery of messages more reliable. At the same time you could argue both of these ultimately just impact performance depending on how you choose to measure them.

Something more agreeably not performance based is the security is better. E.g. more of the conversation is enclosed in encryption at the protocol layer. Whether that's a good reason depends on who you ask though.

Matthias247 2 years ago | | |

We need to distinguish between performance (throughput over a congested/lossy connection) and efficiency (cpu and memory usage). Quic can achieve higher performance, but will always be less efficient. The linked benchmark actually just measures efficiency since it’s about sending data over loopback on the same host

o11c 2 years ago | | |

TCP has at least one unfixable security exploit: literally anybody on the network can reset your connection. Availability is 1/3 of security, remember.

foofie 2 years ago | |

How awesome is that? Thank you for all your harf work. It's thanks to people such as yourself that the whole world keeps on working.

Obligatory:

https://xkcd.com/2347/

vitus 2 years ago |

It is promising to see that openssl-quic serial throughput is within 10-20% of more mature implementations such as quiche. (Which quiche, though? Is this Google's quiche, written in C++, or Cloudflare's quiche, written in Rust? It turns out that's approximately the only word that starts with "quic" that isn't a derivative of "quick".)

One of QUIC's weaknesses is that it's known to be much less CPU efficient, largely due to the lack of things like HW offload for TLS.

> Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server.

To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests. You'll see connection pooling of, uh, 6 (at least for Chrome and Firefox), so the problems of head-of-line blocking that HTTP/2 and HTTP/3 attempt to solve would have manifested in more realistic benchmarks.

Some questions I have:

- What kind of CPU is in use? How much actual hw parallelism do you have in practice?

- Are these requests actually going over the network (even a LAN)? What's the MTU?

- How many trials went into each of these graphs? What are the error bars on these?

jsty 2 years ago | |

Looks like Cloudflare quiche:

https://github.com/curl/curl/blob/0f4c19b66ad5c646ebc3c4268a...

pclmulqdq 2 years ago | |

Hardware offload should be protocol-independent, but I suppose most network cards assume some stuff about TLS and aren't set up for QUIC?

Matthias247 2 years ago | | |

NICs assume stuff for TCP (segmentation offload) that they can’t do for UDP, or can only do in a very limited fashion (GSO).

TLS offloads are very niche. There’s barely anyone using them in production, and the benchmarks are very likely without

ndriscoll 2 years ago | |

> To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests.

They will. You just need to go bump that number in the settings. :-)

secondcoming 2 years ago | |

Browsers aren't the only things that connect to servers that speak HTTP.

BitPirate 2 years ago |

The performance difference between H1/H2 and H3 in this test doesn't really surprise me. The obvious part is the highly optimised TCP stack. But I fear that the benchmark setup itself might be a bit flawed.

The biggest factor is the caddy version used for the benchmark. The quic-go library in caddy v2.6.2 lacks GSO support, which is crucial to avoid high syscall overhead.

The quic-go version in caddy v2.6.2 also doesn't adjust UDP buffer sizes.

The other thing that's not clear from the blog post is the network path used. Running the benchmark over loopback only would give TCP-based protocols an advantage if the QUIC library doesn't support MTU discovery.

Etheryte 2 years ago | |

I don't think taking shots at the Caddy version being not the latest is a fair criticism to be honest. Version 2.6.2 was released roughly three months ago, so it's not like we're talking about anything severely outdated, most servers you run into in the wild will be running something older than that.

zamadatix 2 years ago | | |

I think you mixed up what year we're now :). Caddy 2.6.2 October 13, 2022 so it's been not 3 but 15 months since release.

Even more relevantly, HTTP/3 was first supported out of the box in 2.6.0 - released Sep 20, 2022. Even if 2.6.2 had been just 3 months old that it's from the first 22 days of having HTTP/3 support out of the box instead of the versions from the following 3 months would definitely be relevant criticism to note.

https://github.com/caddyserver/caddy/releases?page=2

nezirus 2 years ago |

Maybe shout out to HAProxy people, like many they've observed performance problems with OpenSSL 3.x series. But having good old OpenSSL with QUIC would be so convenient for distro packages etc

https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-S...

samueloph 2 years ago |

Nice write-up.

I'm one of the Debian maintainers of curl and we are close to enabling http3 on the gnutls libcurl we ship.

We have also started discussing the plan for enabling http3 on the curl CLI in time for the next stable release.

Right now the only option is to switch the CLI to use the gnutls libcurl, but looks like it might be possible to stay with openssl, depending on when non-experimental support lands and how good openssl's implementation is.

pabs3 2 years ago | |

Any chance of WebSocket being enabled too?

samueloph 2 years ago | | |

That's still an experimental feature on curl's side so I'm not sure. https://everything.curl.dev/helpers/ws/support

mistrial9 2 years ago | |

maybe the right time to clean up the unexpected and awkward set of libs that are currently installed, too ?

londons_explore 2 years ago |

Anyone else disappointed that the figures for localhost are in MB/s not GB/s?

The whole lot just seems an order of magnitude slower than I was hoping to see.

zamadatix 2 years ago | |

A core of the 4770 (curl is single threaded) can't even manage a full order of magnitude more plain AES encryption throughput - ignoring it also has to be done into small packets and decrypted on the same machine.

mgaunard 2 years ago |

HTTP/1 remains the one with the highest bandwidth.

No surprise here.

apitman 2 years ago |

Very nice. I would love to see some numbers including simulated packet loss. That's theoretically an area h3 would have an advantage.

1vuio0pswjnm7 2 years ago |

Would it be worthwhile to test QUIC using some other TLS library besides OpenSSL, e.g., wolfSSL. I think I read that the the cURL author is working with them, or for them. Apologies if this is incorrect.

jupp0r 2 years ago |

Great writeup, but the diagrams are downright awful. I'd separate the different facets visually to make it easier to see the difference vs those different colors.

superkuh 2 years ago |

Can cURL's HTTP/3 implementation work with self signed certs? Pretty much every other HTTP/3 lib used by major browsers do not. And since HTTP/3 does not allow for null cypher or TLS-less connections this means in order to establish an HTTP/3 connection a third party CA must be involved.

As is right now it is impossible to host a HTTP/3 server visitable by a random person you've never met without a corporate CA continually re-approving your ability to. HTTP/3 is great for corporate needs but it'll be the death of the human web.

adobrawy 2 years ago | |

Given that browsers discourage HTTP traffic (warning that the connection is insecure), given how easily free SSL certificates are available, and given that HTTPS is already the standard on small hobbyist sites, I don't expect The requirement for an SSL certificate has been a blocker in HTTP/3 adoption.

ndriscoll 2 years ago | | |

Do browsers warn for http (beyond the address bar icon)? I don't think they ever have for my personal site. I also don't think you can really say there's a "standard" for how hobbyists do things. I'm definitely in the bucket of people who use http because browsers throw up scary warnings if you use a self-signed cert, and scary warnings aren't grandma friendly when I want to send photos of the kids. The benefit of TLS isn't worth setting up publicly signed certs to me, and I don't want to invite the extra traffic by appearing on a CT log.

Like the other poster said, it all makes sense for the corporate web. Not so much for the human web. For humans, self-signed certs with automatic TOFU makes sense, but browsers are controlled by and made for the corporate web.

jrpelkonen 2 years ago |

I really don’t want to criticize anyone or their hard work, and appreciate both curl and OpenSSL as a long time user. That said, I personally find it disappointing that in 2024 major new modules are being written in C. Especially so given that a) existing Quic modules written in Rust exist, and b) there’s a precedent for including Rust code in Curl.

Of course there are legacy reasons for maintaining existing codebases, but what is it going to take to shift away from using C for greenfield projects?

apitman 2 years ago | |

Not saying you're wrong, but it's worth noting that switching to Rust is not free. Binary sizes, language complexity, and compile times are all significantly larger.

zinekeller 2 years ago | |

For something like curl (which is also used in embedded systems: a legally-verified (compliant with ISO and other standards, for better or worse) Rust compiler that targets common microarchitectures is a definite first step. Fortunately, the first half of it exists (Ferrocene, https://ferrous-systems.com/ferrocene/). The second one is harder: there are architectures even GCC does not target (these architectures rely on other compilers like the Small Device C Compiler (or a verified variant) or even a proprietary compiler), and LLVM only compiles to a subset of GCC. Even if there's a GCC Rust (currently being developed fortunately), you are still leaving a lot of architectures.

jrpelkonen 2 years ago | | |

This is a good point: there are many niche architectures where Rust is not a viable option. But in this specific case, I don’t see these system benefiting from h3/Quic. HOL blocking etc. will rarely, if ever, be a limiting factor for the use cases involved.

secondcoming 2 years ago | |

I'm personally disappointed you're aware of this issue and have done nothing about it.

teunispeters 2 years ago | |

If rust could support all of C's processors and platforms and produce equivalent sized binaries - especially for embedded ... then it'd be interesting to switch to. (as a start, it also needs a stable and secure ecosystem of tools and libraries)

Right now, it's mostly a special purpose language for a narrow range of platforms.

throwaway892238 2 years ago |

Lol, wait, HTTP2 and HTTP1.1 both trounce HTTP3? Talk about burying the lede. Wasn't performance the whole point behind HTTP3?

This chart shows that HTTP2 is more than half as slow as HTTP1.1, and HTTP3 is half as slow as HTTP2. Jesus christ. If these get adopted across the whole web, the whole web's performance could get up to 75% slower . That's insane. There should be giant red flags on these protocols that say "warning: slows down the internet"

zamadatix 2 years ago | |

If the last decade of web protocol development seems backwards to you after reading one benchmark then why immediately assume it's insane and deserves a warning label instead of asking why your understanding doesn't match your expectations?

The benchmark meant to compare how resource efficient the new backend for curl is by using localhost connectivity. By using localhost connectivity any real world network considerations (such as throughput discovery, loss, latency, jitter, or buffering) are sidestepped to allow a direct measurement of how fast the backend alone is. You can't then assume those numbers have a meaningful direct extrapolation to the actual performance of the web because you don't know how the additional things the newer protocols do impact performance once you add a real network. Ingoring that, you still have to consider the notes like "Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server." before making claims about HTTP2 being more than half as slow as HTTP1.1.

CharlesW 2 years ago | |

> Wasn't performance the whole point behind HTTP3?

Faster, more secure, and more reliable, yes. The numbers in this article looks terrible, but real-world testing¹ shows that real-world HTTP/3 performance is quite good, even though implementations are relatively young.

"…we saw substantially higher throughput on HTTP/3 compared to HTTP/2. For example, we saw about 69% of HTTP/3 connections reach a throughput of 5 Mbps or more […] compared to only 56% of HTTP/2 connections. In practice, this means that the video streams will be of a higher visual quality, and/or have fewer stalls over HTTP/3."

¹https://pulse.internetsociety.org/blog/measuring-http-3-real...

jgalt212 2 years ago |

Does Curl performance really matter? i.e. if it's too performant, doesn't that increase the odds your spider is blocked? Of course, if you're sharding horizontally across targets, then any performance increase is appreciated.

j16sdiz 2 years ago | |

libcurl is the backend for many (RESTful) API library.

Improving upload throughput to S3 bucket would be great, right?

zamadatix 2 years ago | |

What if you're not using curl as a spider? Even if you are I'd recommend some other spider design which doesn't rely on the performance of curl to set the crawling rate.