The HTTP crash course nobody asked for(fasterthanli.me) |
The HTTP crash course nobody asked for(fasterthanli.me) |
As someone who had to write a couple of proxy servers, I can't express how so sadly accurate it is.
This is far more of a security problem than all of the bad HTTP 1.1 implementations put together. It is built in corporate control that cannot be bypassed except by not using HTTP/3. It is extremely important that we not let the mega-corp browsers drop HTTP 1.1 and continue to write our own projects for it.
Doesn't that imply that HTTP/1 is deceptively complex?
Both of these are only concerned with reducing the latency of doing lots of requests to the same server in parallel.
Which is only needed by web browsers and nothing else.
The initial problem is usually easy to solve for, it’s all the edge cases and other details that makes something complex.
Chunked transfer/content encoding problems still give me nightmares...
LOL, yes same here. Can’t wait for Bluetooths b̶a̶l̶l̶s̶ baggage to be chopped.
That is cause HTTP pipelining was and is a mistake and is responsible for a ton of http request smuggling vulnerabilities because the http 1.1 protocol has no framing.
No browser supports it anymore, thankfully.
Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.
The problem is that if Request number 1 leads to an error whereby the connection is closed, those latter two requests are discarded entirely. The client would have to retry request number two and three. If the server has already done work in parallel though, it can't send those last two responses because there is no way to specify that the response is for the second or third request.
The only way a server has to signal that it is in a bad state is to return 400 Bad Request and to close the connection because it can't keep parsing the original requests.
There is no support for HTTP pipelining in current browsers.
What you are thinking about is probably HTTP keep alive, where the same TCP/IP channel is used to send a follow-up request once a response to the original request has been received and processed. That is NOT HTTP pipelining.
> Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.
Yep.
There is some "support" a server could do, in the form of processing multiple requests in parallel¹, e.g., if it gets two GET requests back to back, it could queue up the second GET's data in memory, or so. The responses still have to be streamed out in the order they came in, of course. Given how complex I imagine such an implementation would be, I'd expect that to be implemented almost never, though; if you're just doing a simple "read request from socket, process request, write response" loop, then like you say, pipelined requests aren't a problem: they're just buffered on the socket or in the read portion's buffers.
¹this seems fraught with peril. I doubt you'd want to parallelize anything that wasn't GET/HEAD for risk of side-effects happening in unexpected orders.
> Host: neverssl.com
> This is actually a requirement for HTTP/1.1, and was one of its big selling points compared to, uh...
> AhAH! Drew yourself into a corner didn't you.
> ...Gopher? I guess?
I feel like the author must know this.. HTTP/1.0 supported but didn't require the Host header and thus HTTP/1.1 allowed consistent name-based virtual hosting on web servers.
I did appreciate the simple natures of the early protocols, although it is hard to argue against the many improvements in newer protocols. It was so easy to use nc to test SMTP and HTTP in particular.
I did enjoy the article's notes on the protocols however the huge sections of code snippets lost my attention midway.
The author does know this, it's a reference to a couple paragraphs above:
> [...] and the HTTP protocol version, which is a fixed string which is always set to HTTP/1.1 and nothing else.
> (cool bear) But what ab-
> IT'S SET TO HTTP/1.1 AND NOTHING ELSE.
Since playing with QUIC, I've lost all interest in learning HTTP/2, it feels like something already outdated that we're collectively going to skip over soon.
As far as learning goes, I do think HTTP/2 is interesting as a step towards understanding HTTP/3 better, because a lot of the concepts are refined: HPACK evolves into QPACK, flow control still exists but is neatly separated into QUIC, I've only taken a cursory look at H3 so far but it seems like a logical progression that I'm excited to dig into deeper, after I've gotten a lot more sleep.
Plus, you know, just an awesome dev who knows his stuff. Huge fan.
https://fasterthanli.me/series/reading-files-the-hard-way/pa...
For TLS, I recommend The Illustrated TLS 1.3 Connection (Every byte explained and reproduced): https://tls13.xargs.org/
Does it need to be pointed out that this is complete bullshit?
Theoretically yes, but in practice?
I've done my share of nc testing even simpler protocols than HTTP/1.1
For some reason the migration to HTTPS scared me despite the security assurances. I could not see anything useful in wireshark anymore. I now had to trust one more layer of abstraction.
> Theoretically yes, but in practice?
Yes, that's the whole point of encapsulation. The protocol is blissfully unaware of encryption and doesn't even have to be. It has no STARTTLS mechanism either.
Your HTTPS traffic consists of a TCP handshake to establishes a TCP connection, a TLS handshake across that TCP connection to exchange keys and establish a TLS session, and the exact, same HTTP request/response traffic, inside the encrypted/authenticated TLS session.
The wonderful magic of solving a problem by layering/encapsulating.
> I could not see anything useful in wireshark anymore
Wireshark supports importing private keys for that, see: https://wiki.wireshark.org/TLS
With TLS+SNI, this is redundant to the name from SNI. But we had TLS long before we had SNI, and we had HTTP long before we had TLS, and both of those scenarios need the `Host` header.
GET / HTTP/1.0\r\n\r\n
Still works with many websites.I installed a web server on my phone and send files this way much faster (and Android -> Apple works):
https://f-droid.org/en/packages/net.basov.lws.fdroid/
I wish there were a standard for streaming (headphones could connect to your network via WPS, and stream some canonical URL with no configuration needed).
WiFi uses near 10x the power Bluetooth does when active (and that’s before factoring in BLE which cuts that down in half). WiFi also has access to the much less crowded 5GHz band.
IIRC WiFi is also a much simpler protocol, it’s just a data channel (its aim being to replace LAN cables).
Plus in order to support cheap and specialised devices Bluetooth supports all sorts of profiles and applications. This makes the devices simpler, and means all the configuration can be automated to pairing, but it makes the generic hosts a lot more complicated.
But mostly the problem is that too much of this complexity fell on hardware vendors and they suck at writing software. There are umpteen bajillion different bluetooth stacks out there and they're all buggy in new and exciting ways. Interoperability testing is hugely neglected by most vendors. The times where Bluetooth works well are typically where the same vendor controls both ends of the link, like Airpods on an iPhone.
In 2020 I tried buying some reputable brand Bluetooth headphones for my kids so they could do home-schooling without disturbing each other. It was a total failure. Every time their computer went to sleep the bluetooth stack would become out of sync and attempts to reconnect would result in just "error connecting" messages, requiring you to fully delete the bluetooth device on the Windows side and redo the entire discovery/association/connection from scratch. The bluetooth stack on Windows would crash halfway through the association process about half of the time forcing you to reboot the computer to start over. Absolutely unusable. I tried the same headphones on a Linux host and they worked slightly better, but were still prone to getting out of sync and requiring a full "forget this device" and add it again cycle every few days for no apparent reason.
CRLF was used verily heavily and thus got baked into a lot of different places. Namely, it conveniently sidesteps the ambiguity of "some systems use CR, others use LF" by just putting both in, and since they are whitespace, there's not much downside other than the extra byte.
Beyond that, there are many other clear and obvious connections between Hypertext Transfer Protocol and teletype machines. Many early web browsers were expected to be teletype machines [0]. So while it might be a bit of a stretch, I'd say this is far from "complete bullshit".
[0] - http://info.cern.ch/hypertext/WWW/Proposal.html#:~:text=it%2...
I agree the two are similar, but the space shuttle story is also bullshit. See e.g. Snopes: https://www.snopes.com/fact-check/railroad-gauge-chariots/
People are suckers for plausible-sounding and amusing stories, that one's classic bait for people's lack of critical thinking skills.
> CRLF was used verily heavily and thus got baked into a lot of different places.
Well, exactly. Which is precisely why it's bullshit to claim that HTTP was "based on teletypes". It was based on technical standards at the time, that originally derived from teletypes, but there was no consideration of teletypes in the development of HTTP that I'm aware of:
> Many early web browsers were expected to be teletype machines [0].
Could you quote a relevant part of your reference? Because I don't see it. Perhaps you're confusing "dumb terminal" with "teletype"? Or confusing the Unix concept of tty, a teletype abstraction, with the electromechanical device known as a teletype - the "remote typewriters" mentioned in the original comment?
By the time that WWW spec was written in 1990, teletypes were decades out of date and not commonly used at all. PCs had existed for over a decade, and video display terminals for mainframes and minicomputers had been around for nearly three decades. No-one was using actual teletypes any more.
> So while it might be a bit of a stretch, I'd say this is far from "complete bullshit".
This conclusion would work if any of your claims had survived scrutiny.
I seriously think you underestimate the complexity in Wi-Fi networks. The 802.11 2020 standard is 4379 pages long. And i'm not even counting the amendments ( https://www.ieee802.org/11/Reports/802.11_Timelines.htm ) that are in development.
- netcat-traditional http://www.stearns.org/nc/
- netcat-openbsd : https://github.com/openbsd/src/blob/master/usr.bin/nc/netcat... (also packaged in Debian)
- ncat https://nmap.org/ncat/
- netcat GNU: https://netcat.sourceforge.net/ (quite rare)
To prevent any confusion, I like to recommend socat: http://www.dest-unreach.org/socat/
And a decent HTTP-proxy must handle all of that stuff or at least fail gracefully without affecting other clients.
I'm not sure what do you mean, but Wi-Fi covers the PHY layer and the MAC layers. It's not « only » a data channel. Modern Wi-Fi uses OFDMA, which is arguably more complex than what bluetooth uses (without even talking about the MAC).
This is what led to the many request smuggling, its because the front-end proxy treats the request different from the backend proxy and parses the same HTTP text stream differently.
Since there is no framing there is no one valid way to say "this is where a request starts, and this is where a request ends and it is safe to continue parsing past the end of this request for the next request".
Servers are also allowed to close the connection at will. So let's say I pipeline Request 1, 2, and 3.
The server can respond to Request 1 with Connection: close, and now request 2 and 3 are lost.
That's the reason HTTP pipelining is not supported by browsers/most clients.
Curl removed it and there's a blog post about it: https://daniel.haxx.se/blog/2019/04/06/curl-says-bye-bye-to-...
https://portswigger.net/research/http-desync-attacks-request...
There's a reason why web servers will slam the door shut even when the client requests HTTP Keep Alive because they are unable to properly parse a request in a way that makes it safe to parse a follow-up request on the same TCP/IP connection.
That doesn't change the fact the protocol itself is quite well-defined.
There is no serious HTTP server that wouldn't support keep-alive, this is just FUD.
Keep alive and http pipelining are supported by major servers, some with bugs or issues, but no clients pipeline requests (at least not the major browsers, curl and other popular tooling).
It’s not FUD, pipelining and reuse of an existing connection is broken in the face of trying to parse text protocols that don’t have well defined semantics and where implementations reading the same documentation provide different results because it’s not black and white, it’s fuzzy around the edges.
Pipelining is just normal usage of TCP, which is a mechanism to establish two queues of bytes between two endpoints on a network.
There is no difference between sending data before or after having received data from the other party. The two directions are logically independent, even if at the transport level data from one direction contains acks of the other direction.
Now, some servers will start processing requests on a given connection in parallel, and will not correctly synchronize the multiple threads trying to write back their respective response to the client. This is just a bug on the server doing multithreading incorrectly, and has nothing to do with any framing problems in the protocol.
I suppose HTTP/2 supports that use case better, since it can multiplex the concurrent responses, but the correct thing to do is to simply treat each request synchronously one after the other, and not parallelize the processing of multiple requests on a given TCP connection.
Look, cleartext internet protocols are on the way out, because their model is fundamentally broken. For security reasons, I will note, and privacy. There, we joust security against security. Cleartext HTTP/1 is strictly a legacy matter, retained only because there’s still too much content stuck on it. But browsers will be more aggressively phasing it out sooner or later, first with the likes of scary address bar “insecure” badges, and probably within a decade by disabling http: by default in a way similar to Firefox’s HTTPS-Only Mode (puts up a network error page with the ability to temporarily enable HTTP for the site), though I doubt it’ll be removed for decades. And HTTP/1 at least over TLS will remain the baseline for decades to come—HTTP/2 could conceivably be dropped at some point, but HTTP/3 is very unlikely to ever become the baseline because it requires more setup effort.
You can still use cleartext HTTP/1 at least for now if you want, but this functionality was rightly more or less removed in HTTP/2, and fully removed in HTTP/3. Pervasive monitoring is an attack (https://www.rfc-editor.org/rfc/rfc7258.html), and HTTP/2 and HTTP/3 are appropriately designed to mitigate it.
Look, be real: the entire web is now built heavily on the CA model. If free issuance of certificates falters, the internet as we know it is in serious trouble. Deal with it. Social factors. This might conceivably happen, and if it does, HTTP/1 will not save you. In fact, cleartext HTTP/1 will be just about the first thing to die (be blocked) in the most likely relevant sequence of events.
Suppose I have an app within an intranet that's secured with, say, Wireguard or an application-layer tunnel (eg, SSH or Openziti).
Bringing HTTP/3 into the picture means dealing with CAs and certs on top of the provisioning I've already done for my lower layers, possibly leaking information via Certificate Transparency logs. Then the cost of double-encryption, etc.
I have a suspicion that this will prove to be a better abstraction than application-level encryption for everything. If I'm right, I would expect things to naturally start migrating in that direction over time. We'll see!
This (HTTP/1 won't save us) doesn't seem entirely accurate to me.
I can run free, untrusted HTTPS easily using self-issued certificates. It's relatively simple to think of mechanisms where trust can be layered on top of that outside the traditional CA mechanisms (think Keybase derivatives like DID-systems). It's a small patch to allow that alternative trust framework to be used for HTTPS.
I don't know HTTP/3 at all, but if it is more tightly tied to CA infrastructure that is a problem.
At present you can. But think about what conditions might lead to free issuance faltering: it will almost certainly boil down to pressure from governments. And do you think that such governments will lightly allow you to bypass their measures? No; once the dust settles, no technical measures will be effective: the end result will be mandatory interception of all traffic, with TLS proxying and similar, and any other traffic blocked. Countries have even done this at times, requiring anyone who wants to access the internet to install their root certificate.
The internet is designed to be comparatively robust against sociopolitical attack, but if a sufficiently powerful government decides to concertedly attack the internet as we know it, the internet will not win the conflict.
> I don't know HTTP/3 at all, but if it is more tightly tied to CA infrastructure that is a problem.
As clarified elsewhere in this thread, HTTP/3 changes absolutely nothing about certificate verification; superkuh appears to have misunderstood the meaning of the text in the spec.
No. You've just got your commercial blinders on. The entire *commercial web* is built on the CA model. But the commercial web is hardly all there is. There is a giant web of actual websites run by human persons out there that do not depend on CA TLS and who's use cases do not require dropping clear text connections. That's only a need for for-profit businesses and institutions.
I agree that the mega-corp browsers will drop support for any protocol that does not generate them profit. The consequences of this action will be dire for everyone. But you can't convince people of this. You just have to let it happen and let people learn from the pain. Just like with the social networks.
This sounds like a red herring to me.
edit: Yeah I've more or less confirmed that self signed certs are perfectly fine in HTTP3. This is a big ball of nothing.
And that's without worrying about HTTP/3.
Or baring that I wish that browsers would ease up a bit and make tofu style self signed certs acceptable.
I really don't like how there is an expire time built into tls sites. Have you ever found someones old site, usually hosted by a university, that just lives year after year like a time capsule. well not gonna happen with tls.
And on the subject of CA's I don't think I trust them any more than a tofu model Have you looked and verified every authority in your CA file? Do you really trust the turkish government to be able to sign for any web site.
Aha! you say, this is why we have cert pinning.
To which my reply is. cert pinning is the tofu model where you have removed all user agency. it is better than the CA model but really sucks from a end user perspective. when thing go wrong, there is no easy way to fix it.
8 keys? f + i + r + e + f + o + x + Enter?
https://developer.chrome.com/blog/chrome-58-deprecations/#re...
I would be extremely relieved if I am wrong and someone could explain how I am wrong. Like... maybe there's some mechanism to self-sign without CA and use a null cypher? So even if most users would be scared away geeks could click through (like today's status quo with self-signed ssl certs).
This is a gross misrepresentation of the situation. Yes, Google played a significant role in the development of HTTP/2, QUIC and HTTP/3, providing the starting point for the development work in each case, but there was no open-washing: there was a collaborative process with the involvement of many interested parties, and the end result was significantly different from what was first proposed, and significantly better. This is how IETF works. Google did not control matters in any way, nor Microsoft.
> The "https" scheme associates authority with possession of a certificate that the client considers to be trustworthy for the host identified by the authority component of the URI. Upon receiving a server certificate in the TLS handshake, the client MUST verify that the certificate is an acceptable match for the URI's origin server using the process described in Section 4.3.4 of [HTTP]. If the certificate cannot be verified with respect to the URI's origin server, the client MUST NOT consider the server authoritative for that origin.
This boils down to “this is HTTPS, so the same rules as ever apply for matching the certificate and origin”. I suspect you’ve misunderstood what authoritativity conveys. The last sentence is saying “… and if verification fails, don’t trust the connection”—and it’s up to each app to decide what to do about that; browsers put up a scary warning error page that you can normally click through (depending on server configuration). Note that it doesn’t even hardcode the CA model; I like the way RFC 9110 §4.3.3 ¶1 puts it: “The client usually relies upon a chain of trust, conveyed from some prearranged or configured trust anchor, to deem a certificate trustworthy.”
You can read more about the rules of HTTPS in https://www.rfc-editor.org/rfc/rfc9110#section-4.3.3 (sections 4.3.3 and 4.3.4). Certificate verification is the same as ever, and the only difference between HTTP/1 and HTTP/2 and HTTP/3 is that HTTP/1 has connection-per-origin, where 2 and 3 can use a connection for multiple origins (§4.3.3 ¶2–3 spells it out).
the zealous "you must obey the law" tone of SOME comments here reinforces the worst stereotypes of corporate apparats.. individuals doing the bidding of institutions based on the letter of their "laws"
Human history has shown again and again that this ends badly .. HTTP is OK with ME
I'm on my phone so I can't confirm this is http3, but how about https://self-signed.badssl.com/
$ curl -v https://self-signed.badssl.com/
* Trying 104.154.89.105:443...
* Connected to self-signed.badssl.com (104.154.89.105) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS header, Unknown (21):
* TLSv1.2 (OUT), TLS alert, unknown CA (560):
* SSL certificate problem: self-signed certificate
* Closing connection 0
curl: (60) SSL certificate problem: self-signed certificate
More details here:
https://curl.se/docs/sslcerts.html
"curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above." $ curl --version
curl 7.81.0 (x86_64-pc-linux-gnu) libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 libidn2/2.3.2 libpsl/0.21.0 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.43.0 librtmp/2.3 OpenLDAP/2.5.13
Release-Date: 2022-01-05
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets zstdHow do you propose to secure user sessions and prevent MITM or tracking otherwise?
curl -kv https://self-signed.badssl.com/
* Trying 104.154.89.105:443...
* TCP_NODELAY set
* Connected to self-signed.badssl.com (104.154.89.105) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /opt/local/share/curl/curl-ca-bundle.crt
CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=California; L=San Francisco; O=BadSSL; CN=*.badssl.com
* start date: Aug 12 15:59:10 2022 GMT
* expire date: Aug 11 15:59:10 2024 GMT
* issuer: C=US; ST=California; L=San Francisco; O=BadSSL; CN=*.badssl.com
* SSL certificate verify result: self signed certificate (18), continuing anyway.
> GET / HTTP/1.1
> Host: self-signed.badssl.com
> User-Agent: curl/7.65.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.10.3 (Ubuntu)
< Date: Fri, 21 Oct 2022 18:41:58 GMT
< Content-Type: text/html
< Content-Length: 502
< Last-Modified: Fri, 12 Aug 2022 15:59:21 GMT
< Connection: keep-alive
< ETag: "62f678d9-1f6"
< Cache-Control: no-store
< Accept-Ranges: bytes
<If you decide you trust that certificate (which can be a legitimate thing to do - the cert signature could be communicated to you via out-of-band trusted mechanisms) then https://curl.se/docs/sslcerts.html explains how to trust it.