The HTTP crash course nobody asked for

The HTTP crash course nobody asked for(fasterthanli.me)

902 points by g0xA52A2A 3 years ago | 141 comments

Joker_vD 3 years ago |

> HTTP/1.1 is a delightfully simple protocol, if you ignore most of it.

As someone who had to write a couple of proxy servers, I can't express how so sadly accurate it is.

chrismorgan 3 years ago | |

And this is why I expect HTTP/2 and HTTP/3 to be much more robust in the long term: the implementations are harder to write, and you won’t get anywhere without reading at least a some spec, whereas HTTP/1 is deceptively simple with therefore a lot of badly incorrect implementations, often with corresponding security problems.

superkuh 3 years ago | | |

HTTP/3 is written for the use case of large corporations and does not even allow for human persons to use it alone. It requires CA based TLS to set up a connection. So if you want to host a website visitable by a random person you've never communicated with before you have to get continued permission from an incorporated entity running a CA to do so.

This is far more of a security problem than all of the bad HTTP 1.1 implementations put together. It is built in corporate control that cannot be bypassed except by not using HTTP/3. It is extremely important that we not let the mega-corp browsers drop HTTP 1.1 and continue to write our own projects for it.

bmitc 3 years ago | | |

> whereas HTTP/1 is deceptively simple with therefore a lot of badly incorrect implementations

Doesn't that imply that HTTP/1 is deceptively complex?

chestervonwinch 3 years ago | | |

I get what you're saying, but robustness through complexity feels like an odd argument nonetheless.

arjvik 3 years ago | | |

As someone who has not read the HTTP/1.1 spec, what are some pitfalls that could actually become security issues?

dbttdft 3 years ago | | |

I don't think I could implement a correct HTTP 1 agent even if I read the specs.

gumby 3 years ago | | |

But for back compatibility implementors will still have to support HTTP/1, which will likely take more than 50% of the total effort.

mgaunard 3 years ago | | |

HTTP/2 makes no sense at all. HTTP/3 is just a fix to HTTP/2 so that it makes some sort of sense.

Both of these are only concerned with reducing the latency of doing lots of requests to the same server in parallel.

Which is only needed by web browsers and nothing else.

SamuelAdams 3 years ago | |

I feel like this applies to many technologies. Made me think of the bootstrapping, “I-can-build-that-in-a-weekend” crowd.

The initial problem is usually easy to solve for, it’s all the edge cases and other details that makes something complex.

cookiengineer 3 years ago | |

> As someone who had to write a couple of proxy servers, I can't express how so sadly accurate it is.

Chunked transfer/content encoding problems still give me nightmares...

Donckele 3 years ago |

“By contrast, I think about Bluetooth a lot. I wish I didn't.”

LOL, yes same here. Can’t wait for Bluetooths b̶a̶l̶l̶s̶ baggage to be chopped.

leinadho 3 years ago |

The humorous style is very refreshing, if only my networking lecturers had been more witty I might remember more of this

X-Istence 3 years ago |

> This is not the same as HTTP pipelining, which I will not discuss, out of spite.

That is cause HTTP pipelining was and is a mistake and is responsible for a ton of http request smuggling vulnerabilities because the http 1.1 protocol has no framing.

No browser supports it anymore, thankfully.

mgaunard 3 years ago | |

Isn't "HTTP pipelining" just normal usage of HTTP/1.1?

Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.

X-Istence 3 years ago | | |

HTTP Pipelining has the client sending multiple requests before receiving a response. It turns it into Request, Request, Request, Response, Response, Response.

The problem is that if Request number 1 leads to an error whereby the connection is closed, those latter two requests are discarded entirely. The client would have to retry request number two and three. If the server has already done work in parallel though, it can't send those last two responses because there is no way to specify that the response is for the second or third request.

The only way a server has to signal that it is in a bad state is to return 400 Bad Request and to close the connection because it can't keep parsing the original requests.

There is no support for HTTP pipelining in current browsers.

What you are thinking about is probably HTTP keep alive, where the same TCP/IP channel is used to send a follow-up request once a response to the original request has been received and processed. That is NOT HTTP pipelining.

deathanatos 3 years ago | | |

> Isn't "HTTP pipelining" just normal usage of HTTP/1.1?

> Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.

Yep.

There is some "support" a server could do, in the form of processing multiple requests in parallel¹, e.g., if it gets two GET requests back to back, it could queue up the second GET's data in memory, or so. The responses still have to be streamed out in the order they came in, of course. Given how complex I imagine such an implementation would be, I'd expect that to be implemented almost never, though; if you're just doing a simple "read request from socket, process request, write response" loop, then like you say, pipelined requests aren't a problem: they're just buffered on the socket or in the read portion's buffers.

¹this seems fraught with peril. I doubt you'd want to parallelize anything that wasn't GET/HEAD for risk of side-effects happening in unexpected orders.

yfiapo 3 years ago |

> We're not done with our request payload yet! We sent:

> Host: neverssl.com

> This is actually a requirement for HTTP/1.1, and was one of its big selling points compared to, uh...

> AhAH! Drew yourself into a corner didn't you.

> ...Gopher? I guess?

I feel like the author must know this.. HTTP/1.0 supported but didn't require the Host header and thus HTTP/1.1 allowed consistent name-based virtual hosting on web servers.

I did appreciate the simple natures of the early protocols, although it is hard to argue against the many improvements in newer protocols. It was so easy to use nc to test SMTP and HTTP in particular.

I did enjoy the article's notes on the protocols however the huge sections of code snippets lost my attention midway.

proto_lambda 3 years ago | |

> I feel like the author must know this

The author does know this, it's a reference to a couple paragraphs above:

> [...] and the HTTP protocol version, which is a fixed string which is always set to HTTP/1.1 and nothing else.

> (cool bear) But what ab-

> IT'S SET TO HTTP/1.1 AND NOTHING ELSE.

yfiapo 3 years ago | | |

Thanks, missed that.

I_complete_me 3 years ago |

That was an excellent, well-written, well-thought out, well presented, interesting, humorous, enjoyable read. Coincidentally I recently did a Rust crash course so it all made perfect sense - I am not an IT pro. Anyhows, thanks.

pohuing 3 years ago | |

I highly recommend taking a look at the other writeups on fasterthanli.me they're almost all excellent

mihneawalker 3 years ago | |

I'd like to ask you what crash course on Rust did you take, as there are quite a few out there, and it would help if someone recommends a certain course.

atfzl 3 years ago | | |

Try https://fasterthanli.me/articles/a-half-hour-to-learn-rust which is also written by the same author.

I_complete_me 3 years ago | | |

You Tube Let's Get Rusty - ULTIMATE Rust Lang Tutorial! - Getting Started

becquerel 3 years ago | |

After the string of positive adjectives, I was expecting the second half of your comment to take a sharp turn into cynicism. Thank you for subverting my expectations by not subverting my expectations!

q-base 3 years ago | |

I will piggyback on your comment as I totally agree. I am amazed at the amount of work that must go into not just writing the article itself but all the implementations along the way. Really amazing job!

Andys 3 years ago |

I learned HTTP1 pretty well but not much of 2.

Since playing with QUIC, I've lost all interest in learning HTTP/2, it feels like something already outdated that we're collectively going to skip over soon.

fasterthanlime 3 years ago | |

I tend to agree with you there, however the thing I'm replacing does HTTP/2, and HTTP/3 is yet another can of worms as far as "production multitenant deployment" goes, so, that's what my life is right now.

As far as learning goes, I do think HTTP/2 is interesting as a step towards understanding HTTP/3 better, because a lot of the concepts are refined: HPACK evolves into QPACK, flow control still exists but is neatly separated into QUIC, I've only taken a cursory look at H3 so far but it seems like a logical progression that I'm excited to dig into deeper, after I've gotten a lot more sleep.

masklinn 3 years ago | |

FWIW HTTP/3 very much builds upon / reframes HTTP/2’s semantics, so it might be useful to get a handle on /2, as I’m not sure all the /3 documentation will frame it in /1.1 terms.

pcthrowaway 3 years ago | |

HTTP1 is definitely outdated (it was expeditiously replaced by HTTP 1.1), but I'd argue ignoring HTTP/2 might be more like ignoring IPv4 because we have IPv6 now

Joker_vD 3 years ago | |

It's pretty much a transport-level protocol, just like QUIC.

Icathian 3 years ago |

Amos' writing style is just so incredibly good. I don't know anyone else doing these very long-form, conversational style articles.

Plus, you know, just an awesome dev who knows his stuff. Huge fan.

mcspiff 3 years ago | |

https://xeiaso.net/ is equally great content in a similar style in my opinion. Different area of topics a bit, but I enjoy both very much

Icathian 3 years ago | | |

Oh, this looks very promising. Thanks for the recommendation!

juped 3 years ago |

If you're using OpenBSD nc already, just use nc -c for TLS.

stevewatson301 3 years ago | |

Depending on your version of nc, -c is for sending CRLFs or executing sent data as commands. You might be looking for ncat instead.

Denvercoder9 3 years ago | | |

In OpenBSD nc (as GP mentioned), -c is for a TLS connection: https://man.openbsd.org/nc.1

silon42 3 years ago | | |

My nc has that as -C, no -c option.

photochemsyn 3 years ago |

What a great overall site. Hopping down the links I found the section on files with code examples in JS, Rust and C, plus strace, really the best short explanation I've ever found online.

https://fasterthanli.me/series/reading-files-the-hard-way/pa...

rpigab 3 years ago |

This is awesome, didn't read all of it yet, but I will for sure, I use HTTP way too much and too often to ignore some of these underlying concepts, and when I try to look it up, there's always way too much abstraction and the claims aren't proven to me with a simple example, and this article is full of simple examples. Thanks Amos!

est 3 years ago |

I hope there's a h2 or TLS crash course.

fasterthanlime 3 years ago | |

Against my better judgement, the article /does/ go over H2 (although H3 is all the rage right now).

For TLS, I recommend The Illustrated TLS 1.3 Connection (Every byte explained and reproduced): https://tls13.xargs.org/

tehmillhouse 3 years ago | | |

I'd like to thank you for the time and effort it must take to research, write and edit these articles. The tone you strike with these articles is a delight to read, and I find myself gobbling these things up even for topics about which I (falsely, it usually turns out) consider myself fairly knowledgeable.

keewee7 3 years ago | | |

Thanks for the link! Are there other good crash courses on various protocols and standards? Directly jumping into the dry official specs is just too overwhelmingly sometimes.

antonvs 3 years ago |

> Where every line ends with \r\n, also known as CRLF, for Carriage Return + Line Feed, that's right, HTTP is based on teletypes, which are just remote typewriters

Does it need to be pointed out that this is complete bullshit?

sireat 3 years ago |

Is HTTP always the same protocol as HTTPS - given the same version - and ignoring the encryption from TLS?

Theoretically yes, but in practice?

I've done my share of nc testing even simpler protocols than HTTP/1.1

For some reason the migration to HTTPS scared me despite the security assurances. I could not see anything useful in wireshark anymore. I now had to trust one more layer of abstraction.

st_goliath 3 years ago | |

> Is HTTP always the same protocol as HTTPS - given the same version - and ignoring the encryption from TLS?

> Theoretically yes, but in practice?

Yes, that's the whole point of encapsulation. The protocol is blissfully unaware of encryption and doesn't even have to be. It has no STARTTLS mechanism either.

Your HTTPS traffic consists of a TCP handshake to establishes a TCP connection, a TLS handshake across that TCP connection to exchange keys and establish a TLS session, and the exact, same HTTP request/response traffic, inside the encrypted/authenticated TLS session.

The wonderful magic of solving a problem by layering/encapsulating.

> I could not see anything useful in wireshark anymore

Wireshark supports importing private keys for that, see: https://wiki.wireshark.org/TLS

fasterthanlime 3 years ago | | |

The article covers using Wireshark to decrypt TLS traffic using Pre-Shared Master Secrets!

ok123456 3 years ago | | |

The encapsulation isn't complete because of SNI.

dochtman 3 years ago | |

For 1.1 and 2, the byte stream is the same for TCP vs TLS over TCP. For 3, it uses one stream per request over a QUIC connection which is always encrypted.

Too 3 years ago | |

The protocol is the same, but semantics in the applications can differ. Secure cookies only working on https to give one example.

mannyv 3 years ago |

As far as i can tell the host header is pointless, because if it's ssl/tls you won't be able to read it and route it. That's what sni is for. If you aren't tls then you don't need it, unless you hit the server as an ip. But then why would you do that?

LukeShu 3 years ago | |

It's for one server/IP serving multiple hostnames. For instance, the same physical server at 45.76.26.79 serves both www.lukeshu.com and git.lukeshu.com with the same instance of Nginx. Once Nginx decrypts the request, it needs to know which `server { … }` block to use to generate the reply.

With TLS+SNI, this is redundant to the name from SNI. But we had TLS long before we had SNI, and we had HTTP long before we had TLS, and both of those scenarios need the `Host` header.

Too 3 years ago | |

Proxies doing TLS termination, with multiple servers behind.

mahdi7d1 3 years ago |

I didn't ask but I needed it.

mannyv 3 years ago |

Also, never trust the content length. It's been that way since before http was finalized. Use it as guidance, but don't treat it as canonical.

mannyv 3 years ago |

When doing http by hand, it's better to do http/1.0 because that tells the server you (and it) can't do anything exciting.

mustak_im 3 years ago |

Yay! this is going to be a great read for the weekend!

danesparza 3 years ago |

More articles should be written in the style of this article. Thank you for this.

stefs 3 years ago | |

most of his articles are written in this style. they're great!

tinglymintyfrsh 3 years ago |

    GET / HTTP/1.0\r\n\r\n

Still works with many websites.

mlindner 3 years ago |

Is there a way to get this guide without the annoying side-commentary?

fasterthanlime 3 years ago | |

The RFCs themselves are pretty dry, if that's your thing — https://httpwg.org/ has the freshest ones.

tomcam 3 years ago |

Funny and very helpful. Thank you.

cph123 3 years ago |

For a crash course would the code examples have been better in something like Python rather than Rust?

fasterthanlime 3 years ago | |

My whole thing is that I'm teaching Rust /while/ solving interesting, real-world problems (instead of looking at artificial code samples), so, if someone wants to write the equivalent article with Python, they should! I won't.

rk06 3 years ago | |

Nope, that’s the author’s favourite language. A regular reader would expect rust to be used like in previous articles

tmountain 3 years ago |

This is gold.

$ curl -v https://self-signed.badssl.com/ * Trying 104.154.89.105:443... * Connected to self-signed.badssl.com (104.154.89.105) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * CAfile: /etc/ssl/certs/ca-certificates.crt * CApath: /etc/ssl/certs * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (OUT), TLS header, Unknown (21): * TLSv1.2 (OUT), TLS alert, unknown CA (560): * SSL certificate problem: self-signed certificate * Closing connection 0 curl: (60) SSL certificate problem: self-signed certificate More details here: https://curl.se/docs/sslcerts.html

$ curl --version curl 7.81.0 (x86_64-pc-linux-gnu) libcurl/7.81.0 OpenSSL/3.0.2 zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 libidn2/2.3.2 libpsl/0.21.0 (+libidn2/2.3.2) libssh/0.9.6/openssl/zlib nghttp2/1.43.0 librtmp/2.3 OpenLDAP/2.5.13 Release-Date: 2022-01-05 Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets zstd