Understanding IP, TCP, and HTTP

Understanding IP, TCP, and HTTP(objc.io)

237 points by danieleggert 12 years ago | 61 comments

How I love it when people without deep knowledge of some subject write authoritative sounding articles.

Without guarantee of completeness, to avoid the spread of misinformation:

- IPv6 fragmentation has nothing to do with some "minimum payload size" (whatever that is) - there simply is no fragmentation being done by routers, the sender still can fragment however it pleases, and presumably will do so whenever it has to send a packet that doesn't fit through the path MTU.

- The end points use Packet Too big ICMP6 messages to determine _path_ MTU, which is different from just "the MTU".

- With IPv4, the sender chooses whether a router will fragment when the packet exceeds the next-hop MTU or whether the router should drop the packet and send a Fragmentation Needed ICMP message - where the latter again is used for path MTU discovery.

- Path MTU discovery is useful because it allows the sending IP implementation to push the chunking higher up the stack when the sending higher-level protocol has the capability (as is the case with TCP, but not with UDP, for example), which tends to produce lower overhead. Unfortunately, some clueless firewall administrators, such as those responsible for AWS EC2, do filter all ICMP because they for unknown reasons consider it to be bad, thus breaking PMTUD, which can lead to hanging TCP connections.

- TCP sequence numbers are for bytes, always, with the special case of SYN and FIN also counting as "bytes" in the sequence, but never for segments.

unethical_ban 12 years ago | |

I love it when people without deep knowledge of a subject try to learn about it and explain themselves to others.

stusmall 12 years ago | | |

This is an important part of the way I learn. I will read something and then explain it to someone else. It makes me think deeper about the issue as I form the words and it gives me a great chance to get corrected when I am making unfair assumptions. I always preface this conversations with "as I understand it" or "from what I read" or some other disclaimer. I used to have a coworker who would give me soooo much guff about these disclaimers since I'd drop so many of them in one of these conversations. I just felt it was important to make it clear I wasn't coming from a place of authority and more from the perspective of a guy who is bumbling through it and trying to figure out what the hell is going on.

smikhanov 12 years ago | | |

When it comes to complicated subjects like internals of TCP/IP, they may make more harm than good. Volume 1 of "TCP/IP Illustrated" is, I think, 700 pages long, not without reason. When one writes condensed articles like this it makes sense to stay high-level, because the moment you get into discussing SYN/ACK handshake, you are in danger of leaving large gaps in your explanation or making unreasonable stretches to complete the picture.

On a side note, it's funny to see that most of the paragraphs in the original article end with a link to Wikipedia — as a reader, I can go there myself, what good those articles do to me?

shadesandcolour 12 years ago | |

From the editorial page of this issue:

"We’ve created a new public repository on GitHub that contains all current and past objc.io articles. If you find any mistakes or have suggestions for improvements, please don’t hesitate to file issues, or even better: submit a pull request!"

Make a pull request so that people like me can learn about networking too.

mhurron 12 years ago | |

> "minimum payload size" (whatever that is)

I'll give them the benefit of the doubt and say he got his terms wrong. The IPv6 RFC states that IPv6 requires a minimum MTU of 1280 bytes. I guess that's what he meant.

https://www.ietf.org/rfc/rfc2460.txt

Packet Size Issues

IPv6 requires that every link in the internet have an MTU of 1280 octets or greater. On any link that cannot convey a 1280-octet packet in one piece, link-specific fragmentation and reassembly must be provided at a layer below IPv6.

zAy0LfpBZLC8mAC 12 years ago | | |

Which wouldn't really make me any more confident in the reliability of the whole thing?! Confusing lower-level fragmentation and reassembly with IPv6 fragmentation is not exactly a mistake you'd be likely to make when you understand what that actually means, I would think.

jakejake 12 years ago | |

Anybody writing routing code would be foolish to use this, or any other "simplified" article as a protocol reference. But, I'll agree that it is presented in such a way (and with enough technical detail) that any technical errors should be corrected.

If I had deep knowledge in this area, I'd probably applaud the effort and send corrections, rather than criticize.

danieleggert 12 years ago | |

Thanks for taking the time to comment on the article. I'll update the article later today to correct these. Your help is appreciated.

the_cat_kittles 12 years ago | |

Just out of curiosity, what do you do? Is this knowledge germane to where you work? I've just recently become interested in this stuff, so I'm curious to get a lay of the land.

zAy0LfpBZLC8mAC 12 years ago | | |

I do ... software development? ;-) There isn't really any particular category for what I do, though I tend to do more of the lower-level/backend stuff of projects, and knowing how the stuff that you build on works internally certainly is useful in optimizing and debugging.

As for getting an understanding of how TCP/IP works, I think Stevens' classic TCP/IP Illustrated still is a good book to get started, even if somewhat dated in some details (no IPv6, in particular), but the general principles still apply. Though maybe there are newer equally good books around that I just don't know about?

cbsmith 12 years ago | |

I'm also going to cut him some slack with the bit about giving each segment a unique number. While formally the sequence number identifies each byte of data, it really is about providing heuristics to identify and correct for out of order, fragmented, missing & duplicate segments. It is important that it be about bytes, particularly for things like SACK, but if you are trying to simplify things you might describe it as being about the segments.

zAy0LfpBZLC8mAC 12 years ago | | |

Sure, nothing wrong with simplifying things, but "Both ends are sending sackOK. This will enable Selective Acknowledgement. It switches the sequence numbers and acknowledgment number to use byte range instead of TCP segment numbers." is just flat-out wrong, and in particular suggests that "numbered segments" is not a simplification but an actual fact about how the thing works.

cbsmith 12 years ago | |

And regarding dropping the ICMP message about fragmentation... good firewall implementations have the firewall discover the MTU behind it and express THAT.. even better they might hide the hops behind it.

zAy0LfpBZLC8mAC 12 years ago | | |

There is no such thing as an "MTU behind it", there is a separate path MTU for each and every ordered address pair, more or less (and that's not even static, obviously).

And obviously I was talking about packet filters, not about some kind of application firewall, which obviously doesn't have anything to do with filtering of packets anyhow.

Myrmornis 12 years ago | |

What an unpleasant attitude. The author clearly made a lot of effort and as far as I know it's all accurate. If you know better then say so pleasantly.

tshaddox 12 years ago |

I've heard that a good way to gauge a person's general technological literacy is to simply ask "what happens when I type a URL in a browser and hit Enter?" Obviously, the question is deliberately open-ended, and any step in the process can be broken down into more detailed steps (up to a point). I'd like to see an article that initially shows high-level steps (e.g. DNS request, HTTP request, server processing, HTTP response, parsing and rendering), but allows each step to be expanded progressively with increasing detail.

pbhjpbhj 12 years ago | |

[Deeper]

The Pauli exclusion principle prevents electrons with the same quantum characteristics from entering the same space, this interaction occurs across the amassed copper atoms which form the majority of the metallic wires that approach one another in the internal structure of the keyboard ...

norswap 12 years ago |

If you want to further your understanding of network protocols, there's an excellent open textbook available here: http://cnp3book.info.ucl.ac.be/

maaaats 12 years ago | |

At Uni we had a book called "Computer Networking - A top down approach". One of the best teaching books I've ever read. The amount of detail is very nice balanced, and as the title says it's a top down approach where one layer at a time is discussed. Very interesting.

greyskull 12 years ago | | |

We're using that same book, sixth edition, in my networks course right now. It's overly verbose like every other textbook, but the content is solid.

yeukhon 12 years ago | |

Also https://www.coursera.org/course/comnetworks is helpful. I use this to prepare interview, just so I don't have to dig my textbook.

dd3141 12 years ago | |

Wow...what a great resource! I just downloaded it and will start reading it tonight. Thanks!

mrtbld 12 years ago |

> There’s a misconception that restarting the (HTTP) request will fix the problem. That is not the case. Again, TCP will resend those packets that need resending on its own.

But that's not true if the connection is interrupted at the socket level, right?

For example, if the device switches from 3G to Wi-Fi, or from Wi-Fi to wire, then I believe, its hardware address changes, its IP address changes and the socket becomes stale. But the TCP connection, would it be closed right away or would it hang until some timeout? (And does it depend on the OS?)

laichzeit0 12 years ago |

If you want to learn about IP, TCP, UDP and some of the protocols below this I would highly recommend reading Richard Stevens book TCP/IP Illustrated, Volume 1: The Protocols.

For two reasons: It's probably one of the best introductions to the subject that has ever been written, and it's a model example of how a technical book should be written.

I'd be hard pressed to find a reason not to go this route at least once in your life. I know the material pretty well but I still re-read Stevens books every few years just because it is so good.

roel_v 12 years ago | |

"I'd be hard pressed to find a reason not to go this route at least once in your life. I know the material pretty well but I still re-read Stevens books every few years just because it is so good."

Then again, that's a lot of effort to spend on something that the vast majority of us don't need to know in much detail. The main reasons for knowing all the details are

- to write a new networking stack, or working on an existing one;

- to write or maintain server software or routers or caches or other software directly involved in networking;

- to break or exploit existing software.

(obviously 'because it's interesting' is a valid, but not practical reason to know)

mje__ 12 years ago | | |

If you write anything that communicates over a network (e.g. anything using HTTP), you need to know at least some of this stuff, otherwise you're not going to be able to explain why (for example) your service call latencies have a big spike around 200ms.

beggi 12 years ago | |

TCP/IP Illustrated books are super detailed. If you don't want to dive quite that deep I recommend "Computer Networking: A Top-Down Approach Featuring the Internet".

rubiquity 12 years ago |

It's nice to see this recent increased emphasis on Web/mobile developers understanding the technologies that link it all together. The next thing I would add is a high level overview of the sockets API. While these topics aren't critical to most day-to-day lives of developers, they are certainly useful to understand.

marai2 12 years ago | |

This is a very readable online book on networking and sockets: http://beej.us/guide/bgnet/output/html/multipage/index.html

Talk about understanding the sockets API ;-) here's the content section for chapter 5:

  5.1. getaddrinfo() — Prepare to launch!
  5.2. socket() — Get the File Descriptor!
  5.3. bind() — What port am I on?
  5.4. connect() — Hey, you!
  5.5. listen() — Will somebody please call me?

stusmall 12 years ago | | |

I love beej's guide! Its where I learned socket programming. It is an art to make such a dry subject as entertaining as he does.

jnbiche 12 years ago | | |

Beej is where I learned socket programming. A great guide.

dnc 12 years ago | | |

Beej's guide is great! Thanks!

nasalgoat 12 years ago | |

I had one developer with over 6 years of server-side experience who made a server/client setup where the server would open a connection to the client, pass connection info to the client, close the connection, and then have the client open a connection back to the server to return results.

When I explained how TCP worked, in that the client could connect to the server and maintain an open socket to pass info continuously he was blown away. He had no idea this was possible. Explaining UDP was a lot harder.

So, I welcome any education on basic TCP/IP functionality!

jokoon 12 years ago |

I'm still curious about an explanation why do we have both TCP and UDP.

For example if you do peer to peer, you need low latency, and UDP is best for that.

I think it's because TCP is hardware optimized, but it's designed to transmit a file in a stream, so if a packet is corrupt, it just waits to send that packet. In that fashion, TCP tend to be slower, but on average it's more efficient for single files or webpages.

You don't have good granularity with TCP, but if you want to work with UDP, you need to add redundancy and other mechanisms to make sure all is good.

ENet is an example of using UDP for gaming, so the goal is to have the lowest latency possible.

sajal83 12 years ago |

> The improvements of using HTTP pipelining can be quite dramatic over high-latency connections – which is what you have when your iPhone is not on Wi-Fi. In fact, there’s been some research that suggests that there’s no additional performance benefit to using SPDY over HTTP pipelining on mobile networks

Excellent summary but i think pipeline has been oversimplified. HTTP pipelining is a FIFO queue. The responses have to be delivered in the same order as the requests. So if the first(or an early) response took longer to generate, all other requests in the pipeline have to wait. Something that SPDY is not susceptible to.

teddyh 12 years ago |

I prefer The Unix and Internet Fundamentals HOWTO:

http://en.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-HOWT...

brudgers 12 years ago |

David Wetherall teaches this course @ Coursera.

https://www.coursera.org/course/comnetworks

He pretty much wrote the book.

notfoss 12 years ago |

There's a minor typo below the HTTPS section. It should be TLS not TSL ;)

Edit: By the way, it was a nice article. I especially liked the tcpdump explanation.