Understanding IP, TCP, and HTTP(objc.io) |
Understanding IP, TCP, and HTTP(objc.io) |
Without guarantee of completeness, to avoid the spread of misinformation:
- IPv6 fragmentation has nothing to do with some "minimum payload size" (whatever that is) - there simply is no fragmentation being done by routers, the sender still can fragment however it pleases, and presumably will do so whenever it has to send a packet that doesn't fit through the path MTU.
- The end points use Packet Too big ICMP6 messages to determine _path_ MTU, which is different from just "the MTU".
- With IPv4, the sender chooses whether a router will fragment when the packet exceeds the next-hop MTU or whether the router should drop the packet and send a Fragmentation Needed ICMP message - where the latter again is used for path MTU discovery.
- Path MTU discovery is useful because it allows the sending IP implementation to push the chunking higher up the stack when the sending higher-level protocol has the capability (as is the case with TCP, but not with UDP, for example), which tends to produce lower overhead. Unfortunately, some clueless firewall administrators, such as those responsible for AWS EC2, do filter all ICMP because they for unknown reasons consider it to be bad, thus breaking PMTUD, which can lead to hanging TCP connections.
- TCP sequence numbers are for bytes, always, with the special case of SYN and FIN also counting as "bytes" in the sequence, but never for segments.
On a side note, it's funny to see that most of the paragraphs in the original article end with a link to Wikipedia — as a reader, I can go there myself, what good those articles do to me?
"We’ve created a new public repository on GitHub that contains all current and past objc.io articles. If you find any mistakes or have suggestions for improvements, please don’t hesitate to file issues, or even better: submit a pull request!"
Make a pull request so that people like me can learn about networking too.
I'll give them the benefit of the doubt and say he got his terms wrong. The IPv6 RFC states that IPv6 requires a minimum MTU of 1280 bytes. I guess that's what he meant.
https://www.ietf.org/rfc/rfc2460.txt
Packet Size Issues
IPv6 requires that every link in the internet have an MTU of 1280 octets or greater. On any link that cannot convey a 1280-octet packet in one piece, link-specific fragmentation and reassembly must be provided at a layer below IPv6.
If I had deep knowledge in this area, I'd probably applaud the effort and send corrections, rather than criticize.
As for getting an understanding of how TCP/IP works, I think Stevens' classic TCP/IP Illustrated still is a good book to get started, even if somewhat dated in some details (no IPv6, in particular), but the general principles still apply. Though maybe there are newer equally good books around that I just don't know about?
And obviously I was talking about packet filters, not about some kind of application firewall, which obviously doesn't have anything to do with filtering of packets anyhow.
The Pauli exclusion principle prevents electrons with the same quantum characteristics from entering the same space, this interaction occurs across the amassed copper atoms which form the majority of the metallic wires that approach one another in the internal structure of the keyboard ...
But that's not true if the connection is interrupted at the socket level, right?
For example, if the device switches from 3G to Wi-Fi, or from Wi-Fi to wire, then I believe, its hardware address changes, its IP address changes and the socket becomes stale. But the TCP connection, would it be closed right away or would it hang until some timeout? (And does it depend on the OS?)
For two reasons: It's probably one of the best introductions to the subject that has ever been written, and it's a model example of how a technical book should be written.
I'd be hard pressed to find a reason not to go this route at least once in your life. I know the material pretty well but I still re-read Stevens books every few years just because it is so good.
Then again, that's a lot of effort to spend on something that the vast majority of us don't need to know in much detail. The main reasons for knowing all the details are
- to write a new networking stack, or working on an existing one;
- to write or maintain server software or routers or caches or other software directly involved in networking;
- to break or exploit existing software.
(obviously 'because it's interesting' is a valid, but not practical reason to know)
Talk about understanding the sockets API ;-) here's the content section for chapter 5:
5.1. getaddrinfo() — Prepare to launch!
5.2. socket() — Get the File Descriptor!
5.3. bind() — What port am I on?
5.4. connect() — Hey, you!
5.5. listen() — Will somebody please call me?When I explained how TCP worked, in that the client could connect to the server and maintain an open socket to pass info continuously he was blown away. He had no idea this was possible. Explaining UDP was a lot harder.
So, I welcome any education on basic TCP/IP functionality!
For example if you do peer to peer, you need low latency, and UDP is best for that.
I think it's because TCP is hardware optimized, but it's designed to transmit a file in a stream, so if a packet is corrupt, it just waits to send that packet. In that fashion, TCP tend to be slower, but on average it's more efficient for single files or webpages.
You don't have good granularity with TCP, but if you want to work with UDP, you need to add redundancy and other mechanisms to make sure all is good.
ENet is an example of using UDP for gaming, so the goal is to have the lowest latency possible.
Excellent summary but i think pipeline has been oversimplified. HTTP pipelining is a FIFO queue. The responses have to be delivered in the same order as the requests. So if the first(or an early) response took longer to generate, all other requests in the pipeline have to wait. Something that SPDY is not susceptible to.
http://en.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-HOWT...
https://www.coursera.org/course/comnetworks
He pretty much wrote the book.
Edit: By the way, it was a nice article. I especially liked the tcpdump explanation.
The TCP socket doesn't know anything about any "interfaces" or "links" or anything like that, it only knows about its and the remote IP address (and port), and the IP stack will deliver any packets to it that it receives that are addressed to that port on that address coming from the corresponding remote address and port, no matter which link it was received through (possibly subject to reverse path filtering on end hosts as a security measure). Similarly, each outbound packet is routed independently, so if the routing table changes half-way through a TCP connection, packets simply will be routed via a different link (the end host really just does the same as any other router does, and the fundamental idea of packet switched networks is that routers to not know about connections, they simply forward each packet independently, potentially switching links as needed at any time).
It perfectly possible, for example, to bridge between WiFi and wired ethernet, and have a gateway that routes some IP network onto that Ethernet/WiFi, then, while connected to the WiFi, establish a TCP connection, disconnect from the WiFi, connect to the Ethernet via cable, using the same IP addresss on the Ethernet interface as you previously used on the WiFi interface, and the TCP connection will survive that just fine (it might take a moment for the router to time out its neighbour cache entry and re-resolve your IP address into the new hardware address, but that's just a matter of a few seconds). You could even connect to both, configure things such that the kernel only replied to ARP/ICMP6 ND on WiFi, say, and route outbound packets through the cable, then the outbound packets of the TCP connection would go through the cable while the inbound packets would go through the air.
The only thing that actually breaks a connection is when packets addressed to the address that your TCP connection is using cannot reach you anymore, or when packets you send using that address can not reach the other side anymore, for example because you send them through a link that does not allow you to use that address. The latter really is mostly what kills TCP connections on mobile phones: the default route gets changed from WiFi to G3, say, and your mobile provider won't allow you to continue sending through their network packets using the address you got assigned by the WiFi - so the connection hangs even if the WiFi interface might actually still be up and able to receive packets addressed to that address.
One important thing to notice in this: There isn't really any way how a TCP implementation could detect right away that any of this has happened, as it cannot know what the filtering policies of your provider(s)/network(s) are or whether you disconnected only temporarily or whether you will reconnect to a different access point to the same network ... - so, when some mobile platform kills TCP sockets when you "change from 3G to WiFi" that really is a dirty hack that makes a load of assumptions about some typical setups that don't necessarily hold true.
TCP has head-of-queue blocking, as it guarantees complete and in-order delivery, so when a packet gets lost in transit, it has to wait for a retransmit of the missing packet, whereas UDP delivers packets to the application as they arrive, including duplicates and without any guarantee that a packet arrives at all or which order they arrive (it really is essentially IP with port numbers and an (optional) payload checksum added), but that is fine for telephony, for example, where it usually simply doesn't matter when a few milliseconds of audio are missing, but delay is very annoying, so you don't bother with retransmits, you just drop any duplicates, sort reordered packets into the right order for a few hundred milliseconds of jitter buffer, and if packets don't show up in time or at all, they are simply skipped, possible interpolated where supported by the codec.
Also, a major part of TCP is flow control, to make sure you get as much througput as possible, but without overloading the network (which is kinda redundant, as an overloaded network will drop your packets, which means you'd have to do retransmits, which hurts throughput), UDP doesn't have any of that - which makes sense for applications like telephony, as telephony with a given codec needs a certain amount of bandwidth, you can not "slow it down", and additional bandwidth also doesn't make the call go faster.
In addition to realtime/low latency applications, UDP makes sense for really small transactions, such as DNS lookups, simply because it doesn't have the TCP connection establishment and teardown overhead, both in terms of latency and in terms of bandwidth use. If your request is smaller than a typical MTU and the repsonse probably is, too, you can be done in one roundtrip, with no need to keep any state at the server, and flow control als ordering and all that probably isn't particularly useful for such uses either.
And then, you can use UDP to build your own TCP replacements, of course, but it's probably not a good idea without some deep understanding of network dynamics, modern TCP algorithms are pretty sophisticated.
Also, I guess it should be mentioned that there is more than UDP and TCP, such as SCTP and DCCP. The only problem currently is that the (IPv4) internet is full of NAT gateways which make it impossible to use protocols other than UDP and TCP in end-user applications.
On the flip side, I like to see more people helping critiquing these articles so newbies like me can get the most out of it (though I already took computer network...).
In any case, my recommendation was referring to the old edition by Stevens alone, no clue about the new one, though at least the newly covered material seems appropriate to me.
(For anyone who might not be aware: W. Richard Stevens died in 1999, so the new edition is by a different author, though apparently based on the old material.)
I'm a total noob, yet the first few paragraphs made me cringe because I felt there were some odd things. I had a weird feeling about it. It wouldn't have bothered me if there wasn't this "A periodical about best practices and advanced techniques in Objective-C"..
Or using the word "great contributors", etc. I mean, one has to be humble because unless one really knows his stuff, he shouldn't talk that way.
If the writing style was more in the "I'm learning and journaling my progress", it would've been more than okay, and knowledgeable people wouldn't have a problem with it.
I was in forums and learning to design my PCB's, I'd post my design and ask for feedback, and people who'd spent 30+ years would comment on them and point flaws on what I thought was nice and would find a thousand flaws in it. And I got back to wrok, iteration after iteration.. Until these really great guys who do that for a living would say "Beautiful work".
Had I posted something like "advanced PCB design" in the "this is how it's done" way, they'd have ignored me and I would've stayed more ignorant than I still am.
There was a question on the Python mailing list asking how long it takes to say that one knows how to program. People with 40+ years actively programming said: I'll let you know when I'm there.
Humility goes a long way. Heck even when I read things on the nmap mailing list, I don't feel that tone that they consider they know more than you do even though they really, really know their stuff.
And of course you can publish it (might be good for feedback), just state that it's a learning journal, not "best practices".
This is a project I did a couple of years ago
http://www.electro-tech-online.com/threads/pcb-etching-tank-...
This is another project :
http://www.electro-tech-online.com/threads/first-pcb-stepper...
There was an update on the site, so images are not available there.. Here they are:
http://www.mediafire.com/view/uu4vsqq8e1yq8/PICTURES#6qj25tt...
Bear in mind that this is what my first attempt with PCB design was [PDF] in French, but you can see how ugly it was:
I was on the forum chat, and I'd send pictures and they'd help me see, they'd open my eyes and explain things. Why 90° tracks are a no-no, etc.
People are tremendously generous with their time as long as your attitude is okay. I learned orders of magnitude more on the internet than in college, and still do every minute I'm online.
That's why I find the whole "blind leading the blind" way of teaching to be counter-productive. Not that it's really serious or anything; nobody's going to lose a leg if you screw up your tcp connection. But when extended to other more serious topics, it can be dangerous to teach things to people if you're not sure about the subject matter. For example, something as simple as jumping a car battery actually isn't simple at all (when done correctly).
And I also think, to a degree, it actually is the responsibility of the reader to judge what to use that supposed new knowledge for. Trying to make sense of tcpdump output when debugging some application software bug? Why not? Writing an IP stack? I hope any sensible person would pick up a book and some RFCs first. Unreliable knowledge can still be useful and harmless in figuring things out, you just shouldn't use it to try and build things.
Then again, practically, we can observe that people do build systems without ever having looked into the relevant standards, and I would actually argue the effects are worse than one superficially might think. I mean, if you look at how ridiculously insecure the web/web browsers are, for example - how did that come to be? I would think one major factor is exactly that people didn't (and often still don't) read the relevant standards, something as apparently uncritical as the HTML spec or the HTTP spec, but instead just wrote what they thought was HTML, and wrote books about what they thought was HTML, and so on - resulting in a need for browser vendors to accomodate all this crap out there that isn't HTML or HTTP but that people still expect to be rendered by their browser in some way or another, and so, due to end-user market share pressure, we now have security vulnerabilities in browsers that are there only because fixing them would break stuff that noone really ever had any right to expect to work, but they thought they were just creating some totally uncritical website using the "knowledge" they learned from some other clueless person, and those security problems can have quite serious consequences.
I wrote an IP stack, of sorts, and used Wikipedia to do it. I'm aware that it's probably crappy, but only because it was basically designed to be. If I had tried to design it well, I might lead myself to believe I had done it correctly, for example because I found no problems with it in my testing. But as you're aware, there's plenty of problems with tcp/ip stacks that only come up as edge cases. So even if I was being sensible I might end up with shitty code and push it into a product, and then we're screwed. But if I had learned the stack correctly I couldn't be in that mess.
A kind of solution lies in forums like HN, though. Sure, the posts are fallible and are often upvoted merely because they are perceived as authoritative. But we have the comments section, and knowledgeable persons who can speak up and educate. So it may not matter at all who's teaching, as long as somebody picks up the slack.
The reason I'm saying that is that human beings have selective memory. They tend to remember things they did the right way, they remember their good ideas, times they were right, etc.
I used to note my ideas that would seem genius.. And then I'd look at them a couple months later and it's humbling. How stupid could I be.
But there is a good thing about this: It taught me a valuable lesson.. It taught me to focus on real needs, and not some fancy thoughts I have at 3AM. Like real needs.
And I know that at an early stage, one needs to let go of critical things and be open and not dismiss ideas, etc.. But it's just that some ideas are plain stupid and I had plenty of those.
I write them down, then cross things. Not a real need, not a problem. Now I'm thinking about an idea that I'd use if it were available. And I'm not the only one.
That might not be quite enough for a really good implementation, but overall software quality would be a hell of a lot better if everyone did that, it's just amazing when you look just at websites and also emails, how many people just make up how they think things work rather that reading the standards that are only a google search away.