A fifteen year old TCP bug?

A fifteen year old TCP bug?(blogmal.42.org)

546 points by Sec 15 years ago | 59 comments

pilom 15 years ago |

This is true hacking news! Discusses a possible new bug in TCP, teaches how TCP works, has links to useful and relevant books on the subject, AND includes remarks about how difficult it is for a newbie to actually make changes to open source software and not get yelled at. I love it!

mrspeaker 15 years ago | |

I agree - but it's funny that there's only a handful of comments. You just can't have an uninformed rant with such technical correctness (the best kind of correctness)!

rwmj 15 years ago | |

.. and comes from a cool hackerish domain (42.org)

feintruled 15 years ago |

The response to the bug report looks depressingly typical. Rejects the working fix with a wall of text speculation on numerous other possibly better fixes (without deigning to actually choose one). Nirvana fallacy in action!

stcredzero 15 years ago | |

I know of one commercial Smalltalk UI bug that persisted 12 years -- being reported all the while. To be fair, it was a very tricky low-level race condition, very hard to reproduce, though very serious. (Unhandled exception in the bowels of the UI library. Boom! Application goes down.) Still, the attitude of the vendor was just unbelievable from the POV of the customer. After dozens of reports, hundreds of messages, numerous pieces of documentation, it still took 12 years for engineers to even start thinking it was something besides user error -- even though multiple customers were reporting it. (I know because I worked for 3 of them!) There is a huge perceptual wall there. I know because I used to work for the vendor. I know how apparent this bug is at a production shop and how opaque it appears from inside the vendor's camp. (And despite my being from inside, I still got the "user error" chant!)

EDIT: Oh, and I know of another UI bug that's been in their system for about 8 years. It's a Smalltalk newbie classic -- shoving non-identity keys into an IdentityDictionary. I could describe what it is to a Smalltalker in 2 sentences, and they could then find it and fix it. This vendor seems to have the same attitude about this bug, so I've already learned my lesson. They can keep their damn bug!

cletus 15 years ago | | |

1-2 years ago I had the exact same thing with a PHP bug (I know, PHP bugs... shocking!), specifically with mysqli. It would crash on LONGTEXT columns. Not reliably. Different people reported it in different forms over 2-3 years previous. all of them getting automated responses ("Please provide...") followed by ("Closed due to no activity for 7 days...") with the odd dismissive comment by a committer.

It's an incredibly frustrating experience.

cube13 15 years ago | |

It's a working fix, but isn't the proper fix. The real issue is the size difference of the long type for 32 and 64-bit architectures. Bruce Evans doesn't explicitly mention this fact, but it's the core of his reasoning.

Since long is 64-bits on 64-bit architecture, and 32 on 32-bit architecture. This is the reason that 0xffffffff is showing up as a non-negative number on the 64-bit machine, but shows up as negative on the 32-bit.

Changing the type to int(which is 32 bits long on both x86 and x64), while it does break 16-bit systems(which don't exist anymore), fixes this completely by removing the x86 and x64 behavior differences with the long type.

The two style issues that he mentions are easily fixed by moving the variable declaration to the top of the function and initializing it there. However, these may be forced by the function structure due to the gotos present in the code...

peterbotond 15 years ago | | |

a better fix to use the lmin macro. libkern.h.

http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/src/sys/sys...

btmorex 15 years ago | |

You have to do this if you want your codebase to get better over time rather than worse over time.

jongraehl 15 years ago | |

What he's doing seems useful to the project. There's no better time to get it right. I'm just surprised he's willing to expend so much effort communicating instead of just fixing the patch.

I noticed that C programmers tend to use macros for things where (possibly non-exported) inline functions would make more sense. Why is that? Are they in the habit of building the OS with all optimizations off? Or is it that they're being used as poor man's generic function?

cube13 15 years ago | | |

The inline keyword is best thought of as a hint to the compiler, not a command. The compiler is free to ignore the meatbag telling it to inline functions if it chooses to.

Macros are substituted in before the compiler, so they are always inlined.

EDIT: Hint, not suggestion.

wulczer 15 years ago | | |

You can't always reply on the compiler inlining your function, and AFAIK there's no portable way of forcing inlining.

abecedarius 15 years ago | | |

I think it's mostly inertia and culture. Inline functions weren't in the standard till C99.

runjake 15 years ago |

The response to the bug report was by Bruce Evans, who is listed as the "Style Police-Meister" for FreeBSD. Apparently his job is to enforce standards & code style. Seems like he was doing his job.

http://www.freebsd.org/doc/en_US.ISO8859-1/articles/committe...

Edit: edited for clarity. Thanks, pinko!

pinko 15 years ago | |

I believe the comment above was meant as a response to the "The response to the bug report looks depressingly typical" comment elsewhere in this thread.

It took me a minute to sort that out ("hmm, why is he referencing Bruce Evans?"), so I thought I'd mention it for anyone else trying to follow.

direxorg 15 years ago |

In 2002 we did custom patch for an energy company which had hundreds of outdated remote RS232 terminals hooked up via wireless links to the central station for control and monitoring. Their goal was to encrypt transmitted messages so it will not be intercepted and messed with during wireless transmission. Solution was Linux boxes on both sides that encrypts communication using OpenSSL... The problem was the terminal do not want to talk to Linux over crossover Ethernet because of.... you guessed... bug in TCP... To solve that we had to make patch for Linux kernel. and let me tell you that code in 2.4 kernel was very ugly with extremely funny comments :-)

My companion since than developing drivers and "he feels that he is doing something important rather than boring UI".

but all he is doing is mostly his own projects and drivers since updating open source IS a pain in the neck.

I guess problem in collaborative work is the reason why people do open source vs something that have to be supported. What do you think?

rboyd 15 years ago |

The only reply this PR got, was from Bruce Evans who critiqued my use of a simple (long) cast, which appears to have derailed this PR, sticking it in the usual never getting fixed limbo where unfortunately most of my PR's appear to end up.

Looks to me like Bruce gave you some valuable advice. You spent more time complaining about the handling of your PR and documenting the issue on your blog than it would have taken you to fix your patch.

HenryR 15 years ago |

Is Stevens vol. 2 in the public domain now? If not, that's pretty poor form, linking to a scanned pdf of the book.

uxp 15 years ago | |

No, actually it's not public domain. The 19th edition was printed and released in 2005. Pearson looks like it actively tries to protect it's copyrights to the series as well: http://www.foo.be/docs/TCPIP-Illustrated-1/

derrickpetzold 15 years ago | |

Its not because if you are a FreeBSD developer it is already assumed that you own every copy of Steven's books next to the shrine of him in your basement. So linking to a digital scan is not infringement its just convenience.

doki_pen 15 years ago | |

Wouldn't that be fair use? Do you have issue with the content being posted, or the fact that it's scanned?

estel 15 years ago | | |

Quoting parts of the book to support some points made in the article would fall under fair use, but linking to a scan of the entire book would not.

pavel_lishin 15 years ago |

> As I had virtually no understanding of the TCP code, I liberally sprinkled it with printf()s

And people say it's a stupid way to debug!

SwellJoe 15 years ago | |

What people say that?

pavel_lishin 15 years ago | | |

Pick any "how do you debug?" submission anywhere, and you'll see a lot of people claiming that using printf, etc, is retarded in the age of good debuggers.

Maybe they're just a noisy minority.

barrkel 15 years ago |

So much of this is caused by unsigned types. They are evil; avoid them wherever you can.

1amzave 15 years ago | |

Care to elaborate? Unlike signed ones, unsigned integral types at least have well-defined behavior on shifting and overflow. (I'm speaking in terms C specifically here, of course.)

cpeterso 15 years ago | | |

Signed ints are easier to range check at runtime. Given an unsigned int, it's difficult to detect an invalid result from combining or comparing signed and unsigned ints.

Google's C++ Style Guide discourages using unsigned ints to represent nonnegative numbers (like sizes or counts). It recommends using runtime checks or assertions instead.

http://google-styleguide.googlecode.com/svn/trunk/cppguide.x...

Unsigned ints make sense for bit twiddling, but you should probably use a fixed-size uint32_t or uint64_t to ensure the results are consistent across various architectures.

barrkel 15 years ago | | |

I care less about C's specific behaviour on shifting and overflow (both of which are pretty rare), and more about the fact that unsigned integers use a different arithmetic to the signed integers most people are familiar with. In particular, subtraction doesn't mean what you think it does. At 0 in unsigned arithmetic, there's a gaping cliff you can fall off of where you wrap around the other side, while at 0 in signed arithmetic, you're well away from that cliff and are highly unlikely to get anywhere near to it. Writing a program using many unsigned numbers means playing on the edge of a cliff.

__david__ 15 years ago | |

I wouldn't say they are evil. In fact, both signed and unsigned are the same--the only difference is the "pain point" (the place where you subtract 1 and your world breaks) is in a different spot. 0 for unsigned, INT_MIN for signed. Both are perfectly fine as long as you stay in their good range.

ig1 15 years ago |

I only had a quick skim through the article (need to be off to the London HN meetup shortly!), but couldn't this be used to mount a DOS attack sucking up the number of available sockets on a server?

pmjordan 15 years ago | |

Maybe, if you could trick the server (64-bit FreeBSD) into connecting to sockets open on 32-bit FreeBSD machines. I can't think of any common services that would be susceptible to this (they would normally be susceptible to being tricked into opening other kinds of long-standing connections, too, which is just as good for DoS).

zwp 15 years ago | | |

> if you could trick the server (64-bit FreeBSD) into connecting to sockets

Proxies, SMTP gateways, FTP servers (active mode), ...