Debugging file corruption on iOS(code.facebook.com) |
Debugging file corruption on iOS(code.facebook.com) |
Vernor Vinge coined the idea of a software archeologist, people who could sift through layers of code, all the way back to the beginning of time (Jan 1, 1970 and Unix, of course), to understand how systems work and to make modifications to them. We aren't that far from that point; already, there are people who seem to specializing in digital spelunking to find and fix bugs in these underlying layers.
While I love building new code, some of the most satisfying moments in my career have been when I've gone back through somebody else's code, untied it (oh, you built a polymorphic type system including class initializers and destructors in a decidedly not object oriented language? Cool.) and fixed it. There's something interesting about getting into somebody else's head and seeing how they approached the problem, then finding out where they were wrong.
I've gone on to work other places where I had to do more of this archeology.. and I have to say it actually felt similar.
In summary - I think there is actually a new, more combinatorially complex amount of archeology occurring now. Where the microsoft, apple, netapp, linux, emc, vxworks, etc OS people have been dealing with some of this for a while with one OS... people who rely on services, on many processes, on an internet of things or whatever..
It feels like we'll never have a POSIX of the internet. HTTP is as close as we've gotten, and it's too small to be read/write/exec. We'll never have anything you can "trust" and more and more developers need the patience to wade through everyone else's code as well.
Why on earth is Facebook writing their own SSL layer for iOS?
> // setup a honeypot file
> int trap_fd = open(…);
> // Create new function to detect writes to the honeypot
> static WRITE_FUNC_T original_write = dlsym(RTLD_DEFAULT, "write");;
> ssize_t corruption_write(int fd, const void *buf, size_t size) {
> FBFatal(fd != trap_fd, @"Writing to the honeypot file");
> }
> return original_write(fd, buf, size);
> }
> // Replace the system write with our “checked version”
> rebind_symbols((struct rebinding[1]){{(char *)"write", (void *)corruption_write}}, 1);
Does this code snippet look fishy to anyone else? First, the mismatch braces are messing with my head. I'm thinking the brace before the return is a typo. Also, the call to the macro looks wrong. Shouldn't they be checking for fd == trap_fd? > 2. FBFatal has the same semantics as assert(), so that's correct.
Ah, this makes sense to me now. Thanks!If you can still edit your post, try block-indenting the entire code list with four spaces along the left margin -- this allows the code to appear on separate lines as intended and preserves the original indentation.
| It turns out that abandoning manual code analysis was a good strategy.
Wait a minute, wasn't this manual code analysis? They were certainly digging around the codebase and a particular slice of commits to figure out why the crash kept occurring.
To fix the SSL library, we first used dup() to properly refcount the FD and then did more long-term restructuring to properly couple the FD & SSL object lifetime later.