The Nitty Gritty of “Hello World” on OS X(reinterpretcast.com) |
The Nitty Gritty of “Hello World” on OS X(reinterpretcast.com) |
I wonder how many people these days (including developers) would think "that's not so big for a Hello World program", and then change their mind after watching some 4k demos... 8KB might not sound like much in this era of gigabytes and terabytes, but eight thousand bytes is still, in absolute terms, quite a bit of data, and enough to do plenty more interesting things. Executable formats have become more complex with their headers, which are mostly unavoidable, but seeing empty space in the majority of the file is somewhat sad.
Here are some smaller Hello World programs to examine...
http://seriot.ch/hello_macho.php - OS X
http://timelessname.com/elfbin/ - Linux
...but they're still somewhat larger than the 20 bytes of the DOS version (95 ba 07 01 cd 21 c3 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 24.)
Dynamic linking makes it possible for executables to share a single copy of system frameworks; this saves physical memory, improves performance (due to reduced cache traffic), reduces the size of downloads, etc.
The UUID allows the debugger to reliably associate an executable with the dSYM bundle containing debug information.
Other features not discussed in the article allow a single program to run on multiple architectures (PPC, I386, X86_64, etc), allow the system to reliably determine whether a program has changed, to decide what capabilities the program has been granted, etc.
It's worth putting these size number into perspective; yes, EIGHT THOUSAND BYTES sounds big and scary next to twenty, but the overhead scales linearly with the number of executables and their functional complexity, not geometrically. Also, you have on the order of ONE TRILLION BYTES of storage for these headers, rather than three hundred and sixty thousand...
In this case the entire executable, headers and all, can fit in 1 page, so instead of just having that one page read into memory and executed immediately, you have to read another one for a total of 2 pages. For a small executable, this overhead can result in up to twice as many pages being read in, and although for larger ones that require many tens or more pages it decreases proportionally, it still doesn't make any sense to add otherwise completely useless bytes that have to be read in; disks (and SSDs) are several orders of magnitude slower than memory, so if the block containing the header is going to be read, it might as well include some more data (like the code of the program) that would need to be read later in any case.