The issue is with the processor's cache memory, which is small, typically 64 kiB or so. (The TLB cache is also important.) Even a few hundred bytes savings can give an important improvement in speed amd power consumption.
You can't just decide that people aren't supposed to say "megabyte" anymore
It's becoming more common. By the way, if you use OS X, your utilities output megabytes (1000 bytes), not mebibytes. GNU utilities switched to KiB, MiB, etc. (1024 bytes).
A kilobyte here, a kilobyte there, and pretty soon you're talking real memory!
If you don't believe me, hop on any kernel development mailing list and try to convince them they could solve world hunger by only adding 1 KB overhead to the TCP socket structure that tracks actve connection states.
The argument about streamlined vpn or ssl devices is not relevant here: such a device has to calculate hashes, but it doesn't have to keep hash states open. Hash a message, make/check the signature, forward or discard, O(k) memory use where k is the number of cores.
I agree that a per connection overhead of this size for every TCP connection is a terrible thing, but I think marshray forgot we were talking about places where you use hashes, and keep them open.
For TLS records, there's a MAC at the end of each record. The MAC is based on HMAC, the most efficient implemenation involves priming two hash states with the MAC secret in advance. So with send and receive, every TLS connection already has four open hash states.
You're right though that most implementations seem to buffer a full record's data too. One could probably avoid this overhead when sending TLS (if you knew a minimum length in advance).