Poisonous MD5 – Wolves Among the Sheep(blog.silentsignal.eu) |
Poisonous MD5 – Wolves Among the Sheep(blog.silentsignal.eu) |
Wikipedia has this to say, which seems to solve that puzzle:
"Flame was signed with a fraudulent certificate purportedly from the Microsoft Enforced Licensing Intermediate PCA certificate authority. The malware authors identified a Microsoft Terminal Server Licensing Service certificate that inadvertently was enabled for code signing and that still used the weak MD5 hashing algorithm, then produced a counterfeit copy of the certificate that they used to sign some components of the malware to make them appear to have originated from Microsoft. A successful collision attack against a certificate was previously demonstrated in 2008, but Flame implemented a new variation of the chosen-prefix collision attack."
Here's the 'real' link: http://en.wikipedia.org/wiki/Flame_%28malware%29
In some systems I've built in the past I employ MD5 as a hashing mechanism to verify firmware integrity after flashing it in the memory. I don't use MD5 for anything security related (this is treated in other ways, depending on the system), just to check transmission and memory integrity.
Is MD5 still considered fine for this, or is there a real risk that random or systematic (but unintentional) noise could generate a collision between corrupted and original data? I do believe it should suffice, but hearing all the badmouth makes me wonder...
Not saying that MD5 is a good choice in this case, just that we may be blaming the wrong thing.
The MD5 algorithm is known to lack collision resistance, but whether it has preimage resistance is less certain; mathematical advances have weakened its preimage resistance, but not yet to the point of demonstrating a practical preimage attack.
False negatives would be more of an issue if the anti-virus has white lists and one can manufacture a Microsoft Excel MD5 signature with a malware. But that's not what the article refers to.
MD5 is only broken if you want to use it as a non-reversible hashing algorithm or if you want to use it as a an unforgeable signature. But it's perfectly fine for many other usage.
As you can see, binaries submitted for analysis are
identified by their MD5 sums and no sandboxed execution is
recorded if there is a duplicate (thus the shorter time
delay). This means that if I can create two files with the
same MD5 sum – one that behaves in a malicious way while the
other doesn’t – I can “poison” the database of the product
so that it won’t even try to analyze the malicious sample!
So it's a technique to get the scanner to ignore a malicious binary by constructing a non-malicious one with the same MD5 sum. This would be much harder if the scanner used a SHA-1 hash or similar.0. http://en.wikipedia.org/wiki/Preimage_attack
1. 2^123.4 complexity is not practical
I don't know which hashing algorithm they use but just as example of a situation where whitelist is not used.
The approach may work with traditional AV software too as
many of these also use fingerprinting (not necessarily MD5)
to avoid wasting resources on scanning the same files over
and over (although the RC4 encryption results in VT 0/57
anyway…).This is the case with all instances of seeking a collision, due to the birthday paradox [0]
To put some numbers on it, a good cryptographic hash should produce random-looking output with no way to predictably influence the output, and no visible correlation between the input and the output. Put in A, get out "random number" B, except that every time you put in the same A, you get out the same B.
If you have a hash that matches that description, then your only hope is brute force, so you look at the size of the hash. If you're breaking SHA-256, for example, that's 256 "random" bits.
If you have a specific hash that you want to generate, it will take you on average 2^255 attempts. (On average you have to try half the possibilities before you find a match.) If your computer can do a billion hashes per nanosecond then that means you'll spend on average 2^255 / (10^9 * 10^9) seconds, or about 10^41 times the current age of the universe.
If you just want to find any collision, then the birthday paradox comes into play, and on average you'll need to try roughly the square root of the number of possibilities before you find a collision. At a billion per nanosecond that's 2^128 / (10^9 * 10^9) second, or a mere 780 times the current age of the universe. Plus you have to figure out how to store all those intermediate results.
If MD5 fit this property it would still be good. Not as good as SHA-256, because it's only 128 bits, but plenty sufficient. 128 bits with the hypothetical billion hashes per nanosecond gives you 390 times the current age of the universe to find a specific hash. A collision is easier, at 18 seconds, but a billion hashes per nanosecond is also orders of magnitude faster than you'll realistically be able to do, and you'll need 2^64 * 128 bits = 300 exabytes of storage.
The problem with MD5 is that it does not fit this "random number" property. You can manipulate the input with somewhat predictable results in the output if you're clever, and that means you can generate a collision much easier than it would require for brute force.
My understanding is that more modern hashes are believed/hoped to have this property, but it's unknown. And not only that, but it's unknown whether any hash could exist with that property, or whether it's a theoretically impossible goal.
This is from 1998 but the relevant parts - https://www.schneier.com/essays/archives/1998/05/the_crypto_...
>Cryptographic algorithms have a way of degrading over time. It's a situation that most techies aren't used to: Compression algorithms don't compress less as the years go by, and sorting algorithms don't sort slower. But encryption algorithms get easier to break; something that sufficed three years ago might not today.
>Cryptographic algorithms are all vulnerable to brute force--trying every possible encryption key, systematically searching for hash-function collisions, factoring the large composite number, and so forth--and brute force gets easier with time. A 56-bit key was long enough in the mid-1970s; today that can be pitifully small. In 1977, Martin Gardner wrote that 129-digit numbers would never be factored; in 1994, one was.
>Aside from brute force, cryptographic algorithms can be attacked with more subtle (and more powerful) techniques. In the early 1990s, the academic community discovered differential and linear cryptanalysis, and many symmetric encryption algorithms were broken. Similarly, the factoring community discovered the number-field sieve, which affected the security of public-key cryptosystems.
DES was used in the 70s, now it can be brute forced in a few days (with the right hardware).
They argue you would be better with hardcoding your system to known secure best practice. When the time comes to change it, you specify a new protocol version, as there will be new and better practices not only in algorithms but also in how they are used (mac-before-encryption being the canonical example of this which took far too long to change).
This is a uniquely severe problem for hashing. MD5 is by far the fastest of the common hashes, at least 20% faster than SHA1 for example.
The Venn diagram is you've got popular fast hashes, of which md5 is arguably the best, and in fast hashes, you've got cryptographically secure fast hashes, of which md5 most certainly is no longer a member. Its awesome for everything non-crypto non-secure non-over the internet.
Last weekend I had to burn a legacy DVD and I compared the md5sum of the image to the md5sum of the burned DVD, thankfully they matched. As much as I distrust legacy media, there probably isn't a sentient opponent in the burner, although I've occasionally sworn otherwise, and the most likely failure mode would have been simple truncation or buffer dropouts, so sheer speed was the priority. The hashes obviously matched, it was a good burn.
I've also used them as checksums for engineering data files. Everything on the planet from AS400s to MySQL databases can calculate md5, which is convenient for cross platform interchange type stuff. Heres a data blob and its md5, does your hash calculation match? Oh it does, how nice to know you have perfect data integrity, here have the next one. On the internet I wouldn't trust the opfor with md5, but I can trust my own coworkers and my own engineering machinery not to try and attack me (well, probably).
That means until the end of time you'll unfortunately have people discover md5 is available for non-crypto use, then try to use it to hash passwords or DRM executables or something, which is not so wise. Or for a downgrade attack, they misdesign their crypto system to allow any hash installed on the machine as the hash type and not blacklist md5 and sha1 and maybe more.
What will die out is hashes like SHA1. Why use something almost as dangerous as md5 for crypto, thats slower than md5, and not as widespread as md5? Bye bye sha1!
I'm sure that you trust them, but do you trust their computers? Do you trust all of them that they can keep their personal computers and their accounts secure?
Maybe it doesn't match your case perfectly, but in general trusting people and trusting people's accounts is not the same thing.