-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12.1.2014 19:29, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
1. The official copy of TrueCrypt.
2. A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
1. 2^64 files that are different, but behave exactly like the non-malicious official copy of TrueCrypt.
2. 2^64 files that are different, but behave exactly like the malicious copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Good stuff. Thanks for your time. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS0uaoAAoJEB1x37R9kdwnoDwQAMGT20QaQBwbT5V14F+aGTwk lQT8t0ggUXQxyJ1Lhvvqo9fG4CDBFwSl/h/VncVQE3IW3Aaz/EfJ72fuggTbLfp6 UXpDWgGe5kFSyDD9Sk6NEZc+FHZRPTWjpOu/gV4dg6qGqxOrg4RZkgizJkWt5Lfm PGf4EYSHwC5BR/o/WLHdhJzQHEIYTLmBIIPIUq2InBOQTLcA+gyVcoMY/jMdgN8D ytiN9gE5jefozjDkMNp1W+EVabYpIG3rQJdgGmahBXmWFtV765gsU1fqwU7SVS9d nJQRFEIraJDeaplhxEdrCa0205EsTKlAXgqZRLqvYOq/v2afVagvNwpwJvu5A68f TIVWNcWMcZOdn+1Dk4nuBMF8BJAtmppD+CNiaDMm+hMS7Hy9Q9R1chByij5x0mVB brAgVRELa5gvR5m0tSxJxJwyLxy9J3vQTtDFT6JVw8z3AQH/HZihgsxp9ANzn6j+ A3T5B8bqOXc2lkgGh84tuqnl/Gjm9+uaPIKnxf34Q78WMmX+XoPHiM+O8yTdl8/f k61UWSl0KfRsZ9+lnbDbFKitP/ssUo4eB4uqGJIHrql+eKVuVn8602c8HyrssDG1 kvRWIEgp5vRgJ/s+dFKO7RhHordMbhJFwkftfxY0dRmlfeRnkdQKMaUyEB4GVVKV o1MN29AEKsfCTPlEbjlb =4jUR -----END PGP SIGNATURE-----