[arch-general] Packages Verified with MD5

Sun Jan 12 13:29:22 EST 2014

On 01/12/2014 10:11 AM, Mark Lee wrote:
> Perhaps I'm not strong enough in mathematics but I'd like to know how
> possible md5 collisions can be weaponized. From what I see, the idea
> would be to modify a binary such that it contains malicious code
> (without changing the md5sum). Since most security packages contain a
> number of compilation tests and md5 hashes vary significantly with
> slight modifications, I'd like to know how these collisions can be used
> to hijack a system. If one has to build a binary that doesn't even
> encompass the functionality of the binary it's trying to mimic, wouldn't
> that severely decrease the effectiveness of a hash collision?

The problem with MD5 is that it's possible to find two different
messages A and B such that MD5(A) = MD5(B). The birthday attack [1]
makes it possible to find collisions for 128-bit hashes (like MD5) in
about 2^64 operations, but there are MD5-specific attacks that make it
even easier for MD5. So... it takes a bit of time to find a collision,
but not too much time. It can be done in about a day with PS3's [2].

For my explanation, I'll ignore the MD5-specific attacks and just use
the birthday attack since it's easier to understand.

The attacker starts out with two files:

1. The official copy of TrueCrypt.

2. A malicious copy of TrueCrypt.

The malicious change would be something simple. For example, they could
find where the master encryption key is copied into a buffer:

...
MOV	[master_key + ECX], EAX     // EAX holds master key bits
...

And change it so that it's always zero:

...
MOV	[master_key + ECX], 0       // Now the master key is set to zero
...

Only one instruction is different, but now TrueCrypt encrypts everything
with a null key.

>From these two copies copies of TrueCrypt, they generate 2^64 variants
of each. They can do this by re-ordering instructions, changing which
registers are used for which variables, re-ordering ELF segments, or
just appending random strings to the file (you can append stuff to an
ELF binary and it will still run fine).

Now the attacker has:

1. 2^64 files that are different, but behave exactly like the
non-malicious official copy of TrueCrypt.

2. 2^64 files that are different, but behave exactly like the malicious
copy of TrueCrypt (that uses zero keys or whatever).

If MD5 was a good hash function, each of the files would hash to a
random 128-bit value. You should now be able to see that at least one
file from the "non-malicious" set of files probably has the same MD5
hash as one of the files from the "malicious" set of files.

There are 2^64 * 2^64 = 2^128 ways of pairing a file from the
non-malicious pile with one from the malicious pile. The chance of any
one pair colliding is 1 in 2^128, so there's a good chance that at least
one of the pairs collide (it's not exactly 100% since the pairs are not
mutually independent).

To find the collision quickly, the attacker can just sort one of the
sets of files by hash value, then for each file in the other set, use a
binary search to see if any file has the same hash. Ignoring log(n)
factors, that would take on the order of 2^64 operations.

So... Now the attacker has two files with the same MD5 hash. One that
behaves like the official copy of TrueCrypt, and another that uses null
keys.

So, they give the *good* copy to the package maintainer, who laboriously
checks that it's not backdoored (as if that ever happens). The package
maintainer thinks everything is fine, since it behaves exactly like
TrueCrypt and there's nothing fishy about the file (they probably won't
notice some instructions being reordered). The file might even be the
same as the one from the TrueCrypt website, if the attacker is
collaborating with the TrueCrypt developers.

The package maintainer thinks everything is good, so they hard-code the
MD5 hash of their good copy into the PKGBUILD.

Now, when most users install TrueCrypt, they will download the *good*
copy, check the MD5 hash, it will match, and everything will be good.
However, when the adversary sees that his target (e.g. a terrorist, or
activist/journalist living in a non-free country) is downloading
TrueCrypt, they will silently replace the file with with the *bad* copy.
When the MD5 gets checked, it will match, and the user will be
encrypting their files with null keys.

In total, it took the attacker on the order of 2^66 operations to find
the colliding files. You'd need a good budget (like the NSA's, or a
large company's) to do that, but if you take advantage of the
MD5-specific attacks too, you can do it for much cheaper.

Of course, this doesn't just apply to TrueCrypt, it could be applied to
any other software that's checked only with an MD5 hash. To backdoor a
web server, for example, you could change a constant used in array index
bounds checking to introduce a buffer overflow vulnerability.

[1] https://en.wikipedia.org/wiki/Birthday_attack

[2] http://www.win.tue.nl/hashclash/rogue-ca/

-- 
Taylor Hornby