[arch-general] Packages Verified with MD5

Sun Jan 12 17:06:27 EST 2014

On Sun, 2014-01-12 at 16:37 -0500, Mark Lee wrote:
> On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote:
> > On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
> > > On 01/12/2014 10:11 AM, Mark Lee wrote:
> > > > Perhaps I'm not strong enough in mathematics but I'd like to know how
> > > > possible md5 collisions can be weaponized. From what I see, the idea
> > > > would be to modify a binary such that it contains malicious code
> > > > (without changing the md5sum). Since most security packages contain a
> > > > number of compilation tests and md5 hashes vary significantly with
> > > > slight modifications, I'd like to know how these collisions can be used
> > > > to hijack a system. If one has to build a binary that doesn't even
> > > > encompass the functionality of the binary it's trying to mimic, wouldn't
> > > > that severely decrease the effectiveness of a hash collision?
> > > 
> > > The problem with MD5 is that it's possible to find two different
> > > messages A and B such that MD5(A) = MD5(B). The birthday attack [1]
> > > makes it possible to find collisions for 128-bit hashes (like MD5) in
> > > about 2^64 operations, but there are MD5-specific attacks that make it
> > > even easier for MD5. So... it takes a bit of time to find a collision,
> > > but not too much time. It can be done in about a day with PS3's [2].
> > > 
> > > For my explanation, I'll ignore the MD5-specific attacks and just use
> > > the birthday attack since it's easier to understand.
> > > 
> > > The attacker starts out with two files:
> > > 
> > > 1. The official copy of TrueCrypt.
> > > 
> > > 2. A malicious copy of TrueCrypt.
> > > 
> > > The malicious change would be something simple. For example, they could
> > > find where the master encryption key is copied into a buffer:
> > > 
> > > ...
> > > MOV	[master_key + ECX], EAX     // EAX holds master key bits
> > > ...
> > > 
> > > And change it so that it's always zero:
> > > 
> > > ...
> > > MOV	[master_key + ECX], 0       // Now the master key is set to zero
> > > ...
> > > 
> > > Only one instruction is different, but now TrueCrypt encrypts everything
> > > with a null key.
> > > 
> > > From these two copies copies of TrueCrypt, they generate 2^64 variants
> > > of each. They can do this by re-ordering instructions, changing which
> > > registers are used for which variables, re-ordering ELF segments, or
> > > just appending random strings to the file (you can append stuff to an
> > > ELF binary and it will still run fine).
> > > 
> > > Now the attacker has:
> > > 
> > > 1. 2^64 files that are different, but behave exactly like the
> > > non-malicious official copy of TrueCrypt.
> > > 
> > > 2. 2^64 files that are different, but behave exactly like the malicious
> > > copy of TrueCrypt (that uses zero keys or whatever).
> > > 
> > > If MD5 was a good hash function, each of the files would hash to a
> > > random 128-bit value. You should now be able to see that at least one
> > > file from the "non-malicious" set of files probably has the same MD5
> > > hash as one of the files from the "malicious" set of files.
> > > 
> > > There are 2^64 * 2^64 = 2^128 ways of pairing a file from the
> > > non-malicious pile with one from the malicious pile. The chance of any
> > > one pair colliding is 1 in 2^128, so there's a good chance that at least
> > > one of the pairs collide (it's not exactly 100% since the pairs are not
> > > mutually independent).
> > > 
> > > To find the collision quickly, the attacker can just sort one of the
> > > sets of files by hash value, then for each file in the other set, use a
> > > binary search to see if any file has the same hash. Ignoring log(n)
> > > factors, that would take on the order of 2^64 operations.
> > > 
> > > So... Now the attacker has two files with the same MD5 hash. One that
> > > behaves like the official copy of TrueCrypt, and another that uses null
> > > keys.
> > > 
> > > So, they give the *good* copy to the package maintainer, who laboriously
> > > checks that it's not backdoored (as if that ever happens). The package
> > > maintainer thinks everything is fine, since it behaves exactly like
> > > TrueCrypt and there's nothing fishy about the file (they probably won't
> > > notice some instructions being reordered). The file might even be the
> > > same as the one from the TrueCrypt website, if the attacker is
> > > collaborating with the TrueCrypt developers.
> > > 
> > > The package maintainer thinks everything is good, so they hard-code the
> > > MD5 hash of their good copy into the PKGBUILD.
> > > 
> > > Now, when most users install TrueCrypt, they will download the *good*
> > > copy, check the MD5 hash, it will match, and everything will be good.
> > > However, when the adversary sees that his target (e.g. a terrorist, or
> > > activist/journalist living in a non-free country) is downloading
> > > TrueCrypt, they will silently replace the file with with the *bad* copy.
> > > When the MD5 gets checked, it will match, and the user will be
> > > encrypting their files with null keys.
> > > 
> > > In total, it took the attacker on the order of 2^66 operations to find
> > > the colliding files. You'd need a good budget (like the NSA's, or a
> > > large company's) to do that, but if you take advantage of the
> > > MD5-specific attacks too, you can do it for much cheaper.
> > > 
> > > Of course, this doesn't just apply to TrueCrypt, it could be applied to
> > > any other software that's checked only with an MD5 hash. To backdoor a
> > > web server, for example, you could change a constant used in array index
> > > bounds checking to introduce a buffer overflow vulnerability.
> > > 
> > > [1] https://en.wikipedia.org/wiki/Birthday_attack
> > > 
> > > [2] http://www.win.tue.nl/hashclash/rogue-ca/
> > > 
> > 
> > Salutations!
> > 
> > Excellent explanation but I still don't see the guarantee that md5 hash
> > can be cracked. I don't see any application of the pidgeon hole
> > principle which would show two files (one malicious with the null key
> > and one normal) would have to share the same md5 hash. Can you elaborate
> > any further?
> > 
> > Regarding, switching out the md5 hashes, I agree with the switch to
> > sha256 for PKGBUILDS.
> > 
> > Regards,
> > Mark
> 
> Salutations!
> 
> I'd just like the point out that with the birthday attack, we'd have
> 100% confidence when we have 2^128+1 different hashes (using the pidgeon
> hole theorem). Anyone care to calculate a # pair vs. probability of
> collision for md5 hash?
> 
> Regards,
> Mark

Salutations!

In addition, this birthday attack assumes coercion from source code
developers. It seems that the method you demonstrated of reducing the
very high number needed to be 100% sure (from 2^128+1 to 129) one has a
collision is dependent on source code developers being complacent (they
are letting the code be refactored) to coercion.

Regards,
Mark

-- 
Mark Lee <mark at markelee.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part
URL: <http://mailman.archlinux.org/pipermail/arch-general/attachments/20140112/bcd16324/attachment.asc>