I noticed that the TrueCrypt package is downloaded over an insecure FTP connection and then only verified using MD5 hashes.
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
There are practical collision attacks against MD5. This means an adversary (e.g. the NSA) can construct two versions of the truecrypt binaries, one malicious and one not, which have the same MD5 hash. They can silently replace the file being downloaded with the malicious version and the change will not be detected.
This should be fixed to use SHA256 hashes, like the Firefox package:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
How can I help make it use SHA256 instead of MD5? I'm relatively new to arch, so I'm not familiar with what it takes to change something in the repos. Any advice would be appreciated.
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Thanks,
On 01/11/14 at 11:09pm, Taylor Hornby wrote:
I noticed that the TrueCrypt package is downloaded over an insecure FTP connection and then only verified using MD5 hashes.
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
There are practical collision attacks against MD5. This means an adversary (e.g. the NSA) can construct two versions of the truecrypt binaries, one malicious and one not, which have the same MD5 hash. They can silently replace the file being downloaded with the malicious version and the change will not be detected.
This should be fixed to use SHA256 hashes, like the Firefox package:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
How can I help make it use SHA256 instead of MD5? I'm relatively new to arch, so I'm not familiar with what it takes to change something in the repos. Any advice would be appreciated.
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Thanks,
Taylor Hornby
p.s. This might be better suited to arch-dev-public, but I think users should be informed of the vulnerability, so I decided on arch-general.
SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt.
Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1]
The maintainer also says in his PKGBUILD that the download method used by truecrypt isn't compatible with makepkg [2]
[1] http://www.truecrypt.org/docs/digital-signatures [2] https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
Am 12.01.2014 10:21, schrieb Jelle van der Waa:
On 01/11/14 at 11:09pm, Taylor Hornby wrote:
...
SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt.
Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1]
Signatures are encrypted hashes: [1] "PGP uses a cryptographically strong hash function on the plaintext the user is signing. This generates a fixed-length data item known as a /message digest. /(Again, any change to the information results in a totally different digest.)
Then PGP uses the digest and the private key to create the "signature." PGP transmits the signature and the plaintext together. Upon receipt of the message, the recipient uses PGP to recompute the digest, thus verifying the signature. PGP can encrypt the plaintext or not; signing plaintext is useful if some of the recipients are not interested in or capable of verifying the signature."
The maintainer also says in his PKGBUILD that the download method used by truecrypt isn't compatible with makepkg [2]
[1] http://www.truecrypt.org/docs/digital-signatures [2] https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 02:21 AM, Jelle van der Waa wrote:
SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt.
Right. I assumed it was the PKGBUILD that was signed and verified, then it was trusted to download and verify the files it needs. If that's not the case, I'll have to do some more reading.
Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1]
The .sig file on the FTP server is the same one you can download from the TrueCrypt website. If it's used to verify the packages, the client needs a secure way to get the TrueCrypt Foundation's public key. Where is that done?
I don't see a .sig file used in the Firefox PKGBUILD, so I assume it's relying on the SHA256?
Thanks,
- -- Taylor Hornby
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 09:30 AM, Taylor Hornby wrote:
The .sig file on the FTP server is the same one you can download from the TrueCrypt website. If it's used to verify the packages, the client needs a secure way to get the TrueCrypt Foundation's public key. Where is that done?
I figured it out:
"If a signature file in the form of .sig is part of the PKGBUILD source array, makepkg validates the authenticity of source files. For example, the signature pkgname-pkgver.tar.gz.sig is used to check the integrity of the file pkgname-pkgver.tar.gz with the gpg program."
https://wiki.archlinux.org/index.php/makepkg
However, I'm still not sure how and when the client gets the public key, and the pkcs-2.20.tar.gz file is not covered by the .sig. So I think it's still relying on the MD5s.
- -- Taylor Hornby
On Sun, 12 Jan 2014 09:30:04 -0700 Taylor Hornby havoc@defuse.ca wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 02:21 AM, Jelle van der Waa wrote:
SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt.
Right. I assumed it was the PKGBUILD that was signed and verified, then it was trusted to download and verify the files it needs. If that's not the case, I'll have to do some more reading.
PKGBUILDS are not signed, binary packages are.
Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1]
The .sig file on the FTP server is the same one you can download from the TrueCrypt website. If it's used to verify the packages, the client needs a secure way to get the TrueCrypt Foundation's public key. Where is that done?
In general, a packager has to have the public key in his/her keyring on a host which is used to build the package. Of course, it is implicit that you trust that packager's practices...
I don't see a .sig file used in the Firefox PKGBUILD, so I assume it's relying on the SHA256?
Not every project signs the released tarballs. Heck, some do not even release the hashes.
Best, L.
Thanks,
Taylor Hornby -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBAgAGBQJS0sMMAAoJEP5tMebkC3RuBeUP/i5LP/moujGECT5VDlQWpWLa 78nOlLV6BM99ZpJJicwcBAg2RLTzG1KngrpmKOmxQVon0h7OCImRU0SakK0eoFVl Kdp+cHK429Io1cDIHfmy2Nkzr0y7Wy6c8AOjO1D2JAkW8lXqOW+8FvVx6p8Vkg4b DT/dEMibe6/Wq3CLIvaV/86avWQ/+4LxpPy4Lh/uvqB4HT3GtiJI3SdzLOyCjl93 f8TAVPg7ALkVOtuVkEKfdVB4i2U3JTtN2wr4w2m7Xf5/m7tJWTlpITm/V9/4d5N7 KDyO3OcGpuNV9YE9PzhB5LaU2qnf28Yw4yCs0ntobBXIKocifR3lGxw4HG5lSJv8 1fwRQ2OXzLK4+QcNz/h/+H/HSTJNjSS19+Rss72SY7GIf5JY0ZVxftL02bjFbBA3 1mmlsFSLCAvD15iILoPN1t/WiKBF/3NVqYZXmMsHoaUG1Zf+eg1MwM9ECMTaf62w TysJ1Eh9KUt7sgiXQLggxCGaS0Mxw/eMfo9uPHxneuiuAj68FCpVjA/88W1aTztW zKrNUegPfW6ff5Amr7M4bLp308dJtkDEal0syLqomLCWJ9yo+A8ecEodSKLrdfww NfuOeVOZbm8lhwN02nPFxpo564Qg8YuUjaW6hLiD8nWX7UmfcT9LDWxvStw7q/S0 upEkeuHsI2oAdOGpC9dL =do7B -----END PGP SIGNATURE-----
On 12 January 2014 14:09, Taylor Hornby havoc@defuse.ca wrote:
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Of the 4890 base packages shown by ABS, 2988 are MD5-only. That is 61%, or more than half.
-- GPG/PGP ID: C0711BF1
On 01/12/2014 02:58 AM, Rashif Ray Rahman wrote:
On 12 January 2014 14:09, Taylor Hornby havoc@defuse.ca wrote:
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Of the 4890 base packages shown by ABS, 2988 are MD5-only. That is 61%, or more than half.
Wow, that's quite a lot.
Do I understand correctly that the hashes are relied on for security? In other words, is it the package (containing the PKGBUILD) that's signed, and once it's verified, it's the PKGBUILD's responsibility to check the integrity of the files it needs?
If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that:
1. Finds the 'source' and 'md5sums' lines. 2. Downloads the packages and checks the md5sums. 3. Computes the SHA256sums, and adds them to the file.
If there's anything I can do to help, let me know.
On 01/12/14 at 09:58am, Taylor Hornby wrote:
On 01/12/2014 02:58 AM, Rashif Ray Rahman wrote:
On 12 January 2014 14:09, Taylor Hornby havoc@defuse.ca wrote:
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Of the 4890 base packages shown by ABS, 2988 are MD5-only. That is 61%, or more than half.
Wow, that's quite a lot.
Do I understand correctly that the hashes are relied on for security? In other words, is it the package (containing the PKGBUILD) that's signed, and once it's verified, it's the PKGBUILD's responsibility to check the integrity of the files it needs?
If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that:
- Finds the 'source' and 'md5sums' lines.
- Downloads the packages and checks the md5sums.
- Computes the SHA256sums, and adds them to the file.
If there's anything I can do to help, let me know.
-- Taylor Hornby
No, you don't rely on hashes for security, hashes are for integrity checks. Signatures are for the verification of a file or message, since anyone can replace the hash on the server and upload a new tarball.
Signatures can only be created by the developers private key, it hashes a file or messages, then encrypts this hash with his private key. Then the developer puts the signature and tarball on a server.
Everyone who has somehow obtained the developers public key, can verify that the tarball hasn't been tampered with by creating a hash from the tarball and comparing it with the decrypted signature (original hash).
If a hacker uploaded a malicious tarball, he would be able to create a new hash, but wouldn't be able to create a new valid signature.
PS: the explanation of signing isn't exactly correct, since I didn't explain that there hash is actually not encrypted with the private key. A nice explanation of PGP can be found here: http://www.pgpi.org/doc/pgpintro/
PS2: You may raise more concerns about the truecrypts code. http://istruecryptauditedyet.com/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 10:27 AM, Jelle van der Waa wrote:
No, you don't rely on hashes for security, hashes are for integrity checks. Signatures are for the verification of a file or message, since anyone can replace the hash on the server and upload a new tarball.
I agree, and I understand how signatures work. But what am I missing? It looks like in e.g. the Firefox package...
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
...the only thing preventing a man in the middle from tampering with the binaries as an Arch user installs Firefox are those SHA256 hashes.
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD?
I have been assuming the former, that when I do pacman -S firefox or pacman -S truecrypt, it runs the PKGBUILD on *my* system. Is that not the case?
Thanks for your time, - -- Taylor Hornby
On Sun, Jan 12, 2014 at 9:40 PM, Taylor Hornby havoc@defuse.ca wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 10:27 AM, Jelle van der Waa wrote:
No, you don't rely on hashes for security, hashes are for integrity checks. Signatures are for the verification of a file or message, since anyone can replace the hash on the server and upload a new tarball.
I agree, and I understand how signatures work. But what am I missing? It looks like in e.g. the Firefox package...
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
...the only thing preventing a man in the middle from tampering with the binaries as an Arch user installs Firefox are those SHA256 hashes.
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD?
I have been assuming the former, that when I do pacman -S firefox or pacman -S truecrypt, it runs the PKGBUILD on *my* system. Is that not the case?
Thanks for your time,
Taylor Hornby
Which part of the man page or the wiki isn't clear about what 'pacman -S foo' does?
Hi,
I believe the topic stater has concerns about weakness of the MD5 hash algorithm. He suggests to deprecate md5sums=() and use cryptographic hash algorithm like SHA256. Personally I avoid MD5 in my packages because of its bad reputation. But I am not an crypto expert though.
I have been assuming the former, that when I do pacman -S firefox or pacman -S truecrypt, it runs the PKGBUILD on *my* system. Is that not the case?
No. Both firefox and truecrypt are distributed as binary packages. PKGBUILD is used by maintainer only at the build time. From other side AUR packages are always built on your machine. md5sums=() checks that the *source* files downloaded from internet are correct. MITM attack is still possible here.
On 01/12/2014 12:40 PM, Taylor Hornby wrote:
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD?
"pacman -S firefox" installs a pre-compiled binary maintained by an Arch Dev. On the other hand, PKGBUILDs are for building packages.
And the official firefox package is cryptographically signed by the package maintainer (not Mozilla).
Hopefully, that clears things up.
If you really want to build a firefox package yourself, you can set up ABS. If you build a package from ABS (using makepkg), you will run the PKGBUILD. https://wiki.archlinux.org/index.php/Abs
Kyle Terrien
PS: Great discussion on exploiting MD5.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 01:56 PM, Kyle Terrien wrote:
On 01/12/2014 12:40 PM, Taylor Hornby wrote:
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD?
"pacman -S firefox" installs a pre-compiled binary maintained by an Arch Dev. On the other hand, PKGBUILDs are for building packages.
And the official firefox package is cryptographically signed by the package maintainer (not Mozilla).
Hopefully, that clears things up.
Thank you, that makes so much more sense!
So, really, the vulnerability only exists while the Arch dev (or package maintainer or whatever they're called) is building the package. Once they do, and sign it, all Arch users will verify their signature to make sure they get the same file the Arch dev created.
That's not so bad, then, since you can't really do any better unless the upstream source (Mozilla) signs their files, and the package maintainer has their public key.
- -- Taylor Hornby
On 01/12/2014 01:13 PM, Taylor Hornby wrote:
Thank you, that makes so much more sense!
So, really, the vulnerability only exists while the Arch dev (or package maintainer or whatever they're called) is building the package. Once they do, and sign it, all Arch users will verify their signature to make sure they get the same file the Arch dev created.
That's correct! See these pages for more info on how pacman's signature checking works:
https://wiki.archlinux.org/index.php/Pacman#Package_security https://wiki.archlinux.org/index.php/Pacman-key
That's not so bad, then, since you can't really do any better unless the upstream source (Mozilla) signs their files, and the package maintainer has their public key.
To be honest, I'm a little surprised that Mozilla doesn't sign their Firefox source code.
Kyle
2014/1/12 Taylor Hornby havoc@defuse.ca
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 01:56 PM, Kyle Terrien wrote:
On 01/12/2014 12:40 PM, Taylor Hornby wrote:
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD?
"pacman -S firefox" installs a pre-compiled binary maintained by an Arch Dev. On the other hand, PKGBUILDs are for building packages.
And the official firefox package is cryptographically signed by the package maintainer (not Mozilla).
Hopefully, that clears things up.
Thank you, that makes so much more sense!
So, really, the vulnerability only exists while the Arch dev (or package maintainer or whatever they're called) is building the package. Once they do, and sign it, all Arch users will verify their signature to make sure they get the same file the Arch dev created.
That's not so bad, then, since you can't really do any better unless the upstream source (Mozilla) signs their files, and the package maintainer has their public key.
I think this could yet be a problem if the sys admin wants to build all of it's system. Then he will fall into the same problem with the AUR PKGBUILDs, or am i wrong?
On 13 January 2014 00:58, Taylor Hornby havoc@defuse.ca wrote:
If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that:
- Finds the 'source' and 'md5sums' lines.
- Downloads the packages and checks the md5sums.
- Computes the SHA256sums, and adds them to the file.
If there's anything I can do to help, let me know.
Makepkg supports MD5 and the SHAs. A PKGBUILD can have multiple checksums, but it depends on the maintainer which of them they'd prefer to use. You can get them to deprecate the practice of using MD5-only PKGBUILDs.
You're actually concerned about a part of the packaging process that requires human discretion. It is up to the packager to verify that the sources are good. They can proactively search for authentic checksums and signatures.
-- GPG/PGP ID: C0711BF1
On 01/13/2014 02:49 AM, Rashif Ray Rahman wrote:
On 13 January 2014 00:58, Taylor Hornby havoc@defuse.ca wrote:
If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that:
- Finds the 'source' and 'md5sums' lines.
- Downloads the packages and checks the md5sums.
- Computes the SHA256sums, and adds them to the file.
If there's anything I can do to help, let me know.
Makepkg supports MD5 and the SHAs. A PKGBUILD can have multiple checksums, but it depends on the maintainer which of them they'd prefer to use. You can get them to deprecate the practice of using MD5-only PKGBUILDs.
You're actually concerned about a part of the packaging process that requires human discretion. It is up to the packager to verify that the sources are good. They can proactively search for authentic checksums and signatures.
Yep, I misunderstood how it works. I thought the PKGBUILD was used on users' systems when they run "pacman -S truecrypt", when in fact the PKGBUILD is only used by the package maintainer to generate the binary packages, which they then sign.
So it's not as bad as I thought, and moving to SHA256 doesn't fix the problem. The only solution is to convince the software sources (Mozilla, etc.) to sign the files they release.
On Sat, 2014-01-11 at 23:09 -0700, Taylor Hornby wrote:
I noticed that the TrueCrypt package is downloaded over an insecure FTP connection and then only verified using MD5 hashes.
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
There are practical collision attacks against MD5. This means an adversary (e.g. the NSA) can construct two versions of the truecrypt binaries, one malicious and one not, which have the same MD5 hash. They can silently replace the file being downloaded with the malicious version and the change will not be detected.
This should be fixed to use SHA256 hashes, like the Firefox package:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
How can I help make it use SHA256 instead of MD5? I'm relatively new to arch, so I'm not familiar with what it takes to change something in the repos. Any advice would be appreciated.
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Thanks,
Salutations!
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
Regards, Mark
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
1. The official copy of TrueCrypt.
2. A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants
of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
1. 2^64 files that are different, but behave exactly like the non-malicious official copy of TrueCrypt.
2. 2^64 files that are different, but behave exactly like the malicious copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
[1] https://en.wikipedia.org/wiki/Birthday_attack
[2] http://www.win.tue.nl/hashclash/rogue-ca/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12.1.2014 19:29, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
The official copy of TrueCrypt.
A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
- 2^64 files that are different, but behave exactly like the
non-malicious official copy of TrueCrypt.
- 2^64 files that are different, but behave exactly like the
malicious copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Good stuff. Thanks for your time.
On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
The official copy of TrueCrypt.
A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
- 2^64 files that are different, but behave exactly like the
non-malicious official copy of TrueCrypt.
- 2^64 files that are different, but behave exactly like the malicious
copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Salutations!
Excellent explanation but I still don't see the guarantee that md5 hash can be cracked. I don't see any application of the pidgeon hole principle which would show two files (one malicious with the null key and one normal) would have to share the same md5 hash. Can you elaborate any further?
Regarding, switching out the md5 hashes, I agree with the switch to sha256 for PKGBUILDS.
Regards, Mark
On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote:
On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
The official copy of TrueCrypt.
A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
- 2^64 files that are different, but behave exactly like the
non-malicious official copy of TrueCrypt.
- 2^64 files that are different, but behave exactly like the malicious
copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Salutations!
Excellent explanation but I still don't see the guarantee that md5 hash can be cracked. I don't see any application of the pidgeon hole principle which would show two files (one malicious with the null key and one normal) would have to share the same md5 hash. Can you elaborate any further?
Regarding, switching out the md5 hashes, I agree with the switch to sha256 for PKGBUILDS.
Regards, Mark
Salutations!
I'd just like the point out that with the birthday attack, we'd have 100% confidence when we have 2^128+1 different hashes (using the pidgeon hole theorem). Anyone care to calculate a # pair vs. probability of collision for md5 hash?
Regards, Mark
On Sun, 2014-01-12 at 16:37 -0500, Mark Lee wrote:
On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote:
On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
The official copy of TrueCrypt.
A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
- 2^64 files that are different, but behave exactly like the
non-malicious official copy of TrueCrypt.
- 2^64 files that are different, but behave exactly like the malicious
copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Salutations!
Excellent explanation but I still don't see the guarantee that md5 hash can be cracked. I don't see any application of the pidgeon hole principle which would show two files (one malicious with the null key and one normal) would have to share the same md5 hash. Can you elaborate any further?
Regarding, switching out the md5 hashes, I agree with the switch to sha256 for PKGBUILDS.
Regards, Mark
Salutations!
I'd just like the point out that with the birthday attack, we'd have 100% confidence when we have 2^128+1 different hashes (using the pidgeon hole theorem). Anyone care to calculate a # pair vs. probability of collision for md5 hash?
Regards, Mark
Salutations!
In addition, this birthday attack assumes coercion from source code developers. It seems that the method you demonstrated of reducing the very high number needed to be 100% sure (from 2^128+1 to 129) one has a collision is dependent on source code developers being complacent (they are letting the code be refactored) to coercion.
Regards, Mark
arch-general@lists.archlinux.org