[arch-general] Packages Verified with MD5
I noticed that the TrueCrypt package is downloaded over an insecure FTP connection and then only verified using MD5 hashes. https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p... There are practical collision attacks against MD5. This means an adversary (e.g. the NSA) can construct two versions of the truecrypt binaries, one malicious and one not, which have the same MD5 hash. They can silently replace the file being downloaded with the malicious version and the change will not be detected. This should be fixed to use SHA256 hashes, like the Firefox package: https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p... How can I help make it use SHA256 instead of MD5? I'm relatively new to arch, so I'm not familiar with what it takes to change something in the repos. Any advice would be appreciated. Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated. Thanks, -- Taylor Hornby p.s. This might be better suited to arch-dev-public, but I think users should be informed of the vulnerability, so I decided on arch-general.
On 01/11/14 at 11:09pm, Taylor Hornby wrote:
I noticed that the TrueCrypt package is downloaded over an insecure FTP connection and then only verified using MD5 hashes.
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
There are practical collision attacks against MD5. This means an adversary (e.g. the NSA) can construct two versions of the truecrypt binaries, one malicious and one not, which have the same MD5 hash. They can silently replace the file being downloaded with the malicious version and the change will not be detected.
This should be fixed to use SHA256 hashes, like the Firefox package:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
How can I help make it use SHA256 instead of MD5? I'm relatively new to arch, so I'm not familiar with what it takes to change something in the repos. Any advice would be appreciated.
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Thanks, -- Taylor Hornby
p.s. This might be better suited to arch-dev-public, but I think users should be informed of the vulnerability, so I decided on arch-general.
SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt. Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1] The maintainer also says in his PKGBUILD that the download method used by truecrypt isn't compatible with makepkg [2] [1] http://www.truecrypt.org/docs/digital-signatures [2] https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p... -- Jelle van der Waa
Am 12.01.2014 10:21, schrieb Jelle van der Waa:
On 01/11/14 at 11:09pm, Taylor Hornby wrote:
... SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt.
Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1]
Signatures are encrypted hashes: [1] "PGP uses a cryptographically strong hash function on the plaintext the user is signing. This generates a fixed-length data item known as a /message digest. /(Again, any change to the information results in a totally different digest.) Then PGP uses the digest and the private key to create the "signature." PGP transmits the signature and the plaintext together. Upon receipt of the message, the recipient uses PGP to recompute the digest, thus verifying the signature. PGP can encrypt the plaintext or not; signing plaintext is useful if some of the recipients are not interested in or capable of verifying the signature."
The maintainer also says in his PKGBUILD that the download method used by truecrypt isn't compatible with makepkg [2]
[1] http://www.truecrypt.org/docs/digital-signatures [2] https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/12/2014 02:21 AM, Jelle van der Waa wrote:
SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt.
Right. I assumed it was the PKGBUILD that was signed and verified, then it was trusted to download and verify the files it needs. If that's not the case, I'll have to do some more reading.
Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1]
The .sig file on the FTP server is the same one you can download from the TrueCrypt website. If it's used to verify the packages, the client needs a secure way to get the TrueCrypt Foundation's public key. Where is that done? I don't see a .sig file used in the Firefox PKGBUILD, so I assume it's relying on the SHA256? Thanks, - -- Taylor Hornby -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS0sMMAAoJEP5tMebkC3RuBeUP/i5LP/moujGECT5VDlQWpWLa 78nOlLV6BM99ZpJJicwcBAg2RLTzG1KngrpmKOmxQVon0h7OCImRU0SakK0eoFVl Kdp+cHK429Io1cDIHfmy2Nkzr0y7Wy6c8AOjO1D2JAkW8lXqOW+8FvVx6p8Vkg4b DT/dEMibe6/Wq3CLIvaV/86avWQ/+4LxpPy4Lh/uvqB4HT3GtiJI3SdzLOyCjl93 f8TAVPg7ALkVOtuVkEKfdVB4i2U3JTtN2wr4w2m7Xf5/m7tJWTlpITm/V9/4d5N7 KDyO3OcGpuNV9YE9PzhB5LaU2qnf28Yw4yCs0ntobBXIKocifR3lGxw4HG5lSJv8 1fwRQ2OXzLK4+QcNz/h/+H/HSTJNjSS19+Rss72SY7GIf5JY0ZVxftL02bjFbBA3 1mmlsFSLCAvD15iILoPN1t/WiKBF/3NVqYZXmMsHoaUG1Zf+eg1MwM9ECMTaf62w TysJ1Eh9KUt7sgiXQLggxCGaS0Mxw/eMfo9uPHxneuiuAj68FCpVjA/88W1aTztW zKrNUegPfW6ff5Amr7M4bLp308dJtkDEal0syLqomLCWJ9yo+A8ecEodSKLrdfww NfuOeVOZbm8lhwN02nPFxpo564Qg8YuUjaW6hLiD8nWX7UmfcT9LDWxvStw7q/S0 upEkeuHsI2oAdOGpC9dL =do7B -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/12/2014 09:30 AM, Taylor Hornby wrote:
The .sig file on the FTP server is the same one you can download from the TrueCrypt website. If it's used to verify the packages, the client needs a secure way to get the TrueCrypt Foundation's public key. Where is that done?
I figured it out: "If a signature file in the form of .sig is part of the PKGBUILD source array, makepkg validates the authenticity of source files. For example, the signature pkgname-pkgver.tar.gz.sig is used to check the integrity of the file pkgname-pkgver.tar.gz with the gpg program." https://wiki.archlinux.org/index.php/makepkg However, I'm still not sure how and when the client gets the public key, and the pkcs-2.20.tar.gz file is not covered by the .sig. So I think it's still relying on the MD5s. - -- Taylor Hornby -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS0so/AAoJEP5tMebkC3Ru1mwP/i1wPOZbEGTc9FuD6RP6zcZt IOoRdCIZWoV/8tunTguGiGJrpCuEuxyKi6uL7wS6YqmkR6GHUzKVwtZwfv+lyh/w 95r4gDHNMrktrnxUNiCQ6HmmGkE1Hn9XO4fHbF4xc2NbV70IoLlHHqyGIk6FVKac IJ3xhTyUUNEkEh5FxDkqqBOrLpkcA9fE0lXvAuyR6rxMqrGjMdK0PiaTcZwmASSl 4BfyQ+N8wuZM738lDPkDMlF6pRqtkJkue+thM5OWzKi8uEHn2u6uLqM4Wlg4pTNM rlYeeMjlLSvCFlD9QUm2EytIjtTSXZIH2titYtcogg5lIBjhlWuapMIoQSdi4hG7 Ubphk9n7D8JF3IZUXTRvmyXv9uElH9K2TBHTTJwqDispXaEZre+6PwyfasDZ9fxf B/qa+OGoU8xeoqUs0Rp+Rde70Ly49HUWNKSXzuCa0HaQoGYlf/xFEkNk21nhQy4Z KwKYFPfViuTL3f8uwLsBhbh7yWEGSaHL5qi4GnxsiBNwqf7wogy9YqjEGY4iOfGM xli9dCvNCjuqmJWKVPhQd8cQ5p/GlYjNr/ICjdB3ksIZO7R1rqrpXETeiUkueoqN vDNM009uhDwYyqly2GTdi3HxnMKGo3grmGxOlX1/mE9ssNyBT6t5yyrKLr/pK7Cc 9SbDYpQMogk/Y6mq159O =7RGz -----END PGP SIGNATURE-----
On Sun, 12 Jan 2014 09:30:04 -0700 Taylor Hornby <havoc@defuse.ca> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 02:21 AM, Jelle van der Waa wrote:
SHA256 hashes won't fix anything, since hashes are only integritiy checks telling you the downloaded file isn't corrupt.
Right. I assumed it was the PKGBUILD that was signed and verified, then it was trusted to download and verify the files it needs. If that's not the case, I'll have to do some more reading.
PKGBUILDS are not signed, binary packages are.
Signatures however are made to verify that the content isn't modified on the server, which as you can see is used in the PKGBUILD. [1]
The .sig file on the FTP server is the same one you can download from the TrueCrypt website. If it's used to verify the packages, the client needs a secure way to get the TrueCrypt Foundation's public key. Where is that done?
In general, a packager has to have the public key in his/her keyring on a host which is used to build the package. Of course, it is implicit that you trust that packager's practices...
I don't see a .sig file used in the Firefox PKGBUILD, so I assume it's relying on the SHA256?
Not every project signs the released tarballs. Heck, some do not even release the hashes. Best, L.
Thanks,
- -- Taylor Hornby -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBAgAGBQJS0sMMAAoJEP5tMebkC3RuBeUP/i5LP/moujGECT5VDlQWpWLa 78nOlLV6BM99ZpJJicwcBAg2RLTzG1KngrpmKOmxQVon0h7OCImRU0SakK0eoFVl Kdp+cHK429Io1cDIHfmy2Nkzr0y7Wy6c8AOjO1D2JAkW8lXqOW+8FvVx6p8Vkg4b DT/dEMibe6/Wq3CLIvaV/86avWQ/+4LxpPy4Lh/uvqB4HT3GtiJI3SdzLOyCjl93 f8TAVPg7ALkVOtuVkEKfdVB4i2U3JTtN2wr4w2m7Xf5/m7tJWTlpITm/V9/4d5N7 KDyO3OcGpuNV9YE9PzhB5LaU2qnf28Yw4yCs0ntobBXIKocifR3lGxw4HG5lSJv8 1fwRQ2OXzLK4+QcNz/h/+H/HSTJNjSS19+Rss72SY7GIf5JY0ZVxftL02bjFbBA3 1mmlsFSLCAvD15iILoPN1t/WiKBF/3NVqYZXmMsHoaUG1Zf+eg1MwM9ECMTaf62w TysJ1Eh9KUt7sgiXQLggxCGaS0Mxw/eMfo9uPHxneuiuAj68FCpVjA/88W1aTztW zKrNUegPfW6ff5Amr7M4bLp308dJtkDEal0syLqomLCWJ9yo+A8ecEodSKLrdfww NfuOeVOZbm8lhwN02nPFxpo564Qg8YuUjaW6hLiD8nWX7UmfcT9LDWxvStw7q/S0 upEkeuHsI2oAdOGpC9dL =do7B -----END PGP SIGNATURE-----
-- Leonid Isaev GnuPG key: 0x164B5A6D Fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
On 12 January 2014 14:09, Taylor Hornby <havoc@defuse.ca> wrote:
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Of the 4890 base packages shown by ABS, 2988 are MD5-only. That is 61%, or more than half. -- GPG/PGP ID: C0711BF1
On 01/12/2014 02:58 AM, Rashif Ray Rahman wrote:
On 12 January 2014 14:09, Taylor Hornby <havoc@defuse.ca> wrote:
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Of the 4890 base packages shown by ABS, 2988 are MD5-only. That is 61%, or more than half.
Wow, that's quite a lot. Do I understand correctly that the hashes are relied on for security? In other words, is it the package (containing the PKGBUILD) that's signed, and once it's verified, it's the PKGBUILD's responsibility to check the integrity of the files it needs? If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that: 1. Finds the 'source' and 'md5sums' lines. 2. Downloads the packages and checks the md5sums. 3. Computes the SHA256sums, and adds them to the file. If there's anything I can do to help, let me know. -- Taylor Hornby
On 01/12/14 at 09:58am, Taylor Hornby wrote:
On 01/12/2014 02:58 AM, Rashif Ray Rahman wrote:
On 12 January 2014 14:09, Taylor Hornby <havoc@defuse.ca> wrote:
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Of the 4890 base packages shown by ABS, 2988 are MD5-only. That is 61%, or more than half.
Wow, that's quite a lot.
Do I understand correctly that the hashes are relied on for security? In other words, is it the package (containing the PKGBUILD) that's signed, and once it's verified, it's the PKGBUILD's responsibility to check the integrity of the files it needs?
If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that:
1. Finds the 'source' and 'md5sums' lines. 2. Downloads the packages and checks the md5sums. 3. Computes the SHA256sums, and adds them to the file.
If there's anything I can do to help, let me know.
-- Taylor Hornby
No, you don't rely on hashes for security, hashes are for integrity checks. Signatures are for the verification of a file or message, since anyone can replace the hash on the server and upload a new tarball. Signatures can only be created by the developers private key, it hashes a file or messages, then encrypts this hash with his private key. Then the developer puts the signature and tarball on a server. Everyone who has somehow obtained the developers public key, can verify that the tarball hasn't been tampered with by creating a hash from the tarball and comparing it with the decrypted signature (original hash). If a hacker uploaded a malicious tarball, he would be able to create a new hash, but wouldn't be able to create a new valid signature. PS: the explanation of signing isn't exactly correct, since I didn't explain that there hash is actually not encrypted with the private key. A nice explanation of PGP can be found here: http://www.pgpi.org/doc/pgpintro/ PS2: You may raise more concerns about the truecrypts code. http://istruecryptauditedyet.com/ -- Jelle van der Waa
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/12/2014 10:27 AM, Jelle van der Waa wrote:
No, you don't rely on hashes for security, hashes are for integrity checks. Signatures are for the verification of a file or message, since anyone can replace the hash on the server and upload a new tarball.
I agree, and I understand how signatures work. But what am I missing? It looks like in e.g. the Firefox package... https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p... ...the only thing preventing a man in the middle from tampering with the binaries as an Arch user installs Firefox are those SHA256 hashes. I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD? I have been assuming the former, that when I do pacman -S firefox or pacman -S truecrypt, it runs the PKGBUILD on *my* system. Is that not the case? Thanks for your time, - -- Taylor Hornby -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS0v23AAoJEP5tMebkC3Ruz8UP/0asU+Xfx1QL/y8E++XfQf1d s2440S4Y8A1YlPB+tKcOdDZ2PfxfaK5T1DboXqEKs+49yOb0+vdsI/w/aHvqMDDa Qp1ulu+Ci2QOEa6C3d9b7emlYxm4Hv2JbV1gQCtlqr2v2I64F98Db2bfj8wBmoOf pz8Z7Au1uBOdrIBWUqQPt/VH5A6H5hT1/e9mYZ36LU7Cw9rptyd8RSd2NPU9uEoQ vdvOtDAh6Wz8WIt8m8oMbplmNq1Uxd/TrwdvPnwJ5z9C6lnF3y+A+qrDn83J1DVm OgfHWw4KIiJGpfck91daTPBH51FZ3E5AgRifzDXgxwRBp0os/g6Y/2wR02vr+Fmd LNHRullzR4lA7k+uUtTLdDpbOIUw+uLYF7j7ARHXQZkmOwXwfBGmykB55/oUZJel sOew4+sQofVzZVziYAi/aCmrYVJr8yeDv4SIRNrLIi7ensYRfsCXe8JfUTehgVGK LDxzvvFkYuey8bkcTykjYR2DkSKHJ82gwA1v9PUOrDjNWBG8EkDRGJ2Q7y1bCRoh 3fXIynME7QCyshWPolyCEphVZScUPFKEHr02/dyRXtDAYocoZF6zCApZE5GOXfDz UVF6xKIMJDZg2+em8k0pOpOEEX+/NuWbBBwQIf9RmHEQkVMGrLWUIKBt8qYQfSjZ TLUcZm69MZ89ojHgy515 =WeYA -----END PGP SIGNATURE-----
On Sun, Jan 12, 2014 at 9:40 PM, Taylor Hornby <havoc@defuse.ca> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 10:27 AM, Jelle van der Waa wrote:
No, you don't rely on hashes for security, hashes are for integrity checks. Signatures are for the verification of a file or message, since anyone can replace the hash on the server and upload a new tarball.
I agree, and I understand how signatures work. But what am I missing? It looks like in e.g. the Firefox package...
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
...the only thing preventing a man in the middle from tampering with the binaries as an Arch user installs Firefox are those SHA256 hashes.
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD?
I have been assuming the former, that when I do pacman -S firefox or pacman -S truecrypt, it runs the PKGBUILD on *my* system. Is that not the case?
Thanks for your time, - -- Taylor Hornby
Which part of the man page or the wiki isn't clear about what 'pacman -S foo' does?
Hi, I believe the topic stater has concerns about weakness of the MD5 hash algorithm. He suggests to deprecate md5sums=() and use cryptographic hash algorithm like SHA256. Personally I avoid MD5 in my packages because of its bad reputation. But I am not an crypto expert though.
I have been assuming the former, that when I do pacman -S firefox or pacman -S truecrypt, it runs the PKGBUILD on *my* system. Is that not the case? No. Both firefox and truecrypt are distributed as binary packages. PKGBUILD is used by maintainer only at the build time. From other side AUR packages are always built on your machine. md5sums=() checks that the *source* files downloaded from internet are correct. MITM attack is still possible here.
On 01/12/2014 12:40 PM, Taylor Hornby wrote:
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD?
"pacman -S firefox" installs a pre-compiled binary maintained by an Arch Dev. On the other hand, PKGBUILDs are for building packages. And the official firefox package is cryptographically signed by the package maintainer (not Mozilla). Hopefully, that clears things up. If you really want to build a firefox package yourself, you can set up ABS. If you build a package from ABS (using makepkg), you will run the PKGBUILD. <https://wiki.archlinux.org/index.php/Abs> Kyle Terrien PS: Great discussion on exploiting MD5.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/12/2014 01:56 PM, Kyle Terrien wrote:
On 01/12/2014 12:40 PM, Taylor Hornby wrote:
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD? "pacman -S firefox" installs a pre-compiled binary maintained by an Arch Dev. On the other hand, PKGBUILDs are for building packages.
And the official firefox package is cryptographically signed by the package maintainer (not Mozilla).
Hopefully, that clears things up.
Thank you, that makes so much more sense! So, really, the vulnerability only exists while the Arch dev (or package maintainer or whatever they're called) is building the package. Once they do, and sign it, all Arch users will verify their signature to make sure they get the same file the Arch dev created. That's not so bad, then, since you can't really do any better unless the upstream source (Mozilla) signs their files, and the package maintainer has their public key. - -- Taylor Hornby -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS0wWRAAoJEP5tMebkC3RuYxQP/R5j9AE29+bcWzx9cAYlCkYa iTEinD8SQ9DGWw+ludTHQTrA8yCzvbXftawtSVy49djYawS5y61HICDm5bpnGRqj KFsHyqD4ovJO50uVRAXmuhbpYARpttF8zyQi8RklxFYWfp9P4illNqf/oOy5R7wr MjWuY1Ljb98beblQK6o9Ps4DsMDD2b6rFeDR/HB2hmLDjnqTWlDnZxx1lwj15IHf dSjTomt3LOcDRFHg2zCcGnoFXBlThmb9Dud139C/9XHd0XSLLFE7E4LeiJVJ8Tso pyAIoRWwNaETeECYpqf8QrI69vQEVkFatAmvc2jnZO/3IwpHbfo9Ld+M/qMuP4pQ uNWcOO6448Awr+PtC2cTtcJxbidN+Wpp/cn9q3invpX6JKKbsiSxRQ8pTig09LQN +KnAwf7uDyqmYGmpoS+2qw5fAAPywtatgiC+AEkU6oM8V2LWTdWniCjBtLcdnaZo Fg4RWFw9kwjc7lz316SYTEnbw3iS5i/vUwdHxdfu89gMAnAW+EJB9M9JaIiCokjJ 9i10OAsZpiHVhl7SDATr9tkz9VVH+nW5tTm7ckPQ0UCrmN111isfYOpnqDrvcTiO 3te76MHLvZe1Pp5PwdKvoRZVBLnbtG6uPx2gYRoL6Y4sDAninLJeYKk0LWloE71n hbNgXvicMQ6udoBnrnZH =L+zO -----END PGP SIGNATURE-----
On 01/12/2014 01:13 PM, Taylor Hornby wrote:
Thank you, that makes so much more sense!
So, really, the vulnerability only exists while the Arch dev (or package maintainer or whatever they're called) is building the package. Once they do, and sign it, all Arch users will verify their signature to make sure they get the same file the Arch dev created.
That's correct! See these pages for more info on how pacman's signature checking works: <https://wiki.archlinux.org/index.php/Pacman#Package_security> <https://wiki.archlinux.org/index.php/Pacman-key>
That's not so bad, then, since you can't really do any better unless the upstream source (Mozilla) signs their files, and the package maintainer has their public key.
To be honest, I'm a little surprised that Mozilla doesn't sign their Firefox source code. Kyle
2014/1/12 Taylor Hornby <havoc@defuse.ca>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/12/2014 01:56 PM, Kyle Terrien wrote:
On 01/12/2014 12:40 PM, Taylor Hornby wrote:
I guess I just don't understand what happens when I type "pacman -S firefox." Does that run the PKGBUILD on my system, or does it download and install pre-compiled (and signed) Firefox binaries that were created by one of the Arch developers using the PKGBUILD? "pacman -S firefox" installs a pre-compiled binary maintained by an Arch Dev. On the other hand, PKGBUILDs are for building packages.
And the official firefox package is cryptographically signed by the package maintainer (not Mozilla).
Hopefully, that clears things up.
Thank you, that makes so much more sense!
So, really, the vulnerability only exists while the Arch dev (or package maintainer or whatever they're called) is building the package. Once they do, and sign it, all Arch users will verify their signature to make sure they get the same file the Arch dev created.
That's not so bad, then, since you can't really do any better unless the upstream source (Mozilla) signs their files, and the package maintainer has their public key.
I think this could yet be a problem if the sys admin wants to build all of it's system. Then he will fall into the same problem with the AUR PKGBUILDs, or am i wrong?
On 13 January 2014 00:58, Taylor Hornby <havoc@defuse.ca> wrote:
If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that:
1. Finds the 'source' and 'md5sums' lines. 2. Downloads the packages and checks the md5sums. 3. Computes the SHA256sums, and adds them to the file.
If there's anything I can do to help, let me know.
Makepkg supports MD5 and the SHAs. A PKGBUILD can have multiple checksums, but it depends on the maintainer which of them they'd prefer to use. You can get them to deprecate the practice of using MD5-only PKGBUILDs. You're actually concerned about a part of the packaging process that requires human discretion. It is up to the packager to verify that the sources are good. They can proactively search for authentic checksums and signatures. -- GPG/PGP ID: C0711BF1
On 01/13/2014 02:49 AM, Rashif Ray Rahman wrote:
If so, this should be fixed as soon as possible. How feasible would it be? Could it be as simple as making a script that:
1. Finds the 'source' and 'md5sums' lines. 2. Downloads the packages and checks the md5sums. 3. Computes the SHA256sums, and adds them to the file.
If there's anything I can do to help, let me know. Makepkg supports MD5 and the SHAs. A PKGBUILD can have multiple checksums, but it depends on the maintainer which of them they'd
On 13 January 2014 00:58, Taylor Hornby <havoc@defuse.ca> wrote: prefer to use. You can get them to deprecate the practice of using MD5-only PKGBUILDs.
You're actually concerned about a part of the packaging process that requires human discretion. It is up to the packager to verify that the sources are good. They can proactively search for authentic checksums and signatures.
Yep, I misunderstood how it works. I thought the PKGBUILD was used on users' systems when they run "pacman -S truecrypt", when in fact the PKGBUILD is only used by the package maintainer to generate the binary packages, which they then sign. So it's not as bad as I thought, and moving to SHA256 doesn't fix the problem. The only solution is to convince the software sources (Mozilla, etc.) to sign the files they release. -- Taylor Hornby
On Sat, 2014-01-11 at 23:09 -0700, Taylor Hornby wrote:
I noticed that the TrueCrypt package is downloaded over an insecure FTP connection and then only verified using MD5 hashes.
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
There are practical collision attacks against MD5. This means an adversary (e.g. the NSA) can construct two versions of the truecrypt binaries, one malicious and one not, which have the same MD5 hash. They can silently replace the file being downloaded with the malicious version and the change will not be detected.
This should be fixed to use SHA256 hashes, like the Firefox package:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=p...
How can I help make it use SHA256 instead of MD5? I'm relatively new to arch, so I'm not familiar with what it takes to change something in the repos. Any advice would be appreciated.
Are there other packages still being verified with MD5? Can we fix them too? I'll gladly donate my time if it's not something that can be automated.
Thanks,
Salutations! Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision? Regards, Mark -- Mark Lee <mark@markelee.com>
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2]. For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand. The attacker starts out with two files: 1. The official copy of TrueCrypt. 2. A malicious copy of TrueCrypt. The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer: ... MOV [master_key + ECX], EAX // EAX holds master key bits ... And change it so that it's always zero: ... MOV [master_key + ECX], 0 // Now the master key is set to zero ... Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has: 1. 2^64 files that are different, but behave exactly like the non-malicious official copy of TrueCrypt. 2. 2^64 files that are different, but behave exactly like the malicious copy of TrueCrypt (that uses zero keys or whatever). If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files. There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent). To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations. So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys. So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers. The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD. Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys. In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper. Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability. [1] https://en.wikipedia.org/wiki/Birthday_attack [2] http://www.win.tue.nl/hashclash/rogue-ca/ -- Taylor Hornby
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12.1.2014 19:29, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
1. The official copy of TrueCrypt.
2. A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
1. 2^64 files that are different, but behave exactly like the non-malicious official copy of TrueCrypt.
2. 2^64 files that are different, but behave exactly like the malicious copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Good stuff. Thanks for your time. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS0uaoAAoJEB1x37R9kdwnoDwQAMGT20QaQBwbT5V14F+aGTwk lQT8t0ggUXQxyJ1Lhvvqo9fG4CDBFwSl/h/VncVQE3IW3Aaz/EfJ72fuggTbLfp6 UXpDWgGe5kFSyDD9Sk6NEZc+FHZRPTWjpOu/gV4dg6qGqxOrg4RZkgizJkWt5Lfm PGf4EYSHwC5BR/o/WLHdhJzQHEIYTLmBIIPIUq2InBOQTLcA+gyVcoMY/jMdgN8D ytiN9gE5jefozjDkMNp1W+EVabYpIG3rQJdgGmahBXmWFtV765gsU1fqwU7SVS9d nJQRFEIraJDeaplhxEdrCa0205EsTKlAXgqZRLqvYOq/v2afVagvNwpwJvu5A68f TIVWNcWMcZOdn+1Dk4nuBMF8BJAtmppD+CNiaDMm+hMS7Hy9Q9R1chByij5x0mVB brAgVRELa5gvR5m0tSxJxJwyLxy9J3vQTtDFT6JVw8z3AQH/HZihgsxp9ANzn6j+ A3T5B8bqOXc2lkgGh84tuqnl/Gjm9+uaPIKnxf34Q78WMmX+XoPHiM+O8yTdl8/f k61UWSl0KfRsZ9+lnbDbFKitP/ssUo4eB4uqGJIHrql+eKVuVn8602c8HyrssDG1 kvRWIEgp5vRgJ/s+dFKO7RhHordMbhJFwkftfxY0dRmlfeRnkdQKMaUyEB4GVVKV o1MN29AEKsfCTPlEbjlb =4jUR -----END PGP SIGNATURE-----
On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
1. The official copy of TrueCrypt.
2. A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
1. 2^64 files that are different, but behave exactly like the non-malicious official copy of TrueCrypt.
2. 2^64 files that are different, but behave exactly like the malicious copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Salutations! Excellent explanation but I still don't see the guarantee that md5 hash can be cracked. I don't see any application of the pidgeon hole principle which would show two files (one malicious with the null key and one normal) would have to share the same md5 hash. Can you elaborate any further? Regarding, switching out the md5 hashes, I agree with the switch to sha256 for PKGBUILDS. Regards, Mark -- Mark Lee <mark@markelee.com>
On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote:
On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
1. The official copy of TrueCrypt.
2. A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
1. 2^64 files that are different, but behave exactly like the non-malicious official copy of TrueCrypt.
2. 2^64 files that are different, but behave exactly like the malicious copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Salutations!
Excellent explanation but I still don't see the guarantee that md5 hash can be cracked. I don't see any application of the pidgeon hole principle which would show two files (one malicious with the null key and one normal) would have to share the same md5 hash. Can you elaborate any further?
Regarding, switching out the md5 hashes, I agree with the switch to sha256 for PKGBUILDS.
Regards, Mark
Salutations! I'd just like the point out that with the birthday attack, we'd have 100% confidence when we have 2^128+1 different hashes (using the pidgeon hole theorem). Anyone care to calculate a # pair vs. probability of collision for md5 hash? Regards, Mark -- Mark Lee <mark@markelee.com>
On Sun, 2014-01-12 at 16:37 -0500, Mark Lee wrote:
On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote:
On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
On 01/12/2014 10:11 AM, Mark Lee wrote:
Perhaps I'm not strong enough in mathematics but I'd like to know how possible md5 collisions can be weaponized. From what I see, the idea would be to modify a binary such that it contains malicious code (without changing the md5sum). Since most security packages contain a number of compilation tests and md5 hashes vary significantly with slight modifications, I'd like to know how these collisions can be used to hijack a system. If one has to build a binary that doesn't even encompass the functionality of the binary it's trying to mimic, wouldn't that severely decrease the effectiveness of a hash collision?
The problem with MD5 is that it's possible to find two different messages A and B such that MD5(A) = MD5(B). The birthday attack [1] makes it possible to find collisions for 128-bit hashes (like MD5) in about 2^64 operations, but there are MD5-specific attacks that make it even easier for MD5. So... it takes a bit of time to find a collision, but not too much time. It can be done in about a day with PS3's [2].
For my explanation, I'll ignore the MD5-specific attacks and just use the birthday attack since it's easier to understand.
The attacker starts out with two files:
1. The official copy of TrueCrypt.
2. A malicious copy of TrueCrypt.
The malicious change would be something simple. For example, they could find where the master encryption key is copied into a buffer:
... MOV [master_key + ECX], EAX // EAX holds master key bits ...
And change it so that it's always zero:
... MOV [master_key + ECX], 0 // Now the master key is set to zero ...
Only one instruction is different, but now TrueCrypt encrypts everything with a null key.
From these two copies copies of TrueCrypt, they generate 2^64 variants of each. They can do this by re-ordering instructions, changing which registers are used for which variables, re-ordering ELF segments, or just appending random strings to the file (you can append stuff to an ELF binary and it will still run fine).
Now the attacker has:
1. 2^64 files that are different, but behave exactly like the non-malicious official copy of TrueCrypt.
2. 2^64 files that are different, but behave exactly like the malicious copy of TrueCrypt (that uses zero keys or whatever).
If MD5 was a good hash function, each of the files would hash to a random 128-bit value. You should now be able to see that at least one file from the "non-malicious" set of files probably has the same MD5 hash as one of the files from the "malicious" set of files.
There are 2^64 * 2^64 = 2^128 ways of pairing a file from the non-malicious pile with one from the malicious pile. The chance of any one pair colliding is 1 in 2^128, so there's a good chance that at least one of the pairs collide (it's not exactly 100% since the pairs are not mutually independent).
To find the collision quickly, the attacker can just sort one of the sets of files by hash value, then for each file in the other set, use a binary search to see if any file has the same hash. Ignoring log(n) factors, that would take on the order of 2^64 operations.
So... Now the attacker has two files with the same MD5 hash. One that behaves like the official copy of TrueCrypt, and another that uses null keys.
So, they give the *good* copy to the package maintainer, who laboriously checks that it's not backdoored (as if that ever happens). The package maintainer thinks everything is fine, since it behaves exactly like TrueCrypt and there's nothing fishy about the file (they probably won't notice some instructions being reordered). The file might even be the same as the one from the TrueCrypt website, if the attacker is collaborating with the TrueCrypt developers.
The package maintainer thinks everything is good, so they hard-code the MD5 hash of their good copy into the PKGBUILD.
Now, when most users install TrueCrypt, they will download the *good* copy, check the MD5 hash, it will match, and everything will be good. However, when the adversary sees that his target (e.g. a terrorist, or activist/journalist living in a non-free country) is downloading TrueCrypt, they will silently replace the file with with the *bad* copy. When the MD5 gets checked, it will match, and the user will be encrypting their files with null keys.
In total, it took the attacker on the order of 2^66 operations to find the colliding files. You'd need a good budget (like the NSA's, or a large company's) to do that, but if you take advantage of the MD5-specific attacks too, you can do it for much cheaper.
Of course, this doesn't just apply to TrueCrypt, it could be applied to any other software that's checked only with an MD5 hash. To backdoor a web server, for example, you could change a constant used in array index bounds checking to introduce a buffer overflow vulnerability.
Salutations!
Excellent explanation but I still don't see the guarantee that md5 hash can be cracked. I don't see any application of the pidgeon hole principle which would show two files (one malicious with the null key and one normal) would have to share the same md5 hash. Can you elaborate any further?
Regarding, switching out the md5 hashes, I agree with the switch to sha256 for PKGBUILDS.
Regards, Mark
Salutations!
I'd just like the point out that with the birthday attack, we'd have 100% confidence when we have 2^128+1 different hashes (using the pidgeon hole theorem). Anyone care to calculate a # pair vs. probability of collision for md5 hash?
Regards, Mark
Salutations! In addition, this birthday attack assumes coercion from source code developers. It seems that the method you demonstrated of reducing the very high number needed to be 100% sure (from 2^128+1 to 129) one has a collision is dependent on source code developers being complacent (they are letting the code be refactored) to coercion. Regards, Mark -- Mark Lee <mark@markelee.com>
participants (11)
-
Anatol Pomozov
-
Eduardo Machado
-
Jelle van der Waa
-
Karol Blazewicz
-
Kyle Terrien
-
Leonid Isaev
-
Mark Lee
-
Rashif Ray Rahman
-
sehraf
-
Taylor Hornby
-
Никола Вукосављевић