On Thu, Jul 05, 2007 at 06:45:00PM -0400, Dan McGee wrote:
<offtopic> Every possible hashing algorithm has flaws. However, they need to be exploitable for them to be of any use. Just because a flaw hasn't been found doesn't mean there isn't one. And I think the very important point was missed above in all these emails- creating a useful 'flaw' is not easy at all.
What do you mean? A hashing algorithm is supposed to bring a certain level of security. When flaws are found, it means it doesn't even bring that initial level anymore, but a lower one, for example, finding collisions will require less computing powers than what was initally expected. Afaik, flaws were found for MD5, SHA-0 and SHA-1, but not for all SHA-* successors. Obviously, it doesn't mean none will be found in the future, but they are still probably better. Btw, Jeff already mentionned exploiting it would be very hard : http://www.archlinux.org/pipermail/pacman-dev/2007-July/008675.html
Let's first use the example of one hashing function, such as MD5. First, you have the original file's hash. In all flaw finding exercises, this is not the case- all they look for is for two hashes to match, not trying to match one to a preexisting hash.
Jason already mentionned that in an old mail (that the usual case is looking for two hashes to match), and I gave a link to it in my answer to Jeff's mail there : http://www.archlinux.org/pipermail/pacman-dev/2007-July/008691.html And if the flaws that has been found for MD5 or SHA1 only concern collisions, then it shouldn't even affect our use case.
So we are already off to a hard challenge here. Say you are able to find some other junk data that hashes to the same value. Sorry- that is worthless. You need valid data.
See above for link to Jeff's mail. I would agree too, but finally I'm missing some technical knowledge, about for example what makes a valid tarball, and if there isn't a way to somehow hide additional junk data or something. But anyway, I doubt this is the weakest point of Arch, and the first thing a potential attacker would try to exploit.
Now add a second hash. You will need a *double* collision of data- one where *both* hashes are the same for the valid data and the malicious data. I dare to say impossible. </offtopic>
So did we just find a new magic perfectly secure hash algorithm, better than all the existing ones, and no one has ever thought about it before ? Of course, it'll be more secure than using just one hash, the question is how does it compare to another hash using the same number of bits (128 + 160), or even less, like SHA-256. I just did some searching, which seems to suggest combining SHA1 and MD5 is indeed a poor idea for a hash algorithm. But that's still without the constraint of having a valid tarball etc.. and for finding collisions, not a second antecedent. But again, since we have all these additional constraints, md5 or sha1 alone may very well already be highly secure, and as I said above, with our use cases, it might be totally unaffected by the flaws that have been found (which apparently only concern collisions).
One hash being more secure? Doubtful. Maybe about the same.
I personnaly have no doubt :) They don't even have the same size. Suppose you use a hash of only 1 bit, the result for the original package is either 0 or 1. Then you pick any compromised package, you have 1/2 chance it'll match :) So the bigger, the more secure (as long as it doesn't have flaws).
One hash being less complicated? Do you like dealing with 80 character strings?
As much as I like dealing with 2 hash of 40 characters each, without mentionning that would make the code a bit cleaner in some places, because there is only one check to make instead of two. Anyway, I'm not suggesting to move to a stronger hash, it's actually the opposite, I'm suggesting to keep either md5 or sha1, unless someone can prove using only one isn't secure enough :)