[pacman-dev] MD5/SHA* why?
I asked this question a while ago about makepkg now I'm asking about pacman... why do we need support for multiple checksum types? What's wrong with md5? Andrew
Tuesday 03 of July 2007 21:40:17 Andrew Fyfe napisał(a):
I asked this question a while ago about makepkg now I'm asking about pacman... why do we need support for multiple checksum types? What's wrong with md5?
Andrew
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev
It's broken ;) Not that valid maybe and important when it comes to package
corruption checks, but certainly it has been already proven crackable.
And also, it wouldn't hurt I guess to use both. Most modern CPU's are good for
it. And, when all else fails, there's the ground statement - everyone else's
doing it! ;-)
And by everyone else I mean ports/pkgsrc as they are the only other package
management systems I use.
Cheers,
//m.
--
Mateusz Jędrasik
On 7/3/07, Andrew Fyfe
I asked this question a while ago about makepkg now I'm asking about pacman... why do we need support for multiple checksum types? What's wrong with md5?
Frugalware switched to using sha1sums, so that is a big reason. If you can find a good reason to pull support for it, let us know, but as of right now I don't see a reason to remove it, even if we don't use it. Any other distribution using pacman may want to use it, so the option is there. -Dan
On 7/3/07, Mateusz Jedrasik
Tuesday 03 of July 2007 21:40:17 Andrew Fyfe napisał(a):
I asked this question a while ago about makepkg now I'm asking about pacman... why do we need support for multiple checksum types? What's wrong with md5?
The problem with MD5 (and recently SHA1) is that you can find collisions relatively quickly on a powerful machine (under a day in some cases). Thus if you found the correct collision that actually was a valid tarball, that had valid files in it, and one of those files had something malicious in it, you would be in trouble. I mean, the chances are close to zero, but md5 has gotten a lot of press on how "crackable" it is. SHA1 is crackable as well, thought not as easily. Now put BOTH sums in your PKGBUILD. Now some third party would have to find all the collisions for MD5 and SHA1, make sure they create the same sums as those in the package, and then they would have to see if that was even any data that could be used for something malicious. I suggest using both MD5 and SHA1. I seriously doubt there is a single situation where this would not be enough for validating the package. Though I think we should move to signing our packages, so we actually have security along with validation... // codemac -- . : [ + carpe diem totus tuus + ] : .
Mateusz Jedrasik wrote:
And, when all else fails, there's the ground statement - everyone else's doing it! ;-) The worst excuse ever :p
Dan McGee wrote:
On 7/3/07, Andrew Fyfe
wrote: I asked this question a while ago about makepkg now I'm asking about pacman... why do we need support for multiple checksum types? What's wrong with md5?
Frugalware switched to using sha1sums, so that is a big reason. If you can find a good reason to pull support for it, let us know, but as of right now I don't see a reason to remove it, even if we don't use it. Any other distribution using pacman may want to use it, so the option is there.
-Dan
My problem is more with the fact that we have 5 functions and 1 field in pmpkg_t for each checksum and we have to do if checksum = x then x_foo(); else if checksum = y then y_foo(); else if checksum = z then z_foo(); fi every time we want to do something with a checksum. For now I'll just treat it the same as makepkg... cause we can :D Andrew
Oh no, when reading the archives, I forgot to bookmark several important mails, took me a while to find this one back : http://www.archlinux.org/pipermail/pacman-dev/2006-October/006029.html So that's Judd opinion on that matter: "I never pretended that md5 was for anything security-related. If we were trying for security, we would've gone straight to signed packages. The md5sum was added to make sure downloaded files weren't corrupt. I don't see the point of SHA1 if we're still using it/them for download validation. If we want security, then we might as well do it right." As for my opinion on this, it's exactly the same as Andrew, it complicates the code for 0 benefit...
On 7/3/07, Andrew Fyfe
wrote: I asked this question a while ago about makepkg now I'm asking about pacman... why do we need support for multiple checksum types? What's wrong with md5?
My problem is more with the fact that we have 5 functions and 1 field in pmpkg_t for each checksum and we have to do
Hey there. Maybe you are using an "outdated" code piece in pacman3 arch version. When frugalware team contributed back into pacman then there were some discussion about md5->sha1 change. We used multiple checksum (for md5 and for sha1) because of backward compatibility. So when we switched from md5 to sha1 then users did not noticed anything. Because pacman supported md5 and sha1 too. Now in pacman-g2 this md5 code part isn't neccessary anymore. And i think at you (pacman3) dont need this one if you wont want to switch to sha1. So afterall: multiple checksum routins (md5 and sha1 in same time) was in the code because of backward compatibility. (we did this way too when we switched from gzipped packages to bzip2ed packages) Regards -krix-
2007/7/3, Jeff Mickey
On 7/3/07, Mateusz Jedrasik
wrote: Tuesday 03 of July 2007 21:40:17 Andrew Fyfe napisał(a):
I asked this question a while ago about makepkg now I'm asking about pacman... why do we need support for multiple checksum types? What's wrong with md5?
The problem with MD5 (and recently SHA1) is that you can find collisions relatively quickly on a powerful machine (under a day in some cases). Thus if you found the correct collision that actually was a valid tarball, that had valid files in it, and one of those files had something malicious in it, you would be in trouble. I mean, the chances are close to zero, but md5 has gotten a lot of press on how "crackable" it is. SHA1 is crackable as well, thought not as easily.
Note what Jason said there : http://www.archlinux.org/pipermail/pacman-dev/2006-October/005990.html "Most of the ones I've seen talked about creating md5 collisions between two files, not creating a file with the same md5 as another file (there's a distinction)." The numbers you gave are for which case ? But even without talking about that, like you already said, it looks indeed very unlikely this could be exploitable...
Now put BOTH sums in your PKGBUILD. Now some third party would have to find all the collisions for MD5 and SHA1, make sure they create the same sums as those in the package, and then they would have to see if that was even any data that could be used for something malicious.
I suggest using both MD5 and SHA1. I seriously doubt there is a single situation where this would not be enough for validating the package.
Heh, we already seriously doubt there is a single situation where MD5 wouldn't be enough, so what does this add exactly ? If we are going to be completely paranoid, then why not using ONE algorithm that hasn't been cracked yet ?
Though I think we should move to signing our packages, so we actually have security along with validation...
Now that's probably a better suggestion, and there is at least already a FR for it :) http://bugs.archlinux.org/task/5331
Xavier wrote:
Oh no, when reading the archives, I forgot to bookmark several important mails, took me a while to find this one back : http://www.archlinux.org/pipermail/pacman-dev/2006-October/006029.html So that's Judd opinion on that matter: "I never pretended that md5 was for anything security-related. If we were trying for security, we would've gone straight to signed packages. The md5sum was added to make sure downloaded files weren't corrupt.
I don't see the point of SHA1 if we're still using it/them for download validation. If we want security, then we might as well do it right."
As for my opinion on this, it's exactly the same as Andrew, it complicates the code for 0 benefit...
I fully agree with Judd's comment, using MD5 or SHA1 for security is plain stupid all we went a checksum for is a basic check that the package we've downloaded isn't corrupt. What are the odds you could download a corrupt package with the same checksum as the valid package? My preference would be to stick with 1 checksum (preferably MD5 as that's what's mainly used in Arch at the moment), and remove the other to simplify the code.... K.I.S.S. Andrew
On 7/4/07, Andrew Fyfe
Xavier wrote:
Oh no, when reading the archives, I forgot to bookmark several important mails, took me a while to find this one back : http://www.archlinux.org/pipermail/pacman-dev/2006-October/006029.html So that's Judd opinion on that matter: "I never pretended that md5 was for anything security-related. If we were trying for security, we would've gone straight to signed packages. The md5sum was added to make sure downloaded files weren't corrupt.
I don't see the point of SHA1 if we're still using it/them for download validation. If we want security, then we might as well do it right."
As for my opinion on this, it's exactly the same as Andrew, it complicates the code for 0 benefit...
I fully agree with Judd's comment, using MD5 or SHA1 for security is plain stupid all we went a checksum for is a basic check that the package we've downloaded isn't corrupt. What are the odds you could download a corrupt package with the same checksum as the valid package?
My preference would be to stick with 1 checksum (preferably MD5 as that's what's mainly used in Arch at the moment), and remove the other to simplify the code.... K.I.S.S.
Patches welcome for this. If anyone wants to start looking into package signing as well, then more power to you. I also dislike the fact that we have 3 different files for the md5 stuff- md5driver.c, md5.c, and md5.h. We should be able to move this code all into a C file and header, md5.c and md5.h, without difficulties. Make this a separate patch though. -Dan
Na Tue, Jul 03, 2007 at 03:55:06PM -0400, Dan McGee
Frugalware switched to using sha1sums, so that is a big reason. If you can find a good reason to pull support for it, let us know, but as of right now I don't see a reason to remove it, even if we don't use it. Any other distribution using pacman may want to use it, so the option is there.
i think if you generate those arrays using makepkg -g then it doesn't make sense. it's useful when upstream provides sha1sums in the website then you can copy&paste it to the buildscript thanks, VMiklos -- developer of Frugalware Linux - http://frugalware.org
On Wed, Jul 04, 2007 at 11:46:49PM -0400, Dan McGee wrote:
On 7/4/07, Andrew Fyfe
wrote: Xavier wrote:
Oh no, when reading the archives, I forgot to bookmark several important mails, took me a while to find this one back : http://www.archlinux.org/pipermail/pacman-dev/2006-October/006029.html So that's Judd opinion on that matter: "I never pretended that md5 was for anything security-related. If we were trying for security, we would've gone straight to signed packages. The md5sum was added to make sure downloaded files weren't corrupt.
I don't see the point of SHA1 if we're still using it/them for download validation. If we want security, then we might as well do it right."
As for my opinion on this, it's exactly the same as Andrew, it complicates the code for 0 benefit...
I fully agree with Judd's comment, using MD5 or SHA1 for security is plain stupid all we went a checksum for is a basic check that the package we've downloaded isn't corrupt. What are the odds you could download a corrupt package with the same checksum as the valid package?
My preference would be to stick with 1 checksum (preferably MD5 as that's what's mainly used in Arch at the moment), and remove the other to simplify the code.... K.I.S.S.
Patches welcome for this. If anyone wants to start looking into package signing as well, then more power to you.
I also dislike the fact that we have 3 different files for the md5 stuff- md5driver.c, md5.c, and md5.h. We should be able to move this code all into a C file and header, md5.c and md5.h, without difficulties. Make this a separate patch though.
I was the main person pushing for this and it was mostly for the malicious downloads. It's not the package downloading that I was worried about as much as the source tarballs. We use md5sums to make sure that the tarball we downloaded building the package is the same as the tarball that the developer used when they built the package. If someone gets access to the upstream's server, we're using the md5sum to trust files over time. I had long discussions with Aaron about this. He still wasn't convinced but added it because it didn't hurt. Eventually we decided that the best bet was to store source packages on the arch servers, because then we could trust those. That just hasn't happened yet. Obviously there are way more people who think this is dumb than I do. I wrote the original patch for this (makepkg only) during LinuxTag 2005 after JGC mentioned that BSD uses two hashes. I thought their reasoning was sound so I wanted to do it too. Jason
On Thu, Jul 05, 2007 at 02:06:09PM -0700, Jason Chu wrote:
I was the main person pushing for this and it was mostly for the malicious downloads.
It's not the package downloading that I was worried about as much as the source tarballs. We use md5sums to make sure that the tarball we downloaded building the package is the same as the tarball that the developer used when they built the package. If someone gets access to the upstream's server, we're using the md5sum to trust files over time.
Oh I see. But what I am really wondering is why combining two existing algorithms that have flaws instead of using one for which no flaw has been found yet ? Isn't it both less secure and more complicated ?
On Fri, Jul 06, 2007 at 12:20:00AM +0200, Xavier wrote:
On Thu, Jul 05, 2007 at 02:06:09PM -0700, Jason Chu wrote:
I was the main person pushing for this and it was mostly for the malicious downloads.
It's not the package downloading that I was worried about as much as the source tarballs. We use md5sums to make sure that the tarball we downloaded building the package is the same as the tarball that the developer used when they built the package. If someone gets access to the upstream's server, we're using the md5sum to trust files over time.
Oh I see. But what I am really wondering is why combining two existing algorithms that have flaws instead of using one for which no flaw has been found yet ? Isn't it both less secure and more complicated ?
We are at an inroads in hashing algorithm theory. All the current hashing algorithms have flaws. It's also likely that any new hash algorithms will have flaws as well. If we just trusted md5s or sha1s, then it would be less secure and more complicated, but because we look at both md5s and sha1s *together* that things improve. An analogy, think of two sheets with holes in them. You can look through each sheet and see the light on the other side, but if you lay the two sheets on top of each other a lot less light is visible. Because we're considering both hashing algorithms they cover some of the other's failings. I'm all for making less complication though... maybe a more abstract hash API? Jason
On 7/5/07, Xavier
On Thu, Jul 05, 2007 at 02:06:09PM -0700, Jason Chu wrote:
I was the main person pushing for this and it was mostly for the malicious downloads.
It's not the package downloading that I was worried about as much as the source tarballs. We use md5sums to make sure that the tarball we downloaded building the package is the same as the tarball that the developer used when they built the package. If someone gets access to the upstream's server, we're using the md5sum to trust files over time.
Oh I see. But what I am really wondering is why combining two existing algorithms that have flaws instead of using one for which no flaw has been found yet ? Isn't it both less secure and more complicated ?
<offtopic> Every possible hashing algorithm has flaws. However, they need to be exploitable for them to be of any use. Just because a flaw hasn't been found doesn't mean there isn't one. And I think the very important point was missed above in all these emails- creating a useful 'flaw' is not easy at all. Let's first use the example of one hashing function, such as MD5. First, you have the original file's hash. In all flaw finding exercises, this is not the case- all they look for is for two hashes to match, not trying to match one to a preexisting hash. So we are already off to a hard challenge here. Say you are able to find some other junk data that hashes to the same value. Sorry- that is worthless. You need valid data. Now add a second hash. You will need a *double* collision of data- one where *both* hashes are the same for the valid data and the malicious data. I dare to say impossible. </offtopic> One hash being more secure? Doubtful. Maybe about the same. One hash being less complicated? Do you like dealing with 80 character strings? -Dan
On Thu, Jul 05, 2007 at 06:45:00PM -0400, Dan McGee wrote:
<offtopic> Every possible hashing algorithm has flaws. However, they need to be exploitable for them to be of any use. Just because a flaw hasn't been found doesn't mean there isn't one. And I think the very important point was missed above in all these emails- creating a useful 'flaw' is not easy at all.
What do you mean? A hashing algorithm is supposed to bring a certain level of security. When flaws are found, it means it doesn't even bring that initial level anymore, but a lower one, for example, finding collisions will require less computing powers than what was initally expected. Afaik, flaws were found for MD5, SHA-0 and SHA-1, but not for all SHA-* successors. Obviously, it doesn't mean none will be found in the future, but they are still probably better. Btw, Jeff already mentionned exploiting it would be very hard : http://www.archlinux.org/pipermail/pacman-dev/2007-July/008675.html
Let's first use the example of one hashing function, such as MD5. First, you have the original file's hash. In all flaw finding exercises, this is not the case- all they look for is for two hashes to match, not trying to match one to a preexisting hash.
Jason already mentionned that in an old mail (that the usual case is looking for two hashes to match), and I gave a link to it in my answer to Jeff's mail there : http://www.archlinux.org/pipermail/pacman-dev/2007-July/008691.html And if the flaws that has been found for MD5 or SHA1 only concern collisions, then it shouldn't even affect our use case.
So we are already off to a hard challenge here. Say you are able to find some other junk data that hashes to the same value. Sorry- that is worthless. You need valid data.
See above for link to Jeff's mail. I would agree too, but finally I'm missing some technical knowledge, about for example what makes a valid tarball, and if there isn't a way to somehow hide additional junk data or something. But anyway, I doubt this is the weakest point of Arch, and the first thing a potential attacker would try to exploit.
Now add a second hash. You will need a *double* collision of data- one where *both* hashes are the same for the valid data and the malicious data. I dare to say impossible. </offtopic>
So did we just find a new magic perfectly secure hash algorithm, better than all the existing ones, and no one has ever thought about it before ? Of course, it'll be more secure than using just one hash, the question is how does it compare to another hash using the same number of bits (128 + 160), or even less, like SHA-256. I just did some searching, which seems to suggest combining SHA1 and MD5 is indeed a poor idea for a hash algorithm. But that's still without the constraint of having a valid tarball etc.. and for finding collisions, not a second antecedent. But again, since we have all these additional constraints, md5 or sha1 alone may very well already be highly secure, and as I said above, with our use cases, it might be totally unaffected by the flaws that have been found (which apparently only concern collisions).
One hash being more secure? Doubtful. Maybe about the same.
I personnaly have no doubt :) They don't even have the same size. Suppose you use a hash of only 1 bit, the result for the original package is either 0 or 1. Then you pick any compromised package, you have 1/2 chance it'll match :) So the bigger, the more secure (as long as it doesn't have flaws).
One hash being less complicated? Do you like dealing with 80 character strings?
As much as I like dealing with 2 hash of 40 characters each, without mentionning that would make the code a bit cleaner in some places, because there is only one check to make instead of two. Anyway, I'm not suggesting to move to a stronger hash, it's actually the opposite, I'm suggesting to keep either md5 or sha1, unless someone can prove using only one isn't secure enough :)
On Thu, Jul 05, 2007 at 03:42:42PM -0700, Jason Chu wrote:
We are at an inroads in hashing algorithm theory. All the current hashing algorithms have flaws. It's also likely that any new hash algorithms will have flaws as well.
Maybe the information I had is already outdated, since all this stuff moves pretty quickly :) What are the flaws of all the SHA-224/256/384/512 hashes ? see this for example : http://en.wikipedia.org/wiki/SHA-1#SHA_sizes Or are these the new algorithms ? They could indeed have flaws as well, but still say more secure than the current ones, even after flaws are found.
If we just trusted md5s or sha1s, then it would be less secure and more complicated, but because we look at both md5s and sha1s *together* that things improve.
I'm not convinced that 1) md5 or sha1 alone aren't enough secure (for our use case) 2) combining md5 and sha1 is better than eg SHA-256
An analogy, think of two sheets with holes in them. You can look through each sheet and see the light on the other side, but if you lay the two sheets on top of each other a lot less light is visible. Because we're considering both hashing algorithms they cover some of the other's failings.
In that case, you move both holes so that they match (with padding) :) But yes, that's still the general case, not pacman one.
I'm all for making less complication though... maybe a more abstract hash API?
If we need to keep several hashing algorithm, I think this would be great.
There's no need for a second hashing algorithm. MD5 serves the purpose
of verifying that a package file hasn't been corrupted during download.
Signed-off-by: Andrew Fyfe
* Move alpm md5 functions to lib/libalpm/util.c
* Remove unneeded includes for md5.h
* Replace md5 implementation with one from http://www.xyssl.org
Signed-off-by: Andrew Fyfe
On 7/25/07, Andrew Fyfe
There's no need for a second hashing algorithm. MD5 serves the purpose of verifying that a package file hasn't been corrupted during download.
Signed-off-by: Andrew Fyfe
So I've been thinking this one over for a while. On one hand, I agree with the thought. For sure, I think we don't need more than one hashing algorithm. The only real question is whether we should switch to sha1 or not. If no, then this sequence of two patches should be applied. What I really want to hear are thoughts on this issue. We are using md5sums for two main reasons- verification of package downloads, and determining whether a backup file has changed. With this in mind, I think md5 is sufficient to serve our needs. Please chime in on this. -Dan
On 8/15/07, Dan McGee
What I really want to hear are thoughts on this issue. We are using md5sums for two main reasons- verification of package downloads, and determining whether a backup file has changed. With this in mind, I think md5 is sufficient to serve our needs.
Please chime in on this.
There is some history on this somewhere in these list archives. I'll summarize my views because I don't want to figure out what thread that was. a) The "md5 is insecure" argument doesn't hold water with archive formats. Reproducing an md5sum with a malicious file requires that the original file format supports null padding. All of the examples I've seen used ps files as you can embed null padding to fluff the md5sum. In our case, if you add some padding, it suddenly becomes a corrupt archive. Corrupt archives are already checked for before extraction, so if the md5sum matches AND it's corrupt, it's either a packager's error, or malicious. b) We are not using md5 for security. We are using it for integrity. These are two totally different things. Instead of saying "I don't trust you Mr Mirror", we're saying "I trust the DB file is correct, did this download ok". See now there's a subtle problem with this point. If we want to implicitly trust the DB files, then we need to ensure where they come from. DB files on mirrors might not be "trustable". /me shrugs But my opinions is thus: md5 is faster than sha1, and we're just ensuring that we downloaded the file exactly as the server told us to. We are not guaranteeing that it is super-duper secure. If we wanted that, we'd sign packages. I vote md5
Aaron Griffin wrote:
On 8/15/07, Dan McGee
wrote: What I really want to hear are thoughts on this issue. We are using md5sums for two main reasons- verification of package downloads, and determining whether a backup file has changed. With this in mind, I think md5 is sufficient to serve our needs.
Please chime in on this.
There is some history on this somewhere in these list archives. I'll summarize my views because I don't want to figure out what thread that was.
a) The "md5 is insecure" argument doesn't hold water with archive formats. Reproducing an md5sum with a malicious file requires that the original file format supports null padding. All of the examples I've seen used ps files as you can embed null padding to fluff the md5sum. In our case, if you add some padding, it suddenly becomes a corrupt archive. Corrupt archives are already checked for before extraction, so if the md5sum matches AND it's corrupt, it's either a packager's error, or malicious. b) We are not using md5 for security. We are using it for integrity. These are two totally different things. Instead of saying "I don't trust you Mr Mirror", we're saying "I trust the DB file is correct, did this download ok". See now there's a subtle problem with this point. If we want to implicitly trust the DB files, then we need to ensure where they come from. DB files on mirrors might not be "trustable". /me shrugs
But my opinions is thus: md5 is faster than sha1, and we're just ensuring that we downloaded the file exactly as the server told us to. We are not guaranteeing that it is super-duper secure. If we wanted that, we'd sign packages. I vote md5
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev +1 here
I've made a few tweaks to the patch... http://neptune-one.homeip.net/git?p=pacman;a=shortlog;h=ready_to_pull Andrew
On 8/16/07, Andrew Fyfe
Aaron Griffin wrote:
On 8/15/07, Dan McGee
wrote: What I really want to hear are thoughts on this issue. We are using md5sums for two main reasons- verification of package downloads, and determining whether a backup file has changed. With this in mind, I think md5 is sufficient to serve our needs.
Please chime in on this.
There is some history on this somewhere in these list archives. I'll summarize my views because I don't want to figure out what thread that was.
a) The "md5 is insecure" argument doesn't hold water with archive formats. Reproducing an md5sum with a malicious file requires that the original file format supports null padding. All of the examples I've seen used ps files as you can embed null padding to fluff the md5sum. In our case, if you add some padding, it suddenly becomes a corrupt archive. Corrupt archives are already checked for before extraction, so if the md5sum matches AND it's corrupt, it's either a packager's error, or malicious. b) We are not using md5 for security. We are using it for integrity. These are two totally different things. Instead of saying "I don't trust you Mr Mirror", we're saying "I trust the DB file is correct, did this download ok". See now there's a subtle problem with this point. If we want to implicitly trust the DB files, then we need to ensure where they come from. DB files on mirrors might not be "trustable". /me shrugs
But my opinions is thus: md5 is faster than sha1, and we're just ensuring that we downloaded the file exactly as the server told us to. We are not guaranteeing that it is super-duper secure. If we wanted that, we'd sign packages. I vote md5
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev +1 here
I've made a few tweaks to the patch... http://neptune-one.homeip.net/git?p=pacman;a=shortlog;h=ready_to_pull
The diffstat on that patch is exactly the same as the one that was in this email. Has it really changed? I'm just referring to the "Remove SHA1" patch, not the "cleanup MD5sum" one. -Dan
On 7/25/07, Andrew Fyfe
* Move alpm md5 functions to lib/libalpm/util.c * Remove unneeded includes for md5.h * Replace md5 implementation with one from http://www.xyssl.org
Signed-off-by: Andrew Fyfe
--- lib/libalpm/Makefile.am | 1 - lib/libalpm/add.c | 7 +- lib/libalpm/md5.c | 661 +++++++++++++++++++++++++++++------------------ lib/libalpm/md5.h | 144 ++++++++--- lib/libalpm/md5driver.c | 93 ------- lib/libalpm/remove.c | 1 - lib/libalpm/sync.c | 1 - lib/libalpm/util.c | 31 +++ 8 files changed, 547 insertions(+), 392 deletions(-)
I've now pulled both the SHA1 removal patch and this one into my working branch. However, this one needed a few fixes which should be reflected in the diff I hacked up below. Two major things to point out in the diff: 1. Even on one line if/for/looping statements, use {}. This is pacman coding style and helps keep us consistent, and it cuts out stupid bugs. 2. Watch your mallocs, and use calloc when possible. You didn't allocate space for the null byte, so you were overrunning your buffers when you filled them and the free() failed when using mtrace(). I switched to calloc usage, and now use sprintf because this is a case where we can do that- it is faster and we aren't worried about running out of room. We then need to take care of the null byte ourselves, however. I'll give you a break on some of this because you are venturing into C code where few have gone before, and you probably weren't aware of the rules. I think this is the most recent version of them: http://www.archlinux.org/~aaron/pacman-coding.html Finally, I cleaned up the imported md5.c/md5.h from XySSL a bit. I removed the HMAC and SELF_CHECK stuff we won't use, as well as threw the LGPL header at the top of md5.h and put instructions for upgrading the md5 routines in md5.c. -Dan diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index 0f47e90..5f43117 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -550,26 +551,32 @@ char SYMEXPORT *alpm_get_md5sum(char *filename) { unsigned char output[16]; char *md5sum; + int ret, i; ALPM_LOG_FUNC; ASSERT(filename != NULL, return(NULL)); - md5sum = (char*)malloc(32); - int ret = md5_file(filename, output); + /* allocate 32 chars plus 1 for null */ + md5sum = calloc(33, sizeof(char)); + ret = md5_file(filename, output); if (ret > 0) { - if (ret == 1) + if (ret == 1) { _alpm_log(PM_LOG_ERROR, _("md5: %s can't be opened\n"), filename); - else if (ret == 2) + } else if (ret == 2) { _alpm_log(PM_LOG_ERROR, _("md5: %s can't be read\n"), filename); + } return(NULL); } /* Convert the result to something readable */ - for (unsigned int i = 0; i < 16; i++) - snprintf(md5sum + i * 2, 33, "%02x", output[i]); + for (i = 0; i < 16; i++) { + /* sprintf is acceptable here because we know our output */ + sprintf(md5sum +(i * 2), "%02x", output[i]); + } + md5sum[32] = '\0'; _alpm_log(PM_LOG_DEBUG, "md5(%s) = %s", filename, md5sum); return(md5sum);
On Thu, Aug 16, 2007 at 01:36:31PM -0400, Dan McGee wrote:
2. Watch your mallocs, and use calloc when possible. You didn't allocate space for the null byte, so you were overrunning your buffers when you filled them and the free() failed when using mtrace(). I switched to calloc usage, and now use sprintf because this is a case where we can do that- it is faster and we aren't worried about running out of room. We then need to take care of the null byte ourselves, however.
Why is it better to use calloc? And when using calloc, is it still needed to set the null byte a second time?
On 8/16/07, Xavier
On Thu, Aug 16, 2007 at 01:36:31PM -0400, Dan McGee wrote:
2. Watch your mallocs, and use calloc when possible. You didn't allocate space for the null byte, so you were overrunning your buffers when you filled them and the free() failed when using mtrace(). I switched to calloc usage, and now use sprintf because this is a case where we can do that- it is faster and we aren't worried about running out of room. We then need to take care of the null byte ourselves, however.
Why is it better to use calloc? And when using calloc, is it still needed to set the null byte a second time?
Calloc initializes any memory to zero, which can help solve debugging issues a lot easier when something goes wrong. In addition, it is very clear how much and of what size memory you need with it. Regarding the null byte thing- calloc zeros the memory, and while I would assume that is the same as the null byte, I don't want to take it for granted, so I figured I'd set it explicitly anyway. -Dan
On Thu, 16 Aug 2007 14:45:40 -0400
"Dan McGee"
Regarding the null byte thing- calloc zeros the memory, and while I would assume that is the same as the null byte,
It is. Setting the end of the string to '\0' (which == 0) is unnecessary - but it won't waste much CPU time and it prevents issues in the future (if it's ever changed from calloc to malloc again), so I don't see a problem with it being there. -- Travis
On Thu, Aug 16, 2007 at 06:09:58PM -0400, Travis Willard wrote:
On Thu, 16 Aug 2007 14:45:40 -0400 "Dan McGee"
wrote: Regarding the null byte thing- calloc zeros the memory, and while I would assume that is the same as the null byte,
It is. Setting the end of the string to '\0' (which == 0) is unnecessary - but it won't waste much CPU time and it prevents issues in the future (if it's ever changed from calloc to malloc again), so I don't see a problem with it being there.
Right, fair enough ;)
Hello,
Na Wed, Aug 15, 2007 at 10:55:20PM -0400, Dan McGee
with the thought. For sure, I think we don't need more than one hashing algorithm.
FreeBSD uses 3.. :) - VMiklos
participants (10)
-
Aaron Griffin
-
Andrew Fyfe
-
Christian Hamar [krix]
-
Dan McGee
-
Jason Chu
-
Jeff Mickey
-
Mateusz Jedrasik
-
Travis Willard
-
VMiklos
-
Xavier