[pacman-dev] package loading scheme
Last commit on Dan's pkgname_check branch : http://code.toofishes.net/gitweb.cgi?p=pacman.git;a=commitdiff;h=b3d5764134b...
Partial cache cleaning was eliminated in a previous commit because it relied on package naming conventions. Re-add it the correct way- we actually open up each package in the cache and get a name and version out of it. If the name and version match that of an installed package, keep it. If the package is not installed or the version does not match the locally-installed version, get rid of it.
This can easily be modified if some other heuristic of keeping and removing packages is desired, or if we should clean out the cache dir of any files that are not packages, etc.
The biggest current problem with this new approach- speed. Here is one run on my local machine, going from 1643 to 729 packages in the cache (753 in the local DB): real 4m25.829s user 3m22.527s sys 0m6.713s
This is likely best addressed by the package loading scheme, which may be loading the entirety of each package archive, which is a waste when we only need the .PKGINFO file read.
That's a difference with frugalware I noticed earlier : http://www.archlinux.org/pipermail/pacman-dev/2007-June/008524.html Back then, I thought fw added this feature. But actually, it's the other way around, this feature was removed in arch, in this commit : http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=b2da4b42344444dc2... And this was caused by this bug report : http://bugs.archlinux.org/task/5120 I actually have to ask the same as Aaron in one of his comment there : "Ok, after re-looking at this, I'm fairly confused. If a gzip file is corrupt, shouldn't the gunzipping algorithm/whatever KNOW that? How does it even begin to extract itself, if it's corrupt? I don't know compression algorithms that well, but I thought there was some sort of check for this?" I just started downloading a package with wget, cancelled it, then tried to install it with pacman, and : loading package data... error: error while reading package: Premature end of gzip compressed data: Input/output error I restored the old code that prevented reading the whole archive : 956 if(config && filelist && scriptcheck) { 957 /* we have everything we need */ 958 break; 959 } And after that, I got : loading package data... error: error while reading package: (Empty error message) So apparently, it still fails, which is good. However, it apparently doesn't fail at the same point, because of this Empty error message. So I didn't figure out yet where it failed exactly, and if it'll fail with all corrupted archives. At least, it doesn't work with any corrupted archives, which might be good enough for restoring that feature. The times showed by Dan are quite crazy :)
On 9/26/07, Xavier <shiningxc@gmail.com> wrote:
Last commit on Dan's pkgname_check branch : http://code.toofishes.net/gitweb.cgi?p=pacman.git;a=commitdiff;h=b3d5764134b...
Partial cache cleaning was eliminated in a previous commit because it relied on package naming conventions. Re-add it the correct way- we actually open up each package in the cache and get a name and version out of it. If the name and version match that of an installed package, keep it. If the package is not installed or the version does not match the locally-installed version, get rid of it.
This can easily be modified if some other heuristic of keeping and removing packages is desired, or if we should clean out the cache dir of any files that are not packages, etc.
The biggest current problem with this new approach- speed. Here is one run on my local machine, going from 1643 to 729 packages in the cache (753 in the local DB): real 4m25.829s user 3m22.527s sys 0m6.713s
This is likely best addressed by the package loading scheme, which may be loading the entirety of each package archive, which is a waste when we only need the .PKGINFO file read.
That's a difference with frugalware I noticed earlier : http://www.archlinux.org/pipermail/pacman-dev/2007-June/008524.html
Back then, I thought fw added this feature. But actually, it's the other way around, this feature was removed in arch, in this commit : http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=b2da4b42344444dc2...
And this was caused by this bug report : http://bugs.archlinux.org/task/5120
I actually have to ask the same as Aaron in one of his comment there : "Ok, after re-looking at this, I'm fairly confused. If a gzip file is corrupt, shouldn't the gunzipping algorithm/whatever KNOW that? How does it even begin to extract itself, if it's corrupt? I don't know compression algorithms that well, but I thought there was some sort of check for this?"
I just started downloading a package with wget, cancelled it, then tried to install it with pacman, and : loading package data... error: error while reading package: Premature end of gzip compressed data: Input/output error
I restored the old code that prevented reading the whole archive : 956 if(config && filelist && scriptcheck) { 957 /* we have everything we need */ 958 break; 959 }
And after that, I got : loading package data... error: error while reading package: (Empty error message)
So apparently, it still fails, which is good. However, it apparently doesn't fail at the same point, because of this Empty error message. So I didn't figure out yet where it failed exactly, and if it'll fail with all corrupted archives. At least, it doesn't work with any corrupted archives, which might be good enough for restoring that feature. The times showed by Dan are quite crazy :)
Well, see here's the thing. The *reason* we run through once is just for verification that the archive isn't corrupt near the tail end. For instance, if pacman starts pulling out files, and fails on the 101st file, well, everything goes to hell. It's a valid check, in my opinion, BUT I think we should make it optional, for cases like Dan's cache cleaning. That is, it might be worth it to throw some flag at it that will tell us if we need an integrity check or not. Packages in the cache are assumed to be functional, so we should be able to skip it at that point, right?
On 9/27/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On 9/26/07, Xavier <shiningxc@gmail.com> wrote:
Last commit on Dan's pkgname_check branch : http://code.toofishes.net/gitweb.cgi?p=pacman.git;a=commitdiff;h=b3d5764134b...
Partial cache cleaning was eliminated in a previous commit because it relied on package naming conventions. Re-add it the correct way- we actually open up each package in the cache and get a name and version out of it. If the name and version match that of an installed package, keep it. If the package is not installed or the version does not match the locally-installed version, get rid of it.
This can easily be modified if some other heuristic of keeping and removing packages is desired, or if we should clean out the cache dir of any files that are not packages, etc.
The biggest current problem with this new approach- speed. Here is one run on my local machine, going from 1643 to 729 packages in the cache (753 in the local DB): real 4m25.829s user 3m22.527s sys 0m6.713s
This is likely best addressed by the package loading scheme, which may be loading the entirety of each package archive, which is a waste when we only need the .PKGINFO file read.
That's a difference with frugalware I noticed earlier : http://www.archlinux.org/pipermail/pacman-dev/2007-June/008524.html
Back then, I thought fw added this feature. But actually, it's the other way around, this feature was removed in arch, in this commit : http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=b2da4b42344444dc2...
And this was caused by this bug report : http://bugs.archlinux.org/task/5120
I actually have to ask the same as Aaron in one of his comment there : "Ok, after re-looking at this, I'm fairly confused. If a gzip file is corrupt, shouldn't the gunzipping algorithm/whatever KNOW that? How does it even begin to extract itself, if it's corrupt? I don't know compression algorithms that well, but I thought there was some sort of check for this?"
I just started downloading a package with wget, cancelled it, then tried to install it with pacman, and : loading package data... error: error while reading package: Premature end of gzip compressed data: Input/output error
I restored the old code that prevented reading the whole archive : 956 if(config && filelist && scriptcheck) { 957 /* we have everything we need */ 958 break; 959 }
And after that, I got : loading package data... error: error while reading package: (Empty error message)
So apparently, it still fails, which is good. However, it apparently doesn't fail at the same point, because of this Empty error message. So I didn't figure out yet where it failed exactly, and if it'll fail with all corrupted archives. At least, it doesn't work with any corrupted archives, which might be good enough for restoring that feature. The times showed by Dan are quite crazy :)
Well, see here's the thing. The *reason* we run through once is just for verification that the archive isn't corrupt near the tail end.
For instance, if pacman starts pulling out files, and fails on the 101st file, well, everything goes to hell.
It's a valid check, in my opinion, BUT I think we should make it optional, for cases like Dan's cache cleaning.
We could implement a pkg_load_meta or pkg_load_info quite easily, I'm sure.
That is, it might be worth it to throw some flag at it that will tell us if we need an integrity check or not. Packages in the cache are assumed to be functional, so we should be able to skip it at that point, right?
Well...do we really care if it is valid? If it isn't installed, its getting deleted anyway. If it is installed and invalid...well, let's hope that isn't the case. Regardless, if we can get a name and version out of it we should be good to go. -Dan
On 9/27/07, Dan McGee <dpmcgee@gmail.com> wrote:
On 9/27/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On 9/26/07, Xavier <shiningxc@gmail.com> wrote:
Last commit on Dan's pkgname_check branch : http://code.toofishes.net/gitweb.cgi?p=pacman.git;a=commitdiff;h=b3d5764134b...
Partial cache cleaning was eliminated in a previous commit because it relied on package naming conventions. Re-add it the correct way- we actually open up each package in the cache and get a name and version out of it. If the name and version match that of an installed package, keep it. If the package is not installed or the version does not match the locally-installed version, get rid of it.
This can easily be modified if some other heuristic of keeping and removing packages is desired, or if we should clean out the cache dir of any files that are not packages, etc.
The biggest current problem with this new approach- speed. Here is one run on my local machine, going from 1643 to 729 packages in the cache (753 in the local DB): real 4m25.829s user 3m22.527s sys 0m6.713s
This is likely best addressed by the package loading scheme, which may be loading the entirety of each package archive, which is a waste when we only need the .PKGINFO file read.
That's a difference with frugalware I noticed earlier : http://www.archlinux.org/pipermail/pacman-dev/2007-June/008524.html
Back then, I thought fw added this feature. But actually, it's the other way around, this feature was removed in arch, in this commit : http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=b2da4b42344444dc2...
And this was caused by this bug report : http://bugs.archlinux.org/task/5120
I actually have to ask the same as Aaron in one of his comment there : "Ok, after re-looking at this, I'm fairly confused. If a gzip file is corrupt, shouldn't the gunzipping algorithm/whatever KNOW that? How does it even begin to extract itself, if it's corrupt? I don't know compression algorithms that well, but I thought there was some sort of check for this?"
I just started downloading a package with wget, cancelled it, then tried to install it with pacman, and : loading package data... error: error while reading package: Premature end of gzip compressed data: Input/output error
I restored the old code that prevented reading the whole archive : 956 if(config && filelist && scriptcheck) { 957 /* we have everything we need */ 958 break; 959 }
And after that, I got : loading package data... error: error while reading package: (Empty error message)
So apparently, it still fails, which is good. However, it apparently doesn't fail at the same point, because of this Empty error message. So I didn't figure out yet where it failed exactly, and if it'll fail with all corrupted archives. At least, it doesn't work with any corrupted archives, which might be good enough for restoring that feature. The times showed by Dan are quite crazy :)
Well, see here's the thing. The *reason* we run through once is just for verification that the archive isn't corrupt near the tail end.
For instance, if pacman starts pulling out files, and fails on the 101st file, well, everything goes to hell.
It's a valid check, in my opinion, BUT I think we should make it optional, for cases like Dan's cache cleaning.
We could implement a pkg_load_meta or pkg_load_info quite easily, I'm sure.
That is, it might be worth it to throw some flag at it that will tell us if we need an integrity check or not. Packages in the cache are assumed to be functional, so we should be able to skip it at that point, right?
Well...do we really care if it is valid? If it isn't installed, its getting deleted anyway. If it is installed and invalid...well, let's hope that isn't the case. Regardless, if we can get a name and version out of it we should be good to go.
Right, all I'm saying is that the run-through-once thing isn't bad when we're talking about the -A and -U options... Here's an example... wget http://really_big_package & pacman -U really_big_package This will most likely have the meta data very quickly (it's at the beginning of the archive), but it will fail mid-install until the wget is complete BUT in the case where we're scanning the cache, why not do some sort of: pkg_load(foo, integrity_check=false)
participants (3)
-
Aaron Griffin
-
Dan McGee
-
Xavier