[pacman-dev] package loading scheme

Aaron Griffin aaronmgriffin at gmail.com
Thu Sep 27 18:27:20 EDT 2007


On 9/27/07, Dan McGee <dpmcgee at gmail.com> wrote:
> On 9/27/07, Aaron Griffin <aaronmgriffin at gmail.com> wrote:
> > On 9/26/07, Xavier <shiningxc at gmail.com> wrote:
> > > Last commit on Dan's pkgname_check branch :
> > > http://code.toofishes.net/gitweb.cgi?p=pacman.git;a=commitdiff;h=b3d5764134bbbdadca9abe439f12aab174eae107
> > >
> > > > Partial cache cleaning was eliminated in a previous commit because it relied
> > > > on package naming conventions. Re-add it the correct way- we actually open
> > > > up each package in the cache and get a name and version out of it. If the
> > > > name and version match that of an installed package, keep it. If the package
> > > > is not installed or the version does not match the locally-installed version,
> > > > get rid of it.
> > > >
> > > > This can easily be modified if some other heuristic of keeping and removing
> > > > packages is desired, or if we should clean out the cache dir of any files
> > > > that are not packages, etc.
> > > >
> > > > The biggest current problem with this new approach- speed. Here is one run
> > > > on my local machine, going from 1643 to 729 packages in the cache (753 in
> > > > the local DB):
> > > > real    4m25.829s
> > > > user    3m22.527s
> > > > sys     0m6.713s
> > > >
> > > > This is likely best addressed by the package loading scheme, which may be
> > > > loading the entirety of each package archive, which is a waste when we only
> > > > need the .PKGINFO file read.
> > >
> > > That's a difference with frugalware I noticed earlier :
> > > http://www.archlinux.org/pipermail/pacman-dev/2007-June/008524.html
> > >
> > > Back then, I thought fw added this feature. But actually, it's the other way
> > > around, this feature was removed in arch, in this commit :
> > > http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=b2da4b42344444dc22f1e5b01fb4cd09033adc1d
> > >
> > > And this was caused by this bug report : http://bugs.archlinux.org/task/5120
> > >
> > > I actually have to ask the same as Aaron in one of his comment there :
> > > "Ok, after re-looking at this, I'm fairly confused. If a gzip file is
> > > corrupt, shouldn't the gunzipping algorithm/whatever KNOW that? How does it
> > > even begin to extract itself, if it's corrupt? I don't know compression
> > > algorithms that well, but I thought there was some sort of check for this?"
> > >
> > > I just started downloading a package with wget, cancelled it, then tried to
> > > install it with pacman, and :
> > > loading package data... error: error while reading package: Premature end of
> > > gzip compressed data: Input/output error
> > >
> > > I restored the old code that prevented reading the whole archive :
> > >  956     if(config && filelist && scriptcheck) {
> > >  957       /* we have everything we need */
> > >  958       break;
> > >  959     }
> > >
> > > And after that, I got :
> > > loading package data... error: error while reading package: (Empty error
> > > message)
> > >
> > > So apparently, it still fails, which is good. However, it apparently doesn't
> > > fail at the same point, because of this Empty error message.
> > > So I didn't figure out yet where it failed exactly, and if it'll fail with
> > > all corrupted archives.
> > > At least, it doesn't work with any corrupted archives, which might be good
> > > enough for restoring that feature.
> > > The times showed by Dan are quite crazy :)
> >
> > Well, see here's the thing. The *reason* we run through once is just
> > for verification that the archive isn't corrupt near the tail end.
> >
> > For instance, if pacman starts pulling out files, and fails on the
> > 101st file, well, everything goes to hell.
> >
> > It's a valid check, in my opinion, BUT I think we should make it
> > optional, for cases like Dan's cache cleaning.
>
> We could implement a pkg_load_meta or pkg_load_info quite easily, I'm sure.
>
> > That is, it might be worth it to throw some flag at it that will tell
> > us if we need an integrity check or not. Packages in the cache are
> > assumed to be functional, so we should be able to skip it at that
> > point, right?
>
> Well...do we really care if it is valid? If it isn't installed, its
> getting deleted anyway. If it is installed and invalid...well, let's
> hope that isn't the case. Regardless, if we can get a name and version
> out of it we should be good to go.

Right, all I'm saying is that the run-through-once thing isn't bad
when we're talking about the -A and -U options... Here's an example...

    wget http://really_big_package &
    pacman -U really_big_package

This will most likely have the meta data very quickly (it's at the
beginning of the archive), but it will fail mid-install until the wget
is complete

BUT in the case where we're scanning the cache, why not do some sort of:
    pkg_load(foo, integrity_check=false)




More information about the pacman-dev mailing list