[pacman-dev] Paralellising integrity checks

Tavian Barnes tavianator at tavianator.com
Sat Feb 26 18:56:00 EST 2011


On 25 February 2011 11:13, Dan McGee <dpmcgee at gmail.com> wrote:
> On Sat, Feb 19, 2011 at 6:11 PM, Tavian Barnes
> <tavianator at tavianator.com> wrote:
>> On a related note, I just tried running the test suite after entirely
>> patching out integrity checks, and there weren't any regressions.
>> Maybe the test suite should test the handling of corrupt packages?  I
>> can add a test case myself if you want, once I've figured out how the
>> test suite works.
>
> Tests for this would definitely be nice. You will probably have to add
> the ability to pactest to create a broken package and/or database
> entry.

I'll have a look at that.

> Steps I know of and notes about them:
> * "Checking integrity" is really two things- md5sum iterations on the
> file, and then an alpm_pkg_load() call to build the package object and
> create the filelist. Not sure how you incorporated this but at least
> something to think about.

The current patchset runs both those steps in parallel; everything
else in that loop is protected by a mutex.  alpm_pkg_load() takes
significantly longer than the md5sum check, by the way.

> * We do yet another iteration of all of the package contents if
> diskspace checking is enabled and read through the archive. This could
> be eliminated if we grabbed the necessary data in pkg_load, which I
> believe is simply some parts of the stat buffer and the type of the
> entry. This would also be hugely helpful in conflict checking, where
> we don't have this info available, and you will see some comments
> alluding to the "12 checks we do in add.c" or something.

I'll look into this.

> * Downloads. I see a call to do this in parallel a lot and I will
> continue to think this is stupid, but maybe that is just me. If you
> can't find a mirror that saturates your connection, look around- we
> have a lot.

That seems like a bad idea, I agree with you.  But while we're
downloading we could probably be doing a bunch of work in the
background, including integrity checks.  The next version of the
patchset I post will do this.

> * File conflicts- we've made this one pretty damn fast already, so
> probably not worth parallelizing.

Agreed, I've never seen this take up a significant portion of the time
it takes to -Syu.

-- 
Tavian Barnes


More information about the pacman-dev mailing list