[pacman-dev] Paralellising integrity checks
Integrity checking in pacman seems to be a CPU-bound embarrassingly parallel task, so I'd like to spread it out over every available core to speed it up. To me it looks like both the delta and regular package integrity checking loops in lib/libalpm/sync.c could be parallelised. But I've never messed with the pacman code before, so I have a couple questions about your best practices: - There doesn't seem to be any threading code in pacman/libalpm currently; would using pthreads directly be okay, or should a threading abstraction layer be created? - There's no portable way to get the number of available cores. Where does platform-specific code go in libalpm? The way to do it on Linux is with sched_getaffinity(); sysconf(_SC_NPROCESSORS_ONLN) is almost as good and works on BSD too I believe. Also, I guess a major question would be: would this be likely to be merged? It's not a major issue, but it's also not a very large change. Every time there's a KDE update or something similarly large I find myself wishing that all 2/4/24 of my cores were being used for the the integrity checks. -- Tavian Barnes
On Sat, Feb 19, 2011 at 04:26:56AM -0500, Tavian Barnes wrote: snip
- There's no portable way to get the number of available cores. Where does platform-specific code go in libalpm? The way to do it on Linux is with sched_getaffinity(); sysconf(_SC_NPROCESSORS_ONLN) is almost as good and works on BSD too I believe.
No, I don't think it works on BSDs. You can look at how x264 guys implemented this. Check out x264_cpu_num_processors() in common/cpu.c in their source tree. I'm not sure hard-coding supported platforms through ifdefs would be accepted in pacman/libalpm.
Also, I guess a major question would be: would this be likely to be merged? It's not a major issue, but it's also not a very large change. Every time there's a KDE update or something similarly large I find myself wishing that all 2/4/24 of my cores were being used for the the integrity checks.
-- Tavian Barnes
On Sat, Feb 19, 2011 at 01:13:22PM +0200, Nezmer wrote:
On Sat, Feb 19, 2011 at 04:26:56AM -0500, Tavian Barnes wrote:
snip
- There's no portable way to get the number of available cores. Where does platform-specific code go in libalpm? The way to do it on Linux is with sched_getaffinity(); sysconf(_SC_NPROCESSORS_ONLN) is almost as good and works on BSD too I believe.
No, I don't think it works on BSDs.
Actually, The sysconf() method works at least in FreeBSD and the man page says the sysconf interface is defined by POSIX.1
You can look at how x264 guys implemented this. Check out x264_cpu_num_processors() in common/cpu.c in their source tree.
I'm not sure hard-coding supported platforms through ifdefs would be accepted in pacman/libalpm.
Also, I guess a major question would be: would this be likely to be merged? It's not a major issue, but it's also not a very large change. Every time there's a KDE update or something similarly large I find myself wishing that all 2/4/24 of my cores were being used for the the integrity checks.
-- Tavian Barnes
On 19 February 2011 06:28, Nezmer <git@nezmer.info> wrote:
Actually, The sysconf() method works at least in FreeBSD and the man page says the sysconf interface is defined by POSIX.1
The sysconf() interface is specified by POSIX.1, but _SC_NPROCESSORS_ONLN is a non-standard extension.
You can look at how x264 guys implemented this. Check out x264_cpu_num_processors() in common/cpu.c in their source tree.
Seems to be basically a more platform-complete version of what I'm doing here [1]. But yeah, we'd need something like that if pacman is supposed to run on everything unix-based. I don't imagine we care much about Windows compatibility :)
I'm not sure hard-coding supported platforms through ifdefs would be accepted in pacman/libalpm.
There's code that does this already in lib/libalpm/diskspace.c, for example. It's not like this would limit the supported platforms either; we can just pretend there's only 1 core when we don't know how to count them. Anyway, I've made a quick and dirty patch as a proof of concept. It drops the time spent doing integrity checks from 9 seconds to 3 seconds on an 88MB update. The improvements I plan to maker are to hide some of the threading behind an _alpm_for_each_cpu() API, use it for delta integrity checks too, and divide the packages between the cores evenly in terms of package size rather than package count. And of course add the missing error checks. diff --git a/lib/libalpm/sync.c b/lib/libalpm/sync.c index 859b8c9..595abd9 100644 --- a/lib/libalpm/sync.c +++ b/lib/libalpm/sync.c @@ -32,6 +32,7 @@ #include <unistd.h> #include <time.h> #include <limits.h> +#include <pthread.h> /* libalpm */ #include "sync.h" @@ -687,6 +688,79 @@ static int test_md5sum(pmtrans_t *trans, const char *filename, return(ret); } +/* Thread payload for checking package integrity */ +typedef struct { + size_t thread, numcpus, numtargs; + volatile size_t *current; + pmtrans_t *trans; + int *errors; + alpm_list_t **data; + + pthread_mutex_t *mutex; +} _alpm_integrity_payload; + +static void *_alpm_integrity_check_thread(void *ptr) +{ + _alpm_integrity_payload *payload = ptr; + pmtrans_t *trans = payload->trans; + + alpm_list_t *i; + size_t count = 0, range = (payload->numtargs * payload->thread) / payload->numcpus; + for(i = trans->add; count < range; i = i->next, count++); + + range = (payload->numtargs * (payload->thread + 1)) / payload->numcpus; + pthread_mutex_lock(payload->mutex); + for(; count < range; i = i->next, count++) { + pmpkg_t *spkg = i->data; + if(spkg->origin == PKG_FROM_FILE) { + continue; /* pkg_load() has been already called, this package is valid */ + } + + const char *filename = alpm_pkg_get_filename(spkg); + const char *md5sum = alpm_pkg_get_md5sum(spkg); + + pthread_mutex_unlock(payload->mutex); + int test = test_md5sum(trans, filename, md5sum); + pthread_mutex_lock(payload->mutex); + + if (test != 0) { + (*payload->errors)++; + *payload->data = alpm_list_add(*payload->data, strdup(filename)); + goto next; + } + /* load the package file and replace pkgcache entry with it in the target list */ + /* TODO: alpm_pkg_get_db() will not work on this target anymore */ + _alpm_log(PM_LOG_DEBUG, "replacing pkgcache entry with package file for target %s\n", spkg->name); + char *filepath = _alpm_filecache_find(filename); + pmpkg_t *pkgfile; + + pthread_mutex_unlock(payload->mutex); + int loaded = alpm_pkg_load(filepath, 1, &pkgfile); + pthread_mutex_lock(payload->mutex); + + if(loaded != 0) { + _alpm_pkg_free(pkgfile); + (*payload->errors)++; + *payload->data = alpm_list_add(*payload->data, strdup(filename)); + FREE(filepath); + goto next; + } + FREE(filepath); + pkgfile->reason = spkg->reason; /* copy over install reason */ + i->data = pkgfile; + _alpm_pkg_free_trans(spkg); /* spkg has been removed from the target list */ + + next: + (*payload->current)++; + int percent = (*payload->current * 100) / payload->numtargs; + PROGRESS(trans, PM_TRANS_PROGRESS_INTEGRITY_START, "", percent, + payload->numtargs, *payload->current); + } + pthread_mutex_unlock(payload->mutex); + + return(NULL); +} + int _alpm_sync_commit(pmtrans_t *trans, pmdb_t *db_local, alpm_list_t **data) { alpm_list_t *i, *j, *files = NULL; @@ -821,41 +895,36 @@ int _alpm_sync_commit(pmtrans_t *trans, pmdb_t *db_local, alpm_list_t **data) numtargs = alpm_list_count(trans->add); EVENT(trans, PM_TRANS_EVT_INTEGRITY_START, NULL, NULL); - errors = 0; - for(i = trans->add; i; i = i->next, current++) { - pmpkg_t *spkg = i->data; - int percent = (current * 100) / numtargs; - if(spkg->origin == PKG_FROM_FILE) { - continue; /* pkg_load() has been already called, this package is valid */ - } - PROGRESS(trans, PM_TRANS_PROGRESS_INTEGRITY_START, "", percent, - numtargs, current); + long numcpus = sysconf(_SC_NPROCESSORS_ONLN); + if(numcpus < 1) { + numcpus = 1; + } - const char *filename = alpm_pkg_get_filename(spkg); - const char *md5sum = alpm_pkg_get_md5sum(spkg); + static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; + _alpm_integrity_payload payload = { + .numcpus = numcpus, + .numtargs = numtargs, + .current = ¤t, + .trans = trans, + .errors = &errors, + .data = data, + .mutex = &mutex + }; - if(test_md5sum(trans, filename, md5sum) != 0) { - errors++; - *data = alpm_list_add(*data, strdup(filename)); - continue; + errors = 0; + { + pthread_t threads[numcpus]; + _alpm_integrity_payload payloads[numcpus]; + for(int i = 0; i < numcpus; i++) { + payloads[i] = payload; + payloads[i].thread = i; + pthread_create(&threads[i], NULL, _alpm_integrity_check_thread, &payloads[i]); } - /* load the package file and replace pkgcache entry with it in the target list */ - /* TODO: alpm_pkg_get_db() will not work on this target anymore */ - _alpm_log(PM_LOG_DEBUG, "replacing pkgcache entry with package file for target %s\n", spkg->name); - char *filepath = _alpm_filecache_find(filename); - pmpkg_t *pkgfile; - if(alpm_pkg_load(filepath, 1, &pkgfile) != 0) { - _alpm_pkg_free(pkgfile); - errors++; - *data = alpm_list_add(*data, strdup(filename)); - FREE(filepath); - continue; + for(int i = 0; i < numcpus; i++) { + pthread_join(threads[i], NULL); } - FREE(filepath); - pkgfile->reason = spkg->reason; /* copy over install reason */ - i->data = pkgfile; - _alpm_pkg_free_trans(spkg); /* spkg has been removed from the target list */ } + PROGRESS(trans, PM_TRANS_PROGRESS_INTEGRITY_START, "", 100, numtargs, current); EVENT(trans, PM_TRANS_EVT_INTEGRITY_DONE, NULL, NULL); [1]: http://gitorious.org/dimension/dimension/blobs/master/libdimension/platform.... -- Tavian Barnes
On Sat, Feb 19, 2011 at 3:24 PM, Tavian Barnes <tavianator@tavianator.com> wrote:
On 19 February 2011 06:28, Nezmer <git@nezmer.info> wrote:
Actually, The sysconf() method works at least in FreeBSD and the man page says the sysconf interface is defined by POSIX.1
The sysconf() interface is specified by POSIX.1, but _SC_NPROCESSORS_ONLN is a non-standard extension.
You can look at how x264 guys implemented this. Check out x264_cpu_num_processors() in common/cpu.c in their source tree.
Seems to be basically a more platform-complete version of what I'm doing here [1]. But yeah, we'd need something like that if pacman is supposed to run on everything unix-based. I don't imagine we care much about Windows compatibility :)
Take a look at git; it has plenty of code that has been tested for doing this and it works just fine on Windows. -Dan
On 19 February 2011 18:08, Dan McGee <dpmcgee@gmail.com> wrote:
On Sat, Feb 19, 2011 at 3:24 PM, Tavian Barnes <tavianator@tavianator.com> wrote:
On 19 February 2011 06:28, Nezmer <git@nezmer.info> wrote:
You can look at how x264 guys implemented this. Check out x264_cpu_num_processors() in common/cpu.c in their source tree.
Seems to be basically a more platform-complete version of what I'm doing here [1]. But yeah, we'd need something like that if pacman is supposed to run on everything unix-based. I don't imagine we care much about Windows compatibility :)
Take a look at git; it has plenty of code that has been tested for doing this and it works just fine on Windows.
-Dan
You mean git x264? That's what I was looking at, I just meant that we wouldn't have to include that part in pacman, considering the rest of the code doesn't look like it would build on Windows anyway. -- Tavian Barnes
On 20/02/11 09:22, Tavian Barnes wrote:
On 19 February 2011 18:08, Dan McGee<dpmcgee@gmail.com> wrote:
On Sat, Feb 19, 2011 at 3:24 PM, Tavian Barnes <tavianator@tavianator.com> wrote:
On 19 February 2011 06:28, Nezmer<git@nezmer.info> wrote:
You can look at how x264 guys implemented this. Check out x264_cpu_num_processors() in common/cpu.c in their source tree.
Seems to be basically a more platform-complete version of what I'm doing here [1]. But yeah, we'd need something like that if pacman is supposed to run on everything unix-based. I don't imagine we care much about Windows compatibility :)
Take a look at git; it has plenty of code that has been tested for doing this and it works just fine on Windows.
-Dan
You mean git x264? That's what I was looking at, I just meant that we wouldn't have to include that part in pacman, considering the rest of the code doesn't look like it would build on Windows anyway.
I'm fairly sure he meant git itself. Allan
On 19 February 2011 18:26, Allan McRae <allan@archlinux.org> wrote:
On 20/02/11 09:22, Tavian Barnes wrote:
On 19 February 2011 18:08, Dan McGee<dpmcgee@gmail.com> wrote:
On Sat, Feb 19, 2011 at 3:24 PM, Tavian Barnes <tavianator@tavianator.com> wrote:
On 19 February 2011 06:28, Nezmer<git@nezmer.info> wrote:
You can look at how x264 guys implemented this. Check out x264_cpu_num_processors() in common/cpu.c in their source tree.
Seems to be basically a more platform-complete version of what I'm doing here [1]. But yeah, we'd need something like that if pacman is supposed to run on everything unix-based. I don't imagine we care much about Windows compatibility :)
Take a look at git; it has plenty of code that has been tested for doing this and it works just fine on Windows.
-Dan
You mean git x264? That's what I was looking at, I just meant that we wouldn't have to include that part in pacman, considering the rest of the code doesn't look like it would build on Windows anyway.
I'm fairly sure he meant git itself.
Allan
Oh I get it. Well, the code in both is pretty similar, but git seems to support HPUX while x264 doesn't. Also, git just uses the number of online processors, so "taskset 0x1 git gc" runs N threads at once on 1 core, while x264 uses the process's CPU affinity on Linux and Windows to behave better in that case. Anyway, that's not the main point. Are you guys interested in this change? I'm almost done a better version of the patch that adds an _alpm_for_each_cpu() function (to util.h) which takes a callback function to call in N threads. On a related note, I just tried running the test suite after entirely patching out integrity checks, and there weren't any regressions. Maybe the test suite should test the handling of corrupt packages? I can add a test case myself if you want, once I've figured out how the test suite works. -- Tavian Barnes
Oh I get it. Well, the code in both is pretty similar, but git seems to support HPUX while x264 doesn't. Also, git just uses the number of online processors, so "taskset 0x1 git gc" runs N threads at once on 1 core, while x264 uses the process's CPU affinity on Linux and Windows to behave better in that case. Yeah, that seems nice but also overkill- git has a config var to limit
On Sat, Feb 19, 2011 at 6:11 PM, Tavian Barnes <tavianator@tavianator.com> wrote: thread count; we could add that to the config file and just fall back to online_cpus() if not provided.
Anyway, that's not the main point. Are you guys interested in this change? I'm almost done a better version of the patch that adds an _alpm_for_each_cpu() function (to util.h) which takes a callback function to call in N threads.
On a related note, I just tried running the test suite after entirely patching out integrity checks, and there weren't any regressions. Maybe the test suite should test the handling of corrupt packages? I can add a test case myself if you want, once I've figured out how the test suite works.
Tests for this would definitely be nice. You will probably have to add the ability to pactest to create a broken package and/or database entry. I don't want to review these just yet, as I want to focus my time on 3.5.0 releasing. I will add this and maybe you can take it into account in the patchset- we do a lot of things we could parallelize and/or combine. Steps I know of and notes about them: * "Checking integrity" is really two things- md5sum iterations on the file, and then an alpm_pkg_load() call to build the package object and create the filelist. Not sure how you incorporated this but at least something to think about. * We do yet another iteration of all of the package contents if diskspace checking is enabled and read through the archive. This could be eliminated if we grabbed the necessary data in pkg_load, which I believe is simply some parts of the stat buffer and the type of the entry. This would also be hugely helpful in conflict checking, where we don't have this info available, and you will see some comments alluding to the "12 checks we do in add.c" or something. * Downloads. I see a call to do this in parallel a lot and I will continue to think this is stupid, but maybe that is just me. If you can't find a mirror that saturates your connection, look around- we have a lot. * File conflicts- we've made this one pretty damn fast already, so probably not worth parallelizing. -Dan
On 25 February 2011 11:13, Dan McGee <dpmcgee@gmail.com> wrote:
On Sat, Feb 19, 2011 at 6:11 PM, Tavian Barnes <tavianator@tavianator.com> wrote:
On a related note, I just tried running the test suite after entirely patching out integrity checks, and there weren't any regressions. Maybe the test suite should test the handling of corrupt packages? I can add a test case myself if you want, once I've figured out how the test suite works.
Tests for this would definitely be nice. You will probably have to add the ability to pactest to create a broken package and/or database entry.
I'll have a look at that.
Steps I know of and notes about them: * "Checking integrity" is really two things- md5sum iterations on the file, and then an alpm_pkg_load() call to build the package object and create the filelist. Not sure how you incorporated this but at least something to think about.
The current patchset runs both those steps in parallel; everything else in that loop is protected by a mutex. alpm_pkg_load() takes significantly longer than the md5sum check, by the way.
* We do yet another iteration of all of the package contents if diskspace checking is enabled and read through the archive. This could be eliminated if we grabbed the necessary data in pkg_load, which I believe is simply some parts of the stat buffer and the type of the entry. This would also be hugely helpful in conflict checking, where we don't have this info available, and you will see some comments alluding to the "12 checks we do in add.c" or something.
I'll look into this.
* Downloads. I see a call to do this in parallel a lot and I will continue to think this is stupid, but maybe that is just me. If you can't find a mirror that saturates your connection, look around- we have a lot.
That seems like a bad idea, I agree with you. But while we're downloading we could probably be doing a bunch of work in the background, including integrity checks. The next version of the patchset I post will do this.
* File conflicts- we've made this one pretty damn fast already, so probably not worth parallelizing.
Agreed, I've never seen this take up a significant portion of the time it takes to -Syu. -- Tavian Barnes
On 26 February 2011 18:56, Tavian Barnes <tavianator@tavianator.com> wrote:
On 25 February 2011 11:13, Dan McGee <dpmcgee@gmail.com> wrote:
* Downloads. I see a call to do this in parallel a lot and I will continue to think this is stupid, but maybe that is just me. If you can't find a mirror that saturates your connection, look around- we have a lot.
That seems like a bad idea, I agree with you. But while we're downloading we could probably be doing a bunch of work in the background, including integrity checks. The next version of the patchset I post will do this.
Or possibly not, I didn't see the NACK from you on the other email. -- Tavian Barnes
participants (4)
-
Allan McRae
-
Dan McGee
-
Nezmer
-
Tavian Barnes