[pacman-dev] pacman cold caches performance, too much stat()ing
Dimitrios Apostolou
jimis at gmx.net
Sat Dec 12 17:13:19 EST 2009
On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
> Hello list,
>
> I have been investigating the slow performance of pacman regarding the cold
> caches scenario and I'm trying to write some proof of concept code that
> improves things a lot for some cases. However I need your help regarding some
> facts I might have misunderstood, and any pointers to the source code you
> also give me would also help a lot. I wouldn't like to lose time changing
> stuff that would break current functionality. So here are some first
> questions that come to mind, just by using strace:
>
> When doing "pacman -Q blah" I can see that besides the getdents() syscalls in
> /var/lib/pacman/local (probably caused by readdir()), there are also stat()
> and access() calls for every single subdirectory. Why are the last ones
> necessary? Isn't readdir enough?
>
> The same goes when doing "pacman -S blah". But in that case it stat()'s both
> 'local' and 'sync' directories, so worst case is really bad, it will stat()
> all contents of local, core, extra and community...
Regarding the stat() and access() operations I finally found out why they
happen exactly:
In case of corrupted db the sync, for example, directory might contain
files, not subdirectories. So in that case _alpm_db_populate() just makes
sure it's a directory. However stat()ing thousands of files is too much of
a price to pay. Similarly, access() checks it is accessible by the user.
In the attached patch I have just removed the relevant lines, with the
following rationale: In the rare case of corrupted db, even if we do
open("sync/not_a_dir/depends") it will still fail and we'll catch the
failure there, no need to investigate the cause further, just write a
message like "couldn't access sync/not_a_dir/depends".
By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before running, I
measure a nice performance boost on my old laptop: "pacman -Q gdb" time is
reduced from about 7s to 2.5s.
What do you think? Is it possible to remove those checks?
Dimitris
P.S. Now all that remains is the depends/conflicts/requiredby stuff which
is by far the hardest... I'm still trying to decipher the patch
implementing REQUIREDBY that was posted earlier.
-------------- next part --------------
diff --git a/lib/libalpm/be_files.c b/lib/libalpm/be_files.c
index 90e97a5..7d80ea7 100644
--- a/lib/libalpm/be_files.c
+++ b/lib/libalpm/be_files.c
@@ -222,8 +222,6 @@ int _alpm_db_populate(pmdb_t *db)
{
int count = 0;
struct dirent *ent = NULL;
- struct stat sbuf;
- char path[PATH_MAX];
const char *dbpath;
DIR *dbdir;
@@ -243,12 +241,6 @@ int _alpm_db_populate(pmdb_t *db)
if(strcmp(name, ".") == 0 || strcmp(name, "..") == 0) {
continue;
}
- /* stat the entry, make sure it's a directory */
- snprintf(path, PATH_MAX, "%s%s", dbpath, name);
- if(stat(path, &sbuf) != 0 || !S_ISDIR(sbuf.st_mode)) {
- continue;
- }
-
pkg = _alpm_pkg_new();
if(pkg == NULL) {
closedir(dbdir);
@@ -337,13 +329,6 @@ int _alpm_db_read(pmdb_t *db, pmpkg_t *info, pmdbinfrq_t inforeq)
pkgpath = get_pkgpath(db, info);
- if(access(pkgpath, F_OK)) {
- /* directory doesn't exist or can't be opened */
- _alpm_log(PM_LOG_DEBUG, "cannot find '%s-%s' in db '%s'\n",
- info->name, info->version, db->treename);
- goto error;
- }
-
/* DESC */
if(inforeq & INFRQ_DESC) {
snprintf(path, PATH_MAX, "%sdesc", pkgpath);
More information about the pacman-dev
mailing list