[pacman-dev] pacman cold caches performance, too much stat()ing

Dimitrios Apostolou jimis at gmx.net
Sat Dec 12 17:13:19 EST 2009

On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
> Hello list,
> I have been investigating the slow performance of pacman regarding the cold 
> caches scenario and I'm trying to write some proof of concept code that 
> improves things a lot for some cases. However I need your help regarding some 
> facts I might have misunderstood, and any pointers to the source code you 
> also give me would also help a lot. I wouldn't like to lose time changing 
> stuff that would break current functionality. So here are some first 
> questions that come to mind, just by using strace:
> When doing "pacman -Q blah" I can see that besides the getdents() syscalls in 
> /var/lib/pacman/local (probably caused by readdir()), there are also stat() 
> and access() calls for every single subdirectory. Why are the last ones 
> necessary? Isn't readdir enough?
> The same goes when doing "pacman -S blah". But in that case it stat()'s both 
> 'local' and 'sync' directories, so worst case is really bad, it will stat() 
> all contents of local, core, extra and community...

Regarding the stat() and access() operations I finally found out why they 
happen exactly:

In case of corrupted db the sync, for example, directory might contain 
files, not subdirectories. So in that case _alpm_db_populate() just makes 
sure it's a directory. However stat()ing thousands of files is too much of 
a price to pay. Similarly, access() checks it is accessible by the user.

In the attached patch I have just removed the relevant lines, with the 
following rationale: In the rare case of corrupted db, even if we do 
open("sync/not_a_dir/depends") it will still fail and we'll catch the 
failure there, no need to investigate the cause further, just write a 
message like "couldn't access sync/not_a_dir/depends".

By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before running, I 
measure a nice performance boost on my old laptop: "pacman -Q gdb" time is 
reduced from about 7s to 2.5s.

What do you think? Is it possible to remove those checks?

P.S. Now all that remains is the depends/conflicts/requiredby stuff which 
is by far the hardest... I'm still trying to decipher the patch 
implementing REQUIREDBY that was posted earlier.
-------------- next part --------------
diff --git a/lib/libalpm/be_files.c b/lib/libalpm/be_files.c
index 90e97a5..7d80ea7 100644
--- a/lib/libalpm/be_files.c
+++ b/lib/libalpm/be_files.c
@@ -222,8 +222,6 @@ int _alpm_db_populate(pmdb_t *db)
 	int count = 0;
 	struct dirent *ent = NULL;
-	struct stat sbuf;
-	char path[PATH_MAX];
 	const char *dbpath;
 	DIR *dbdir;
@@ -243,12 +241,6 @@ int _alpm_db_populate(pmdb_t *db)
 		if(strcmp(name, ".") == 0 || strcmp(name, "..") == 0) {
-		/* stat the entry, make sure it's a directory */
-		snprintf(path, PATH_MAX, "%s%s", dbpath, name);
-		if(stat(path, &sbuf) != 0 || !S_ISDIR(sbuf.st_mode)) {
-			continue;
-		}
 		pkg = _alpm_pkg_new();
 		if(pkg == NULL) {
@@ -337,13 +329,6 @@ int _alpm_db_read(pmdb_t *db, pmpkg_t *info, pmdbinfrq_t inforeq)
 	pkgpath = get_pkgpath(db, info);
-	if(access(pkgpath, F_OK)) {
-		/* directory doesn't exist or can't be opened */
-		_alpm_log(PM_LOG_DEBUG, "cannot find '%s-%s' in db '%s'\n",
-				info->name, info->version, db->treename);
-		goto error;
-	}
 	/* DESC */
 	if(inforeq & INFRQ_DESC) {
 		snprintf(path, PATH_MAX, "%sdesc", pkgpath);

More information about the pacman-dev mailing list