On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
Hello list,
I have been investigating the slow performance of pacman regarding the cold caches scenario and I'm trying to write some proof of concept code that improves things a lot for some cases. However I need your help regarding some facts I might have misunderstood, and any pointers to the source code you also give me would also help a lot. I wouldn't like to lose time changing stuff that would break current functionality. So here are some first questions that come to mind, just by using strace:
When doing "pacman -Q blah" I can see that besides the getdents() syscalls in /var/lib/pacman/local (probably caused by readdir()), there are also stat() and access() calls for every single subdirectory. Why are the last ones necessary? Isn't readdir enough?
The same goes when doing "pacman -S blah". But in that case it stat()'s both 'local' and 'sync' directories, so worst case is really bad, it will stat() all contents of local, core, extra and community...
Regarding the stat() and access() operations I finally found out why they happen exactly: In case of corrupted db the sync, for example, directory might contain files, not subdirectories. So in that case _alpm_db_populate() just makes sure it's a directory. However stat()ing thousands of files is too much of a price to pay. Similarly, access() checks it is accessible by the user. In the attached patch I have just removed the relevant lines, with the following rationale: In the rare case of corrupted db, even if we do open("sync/not_a_dir/depends") it will still fail and we'll catch the failure there, no need to investigate the cause further, just write a message like "couldn't access sync/not_a_dir/depends". By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before running, I measure a nice performance boost on my old laptop: "pacman -Q gdb" time is reduced from about 7s to 2.5s. What do you think? Is it possible to remove those checks? Dimitris P.S. Now all that remains is the depends/conflicts/requiredby stuff which is by far the hardest... I'm still trying to decipher the patch implementing REQUIREDBY that was posted earlier.