On 13/12/2009, at 2:31 AM, Nagy Gabor wrote:
On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
Hello list,
I have been investigating the slow performance of pacman regarding the cold caches scenario and I'm trying to write some proof of concept code that improves things a lot for some cases. However I need your help regarding some facts I might have misunderstood, and any pointers to the source code you also give me would also help a lot. I wouldn't like to lose time changing stuff that would break current functionality. So here are some first questions that come to mind, just by using strace:
When doing "pacman -Q blah" I can see that besides the getdents() syscalls in /var/lib/pacman/local (probably caused by readdir()), there are also stat() and access() calls for every single subdirectory. Why are the last ones necessary? Isn't readdir enough?
The same goes when doing "pacman -S blah". But in that case it stat()'s both 'local' and 'sync' directories, so worst case is really bad, it will stat() all contents of local, core, extra and community...
Regarding the stat() and access() operations I finally found out why they happen exactly:
In case of corrupted db the sync, for example, directory might contain files, not subdirectories. So in that case _alpm_db_populate() just makes sure it's a directory. However stat()ing thousands of files is too much of a price to pay. Similarly, access() checks it is accessible by the user.
In the attached patch I have just removed the relevant lines, with the following rationale: In the rare case of corrupted db, even if we do open("sync/not_a_dir/depends") it will still fail and we'll catch the failure there, no need to investigate the cause further, just write a message like "couldn't access sync/not_a_dir/depends".
By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before running, I measure a nice performance boost on my old laptop: "pacman -Q gdb" time is reduced from about 7s to 2.5s.
Hm. This is a nice time boost... Did you test this with other operations, too?
What do you think? Is it possible to remove those checks? Dimitris
The best solution would be to rewrite our whole database crap as Dan said. I am pretty sure that this patch would not cause any harm irl, but our code would become a little bit more dangerous: As I see, db_read(INFRQ_BASE) would become a ~NOP function and db_populate would become a simple "ls" function (the only remaining sanity check is splitname).
It occurs to me time and time again, that it would be a good idea to try and abstract the database functions, to allow different database backends to be "plugged in." This would make experimentation with backends a lot easier, since you just compile a different file in (the interface remains the same). A library called libpkg[1] does something along these lines, by leveraging function pointers. Unfortunately I don't have a lot of time to look into it further, but it's an interesting idea. [1]: http://libpkg.berlios.de/doc/trunk/html/pkg__db_8h_source.html