[pacman-dev] pacman cold caches performance, too much stat()ing

Dimitrios Apostolou jimis at gmx.net
Sat Dec 12 18:51:08 EST 2009

On Sat, 12 Dec 2009, Nagy Gabor wrote:
>> On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
>> Regarding the stat() and access() operations I finally found out why
>> they happen exactly:
>> In case of corrupted db the sync, for example, directory might
>> contain files, not subdirectories. So in that case
>> _alpm_db_populate() just makes sure it's a directory. However
>> stat()ing thousands of files is too much of a price to pay.
>> Similarly, access() checks it is accessible by the user.
>> In the attached patch I have just removed the relevant lines, with
>> the following rationale: In the rare case of corrupted db, even if we
>> do open("sync/not_a_dir/depends") it will still fail and we'll catch
>> the failure there, no need to investigate the cause further, just
>> write a message like "couldn't access sync/not_a_dir/depends".
>> By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before
>> running, I measure a nice performance boost on my old laptop: "pacman
>> -Q gdb" time is reduced from about 7s to 2.5s.
> Hm. This is a nice time boost... Did you test this with other
> operations, too?

I didn't time it, but strace shows this improvement applies to -Qi, -Si, 
-Su as well. It doesn't show that much however because all these 
operations actually read() thousands of files (depends, desc) which is 
much worse than stat(). :-)

>> What do you think? Is it possible to remove those checks?
>> Dimitris
> The best solution would be to rewrite our whole database crap as Dan 
> said. I am pretty sure that this patch would not cause any harm irl, but

Because I really like the ease of use of the current format, I'll try 
improving things with minimum changes to it. If we can avoid a complete 
backend rewrite with minor changes, that is a good think, isn't it?

> our code would become a little bit more dangerous: As I see, 
> db_read(INFRQ_BASE) would become a ~NOP function and db_populate would 
> become a simple "ls" function (the only remaining sanity check is 
> splitname).

Exactly! Just a simple ls should be necessary, that was my initial 
motivation. And I have thought of a way to even avoid that readdir(), but 
I should get some measurements first.


More information about the pacman-dev mailing list