[pacman-dev] pacman cold caches performance, too much stat()ing
Dimitrios Apostolou
jimis at gmx.net
Sat Dec 12 18:51:08 EST 2009
On Sat, 12 Dec 2009, Nagy Gabor wrote:
>> On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
>> Regarding the stat() and access() operations I finally found out why
>> they happen exactly:
>>
>> In case of corrupted db the sync, for example, directory might
>> contain files, not subdirectories. So in that case
>> _alpm_db_populate() just makes sure it's a directory. However
>> stat()ing thousands of files is too much of a price to pay.
>> Similarly, access() checks it is accessible by the user.
>>
>> In the attached patch I have just removed the relevant lines, with
>> the following rationale: In the rare case of corrupted db, even if we
>> do open("sync/not_a_dir/depends") it will still fail and we'll catch
>> the failure there, no need to investigate the cause further, just
>> write a message like "couldn't access sync/not_a_dir/depends".
>>
>> By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before
>> running, I measure a nice performance boost on my old laptop: "pacman
>> -Q gdb" time is reduced from about 7s to 2.5s.
>
> Hm. This is a nice time boost... Did you test this with other
> operations, too?
I didn't time it, but strace shows this improvement applies to -Qi, -Si,
-Su as well. It doesn't show that much however because all these
operations actually read() thousands of files (depends, desc) which is
much worse than stat(). :-)
>
>> What do you think? Is it possible to remove those checks?
>> Dimitris
>
> The best solution would be to rewrite our whole database crap as Dan
> said. I am pretty sure that this patch would not cause any harm irl, but
Because I really like the ease of use of the current format, I'll try
improving things with minimum changes to it. If we can avoid a complete
backend rewrite with minor changes, that is a good think, isn't it?
> our code would become a little bit more dangerous: As I see,
> db_read(INFRQ_BASE) would become a ~NOP function and db_populate would
> become a simple "ls" function (the only remaining sanity check is
> splitname).
Exactly! Just a simple ls should be necessary, that was my initial
motivation. And I have thought of a way to even avoid that readdir(), but
I should get some measurements first.
Dimitris
More information about the pacman-dev
mailing list