On Sat, 12 Dec 2009, Nagy Gabor wrote:
On Sat, 12 Dec 2009, Dimitrios Apostolou wrote: Regarding the stat() and access() operations I finally found out why they happen exactly:
In case of corrupted db the sync, for example, directory might contain files, not subdirectories. So in that case _alpm_db_populate() just makes sure it's a directory. However stat()ing thousands of files is too much of a price to pay. Similarly, access() checks it is accessible by the user.
In the attached patch I have just removed the relevant lines, with the following rationale: In the rare case of corrupted db, even if we do open("sync/not_a_dir/depends") it will still fail and we'll catch the failure there, no need to investigate the cause further, just write a message like "couldn't access sync/not_a_dir/depends".
By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before running, I measure a nice performance boost on my old laptop: "pacman -Q gdb" time is reduced from about 7s to 2.5s.
Hm. This is a nice time boost... Did you test this with other operations, too?
I didn't time it, but strace shows this improvement applies to -Qi, -Si, -Su as well. It doesn't show that much however because all these operations actually read() thousands of files (depends, desc) which is much worse than stat(). :-)
What do you think? Is it possible to remove those checks? Dimitris
The best solution would be to rewrite our whole database crap as Dan said. I am pretty sure that this patch would not cause any harm irl, but
Because I really like the ease of use of the current format, I'll try improving things with minimum changes to it. If we can avoid a complete backend rewrite with minor changes, that is a good think, isn't it?
our code would become a little bit more dangerous: As I see, db_read(INFRQ_BASE) would become a ~NOP function and db_populate would become a simple "ls" function (the only remaining sanity check is splitname).
Exactly! Just a simple ls should be necessary, that was my initial motivation. And I have thought of a way to even avoid that readdir(), but I should get some measurements first. Dimitris