[pacman-dev] filesystem performance

Mon Jan 28 08:04:45 EST 2008

> Hey.
> I got interested in this last week, and started breaking libalpm
> apart and try to fit in an sqliteish implementation. The code was new
> to me and I didn't have any other consideration but to get something
> working as fast as possible, so the result is nasty.
> Basically, I first commented treename from struct __pmdb_t so the
> compiler would tell me all(or most of) the places where the old db is
> used, and either disabled those functions or did the same with
> sqlite. Mainly the additions are in be_sqlite.c (renamed be_files.c),
> where _alpm_db_open opens the sqlite db, and db.c: _alpm_db_search,
> which executes a simple SELECT * FROM packages WHERE name LIKE
> "%foo%" and populates the return list.
> 
> So, I implemented about 40% of pacman -Ss. If someone cares about
> timings (and you probably shouldn't, since my version doesn't do
> quite the same thing), here they are:

Huh, this "sqlite backend idea" is quite popular nowadays, imho more
people working on its implementation, so I suggest co-operation ;-)

> (running pacman -Ss g three times after a reboot)
> 
> pacman-3.1: 41.866s, 0.765s, 0.762s
> mutilated-pacman-with-sqlite: 1.036s, 0.131s, 0.133s
> 
> pacman-3.1 shows probably rather worse performance in the worst run
> than it usually would, since my /var/ was 99% full at the time :)
> 
> 
> Anyway. The timing is not the most important issue, I think. libalpm
> has a lot of code that is merely there because C sucks for things
> like string and directory manipulation. And we need to do a lot of
> that. My humble guess is that a proper implementation of libalpm done
> with sqlite could be at least 50% smaller with a more understandable
> codebase.
> 
> If we want to do this, then how? Some options from the top of my head:
> 
> 1) for the parts that deal with the db, start from scratch. With the
> talent you guys have, shouldn't be a problem? Libalpm isn't very
> large...
> 
> 2) for the development phase, consider sqlite to be a cache for the
> filedb, and gradually move each piece of code to the other side. This
> way, the legacy code would weigh us down a bit, but the change might
> be more sustainable.
> 
> 3) Just hack in the functionality somehow, anyhow.
> 
> 4) Refactor alpm to support different backends and implement whatever
> backend de jour.
> 
> 
> Ideas, praise, flames welcome. Code available by request.
> 

First of all, I appreciate your work/attempt to make pacman better.

Well, I'm pretty sure, that
1. sqlite is faster
2. reduces codebase (find replacements, check for provisions, groups
etc. can be reduced to a simple sql query),
but we haven't got reassuring answers to our "database corruption" fear.

So I would like to ask you to convince us, why sqlite is safe (well,
personally I have very limited sql[ite] knowledge now). Please try to
understand that for most of the pacman devels/contributors "stability"
is more important than speed [obviously corrupted localdb == unusable
system]. That's why I can guess that this idea won't be accepted until
we cannot see the proof of the fact that the new db back-end is as safe
as the old one (or more safer ;-P).

Bye