[pacman-dev] proof of concept code with bsd db4

Sun Nov 8 22:50:25 EST 2009

On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver
<solstice.dhiver at gmail.com> wrote:
> hi.
> i wanted to make a new bsd db4 back-end for alpm. but i never reached my
> goal. and will not
> all i have is a proof of concept code that use bsd db4 api to store
> pmpkg_t and wanted to share it with anyone (interested ?)
>
> i have coded 3 utilities:
> - one that converts pacman's db into a bsd db4 file for each repo
> - one that reads that new db format to perform query as pacman does
> - one that converts directly a tarball db (taken from a sync mirror)
> into a bsd db4 file
>
> if this proves useful for someone, great.
> More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html
> and in the README of
> http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz

Nice work on actually doing something here and sharing the code!
Thanks, as it might just make some wheels turn for some other people
here on the list.

I grabbed your code and took it for a spin. I liked the fact that you
had a README and all, I didn't have much trouble at all getting it
running. I even found a real hotspot in readdb (add_sorted is a killer
in a tight loop; it makes a lot more sense to do all your adds
followed by an alpm_list_msort()).

For others on the list who haven't looked at it yet:
* Raw speed alone, this wins. Of course, pacman does a lot more (this
isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman"
search yielded times of 0.083 seconds vs 0.282 seconds (in the hot
cache case, of course).
* BDB uses key/value pairs for those who aren't familiar. The database
layout could probably be simplified a bit- we could pack many
attributes into one key/value pair for those we don't use all that
often, or never search by but only do lookups.
* It didn't take all that much code to do this. That is encouraging.

What do people think about non-file-system-based backends? There are
several options we could think about:
* BSD DB4, similar to what was done here (fast and pretty simple)
* SQLite, which might give us a bit more flexibility for querying/lookup
* Direct tarfile parsing each time, no conversion needed but likely
rather inefficient
* ???

The biggest reason always raised in the past against non-file backends
was corruption. If you get a corrupted localdb or something you can't
recover from, you are in a bad place. With files, you have the lowest
barrier to recovery. With a more binary format, it is a lot trickier.
Thoughts?

-Dan