Re: [pacman-dev] proof of concept code with bsd db4

9 Nov 2009


      On Mon, Nov 9, 2009 at 8:54 AM, Ciprian Dorin, Craciun
<ciprian.craciun@gmail.com> wrote:
...
On Mon, Nov 9, 2009 at 5:50 AM, Dan McGee <dpmcgee@gmail.com> wrote:
...
On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver
<solstice.dhiver@gmail.com> wrote:
...
hi.
i wanted to make a new bsd db4 back-end for alpm. but i never reached my
goal. and will not
all i have is a proof of concept code that use bsd db4 api to store
pmpkg_t and wanted to share it with anyone (interested ?)
i have coded 3 utilities:
- one that converts pacman's db into a bsd db4 file for each repo
- one that reads that new db format to perform query as pacman does
- one that converts directly a tarball db (taken from a sync mirror)
into a bsd db4 file
if this proves useful for someone, great.
More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html
and in the README of
http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz
Nice work on actually doing something here and sharing the code!
Thanks, as it might just make some wheels turn for some other people
here on the list.
I grabbed your code and took it for a spin. I liked the fact that you
had a README and all, I didn't have much trouble at all getting it
running. I even found a real hotspot in readdb (add_sorted is a killer
in a tight loop; it makes a lot more sense to do all your adds
followed by an alpm_list_msort()).
For others on the list who haven't looked at it yet:
* Raw speed alone, this wins. Of course, pacman does a lot more (this
isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman"
search yielded times of 0.083 seconds vs 0.282 seconds (in the hot
cache case, of course).
* BDB uses key/value pairs for those who aren't familiar. The database
layout could probably be simplified a bit- we could pack many
attributes into one key/value pair for those we don't use all that
often, or never search by but only do lookups.
* It didn't take all that much code to do this. That is encouraging.
What do people think about non-file-system-based backends? There are
several options we could think about:
* BSD DB4, similar to what was done here (fast and pretty simple)
* SQLite, which might give us a bit more flexibility for querying/lookup
* Direct tarfile parsing each time, no conversion needed but likely
rather inefficient
* ???
The biggest reason always raised in the past against non-file backends
was corruption. If you get a corrupted localdb or something you can't
recover from, you are in a bad place. With files, you have the lowest
barrier to recovery. With a more binary format, it is a lot trickier.
Thoughts?
-Dan
   Interesting. A quicker pacman should be a positive thing, right? :)
   I vote for BerkeleyDB, because I've used it in previous projects,
and besides performance it also brings data integrity and
recoverability. (For example what happens if a power outage happens
during pacman upgrading, just when pacman is writing its file system?
In the case of BerkeleyDB we have atomic operations without a
problem.)
   Another note: BerkeleyDB also supports indices, thus allowing us
to more efficiently search fol keys based on values (searching
packages by fields). Also newer versions of BerkeleyDB have a kind of
SQL-like language for defining structures. [1]
   About backups, there is a tool to dump and load a database, thus
backups should be very easy.
   So if someone needs some help with implementing this feature I
could also help.
   Ciprian.
   [1] http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html
Sory for the wronk link (I've searched it in a hurry on Google).
It's the following one:
    http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/...