[pacman-dev] proof of concept code with bsd db4

Mon Nov 9 01:57:15 EST 2009

On Mon, Nov 9, 2009 at 8:54 AM, Ciprian Dorin, Craciun
<ciprian.craciun at gmail.com> wrote:
> On Mon, Nov 9, 2009 at 5:50 AM, Dan McGee <dpmcgee at gmail.com> wrote:
>> On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver
>> <solstice.dhiver at gmail.com> wrote:
>>> hi.
>>> i wanted to make a new bsd db4 back-end for alpm. but i never reached my
>>> goal. and will not
>>> all i have is a proof of concept code that use bsd db4 api to store
>>> pmpkg_t and wanted to share it with anyone (interested ?)
>>>
>>> i have coded 3 utilities:
>>> - one that converts pacman's db into a bsd db4 file for each repo
>>> - one that reads that new db format to perform query as pacman does
>>> - one that converts directly a tarball db (taken from a sync mirror)
>>> into a bsd db4 file
>>>
>>> if this proves useful for someone, great.
>>> More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html
>>> and in the README of
>>> http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz
>>
>> Nice work on actually doing something here and sharing the code!
>> Thanks, as it might just make some wheels turn for some other people
>> here on the list.
>>
>> I grabbed your code and took it for a spin. I liked the fact that you
>> had a README and all, I didn't have much trouble at all getting it
>> running. I even found a real hotspot in readdb (add_sorted is a killer
>> in a tight loop; it makes a lot more sense to do all your adds
>> followed by an alpm_list_msort()).
>>
>> For others on the list who haven't looked at it yet:
>> * Raw speed alone, this wins. Of course, pacman does a lot more (this
>> isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman"
>> search yielded times of 0.083 seconds vs 0.282 seconds (in the hot
>> cache case, of course).
>> * BDB uses key/value pairs for those who aren't familiar. The database
>> layout could probably be simplified a bit- we could pack many
>> attributes into one key/value pair for those we don't use all that
>> often, or never search by but only do lookups.
>> * It didn't take all that much code to do this. That is encouraging.
>>
>> What do people think about non-file-system-based backends? There are
>> several options we could think about:
>> * BSD DB4, similar to what was done here (fast and pretty simple)
>> * SQLite, which might give us a bit more flexibility for querying/lookup
>> * Direct tarfile parsing each time, no conversion needed but likely
>> rather inefficient
>> * ???
>>
>> The biggest reason always raised in the past against non-file backends
>> was corruption. If you get a corrupted localdb or something you can't
>> recover from, you are in a bad place. With files, you have the lowest
>> barrier to recovery. With a more binary format, it is a lot trickier.
>> Thoughts?
>>
>> -Dan
>
>
>    Interesting. A quicker pacman should be a positive thing, right? :)
>
>    I vote for BerkeleyDB, because I've used it in previous projects,
> and besides performance it also brings data integrity and
> recoverability. (For example what happens if a power outage happens
> during pacman upgrading, just when pacman is writing its file system?
> In the case of BerkeleyDB we have atomic operations without a
> problem.)
>
>    Another note: BerkeleyDB also supports indices, thus allowing us
> to more efficiently search fol keys based on values (searching
> packages by fields). Also newer versions of BerkeleyDB have a kind of
> SQL-like language for defining structures. [1]
>
>    About backups, there is a tool to dump and load a database, thus
> backups should be very easy.
>
>    So if someone needs some help with implementing this feature I
> could also help.
>
>    Ciprian.
>
>    [1] http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html

    Sory for the wronk link (I've searched it in a hurry on Google).
It's the following one:
    http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/db_sql.html