[pacman-dev] Re: Binary Database

Mon Mar 10 03:54:15 EDT 2008

Hi Xavier,
Sorry if I didn't hit you back immediately, but I wanted to read every
discussion and make up my mind about the various proposals.

So, let's go:

I think we are more interested in trying to improve the current text based
> one rather than switching to a binary database, at least as a first step.
> For example, for the sync db, we could directly read from the archive :
> http://bugs.archlinux.org/task/8586
>
>
Glad to hear. About the first one, I'm not a tar guru, and I don't know
about  how fast things could be if we read things from archive to memory.
I've seen there is already some code for it, but benchmarks in real use
cases would be better. What I fear is that reading too many times from a
tar'ed archive could be somehow counterproductive (again, I could be
completely wrong on that). I think that could be possible, if it's really
worth it, but I don't have enough knowledge on this argument to say if that
would be useful. I'd be glad to hear other voices.

A second point of interest is unifying the format of the meta-info files
inside the package (.PKGINFO) and the ones inside the database (depends /
desc). Having just one format would simplify the code.
We could also reduce the number of files in the sync and local database by
merging them, which would reduce the impact of fragmentation and slow
filesystems.
http://www.archlinux.org/pipermail/pacman-dev/2007-June/008601.html

This would surely be a plus, by now there is a real mess in the packages
dirs, and it's useless, code-consuming and confusing to have different
formats. Also, I propose to keep scriptlets in the Database. I clear my
cache a lot of times for the sake of disk space, and whenever I try to
remove a package, the remove scriptlet is not found. That could be harmful.
It wouldn't be a lot of space. Just a random thought anyway.

Otherwise, if we want to experiment with different backends, it would
> probably be a good idea to try abstracting the backend code, so that we would
> have all specific backend code in the same place, and have the possibility to
> easily switch between several ones. That was also discussed ages ago, in the
> same link as above :
> http://www.archlinux.org/pipermail/pacman-dev/2006-March/005702.html
>
>
This is *the* point. Everything we're saying is useless if we don't abstract
things first. If we keep a "dirty" implementation, where backend code is
mixed with database files parsing, porting improvements is a royal pain in
that place. If we could create a set of functions that take care of giving
out strings or files to parse to the current db functions in alpm, then
changing the current backend, using multiple backend, have a CHOICE in
backend choosing (imagine that! the end of the mother of all discussion in
Arch Linux), it would be surely easier. Why don't we simply start from this?
Let's make alpm a real library, let's abstract code, and let's create a
dbbackend set of functions. If we study the code closely, and we want to do
this (a bit boring) part, I can only imagine how many benefits we could take
from this.

I have nothing against the text-based database, but if we can "split" the
backend part, it could have a lot of great benefits too. What do you think
about that?

More recently, there was an attempt of a sqlite backend :
> http://www.archlinux.org/pipermail/pacman-dev/2008-January/011011.html
> As you can see, this raises several problems of migrating the current code
> base. And then there is also the problem of migrating the databases.
>
>
Yeah, and that demostrates what I said in the last paragraph. Anyway, even
if a bit OT, I think there could be a better solution than SQLite. Amarok
with large (3000+) collection anyone? It takes ages to create collections,
switching to MySQL makes things a lot faster. I'm not a SQLite guru again,
but I have heard many voices saying that it's not so fast as it seems,
especially in case it has a lot of entries (and that would be pacman's
case). I'd like to hear someone competent (clarification : Competent doesn't
mean "OMG text based sucks go SQLite because I was told it is faster"
trolls, I mean arguments, benchmarks, and maybe a pair of lines of code that
show how SQLite can handle a large DB).

Well, I hope that throwing out some random ideas about the backend that were
> discussed here wasn't too confusing.
> But I think there is definitively an interest for improving the backend in
>
>
No, that wasn't confusing at all :) I'm glad to hear that pacman developers
know the current situation and want to do something about it. Well, voices
don't always tell the truth, I expected a completely different answer here.
I would be glad if we could come out with a plan and make things better,
from us (me and other 2-3 people), there is a real interest in making alpm
even better.

Sorry for the long mail
Cheers,
Dario

one way or another.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://archlinux.org/pipermail/pacman-dev/attachments/20080310/ff3a8069/attachment.htm>