Hi Xavier,
Sorry if I didn't hit you back immediately, but I wanted to read every discussion and make up my mind about the various proposals.

So, let's go:

I think we are more interested in trying to improve the current text based
one rather than switching to a binary database, at least as a first step.
For example, for the sync db, we could directly read from the archive :
http://bugs.archlinux.org/task/8586

Glad to hear. About the first one, I'm not a tar guru, and I don't know about  how fast things could be if we read things from archive to memory. I've seen there is already some code for it, but benchmarks in real use cases would be better. What I fear is that reading too many times from a tar'ed archive could be somehow counterproductive (again, I could be completely wrong on that). I think that could be possible, if it's really worth it, but I don't have enough knowledge on this argument to say if that would be useful. I'd be glad to hear other voices.

A second point of interest is unifying the format of the meta-info files
inside the package (.PKGINFO) and the ones inside the database (depends /
desc). Having just one format would simplify the code.
We could also reduce the number of files in the sync and local database by
merging them, which would reduce the impact of fragmentation and slow
filesystems.
http://www.archlinux.org/pipermail/pacman-dev/2007-June/008601.html


This would surely be a plus, by now there is a real mess in the packages dirs, and it's useless, code-consuming and confusing to have different formats. Also, I propose to keep scriptlets in the Database. I clear my cache a lot of times for the sake of disk space, and whenever I try to remove a package, the remove scriptlet is not found. That could be harmful. It wouldn't be a lot of space. Just a random thought anyway.

Otherwise, if we want to experiment with different backends, it would
probably be a good idea to try abstracting the backend code, so that we would
have all specific backend code in the same place, and have the possibility to
easily switch between several ones. That was also discussed ages ago, in the
same link as above :
http://www.archlinux.org/pipermail/pacman-dev/2006-March/005702.html

This is *the* point. Everything we're saying is useless if we don't abstract things first. If we keep a "dirty" implementation, where backend code is mixed with database files parsing, porting improvements is a royal pain in that place. If we could create a set of functions that take care of giving out strings or files to parse to the current db functions in alpm, then changing the current backend, using multiple backend, have a CHOICE in backend choosing (imagine that! the end of the mother of all discussion in Arch Linux), it would be surely easier. Why don't we simply start from this? Let's make alpm a real library, let's abstract code, and let's create a dbbackend set of functions. If we study the code closely, and we want to do this (a bit boring) part, I can only imagine how many benefits we could take from this.

I have nothing against the text-based database, but if we can "split" the backend part, it could have a lot of great benefits too. What do you think about that?

More recently, there was an attempt of a sqlite backend :
http://www.archlinux.org/pipermail/pacman-dev/2008-January/011011.html
As you can see, this raises several problems of migrating the current code
base. And then there is also the problem of migrating the databases.

Yeah, and that demostrates what I said in the last paragraph. Anyway, even if a bit OT, I think there could be a better solution than SQLite. Amarok with large (3000+) collection anyone? It takes ages to create collections, switching to MySQL makes things a lot faster. I'm not a SQLite guru again, but I have heard many voices saying that it's not so fast as it seems, especially in case it has a lot of entries (and that would be pacman's case). I'd like to hear someone competent (clarification : Competent doesn't mean "OMG text based sucks go SQLite because I was told it is faster" trolls, I mean arguments, benchmarks, and maybe a pair of lines of code that show how SQLite can handle a large DB).

Well, I hope that throwing out some random ideas about the backend that were
discussed here wasn't too confusing.
But I think there is definitively an interest for improving the backend in

No, that wasn't confusing at all :) I'm glad to hear that pacman developers know the current situation and want to do something about it. Well, voices don't always tell the truth, I expected a completely different answer here. I would be glad if we could come out with a plan and make things better, from us (me and other 2-3 people), there is a real interest in making alpm even better.

Sorry for the long mail
Cheers,
Dario

one way or another.