[pacman-dev] libalpm and backends

Wed Mar 8 17:40:26 EST 2006

Hello,

As a first step toward backend support, I slightly tweaked the database 
code from libalpm into this direction.

Basically, I've moved all functions relating to the backend from db.c 
inside a new file "be_files.c".  The idea is to have several distinct 
"be_XXX.c" files, one for each backend, implementing the hooks to access 
database fields.
The functions that need to be implemented in order to write a new 
backend are:
   _alpm_db_open
   _alpm_db_close
   _alpm_db_rewind
   _alpm_db_scan
   _alpm_db_read
   _alpm_db_write
   _alpm_db_remove

So, the next step should be to allow users to choose a backend with the
configure script as follows ("files" is the default behavior):
./configure --backend=files
or
./configure --backend=gdbm

This option should impact the list of targets built in the libalpm 
library: if gdbm is activated, only the object "be_gdbm.c" should be 
compiled and linked inside the library.

As of now, what is missing is to implement this selection in the 
configure.ac script and in the lib/libalpm/Makefile.am file.
Currently, the Makefile.am is as follows:
   TARGETS = packege.o db.o cache.o list.o ...
   TARGETS += be_files.o
This last line should be added conditionally depending on configure options.

And the final big step is to revamp the sync db archives handling.

To date, the database synchronization can't be easier: the library does 
not even need to read the archive, it simply unpacks it in 
/var/lib/pacman/<treename>.
It works well because gensync archives are mapped on the current "flat 
files" database format.

Now, let's imagine we're using a different backend... that can't work 
anymore: the library must read the archive, parse each entry and 
db_write it to the database.

The following patch is implementing the whole thing:

http://aurelien.foret.free.fr/archlinux/pacman/patches/pacman-lib-backend.patch

Although the modifications are quite small and straightforward, I'm 
submitting this patch for review to the pacman-dev ml because it has an 
obvious external impact on sync db archives.

Here are some explanations.

To ease the pain, the idea is to rework gensync to simplify the sync db 
archive content.
I reused the format from .PKGINFO files as used by makepkg and 
implemented it in gensync.
For instance, in these conditions, an entry from the sync db archive is 
a file named "dummy-1.0-1" with the following content:
   pkgname = dummy
   pkgver = 1.0-1
   depend = foobar1
   replaces = foobar2
   csize = 123456789
   ...
instead of a subdirectory "dummy-1.0-1" containing two files:
dummy-1.0-1/desc:
   %NAME%
   dummy

   %VERSION%
   1.0-1
   ...
and dummy-1.0-1/depends:
   %DEPENDS%
   foobar1
   ...

This way, it is possible to reuse the function parsedesc from package.c 
to parse both .PKGINFO files from package archives _and_ entries from 
database archives.
IMO, this is quite efficient and simple: makepkg and gensync are using 
the same format, and gensync database archives are simpler to parse and 
analyze.

Finally, I think pacman 3.0 is the right moment to implement it.
We already agreed to break compatibility for sync databases by moving 
"replaces" and "force" fields from "desc" to "depends" db files, and as 
a consequence pacman 3 release will already require users to synchronize 
their sync databases before being able to use them.
The gensync rework I'm suggesting will don't bring more hassle to the 
upgrade from pacman 2.9 to pacman 3.0.

I'm interested to get feedbacks on the general idea and the 
implementation, before possibly committing it.
-- 
Aurelien