[pacman-dev] delta support in libalpm

Henning Garus henning.garus at googlemail.com
Fri Nov 7 18:47:46 EST 2008


I have been looking through the current delta implementation in
libalpm and have put some thought into changing makepkg/repo-add to
support delta creation. However, I'm running into some problems,
mostly due to md5sums and gzip.

The current implementation works as follows. On a sync operation it is
checked, whether a valid delta path exists and if the summed filesize
of the deltas is smaller than the filesize of the whole download. When
this is the case the deltas are downloaded and applied to the old
file. After that the patched file is treated as if it was downloaded
normally, this includes a check of the md5sum. Gzip files have a
header, that has a timestamp, which will screw with this md5sum. When
a patch is applied to a gzipped file by xdelta, xdelta will unzip the
file, apply the patch and then rezip the file. The author of xdelta
was obviously aware of the problems with the timestamp, because he
decided to leave it empty. The same can be achieved by the -n option
of gzip. But there comes the next problem, xdelta uses zlib for
compression, gzip implements compression itself. And files created by
gzip can differ from files created by zlib. Bsdtar uses zlib as well,
but writes the timestamp and there is no option to prevent this (at
least none that I can see).

There are four ways around this, that I can think of:

1. create the package, then create the delta, apply the delta to the
old version, remove the original new package and present the patched
package as output

I think this sucks, this ties delta creation to makepkg (more about
that later) and has an incredibly huge and useless overhead (countless
unzips and rezips and applying the patch).

2. create the package, but don't compress it with bsdtar, use gzip -n
instead. This means we have to use gzip again, in libalpm, when we
apply the delta.

Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
sure if this is a good thing, especially for libalpm.

3. save the md5sums of the unzipped tars in the synchdb and change
libalpm to check those

Seems reasonable, but I don't see a way to do this with libarchive, so
this would require using zlib directly and pacman would lose the
ability to handle to handle tar.bz2

4. Skip checking the md5sum for deltas

OK during the initial synch, as long as we trust xdelta to do its job
(the md5sums of both the old and the new file are in the delta file).
But the created package will have the wrong md5sum and can't be used
to reinstall, etc. which makes this look like a bad idea.

In a previous mail Xavier toyed with the idea to put delta creation
into repo-add, I have given this some thought, as it seems nice in
principle, but there are drawbacks. For Arch this would mean creating
deltas on Gerolde, which seems to be fairly strained already,
according to the dev list. Furthermore this introduces some new
variables to repo-add (at least repo location and an output location)
this would be manageable, but doesn't look very nice.

Delta creation in makepkg seems somehow ok (its already in there after
all). But what I would really like is a separate tool for delta
creation, which would allow the separation of building packages and
creating deltas and setting up a separated delta server. This leaves
us with options 2 and 3 and I am not really sure, which way to go.

looking forward to your comments

