[pacman-dev] delta support in libalpm

Xavier shiningxc at gmail.com
Mon Nov 10 04:30:44 EST 2008


On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus at googlemail.com> wrote:
> Hi,
>
> I have been looking through the current delta implementation in
> libalpm and have put some thought into changing makepkg/repo-add to
> support delta creation. However, I'm running into some problems,
> mostly due to md5sums and gzip.
>
> The current implementation works as follows. On a sync operation it is
> checked, whether a valid delta path exists and if the summed filesize
> of the deltas is smaller than the filesize of the whole download. When
> this is the case the deltas are downloaded and applied to the old
> file. After that the patched file is treated as if it was downloaded
> normally, this includes a check of the md5sum. Gzip files have a
> header, that has a timestamp, which will screw with this md5sum. When
> a patch is applied to a gzipped file by xdelta, xdelta will unzip the
> file, apply the patch and then rezip the file. The author of xdelta
> was obviously aware of the problems with the timestamp, because he
> decided to leave it empty. The same can be achieved by the -n option
> of gzip. But there comes the next problem, xdelta uses zlib for
> compression, gzip implements compression itself. And files created by
> gzip can differ from files created by zlib. Bsdtar uses zlib as well,
> but writes the timestamp and there is no option to prevent this (at
> least none that I can see).
>
> There are four ways around this, that I can think of:
>
> 1. create the package, then create the delta, apply the delta to the
> old version, remove the original new package and present the patched
> package as output
>
> I think this sucks, this ties delta creation to makepkg (more about
> that later) and has an incredibly huge and useless overhead (countless
> unzips and rezips and applying the patch).
>
> 2. create the package, but don't compress it with bsdtar, use gzip -n
> instead. This means we have to use gzip again, in libalpm, when we
> apply the delta.
>
> Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
> sure if this is a good thing, especially for libalpm.
>
> 3. save the md5sums of the unzipped tars in the synchdb and change
> libalpm to check those
>
> Seems reasonable, but I don't see a way to do this with libarchive, so
> this would require using zlib directly and pacman would lose the
> ability to handle to handle tar.bz2
>
> 4. Skip checking the md5sum for deltas
>
> OK during the initial synch, as long as we trust xdelta to do its job
> (the md5sums of both the old and the new file are in the delta file).
> But the created package will have the wrong md5sum and can't be used
> to reinstall, etc. which makes this look like a bad idea.
>
>
> In a previous mail Xavier toyed with the idea to put delta creation
> into repo-add, I have given this some thought, as it seems nice in
> principle, but there are drawbacks. For Arch this would mean creating
> deltas on Gerolde, which seems to be fairly strained already,
> according to the dev list. Furthermore this introduces some new
> variables to repo-add (at least repo location and an output location)
> this would be manageable, but doesn't look very nice.
>
> Delta creation in makepkg seems somehow ok (its already in there after
> all). But what I would really like is a separate tool for delta
> creation, which would allow the separation of building packages and
> creating deltas and setting up a separated delta server. This leaves
> us with options 2 and 3 and I am not really sure, which way to go.
>
>
> looking forward to your comments

I am very glad you looked into this, you seem to have a very good
understanding of the situation, possibly better than me, so it would
be great if you could fix and maintain this part.

I would just go with option 2. When deltas are used, libalpm already
relies on xdelta, so why not on gzip as well.



More information about the pacman-dev mailing list