[pacman-dev] delta support in libalpm

Xavier shiningxc at gmail.com
Tue Feb 10 09:41:42 EST 2009


On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus
<henning.garus at googlemail.com> wrote:
> Hi,
>
> I have been looking through the current delta implementation in
> libalpm and have put some thought into changing makepkg/repo-add to
> support delta creation. However, I'm running into some problems,
> mostly due to md5sums and gzip.
>
> The current implementation works as follows. On a sync operation it is
> checked, whether a valid delta path exists and if the summed filesize
> of the deltas is smaller than the filesize of the whole download. When
> this is the case the deltas are downloaded and applied to the old
> file. After that the patched file is treated as if it was downloaded
> normally, this includes a check of the md5sum. Gzip files have a
> header, that has a timestamp, which will screw with this md5sum. When
> a patch is applied to a gzipped file by xdelta, xdelta will unzip the
> file, apply the patch and then rezip the file. The author of xdelta
> was obviously aware of the problems with the timestamp, because he
> decided to leave it empty. The same can be achieved by the -n option
> of gzip. But there comes the next problem, xdelta uses zlib for
> compression, gzip implements compression itself. And files created by
> gzip can differ from files created by zlib. Bsdtar uses zlib as well,
> but writes the timestamp and there is no option to prevent this (at
> least none that I can see).
>
> There are four ways around this, that I can think of:
>
> 1. create the package, then create the delta, apply the delta to the
> old version, remove the original new package and present the patched
> package as output
>
> I think this sucks, this ties delta creation to makepkg (more about
> that later) and has an incredibly huge and useless overhead (countless
> unzips and rezips and applying the patch).
>
> 2. create the package, but don't compress it with bsdtar, use gzip -n
> instead. This means we have to use gzip again, in libalpm, when we
> apply the delta.
>
> Seems better than 1, but makes makepkg and libalpm rely on gzip. Not
> sure if this is a good thing, especially for libalpm.
>
> 3. save the md5sums of the unzipped tars in the synchdb and change
> libalpm to check those
>
> Seems reasonable, but I don't see a way to do this with libarchive, so
> this would require using zlib directly and pacman would lose the
> ability to handle to handle tar.bz2
>
> 4. Skip checking the md5sum for deltas
>
> OK during the initial synch, as long as we trust xdelta to do its job
> (the md5sums of both the old and the new file are in the delta file).
> But the created package will have the wrong md5sum and can't be used
> to reinstall, etc. which makes this look like a bad idea.
>
>
> In a previous mail Xavier toyed with the idea to put delta creation
> into repo-add, I have given this some thought, as it seems nice in
> principle, but there are drawbacks. For Arch this would mean creating
> deltas on Gerolde, which seems to be fairly strained already,
> according to the dev list. Furthermore this introduces some new
> variables to repo-add (at least repo location and an output location)
> this would be manageable, but doesn't look very nice.
>
> Delta creation in makepkg seems somehow ok (its already in there after
> all). But what I would really like is a separate tool for delta
> creation, which would allow the separation of building packages and
> creating deltas and setting up a separated delta server. This leaves
> us with options 2 and 3 and I am not really sure, which way to go.
>
>
> looking forward to your comments

A very small bump on this :)

1) gzip -n usage

But first, in the last discussion we had which started with the above
mail, it seems we were more in favor of option 2) :
> 2. create the package, but don't compress it with bsdtar, use gzip -n
> instead. This means we have to use gzip again, in libalpm, when we
> apply the delta.

In fact, Nathan already made a patch for that. I think this patch looks fine :
http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6427986

2) repo-add vs makepkg support

Nathan even made one to add support to repo-add too, but this patch
looked a bit more scary :
http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6427987
It was more complex than I hoped. But the simpler way I was thinking
about was to get delta support only in repo-add, instead of both
makepkg and repo-add :
http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6601225
Dan seemed to think it was better in repo-add, and Henning seems to
think it is better in makepkg. We need more discussion on this and
finally take a decision :)

2.1) About Nathan's patch to support both
If we do want to have the functionality in both makepkg and repo-add,
it would be cool to try to cleanup the code a bit, for example this :
+# create_xdelta_file - will create a delta for the package filename given.
+#
+# params:
+#  $1 - the filename of the package
+#  $2 - the arch of the package
+#  $3 - the version and release of the package
+#  $4 - the directory where the package is located
+#  $5 - the extension of packages
+#  $6 - 0 if an existing delta file should not be overwritten
+#  $7 - the filename of the previous package (blank if not known)
+#  $8 - the version of the previous package (blank if not known)

That's a lot of params :)

3) format of delta in the database

However I don't think there is any repo-add / makepkg patch to support
the new format. Henning also made a comment about the format :
http://bugs.archlinux.org/task/12000#comment34162
"So basically the current delta implementation is working. Only the
support in makepkg/repo-add is wrong. I am not exactly sure though,
why libalpm expects the md5sums of the old and the new package. I am
not sure if these are even used anywhere. I would feel save enough
with xdelta checking those and then libalpm checking the md5sum of the
final patched package."

I guess Dan added these two md5sums for safety but yes, they might not
be needed, I would also be fine with dropping them, even if they don't
hurt.


More information about the pacman-dev mailing list