Re: [pacman-dev] delta support in libalpm

23 Feb 2009

      On Mon, Feb 23, 2009 at 9:57 AM, Brendan Hide <brendan@swiftspirit.co.za> wrote:
...
Xavier wrote:
...
There has never been any real official interests for delta. This seems
to make a requirement the ability to make a separate delta server.
This seems to require a separate delta database. This implies a new
level of complexity and code bloat in pacman. Now maybe it is worth
it, I don't know, it still makes me wondering why we put all this
delta stuff in pacman to begin with. What was the problem with
XferCommand, it seemed like it was a great idea. Now that
wget-xdelta.sh script is just a toy, but a much more powerful python
script could be written that has basically the same logic as pacman
currently has + the ability to fetch and parse a separate delta
database.
Unless the server is out of disk space, I'm not too sure exactly why there's
a requirement for a separate server. If pacman is distributed with the delta
option turned on by default, the server doing the actual "serving" of the
updates is probably going to have 60 to 85% less work to do.
I will grant that there would be a new level of complexity involved, for
example, if I've missed 4 updates, we'd have to "chain link" the tar.gz in
my cache via 4 delta patches to get the current tar.gz.
I believe that the following would be the simplest implementation both in
terms of how much implementation work is needed and the probable
effectiveness:
Put delta files into a separate folder (thus also avoiding a snapshot from
containing the deltas):
http://archlinux.mirror.ac.za/delta/core/os/x86_64/kernel26-2.6.28.4-1-x86_6...
Thus, I could do the following (bash pseudocode)
curl http://archlinux.mirror.ac.za/delta/core/os/x86_64/ > tmpfile
grep $pkgname < tmpfile > listing
failed=false
cat listing | while read delta
do
 [ $pkgname-$currentpkgversion-$pkgarch.xd3.tar.gz *within* $delta ] &&
start=true
if [ start=true ]
then while read delta
 do
 wget http://archlinux.mirror.ac.za/delta/core/os/x86_64/$delta &&
applydelta $delta $curfile
 [ $output=$pkgname-$newpkgversion-$pkgarch.tar.gz ] && break
 curfile=`ls -rt | tail -n 1`
 done
fi
[ $output=$pkgname-$newpkgversion-$pkgarch.tar.gz ] && break
done
The above requires no db implementation at all and can work well even using
the above very simple logic.
And yes, by my own standards, the above is very bad bash pseudo-code. :P
Of the above, what is already implemented in pacman?
Everything is already implemented in pacman, with a more complex logic
(which might be totally useless after all)
For each package in a sync db, there is a deltas file besides the
depends and desc one which basically contains the list of deltas for
that package and their size. With this information, and the contents
of the filecache, it computes the shortest path (in term of download
size) to the final package.
That logic applied to an example :
if you have file v1 in your cache, you want to upgrade to v3, and
there are three deltas for this package : v1tov2 , v2tov3 and v1tov3
If v1tov2 + v2tov3 is smaller than v1tov3, it will download the first
two deltas and apply them to get v3. Otherwise it will download the
third one.

The problem of this implementation (besides being probably overkill)
is that it requires information in the sync databases. So either it
requires a big official effort to integrate this stuff and add deltas
to all the official databases. Otherwise, I don't know. You need to
fully mirror the repository you want to add deltas to, then you need
to generate deltas (maybe during mirror sync) and to add the deltas to
your database, and then host everything somewhere (the packages + the
deltas + the database with delta info).