On Mon, Feb 23, 2009 at 9:57 AM, Brendan Hide <brendan@swiftspirit.co.za> wrote:
Xavier wrote:
There has never been any real official interests for delta. This seems to make a requirement the ability to make a separate delta server. This seems to require a separate delta database. This implies a new level of complexity and code bloat in pacman. Now maybe it is worth it, I don't know, it still makes me wondering why we put all this delta stuff in pacman to begin with. What was the problem with XferCommand, it seemed like it was a great idea. Now that wget-xdelta.sh script is just a toy, but a much more powerful python script could be written that has basically the same logic as pacman currently has + the ability to fetch and parse a separate delta database.
Unless the server is out of disk space, I'm not too sure exactly why there's a requirement for a separate server. If pacman is distributed with the delta option turned on by default, the server doing the actual "serving" of the updates is probably going to have 60 to 85% less work to do.
I will grant that there would be a new level of complexity involved, for example, if I've missed 4 updates, we'd have to "chain link" the tar.gz in my cache via 4 delta patches to get the current tar.gz.
I believe that the following would be the simplest implementation both in terms of how much implementation work is needed and the probable effectiveness: Put delta files into a separate folder (thus also avoiding a snapshot from containing the deltas): http://archlinux.mirror.ac.za/delta/core/os/x86_64/kernel26-2.6.28.4-1-x86_6... Thus, I could do the following (bash pseudocode) curl http://archlinux.mirror.ac.za/delta/core/os/x86_64/ > tmpfile grep $pkgname < tmpfile > listing failed=false cat listing | while read delta do [ $pkgname-$currentpkgversion-$pkgarch.xd3.tar.gz *within* $delta ] && start=true if [ start=true ] then while read delta do wget http://archlinux.mirror.ac.za/delta/core/os/x86_64/$delta && applydelta $delta $curfile [ $output=$pkgname-$newpkgversion-$pkgarch.tar.gz ] && break curfile=`ls -rt | tail -n 1` done fi [ $output=$pkgname-$newpkgversion-$pkgarch.tar.gz ] && break done
The above requires no db implementation at all and can work well even using the above very simple logic. And yes, by my own standards, the above is very bad bash pseudo-code. :P
Of the above, what is already implemented in pacman?
Everything is already implemented in pacman, with a more complex logic (which might be totally useless after all) For each package in a sync db, there is a deltas file besides the depends and desc one which basically contains the list of deltas for that package and their size. With this information, and the contents of the filecache, it computes the shortest path (in term of download size) to the final package. That logic applied to an example : if you have file v1 in your cache, you want to upgrade to v3, and there are three deltas for this package : v1tov2 , v2tov3 and v1tov3 If v1tov2 + v2tov3 is smaller than v1tov3, it will download the first two deltas and apply them to get v3. Otherwise it will download the third one. The problem of this implementation (besides being probably overkill) is that it requires information in the sync databases. So either it requires a big official effort to integrate this stuff and add deltas to all the official databases. Otherwise, I don't know. You need to fully mirror the repository you want to add deltas to, then you need to generate deltas (maybe during mirror sync) and to add the deltas to your database, and then host everything somewhere (the packages + the deltas + the database with delta info).