[arch-general] Xz, space savings and delta ... my findings
Here's one for ya. Decompress a gzip'ed package, say pacman. Now, recompress every file in that directory with `xz -1`, smile :) Result: the re-compression is faster than a tar+gz and in each case so far(kerne26, gcc, perl, pacman, kdelibs) is a little smaller as well. It doesn't approach the (much smaller) size of tar+xz but it does take less time to compress and is smaller than what we had before. I know, you're thinking why do that. I wanted to implement an idea for the A.R.M, which is basically to allow downgrade on a file-by-file basis, my theory is that most(many) of the files in 2 versions of a package (esp. when they are minor versions or pkgrel bumped) are often exactly the same. My test involved kdelibs-4.3.98-1 and kdelibs-4.4.0-3. I first decompress both, and remove the duplicates from kdelibs-4.4.0-3 which results in sizes of 59.5MB for the original kdelibs-4.3.98-1 and 35.5MB for the de-deduped kdelibs-4.4.0-3. Now recompressing with `xz -1` kdelibs-4.4.0-3 is brought down to 10.7MB. Comparing with the delta (between the original packages) which 5.3MB it's not a whole lot while allowing more flexibility(IMHO). Some more numbers for comparison, the tar+xz(level -6) for kdelibs-4.4.0-3(de-duped) was 8.4M in comparison to tar+gz which was 12.6MB the original tar'd sizes were 19.8MB for gz and 13.9MB for xz. I know I didn't test based on pkgrel or try gz(as opposed to xz -1) only nor did it very scientifically but I thought it was very interesting non-the-less. I also suspect that the similar results would be seen for a test with xdelta on each file but that's not very useful to me so I'll leave that one for another day.
On Thu, 01 Apr 2010 13:54:56 +0100 Nathan Wayde <kumyco@konnichi.com> wrote:
Comparing with the delta (between the original packages) which 5.3MB it's not a whole lot while allowing more flexibility(IMHO).
I don't see the point. with binary delta's you get smaller packages and getting the non-changed files "again" doesn't really cause harm, does it? In other words: what useful things does this approach allow you to do that we cannot do right now? interesting little experiment though. it's always cool to compare numbers ;) Dieter
On 01/04/10 14:03, Dieter Plaetinck wrote:
On Thu, 01 Apr 2010 13:54:56 +0100 Nathan Wayde<kumyco@konnichi.com> wrote:
Comparing with the delta (between the original packages) which 5.3MB it's not a whole lot while allowing more flexibility(IMHO).
I don't see the point. with binary delta's you get smaller packages and getting the non-changed files "again" doesn't really cause harm, does it? In other words: what useful things does this approach allow you to do that we cannot do right now?
My use-case is for the downgrade scenario where you have upgraded, then later you realize there is some problem and you no longer have any old packages. Now, in theory I could simply rebuild the pkg with local files thus saving the large download. But what would be the point if I can simply get the files I need.
In theory, from what I've understand from archlinux, you can't downgrade by downloading a package. You must fetch it from pacman's package cache. So size doesn't matter at all. But I may be wrong. -- Cordialement, Coues Ludovic 06 148 743 42 -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
participants (3)
-
Dieter Plaetinck
-
ludovic coues
-
Nathan Wayde