[arch-general] Xz, space savings and delta ... my findings
kumyco at konnichi.com
Thu Apr 1 14:54:56 CEST 2010
Here's one for ya.
Decompress a gzip'ed package, say pacman. Now, recompress every file in
that directory with `xz -1`, smile :)
Result: the re-compression is faster than a tar+gz and in each case so
far(kerne26, gcc, perl, pacman, kdelibs) is a little smaller as well. It
doesn't approach the (much smaller) size of tar+xz but it does take less
time to compress and is smaller than what we had before.
I know, you're thinking why do that. I wanted to implement an idea for
the A.R.M, which is basically to allow downgrade on a file-by-file
basis, my theory is that most(many) of the files in 2 versions of a
package (esp. when they are minor versions or pkgrel bumped) are often
exactly the same.
My test involved kdelibs-4.3.98-1 and kdelibs-4.4.0-3.
I first decompress both, and remove the duplicates from kdelibs-4.4.0-3
which results in sizes of 59.5MB for the original kdelibs-4.3.98-1 and
35.5MB for the de-deduped kdelibs-4.4.0-3. Now recompressing with `xz
-1` kdelibs-4.4.0-3 is brought down to 10.7MB.
Comparing with the delta (between the original packages) which 5.3MB
it's not a whole lot while allowing more flexibility(IMHO).
Some more numbers for comparison, the tar+xz(level -6) for
kdelibs-4.4.0-3(de-duped) was 8.4M in comparison to tar+gz which was
12.6MB the original tar'd sizes were 19.8MB for gz and 13.9MB for xz.
I know I didn't test based on pkgrel or try gz(as opposed to xz -1) only
nor did it very scientifically but I thought it was very interesting
I also suspect that the similar results would be seen for a test with
xdelta on each file but that's not very useful to me so I'll leave that
one for another day.
More information about the arch-general