[arch-general] Xz, space savings and delta ... my findings

Nathan Wayde kumyco at konnichi.com
Thu Apr 1 14:54:56 CEST 2010

Here's one for ya.

Decompress a gzip'ed package, say pacman. Now, recompress every file in 
that directory with `xz -1`, smile :)

Result: the re-compression is faster than a tar+gz and in each case so 
far(kerne26, gcc, perl, pacman, kdelibs) is a little smaller as well. It 
doesn't approach the (much smaller) size of tar+xz but it does take less 
time to compress and is smaller than what we had before.

I know, you're thinking why do that. I wanted to implement an idea for 
the A.R.M, which is basically to allow downgrade on a file-by-file 
basis, my theory is that most(many) of the files in 2 versions of a 
package (esp. when they are minor versions or pkgrel bumped) are often 
exactly the same.

My test involved kdelibs-4.3.98-1 and kdelibs-4.4.0-3.
I first decompress both, and remove the duplicates from kdelibs-4.4.0-3 
which results in sizes of 59.5MB for the original kdelibs-4.3.98-1 and 
35.5MB for the de-deduped kdelibs-4.4.0-3. Now recompressing with `xz 
-1` kdelibs-4.4.0-3 is brought down to 10.7MB.

Comparing with the delta (between the original packages) which 5.3MB 
it's not a whole lot while allowing more flexibility(IMHO).
Some more numbers for comparison, the tar+xz(level -6) for 
kdelibs-4.4.0-3(de-duped) was 8.4M in comparison to tar+gz which was 
12.6MB the original tar'd sizes were 19.8MB for gz and 13.9MB for xz.

I know I didn't test based on pkgrel or try gz(as opposed to xz -1) only 
nor did it very scientifically but I thought it was very interesting 

I also suspect that the similar results would be seen for a test with 
xdelta on each file but that's not very useful to me so I'll leave that 
one for another day.

