[pacman-dev] [PATCH] pkgdelta: use highest compression ratio when creating deltas with xdelta3
Allan McRae
allan at archlinux.org
Thu Mar 20 23:02:23 EDT 2014
On 21/03/14 12:28, Matthias Krüger wrote:
> On 03/16/2014 04:34 AM, Allan McRae wrote:
>> On 15/03/14 13:12, Matthias Krüger wrote:
>>> On 03/13/2014 07:11 AM, Allan McRae wrote:
>>>> On 06/03/14 09:25, Matthias Krüger wrote:
>>>>> Side note: it might be even more advantageous to use bsdiff instead of
>>>>> xdelta3
>>>>> comparing the /usr/bin/blender binaries of the above versions
>>>>> (12:2.69.c7ac0e-1 and 13:2.69.13290d-1) :
>>>>>
>>>>> xdelta3 10.4M
>>>>> xdelta3 -9 9.9M
>>>>> bsdiff 4.7M
>>>> I took a look, and changing from xdelta3 to bsdiff would be very
>>>> simple.
>>>> It looks like it is a five minute patch...
>>>>
>>>> But what I need is for someone to generate deltas (with and without -9
>>>> maybe) for a whole bunch of packages. Then generate diffs using bsdiff
>>>> and compare the results. The comparison will need to include:
>>>>
>>>> 1) size of deltas/diffs
>>>> 2) memory used when reconstructing package
>>>> 3) time taken to reconstruct package.
>>>>
>>>> Once we have that information, we can make an informed decision.
>>>>
>>>> Allan
>>>>
>>>>
>>> I got some numbers for xdelta (-9), see attached file.
>>> If someone provides some script or tool to run bsdiff on a package
>>> properly (so that it does not diff the archives themselves), I can offer
>>> to compute the numbers for bsdiff as well.
>> The numbers of -9 look like there is no significant change. A quick look
>> here showed also no change adding -S djw. I'll accept a patch adding
>> both -S djw and -9 to the diff creation.
>>
>> I ran some tests on my system. bsdiff uses masses of memory when
>> reconstructing the file (needs ~16x the size of the file). It can use
>> less, but the performance penalty is massive. And it uses even more
>> when creating the diff. Coupled with its lack of transparent
>> decompression, I don't think we should consider that further.
>>
>> Allan
> Uhmm, according to the bsdiff website:
>> bsdiff is quite memory-hungry. It requires max(17*n,9*n+m)+O(1) bytes
> of memory, where n is the size of the old file and m is the size of the
> new file.
>> bspatch requires n+m+O(1) bytes.
> I did a quick test and generating a 40 MB newfile (binary executable)
> from a 40 MB oldfile and 4kb patch was done very quickly, ps_mem said
> bspatch needed around 98.2 MB for the one time I managed to actually
> measure it.
Looks like I got my numbers mixed up with creating and applying the diff.
> Fedora seems to have some deltarpm tool https://gitorious.org/deltarpm
> which seems to make use of bsdiff, maybe this can be tweaked to be used
> for arch packages?
Sure. If someone comes up with a tool that takes two packages and
creates the diff and can reconstruct it from our package files
(compressed tarballs) and uses bsdiff as its backend, then I will
consider it. Looks like some of that could be guided by deltarpm. But as
bsdiff stands, we need to manually decompress the packages before making
the diff and before and after reconstructing the package. bsdiff is not
worth further consideration without someone putting in that work.
For the moment, I will take a patch adding "-S djw -9" to our delta
creation script.
Allan
More information about the pacman-dev
mailing list