[pacman-dev] delta support in libalpm

Aaron Griffin aaronmgriffin at gmail.com
Thu Feb 26 17:34:11 EST 2009


On Thu, Feb 26, 2009 at 4:19 PM, Xavier <shiningxc at gmail.com> wrote:
> On Mon, Feb 23, 2009 at 6:48 PM, Aaron Griffin <aaronmgriffin at gmail.com> wrote:
>>
>> Questions which make the implementation complex:
>> * When do we generate deltas? As part of the db scripts?
>
> Well I think that would be practical. When a new package is being
> added, grab the old one, generate a delta, and add it to the database.
> This could be doable.
>
>> * How long do we keep them? 10 previous versions? 5?
>
> I would think 5 is more than enough. Allan suggested more complicated
> ways of cleaning deltas, but we could indeed just use a simple limit
> like that. There is still the problem of finding which are the 5
> newest deltas to be kept.
>
>> * How much additional space is this going to take? How do we set it up
>> so that space-constrained mirrors can opt-out of the deltas?
>>
>
> That's a very good question I didn't consider. But well, I didn't
> expect to figure out and answer all the problems alone. I know nothing
> about mirror setup.
> And it seems there are quite a few users interested by delta though,
> so maybe some could help to provide some results about how much space
> it could take.
>
>> I'm sure there's more, but that's just "off the cuff". In my eyes,
>> this is a complex change that doesn't really seem to benefit too many
>> people. If you download 3megs instead of 7, it's not that big of a
>> deal and has so many more points of failure to contend with.
>
> The benefit can be much greater than that. I just wrote a quick hack
> so that will generate a delta for each package upgrade on my box, and
> stores them in a database. The first package that came in :
> 2,8M openjdk6-1.4-2_to_1.4.1-1-x86_64.delta
> 67M openjdk6-1.4.1-1-x86_64.pkg.tar.gz
>
> On a decent 1MB/s line, that's a 1 minute difference for a single package.
>
> But yes, it is clearly more complex and there is clearly many more
> points of failure.

So, ok, from a db-scripts point of view, we're going to have to do the
following:

when a new package is added:
   copy old package file from ftp to build dir
   generate delta from old file -> new file (in staging)
   add new pkg and delta to DB
   ? add new delta info _somewhere_?
   copy new pkg and delta to ftp

Is this correct? If so, it's not all THAT complex. Less so if repo-add
could simply spit out the deltas on it's own - if it can, we can
simply add the logic to copy od packages to the build dir before
calling repo-add, repo-add realizes there's another package there and
uses it for deltas.

Additionally, we run a cleanup script every few hours to remove old
and/or unused packages
this logic would simply need to be changed to scan deltas and leave
$RETAINED_DELTAS for each package.

I haven't been following the delta stuff too much, can we put the
deltas in a totally unrelated directory? Is there delta information
stored in the pacman DB? If so, the cleanup gets far more complicated?


More information about the pacman-dev mailing list