[pacman-dev] delta support in libalpm

Brendan Hide brendan at swiftspirit.co.za
Mon Feb 23 06:27:46 EST 2009


b
Xavier wrote:
> Everything is already implemented in pacman, with a more complex logic
> (which might be totally useless after all)
> For each package in a sync db, there is a deltas file besides the
> depends and desc one which basically contains the list of deltas for
> that package and their size. With this information, and the contents
> of the filecache, it computes the shortest path (in term of download
> size) to the final package.
> That logic applied to an example :
> if you have file v1 in your cache, you want to upgrade to v3, and
> there are three deltas for this package : v1tov2 , v2tov3 and v1tov3
> If v1tov2 + v2tov3 is smaller than v1tov3, it will download the first
> two deltas and apply them to get v3. Otherwise it will download the
> third one.
>
> The problem of this implementation (besides being probably overkill)
> is that it requires information in the sync databases. So either it
> requires a big official effort to integrate this stuff and add deltas
> to all the official databases. Otherwise, I don't know. You need to
> fully mirror the repository you want to add deltas to, then you need
> to generate deltas (maybe during mirror sync) and to add the deltas to
> your database, and then host everything somewhere (the packages + the
> deltas + the database with delta info).
>   
This makes a lot more sense to me now. Thank you for the clarification, 
Xavier. It is the most efficient way, end-user-wise, despite the 
possibly-excessive metadata. It isn't necessarily efficient for the 
server. :/

Looking at the logistics, the best time to make the delta is after the 
new .pkg.tar.(gz|bz2) is uploaded to the repo. I assume this is also 
about the time the db is updated. This could be implemented repo-wide as 
packages are updated and delta'd without any individual package maker's 
direct involvement in the delta process - a "passive" change that won't 
need to change anyone's habits.

If you really want to be able to make lots of delta versions, ie, 
v1tov2, v1tov3, v1tov4, v2tov3, v2tov4, v3tov4, then you'd probably have 
to keep at least 4 older (full) versions that will take up a lot of disk 
space - or you'll need to regenerate all the other versions - take up a 
*lot* of IO / RAM / CPU during the generation of the new deltas.

If you only take v1tov2, v2tov3, v3tov4, you only need to keep v4 and 
the 3 deltas. When v5 gets uploaded, you create v4tov5 and delete v4 
from the server thus saving disk space. This is much simpler and more 
implementable than the current "brief".

Mirror servers can mirror the old way - inefficiently - however they 
should mirror the deltas across too. I guess that the mirror servers do 
a lot less bandwidth from the official repository than the end users.

The net result I believe is a much simpler implementation despite 
achieving 99% of the original brief's goal.

Your thoughts?

__________
Brendan Hide



More information about the pacman-dev mailing list