On Thu, Feb 26, 2009 at 4:19 PM, Xavier <shiningxc@gmail.com> wrote:
On Mon, Feb 23, 2009 at 6:48 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
Questions which make the implementation complex: * When do we generate deltas? As part of the db scripts?
Well I think that would be practical. When a new package is being added, grab the old one, generate a delta, and add it to the database. This could be doable.
* How long do we keep them? 10 previous versions? 5?
I would think 5 is more than enough. Allan suggested more complicated ways of cleaning deltas, but we could indeed just use a simple limit like that. There is still the problem of finding which are the 5 newest deltas to be kept.
* How much additional space is this going to take? How do we set it up so that space-constrained mirrors can opt-out of the deltas?
That's a very good question I didn't consider. But well, I didn't expect to figure out and answer all the problems alone. I know nothing about mirror setup. And it seems there are quite a few users interested by delta though, so maybe some could help to provide some results about how much space it could take.
I'm sure there's more, but that's just "off the cuff". In my eyes, this is a complex change that doesn't really seem to benefit too many people. If you download 3megs instead of 7, it's not that big of a deal and has so many more points of failure to contend with.
The benefit can be much greater than that. I just wrote a quick hack so that will generate a delta for each package upgrade on my box, and stores them in a database. The first package that came in : 2,8M openjdk6-1.4-2_to_1.4.1-1-x86_64.delta 67M openjdk6-1.4.1-1-x86_64.pkg.tar.gz
On a decent 1MB/s line, that's a 1 minute difference for a single package.
But yes, it is clearly more complex and there is clearly many more points of failure.
So, ok, from a db-scripts point of view, we're going to have to do the following: when a new package is added: copy old package file from ftp to build dir generate delta from old file -> new file (in staging) add new pkg and delta to DB ? add new delta info _somewhere_? copy new pkg and delta to ftp Is this correct? If so, it's not all THAT complex. Less so if repo-add could simply spit out the deltas on it's own - if it can, we can simply add the logic to copy od packages to the build dir before calling repo-add, repo-add realizes there's another package there and uses it for deltas. Additionally, we run a cleanup script every few hours to remove old and/or unused packages this logic would simply need to be changed to scan deltas and leave $RETAINED_DELTAS for each package. I haven't been following the delta stuff too much, can we put the deltas in a totally unrelated directory? Is there delta information stored in the pacman DB? If so, the cleanup gets far more complicated?