new Delta Update support
Hey guys, previously pacman have had a delta update functionality which was (to my understanding) removed because of safety concerns and poor performance. Since the old delta version was written (and removed) zstd got the ability to create delta patches. I was running some tests on that, and it seems super promising in terms of performance for creation as well as applying those patches, and the deduplication ratio is phenomenal. I created the patches on an arch-vm running on an AMD EPYC 7702P with 10 unshared cores and 32 GB memory, the storage is an SSD RAID10. # Example: Widelands It got 731M version 1.0-1 and 4 successive updates within ~1 year with just pkgrel increasing. Total for the 4 additional versions is 2.9G. Creating all 4 delta patches: 1->5 2->5 3->5 4->5 It would take a total of real 0m24.383s usr 0m19.814s sys 0m5.859s and all patches combined would be 39M. I tested out applying them on a very old netbook (the slowest device I got on hand) 2 GB memory, Intel Atom x5-Z8300 and Arch on an MMC-SSD. The 4->5 patch for example takes real 1m32,850s user 0m2,325s sys 0m16,562s. On a more modern system (Intel Core i5-1135G7 / 16 GB memory / NVMe SSD) this take just real 0m1.208s user 0m0.255s sys 0m0.748s. Downloading the full package on the other hand over LTE takes 9m22s - downloading just the delta patch take 0m5s. # Other examples While I chose a fairly big package with (obviously) low amount of changes between the pkgrel versions, here are some other examples: libreoffice-fresh-7.2.5-5 to 7.3.0-1 saving 40% libreoffice-fresh-7.3.0-1 to 7.3.0-2 saving 45% 0ad-data-a25-1 to a25-2 saving 99.96% * 0ad-data-a25-2 to a25.b-1 saving 99.84% * glibc-2.35-3 to 2.35-4 saving 70% glibc-2.32-5 to 2.35-1 saving 51.8% opencv-4.5.5-4 to opencv-4.5.5-5 saving 93.3% opencv-4.5.4-9 to 4.5.5-1 saving 37% * had to split the tar archive after 1.6 GB and make patches for each part since zstd can only handle 2 GB files. # What would need to be done, to get this going? Well, the packages in the repo need to get a second signature, for the uncompressed tar package. Pacman could first try to fetch this tar signature. If it's on the server, the server supports delta updates – if not, the full update would be loaded. The database files would need to be extended to include the signatures for the uncompressed tar archives as well as the signature. Now pacman can fetch the patch file which fits for it's stored version to the latest version, decompressed the package stored locally and applies the patch file to it. Then pacman would check the signature/checksum of the resulting tar archive, read it and discard the uncompressed files afterwards. # Caveats This would obviously result in the pkg cache containing a full package file and 1 full then, 1->2 delta, 2->3 delta, 3->4 delta over time. This could be cleaned up by calculating the last version which should be stored and compress the file locally and store it with a dedicated extension to not clash with the regular packages (and their signatures). # Database files This could also work for database updates, obviously. But it would need a bit more work as the database files would need to be versioned (or maybe the timestamp is enough?). On a daily update of the community db, we could for example save 88.3% (2022-05-06 to 2022-05-07). On inter-daily updates, this would be down to just a couple of K: The last update of the community repo was just 40K as patch-file. That means saving 99.4% while applying only takes 0m0.075s. --- Hope you that's interesting and a thing you could look into :) Best regards, Ruben
On 8/5/22 09:10, Ruben Kelevra wrote:
Hey guys,
previously pacman have had a delta update functionality which was (to my understanding) removed because of safety concerns and poor performance.
Since the old delta version was written (and removed) zstd got the ability to create delta patches.
Thanks. However I have little interest tying delta support to a specific compression algorithm. Allan
Hey Allan, well, this works with all compression algorithms. Just the delta patch would be compressed with zstd. But since Pacman would decompress the regular package to a tar and the result is a tar, it doesn't matter which compression algorithm the packages use. Apart from that, the delta patch could be made with any program - zstd is just currently the best option IMHO. xdelta3 patch files take (with the default options) around 10 times longer to be created than zstd patch files – which then are a bit smaller. But xdelta3 uses zlib, so decoding is much slower as well. That's why I used zstd in my experiments instead. Zstd can be however asked to invest more time into compression: With something like '-15' which then usually outperforms xdelta3 patch files in size while still being much faster in terms of decoding – I just don't see the point investing so much CPU time into that. The major change for pacman would be to support a tar based checksum and signature along with the compressed file signature/checksum. And additionally understand which files to fetch to get from version x to y and which program to call to create the tar it needs. Best regards, Ruben On Sun, 8 May 2022 at 09:48, Allan McRae <allan@archlinux.org> wrote:
On 8/5/22 09:10, Ruben Kelevra wrote:
Hey guys,
previously pacman have had a delta update functionality which was (to my understanding) removed because of safety concerns and poor performance.
Since the old delta version was written (and removed) zstd got the ability to create delta patches.
Thanks. However I have little interest tying delta support to a specific compression algorithm.
Allan
participants (2)
-
Allan McRae
-
Ruben Kelevra