[arch-dev-public] RFC: (devtools) Changing default compression method to zstd
Sven-Hendrik Haase
svenstaro at gmail.com
Sun Mar 24 18:42:48 UTC 2019
On Sun, 24 Mar 2019 at 19:35, Robin Broda via arch-dev-public <
arch-dev-public at archlinux.org> wrote:
> Hello all,
>
> in the past few weeks, some TUs and Developers have compared different
> compression algorithms to potentially replace the default compression
> method used in devtools.
> The current method is `xz -c -z -` which is single-threaded and rather
> slow, so we are looking to replace it with something faster.
>
> Multithreaded xz has come up in the past, and was quickly dismissed due to
> edge cases that would end up with packages being unreproducible on
> different machines - namely, xz -T0 -- the method that automatically
> determines the amount of threads -- produces different results when the
> amount of cores in a system is == 1:
> $ taskset -c 1 xz -c -z - -T0 < test > test.xz && sha256sum test.xz
> fe95a1af78304ae4be508e071f6697296e52b625fba95fca5622757779633d90 test.xz
> $ taskset -c 1,2 xz -c -z - -T0 < test > test.xz && sha256sum test.xz
> 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99 test.xz
> $ taskset -c 1,2,3 xz -c -z - -T0 < test > test.xz && sha256sum test.xz
> 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99 test.xz
>
> With this mail, i propose to switch to `zstd` instead (
> https://github.com/facebook/zstd).
> zstd does *not* exhibit this issue, and anthraxx has asked for some
> clarifications (
> https://github.com/facebook/zstd/issues/999#issuecomment-474114799) -
> just in case.
> The response is that zstd is generally friendly to reproducible builds.
>
> After some testing with heftig, I ran some additional benchmarks on our
> new build host 'dragon' to determine the appropriate compression level.
> Here are the results: (sorry for the wide mail :b)
>
> Compressor Package Name Size (MiB) Comp. Size
> (MiB) Ratio Time (mm:ss) Max. RSS in MiB Decomp. Time (mm:ss) Decomp.
> RSS in MiB
> xz -c -z - cuda 3038,58 1316,93
> 43,34% 19:03.44 95,32 1:19.74 10,18
> zstd -c -T0 -18 - cuda 3038,58 1375,41
> 45,26% 01:12.50 2648,93 0:04.46 10,70
> zstd -c -T0 -19 - cuda 3038,58 1371,94
> 45,15% 01:34.13 3401,67 0:04.47 10,73
> zstd -c -T0 -20 - cuda 3038,58 1371,94
> 45,15% 01:34.34 3416,90 0:04.46 10,79
> zstd -c -T0 -21 - cuda 3038,58 1371,94
> 45,15% 01:31.60 3414,14 0:04.46 10,79
> xz -c -z - gcc 135,54 33,11
> 24,43% 00:54.54 95,34 0:02.59 10,11
> zstd -c -T0 -18 - gcc 135,54 35,87
> 26,47% 00:12.37 419,23 0:00.23 10,77
> zstd -c -T0 -19 - gcc 135,54 35,66
> 26,31% 00:15.76 578,99 0:00.24 10,66
> zstd -c -T0 -20 - gcc 135,54 35,66
> 26,31% 00:16.36 579,11 0:00.25 10,75
> zstd -c -T0 -21 - gcc 135,54 35,66
> 26,31% 00:16.18 579,01 0:00.25 10,46
> xz -c -z - go 484,10 122,10
> 25,22% 03:19.11 95,35 0:08.78 10,16
> zstd -c -T0 -18 - go 484,10 132,69
> 27,41% 00:15.40 1402,99 0:00.80 10,80
> zstd -c -T0 -19 - go 484,10 131,84
> 27,23% 00:19.74 1914,07 0:00.79 10,78
> zstd -c -T0 -20 - go 484,10 131,84
> 27,23% 00:20.19 1914,11 0:00.77 10,72
> zstd -c -T0 -21 - go 484,10 131,84
> 27,23% 00:20.08 1914,09 0:00.79 10,78
> xz -c -z - intellij-idea-community-edition 772,46 384,37
> 49,76% 04:53.01 95,31 0:28.69 10,18
> zstd -c -T0 -18 - intellij-idea-community-edition 772,46 392,44
> 50,80% 00:27.10 2341,02 0:00.91 10,63
> zstd -c -T0 -19 - intellij-idea-community-edition 772,46 391,04
> 50,62% 00:37.09 3107,97 0:00.93 10,47
> zstd -c -T0 -20 - intellij-idea-community-edition 772,46 391,04
> 50,62% 00:34.43 3107,87 0:00.93 10,70
> zstd -c -T0 -21 - intellij-idea-community-edition 772,46 391,04
> 50,62% 00:35.45 3104,94 0:00.94 10,64
> xz -c -z - linux 80,15 70,66
> 88,17% 00:31.27 95,35 0:03.85 10,11
> zstd -c -T0 -18 - linux 80,15 70,22
> 87,62% 00:07.48 299,30 0:00.05 10,64
> zstd -c -T0 -19 - linux 80,15 70,18
> 87,56% 00:09.32 395,32 0:00.05 10,72
> zstd -c -T0 -20 - linux 80,15 70,18
> 87,56% 00:08.88 395,23 0:00.06 10,57
> zstd -c -T0 -21 - linux 80,15 70,18
> 87,56% 00:08.91 395,28 0:00.05 10,71
> xz -c -z - linux-headers 103,85 17,02
> 16,39% 00:42.24 95,35 0:01.45 10,15
> zstd -c -T0 -18 - linux-headers 103,85 18,92
> 18,22% 00:12.68 320,98 0:00.16 10,74
> zstd -c -T0 -19 - linux-headers 103,85 18,88
> 18,18% 00:16.36 448,98 0:00.17 10,63
> zstd -c -T0 -20 - linux-headers 103,85 18,88
> 18,18% 00:16.26 448,99 0:00.16 10,77
> zstd -c -T0 -21 - linux-headers 103,85 18,88
> 18,18% 00:16.39 448,97 0:00.16 10,72
> xz -c -z - tensorflow 303,10 55,58
> 18,34% 01:59.56 95,40 0:04.78 10,27
> zstd -c -T0 -18 - tensorflow 303,10 61,83
> 20,40% 00:15.99 856,98 0:00.47 10,64
> zstd -c -T0 -19 - tensorflow 303,10 61,49
> 20,29% 00:21.01 1176,74 0:00.50 10,68
> zstd -c -T0 -20 - tensorflow 303,10 61,49
> 20,29% 00:21.11 1176,88 0:00.49 10,64
> zstd -c -T0 -21 - tensorflow 303,10 61,49
> 20,29% 00:21.16 1176,89 0:00.50 10,67
>
>
> This seems to conclude that the ideal zstd level would be `-18`, as
> anything higher than that has a steep incline in memory usage during
> compression for negligible gains.
> We're, however, looking at a minimal increase in package size most of the
> time. I would consider that a minimal increase only, and a tradeoff we can
> make - given the incredibly fast decompression.
>
> So, TL;DR, the benefits of `zstd -c -T0 -18 -` over `xz -c -z -` are:
> - Massive speed gain in compression
> - Massive speed gain in decompression
> - Stable, reproducible multithreading
> The speed gain in decompression substantially increases pacman's package
> installation speed.
>
> While the trade-offs would be:
> - Minimal increase in compressed package size
> - Increase in memory usage during compression
>
> The required changeset is, i think:
> PKGEXT='.pkg.tar.zst'
> COMPRESSZST=(zstd -c -T0 -18 -)
>
> This change requires a new pacman release, as as of writing this, zstd
> support is in master but hasn't landed in a release yet.
>
> Judging by recent IRC chats in -tu and -devops, I think that many TUs and
> Devs already think this is a good move.
> This mail is a general proposal to gather opinions on this change and
> hopefully clear up any misunderstandings or questions regarding this
> change, before the actual patch is sent.
>
>
> Regards,
> Rob (coderobe)
>
>
>
These are super impressive results! Thanks a lot for running those
elaborate benchmarks. For me, that makes it really clear that we'd gain a
ton by going for zstd while really only losing very little. Clear net
benefit for me.
More information about the arch-dev-public
mailing list