[arch-dev-public] RFC: (devtools) Changing default compression method to zstd

Sven-Hendrik Haase svenstaro at gmail.com
Sun Mar 24 18:42:48 UTC 2019


On Sun, 24 Mar 2019 at 19:35, Robin Broda via arch-dev-public <
arch-dev-public at archlinux.org> wrote:

> Hello all,
>
> in the past few weeks, some TUs and Developers have compared different
> compression algorithms to potentially replace the default compression
> method used in devtools.
> The current method is `xz -c -z -` which is single-threaded and rather
> slow, so we are looking to replace it with something faster.
>
> Multithreaded xz has come up in the past, and was quickly dismissed due to
> edge cases that would end up with packages being unreproducible on
> different machines - namely, xz -T0 -- the method that automatically
> determines the amount of threads -- produces different results when the
> amount of cores in a system is == 1:
> $ taskset -c 1 xz -c -z - -T0 < test > test.xz && sha256sum test.xz
> fe95a1af78304ae4be508e071f6697296e52b625fba95fca5622757779633d90  test.xz
> $ taskset -c 1,2 xz -c -z - -T0 < test > test.xz && sha256sum test.xz
> 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99  test.xz
> $ taskset -c 1,2,3 xz -c -z - -T0 < test > test.xz && sha256sum test.xz
> 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99  test.xz
>
> With this mail, i propose to switch to `zstd` instead (
> https://github.com/facebook/zstd).
> zstd does *not* exhibit this issue, and anthraxx has asked for some
> clarifications (
> https://github.com/facebook/zstd/issues/999#issuecomment-474114799) -
> just in case.
> The response is that zstd is generally friendly to reproducible builds.
>
> After some testing with heftig, I ran some additional benchmarks on our
> new build host 'dragon' to determine the appropriate compression level.
> Here are the results: (sorry for the wide mail :b)
>
> Compressor         Package Name                     Size (MiB)  Comp. Size
> (MiB)  Ratio   Time (mm:ss)  Max. RSS in MiB  Decomp. Time (mm:ss)  Decomp.
> RSS in MiB
> xz -c -z -         cuda                             3038,58     1316,93
>        43,34%  19:03.44      95,32            1:19.74               10,18
> zstd -c -T0 -18 -  cuda                             3038,58     1375,41
>        45,26%  01:12.50      2648,93          0:04.46               10,70
> zstd -c -T0 -19 -  cuda                             3038,58     1371,94
>        45,15%  01:34.13      3401,67          0:04.47               10,73
> zstd -c -T0 -20 -  cuda                             3038,58     1371,94
>        45,15%  01:34.34      3416,90          0:04.46               10,79
> zstd -c -T0 -21 -  cuda                             3038,58     1371,94
>        45,15%  01:31.60      3414,14          0:04.46               10,79
> xz -c -z -         gcc                              135,54      33,11
>        24,43%  00:54.54      95,34            0:02.59               10,11
> zstd -c -T0 -18 -  gcc                              135,54      35,87
>        26,47%  00:12.37      419,23           0:00.23               10,77
> zstd -c -T0 -19 -  gcc                              135,54      35,66
>        26,31%  00:15.76      578,99           0:00.24               10,66
> zstd -c -T0 -20 -  gcc                              135,54      35,66
>        26,31%  00:16.36      579,11           0:00.25               10,75
> zstd -c -T0 -21 -  gcc                              135,54      35,66
>        26,31%  00:16.18      579,01           0:00.25               10,46
> xz -c -z -         go                               484,10      122,10
>         25,22%  03:19.11      95,35            0:08.78               10,16
> zstd -c -T0 -18 -  go                               484,10      132,69
>         27,41%  00:15.40      1402,99          0:00.80               10,80
> zstd -c -T0 -19 -  go                               484,10      131,84
>         27,23%  00:19.74      1914,07          0:00.79               10,78
> zstd -c -T0 -20 -  go                               484,10      131,84
>         27,23%  00:20.19      1914,11          0:00.77               10,72
> zstd -c -T0 -21 -  go                               484,10      131,84
>         27,23%  00:20.08      1914,09          0:00.79               10,78
> xz -c -z -         intellij-idea-community-edition  772,46      384,37
>         49,76%  04:53.01      95,31            0:28.69               10,18
> zstd -c -T0 -18 -  intellij-idea-community-edition  772,46      392,44
>         50,80%  00:27.10      2341,02          0:00.91               10,63
> zstd -c -T0 -19 -  intellij-idea-community-edition  772,46      391,04
>         50,62%  00:37.09      3107,97          0:00.93               10,47
> zstd -c -T0 -20 -  intellij-idea-community-edition  772,46      391,04
>         50,62%  00:34.43      3107,87          0:00.93               10,70
> zstd -c -T0 -21 -  intellij-idea-community-edition  772,46      391,04
>         50,62%  00:35.45      3104,94          0:00.94               10,64
> xz -c -z -         linux                            80,15       70,66
>        88,17%  00:31.27      95,35            0:03.85               10,11
> zstd -c -T0 -18 -  linux                            80,15       70,22
>        87,62%  00:07.48      299,30           0:00.05               10,64
> zstd -c -T0 -19 -  linux                            80,15       70,18
>        87,56%  00:09.32      395,32           0:00.05               10,72
> zstd -c -T0 -20 -  linux                            80,15       70,18
>        87,56%  00:08.88      395,23           0:00.06               10,57
> zstd -c -T0 -21 -  linux                            80,15       70,18
>        87,56%  00:08.91      395,28           0:00.05               10,71
> xz -c -z -         linux-headers                    103,85      17,02
>        16,39%  00:42.24      95,35            0:01.45               10,15
> zstd -c -T0 -18 -  linux-headers                    103,85      18,92
>        18,22%  00:12.68      320,98           0:00.16               10,74
> zstd -c -T0 -19 -  linux-headers                    103,85      18,88
>        18,18%  00:16.36      448,98           0:00.17               10,63
> zstd -c -T0 -20 -  linux-headers                    103,85      18,88
>        18,18%  00:16.26      448,99           0:00.16               10,77
> zstd -c -T0 -21 -  linux-headers                    103,85      18,88
>        18,18%  00:16.39      448,97           0:00.16               10,72
> xz -c -z -         tensorflow                       303,10      55,58
>        18,34%  01:59.56      95,40            0:04.78               10,27
> zstd -c -T0 -18 -  tensorflow                       303,10      61,83
>        20,40%  00:15.99      856,98           0:00.47               10,64
> zstd -c -T0 -19 -  tensorflow                       303,10      61,49
>        20,29%  00:21.01      1176,74          0:00.50               10,68
> zstd -c -T0 -20 -  tensorflow                       303,10      61,49
>        20,29%  00:21.11      1176,88          0:00.49               10,64
> zstd -c -T0 -21 -  tensorflow                       303,10      61,49
>        20,29%  00:21.16      1176,89          0:00.50               10,67
>
>
> This seems to conclude that the ideal zstd level would be `-18`, as
> anything higher than that has a steep incline in memory usage during
> compression for negligible gains.
> We're, however, looking at a minimal increase in package size most of the
> time. I would consider that a minimal increase only, and a tradeoff we can
> make - given the incredibly fast decompression.
>
> So, TL;DR, the benefits of `zstd -c -T0 -18 -` over `xz -c -z -` are:
> - Massive speed gain in compression
> - Massive speed gain in decompression
> - Stable, reproducible multithreading
> The speed gain in decompression substantially increases pacman's package
> installation speed.
>
> While the trade-offs would be:
> - Minimal increase in compressed package size
> - Increase in memory usage during compression
>
> The required changeset is, i think:
> PKGEXT='.pkg.tar.zst'
> COMPRESSZST=(zstd -c -T0 -18 -)
>
> This change requires a new pacman release, as as of writing this, zstd
> support is in master but hasn't landed in a release yet.
>
> Judging by recent IRC chats in -tu and -devops, I think that many TUs and
> Devs already think this is a good move.
> This mail is a general proposal to gather opinions on this change and
> hopefully clear up any misunderstandings or questions regarding this
> change, before the actual patch is sent.
>
>
> Regards,
> Rob (coderobe)
>
>
>
These are super impressive results! Thanks a lot for running those
elaborate benchmarks. For me, that makes it really clear that we'd gain a
ton by going for zstd while really only losing very little. Clear net
benefit for me.


More information about the arch-dev-public mailing list