[arch-dev-public] RFC: (devtools) Changing default compression method to zstd
Hello all, in the past few weeks, some TUs and Developers have compared different compression algorithms to potentially replace the default compression method used in devtools. The current method is `xz -c -z -` which is single-threaded and rather slow, so we are looking to replace it with something faster. Multithreaded xz has come up in the past, and was quickly dismissed due to edge cases that would end up with packages being unreproducible on different machines - namely, xz -T0 -- the method that automatically determines the amount of threads -- produces different results when the amount of cores in a system is == 1: $ taskset -c 1 xz -c -z - -T0 < test > test.xz && sha256sum test.xz fe95a1af78304ae4be508e071f6697296e52b625fba95fca5622757779633d90 test.xz $ taskset -c 1,2 xz -c -z - -T0 < test > test.xz && sha256sum test.xz 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99 test.xz $ taskset -c 1,2,3 xz -c -z - -T0 < test > test.xz && sha256sum test.xz 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99 test.xz With this mail, i propose to switch to `zstd` instead (https://github.com/facebook/zstd). zstd does *not* exhibit this issue, and anthraxx has asked for some clarifications (https://github.com/facebook/zstd/issues/999#issuecomment-474114799) - just in case. The response is that zstd is generally friendly to reproducible builds. After some testing with heftig, I ran some additional benchmarks on our new build host 'dragon' to determine the appropriate compression level. Here are the results: (sorry for the wide mail :b) Compressor Package Name Size (MiB) Comp. Size (MiB) Ratio Time (mm:ss) Max. RSS in MiB Decomp. Time (mm:ss) Decomp. RSS in MiB xz -c -z - cuda 3038,58 1316,93 43,34% 19:03.44 95,32 1:19.74 10,18 zstd -c -T0 -18 - cuda 3038,58 1375,41 45,26% 01:12.50 2648,93 0:04.46 10,70 zstd -c -T0 -19 - cuda 3038,58 1371,94 45,15% 01:34.13 3401,67 0:04.47 10,73 zstd -c -T0 -20 - cuda 3038,58 1371,94 45,15% 01:34.34 3416,90 0:04.46 10,79 zstd -c -T0 -21 - cuda 3038,58 1371,94 45,15% 01:31.60 3414,14 0:04.46 10,79 xz -c -z - gcc 135,54 33,11 24,43% 00:54.54 95,34 0:02.59 10,11 zstd -c -T0 -18 - gcc 135,54 35,87 26,47% 00:12.37 419,23 0:00.23 10,77 zstd -c -T0 -19 - gcc 135,54 35,66 26,31% 00:15.76 578,99 0:00.24 10,66 zstd -c -T0 -20 - gcc 135,54 35,66 26,31% 00:16.36 579,11 0:00.25 10,75 zstd -c -T0 -21 - gcc 135,54 35,66 26,31% 00:16.18 579,01 0:00.25 10,46 xz -c -z - go 484,10 122,10 25,22% 03:19.11 95,35 0:08.78 10,16 zstd -c -T0 -18 - go 484,10 132,69 27,41% 00:15.40 1402,99 0:00.80 10,80 zstd -c -T0 -19 - go 484,10 131,84 27,23% 00:19.74 1914,07 0:00.79 10,78 zstd -c -T0 -20 - go 484,10 131,84 27,23% 00:20.19 1914,11 0:00.77 10,72 zstd -c -T0 -21 - go 484,10 131,84 27,23% 00:20.08 1914,09 0:00.79 10,78 xz -c -z - intellij-idea-community-edition 772,46 384,37 49,76% 04:53.01 95,31 0:28.69 10,18 zstd -c -T0 -18 - intellij-idea-community-edition 772,46 392,44 50,80% 00:27.10 2341,02 0:00.91 10,63 zstd -c -T0 -19 - intellij-idea-community-edition 772,46 391,04 50,62% 00:37.09 3107,97 0:00.93 10,47 zstd -c -T0 -20 - intellij-idea-community-edition 772,46 391,04 50,62% 00:34.43 3107,87 0:00.93 10,70 zstd -c -T0 -21 - intellij-idea-community-edition 772,46 391,04 50,62% 00:35.45 3104,94 0:00.94 10,64 xz -c -z - linux 80,15 70,66 88,17% 00:31.27 95,35 0:03.85 10,11 zstd -c -T0 -18 - linux 80,15 70,22 87,62% 00:07.48 299,30 0:00.05 10,64 zstd -c -T0 -19 - linux 80,15 70,18 87,56% 00:09.32 395,32 0:00.05 10,72 zstd -c -T0 -20 - linux 80,15 70,18 87,56% 00:08.88 395,23 0:00.06 10,57 zstd -c -T0 -21 - linux 80,15 70,18 87,56% 00:08.91 395,28 0:00.05 10,71 xz -c -z - linux-headers 103,85 17,02 16,39% 00:42.24 95,35 0:01.45 10,15 zstd -c -T0 -18 - linux-headers 103,85 18,92 18,22% 00:12.68 320,98 0:00.16 10,74 zstd -c -T0 -19 - linux-headers 103,85 18,88 18,18% 00:16.36 448,98 0:00.17 10,63 zstd -c -T0 -20 - linux-headers 103,85 18,88 18,18% 00:16.26 448,99 0:00.16 10,77 zstd -c -T0 -21 - linux-headers 103,85 18,88 18,18% 00:16.39 448,97 0:00.16 10,72 xz -c -z - tensorflow 303,10 55,58 18,34% 01:59.56 95,40 0:04.78 10,27 zstd -c -T0 -18 - tensorflow 303,10 61,83 20,40% 00:15.99 856,98 0:00.47 10,64 zstd -c -T0 -19 - tensorflow 303,10 61,49 20,29% 00:21.01 1176,74 0:00.50 10,68 zstd -c -T0 -20 - tensorflow 303,10 61,49 20,29% 00:21.11 1176,88 0:00.49 10,64 zstd -c -T0 -21 - tensorflow 303,10 61,49 20,29% 00:21.16 1176,89 0:00.50 10,67 This seems to conclude that the ideal zstd level would be `-18`, as anything higher than that has a steep incline in memory usage during compression for negligible gains. We're, however, looking at a minimal increase in package size most of the time. I would consider that a minimal increase only, and a tradeoff we can make - given the incredibly fast decompression. So, TL;DR, the benefits of `zstd -c -T0 -18 -` over `xz -c -z -` are: - Massive speed gain in compression - Massive speed gain in decompression - Stable, reproducible multithreading The speed gain in decompression substantially increases pacman's package installation speed. While the trade-offs would be: - Minimal increase in compressed package size - Increase in memory usage during compression The required changeset is, i think: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 -18 -) This change requires a new pacman release, as as of writing this, zstd support is in master but hasn't landed in a release yet. Judging by recent IRC chats in -tu and -devops, I think that many TUs and Devs already think this is a good move. This mail is a general proposal to gather opinions on this change and hopefully clear up any misunderstandings or questions regarding this change, before the actual patch is sent. Regards, Rob (coderobe)
On Sun, 24 Mar 2019 at 19:35, Robin Broda via arch-dev-public < arch-dev-public@archlinux.org> wrote:
Hello all,
in the past few weeks, some TUs and Developers have compared different compression algorithms to potentially replace the default compression method used in devtools. The current method is `xz -c -z -` which is single-threaded and rather slow, so we are looking to replace it with something faster.
Multithreaded xz has come up in the past, and was quickly dismissed due to edge cases that would end up with packages being unreproducible on different machines - namely, xz -T0 -- the method that automatically determines the amount of threads -- produces different results when the amount of cores in a system is == 1: $ taskset -c 1 xz -c -z - -T0 < test > test.xz && sha256sum test.xz fe95a1af78304ae4be508e071f6697296e52b625fba95fca5622757779633d90 test.xz $ taskset -c 1,2 xz -c -z - -T0 < test > test.xz && sha256sum test.xz 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99 test.xz $ taskset -c 1,2,3 xz -c -z - -T0 < test > test.xz && sha256sum test.xz 3b2c520eda654de19c5fc02ea1d850e142ae24e1246edcce82e90bd690d18f99 test.xz
With this mail, i propose to switch to `zstd` instead ( https://github.com/facebook/zstd). zstd does *not* exhibit this issue, and anthraxx has asked for some clarifications ( https://github.com/facebook/zstd/issues/999#issuecomment-474114799) - just in case. The response is that zstd is generally friendly to reproducible builds.
After some testing with heftig, I ran some additional benchmarks on our new build host 'dragon' to determine the appropriate compression level. Here are the results: (sorry for the wide mail :b)
Compressor Package Name Size (MiB) Comp. Size (MiB) Ratio Time (mm:ss) Max. RSS in MiB Decomp. Time (mm:ss) Decomp. RSS in MiB xz -c -z - cuda 3038,58 1316,93 43,34% 19:03.44 95,32 1:19.74 10,18 zstd -c -T0 -18 - cuda 3038,58 1375,41 45,26% 01:12.50 2648,93 0:04.46 10,70 zstd -c -T0 -19 - cuda 3038,58 1371,94 45,15% 01:34.13 3401,67 0:04.47 10,73 zstd -c -T0 -20 - cuda 3038,58 1371,94 45,15% 01:34.34 3416,90 0:04.46 10,79 zstd -c -T0 -21 - cuda 3038,58 1371,94 45,15% 01:31.60 3414,14 0:04.46 10,79 xz -c -z - gcc 135,54 33,11 24,43% 00:54.54 95,34 0:02.59 10,11 zstd -c -T0 -18 - gcc 135,54 35,87 26,47% 00:12.37 419,23 0:00.23 10,77 zstd -c -T0 -19 - gcc 135,54 35,66 26,31% 00:15.76 578,99 0:00.24 10,66 zstd -c -T0 -20 - gcc 135,54 35,66 26,31% 00:16.36 579,11 0:00.25 10,75 zstd -c -T0 -21 - gcc 135,54 35,66 26,31% 00:16.18 579,01 0:00.25 10,46 xz -c -z - go 484,10 122,10 25,22% 03:19.11 95,35 0:08.78 10,16 zstd -c -T0 -18 - go 484,10 132,69 27,41% 00:15.40 1402,99 0:00.80 10,80 zstd -c -T0 -19 - go 484,10 131,84 27,23% 00:19.74 1914,07 0:00.79 10,78 zstd -c -T0 -20 - go 484,10 131,84 27,23% 00:20.19 1914,11 0:00.77 10,72 zstd -c -T0 -21 - go 484,10 131,84 27,23% 00:20.08 1914,09 0:00.79 10,78 xz -c -z - intellij-idea-community-edition 772,46 384,37 49,76% 04:53.01 95,31 0:28.69 10,18 zstd -c -T0 -18 - intellij-idea-community-edition 772,46 392,44 50,80% 00:27.10 2341,02 0:00.91 10,63 zstd -c -T0 -19 - intellij-idea-community-edition 772,46 391,04 50,62% 00:37.09 3107,97 0:00.93 10,47 zstd -c -T0 -20 - intellij-idea-community-edition 772,46 391,04 50,62% 00:34.43 3107,87 0:00.93 10,70 zstd -c -T0 -21 - intellij-idea-community-edition 772,46 391,04 50,62% 00:35.45 3104,94 0:00.94 10,64 xz -c -z - linux 80,15 70,66 88,17% 00:31.27 95,35 0:03.85 10,11 zstd -c -T0 -18 - linux 80,15 70,22 87,62% 00:07.48 299,30 0:00.05 10,64 zstd -c -T0 -19 - linux 80,15 70,18 87,56% 00:09.32 395,32 0:00.05 10,72 zstd -c -T0 -20 - linux 80,15 70,18 87,56% 00:08.88 395,23 0:00.06 10,57 zstd -c -T0 -21 - linux 80,15 70,18 87,56% 00:08.91 395,28 0:00.05 10,71 xz -c -z - linux-headers 103,85 17,02 16,39% 00:42.24 95,35 0:01.45 10,15 zstd -c -T0 -18 - linux-headers 103,85 18,92 18,22% 00:12.68 320,98 0:00.16 10,74 zstd -c -T0 -19 - linux-headers 103,85 18,88 18,18% 00:16.36 448,98 0:00.17 10,63 zstd -c -T0 -20 - linux-headers 103,85 18,88 18,18% 00:16.26 448,99 0:00.16 10,77 zstd -c -T0 -21 - linux-headers 103,85 18,88 18,18% 00:16.39 448,97 0:00.16 10,72 xz -c -z - tensorflow 303,10 55,58 18,34% 01:59.56 95,40 0:04.78 10,27 zstd -c -T0 -18 - tensorflow 303,10 61,83 20,40% 00:15.99 856,98 0:00.47 10,64 zstd -c -T0 -19 - tensorflow 303,10 61,49 20,29% 00:21.01 1176,74 0:00.50 10,68 zstd -c -T0 -20 - tensorflow 303,10 61,49 20,29% 00:21.11 1176,88 0:00.49 10,64 zstd -c -T0 -21 - tensorflow 303,10 61,49 20,29% 00:21.16 1176,89 0:00.50 10,67
This seems to conclude that the ideal zstd level would be `-18`, as anything higher than that has a steep incline in memory usage during compression for negligible gains. We're, however, looking at a minimal increase in package size most of the time. I would consider that a minimal increase only, and a tradeoff we can make - given the incredibly fast decompression.
So, TL;DR, the benefits of `zstd -c -T0 -18 -` over `xz -c -z -` are: - Massive speed gain in compression - Massive speed gain in decompression - Stable, reproducible multithreading The speed gain in decompression substantially increases pacman's package installation speed.
While the trade-offs would be: - Minimal increase in compressed package size - Increase in memory usage during compression
The required changeset is, i think: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 -18 -)
This change requires a new pacman release, as as of writing this, zstd support is in master but hasn't landed in a release yet.
Judging by recent IRC chats in -tu and -devops, I think that many TUs and Devs already think this is a good move. This mail is a general proposal to gather opinions on this change and hopefully clear up any misunderstandings or questions regarding this change, before the actual patch is sent.
Regards, Rob (coderobe)
These are super impressive results! Thanks a lot for running those elaborate benchmarks. For me, that makes it really clear that we'd gain a ton by going for zstd while really only losing very little. Clear net benefit for me.
Attached here is the script i wrote to make most of these measurements, for anyone interested in reproducing these results - and the raw results of the benchmark. Read it before running it. You may need to make adjustments. Rob
Hi, On 24-03-19, Robin Broda via arch-dev-public wrote:
So, TL;DR, the benefits of `zstd -c -T0 -18 -` over `xz -c -z -` are: - Massive speed gain in compression - Massive speed gain in decompression - Stable, reproducible multithreading The speed gain in decompression substantially increases pacman's package installation speed.
Interesting results, thanks! Just one detail: your results for -19, -20 and -21 are identical because apparently zstd needs an additional flag (--ultra) to "unlock" the higher compression levels: zstd -c -T0 -20 - Warning : compression level higher than max, reduced to 19 Also, I see you did not test zstd with a small number of cores: can you add e.g. -T1, -T2 and -T4 to the comparison? It would give a more realistic idea of what to expect when building on a typical machine, as opposed to dragon ;) In my tests, using less threads also decreased memory usage when compressing (35% less memory when switching from -T2 to -T1). For decompression, it seems that both xz and zstd run single-threaded, so there's not much to think about (zstd is just incredibly fast). In any case, I support this change! Baptiste
On 3/24/19 9:18 PM, Baptiste Jonglez wrote:
Just one detail: your results for -19, -20 and -21 are identical because apparently zstd needs an additional flag (--ultra) to "unlock" the higher compression levels:
zstd -c -T0 -20 - Warning : compression level higher than max, reduced to 19
Also, I see you did not test zstd with a small number of cores: can you add e.g. -T1, -T2 and -T4 to the comparison? It would give a more realistic idea of what to expect when building on a typical machine, as opposed to dragon ;) In my tests, using less threads also decreased memory usage when compressing (35% less memory when switching from -T2 to -T1).
Damn, i knew i must've missed something - archange had already mentioned on IRC that these results look weird, but i shrugged it off. Should've double-checked. I'll get you a new table with the higher levels fixed and a second set with -T2 for comparison later. Regardless, IIRC preliminary testing showed that these gains are not worth it, as they were quite small in the tests we ran a while ago.
For decompression, it seems that both xz and zstd run single-threaded, so there's not much to think about (zstd is just incredibly fast).
Correct Rob
On 25/3/19 4:34 am, Robin Broda via arch-dev-public wrote:
This change requires a new pacman release, as as of writing this, zstd support is in master but hasn't landed in a release yet.
Which is a complete blocker for quite a period of time. We need to assume every system has a copy of pacman-5.2+ before we can switch packages to zstd. Otherwise updates can break systems (pacman and all dependencies will need to stay as .xz for a year at least, and we have to hope that partial update does not break anything until a full update is run). Experience switching from .gz to .xz showed system breakage was a fairly regular occurrence. I would not do this until at least one year, maybe two after pacman-5.2 is released. Allan
On Sun, 24 Mar 2019 at 23:45, Allan McRae via arch-dev-public <arch-dev-public@archlinux.org> wrote:
On 25/3/19 4:34 am, Robin Broda via arch-dev-public wrote:
This change requires a new pacman release, as as of writing this, zstd support is in master but hasn't landed in a release yet.
Which is a complete blocker for quite a period of time.
We need to assume every system has a copy of pacman-5.2+ before we can switch packages to zstd.
Why is pacman support needed here? I can already install .zstd packages using pacman 5.1.3. The crucial part seems to be libarchive support, which was added in v3.3.3 (~ September 2018).
On 3/24/19 11:20 PM, Evangelos Foutras via arch-dev-public wrote:
On Sun, 24 Mar 2019 at 23:45, Allan McRae via arch-dev-public <arch-dev-public@archlinux.org> wrote:
On 25/3/19 4:34 am, Robin Broda via arch-dev-public wrote:
This change requires a new pacman release, as as of writing this, zstd support is in master but hasn't landed in a release yet.
Which is a complete blocker for quite a period of time.
We need to assume every system has a copy of pacman-5.2+ before we can switch packages to zstd.
Why is pacman support needed here? I can already install .zstd packages using pacman 5.1.3.
The crucial part seems to be libarchive support, which was added in v3.3.3 (~ September 2018).
Yes, installing zstd packages works - the pacman release is merely required for makepkg. Unless that has already landed too, which would be news to me :) Thus i don't think we need a hold-off period like this, Allan. Rob
On 03/25/19 at 12:15am, Robin Broda via arch-dev-public wrote:
On 3/24/19 11:20 PM, Evangelos Foutras via arch-dev-public wrote:
On Sun, 24 Mar 2019 at 23:45, Allan McRae via arch-dev-public <arch-dev-public@archlinux.org> wrote:
On 25/3/19 4:34 am, Robin Broda via arch-dev-public wrote:
This change requires a new pacman release, as as of writing this, zstd support is in master but hasn't landed in a release yet.
Which is a complete blocker for quite a period of time.
We need to assume every system has a copy of pacman-5.2+ before we can switch packages to zstd.
Why is pacman support needed here? I can already install .zstd packages using pacman 5.1.3.
The crucial part seems to be libarchive support, which was added in v3.3.3 (~ September 2018).
Yes, installing zstd packages works - the pacman release is merely required for makepkg. Unless that has already landed too, which would be news to me :)
Thus i don't think we need a hold-off period like this, Allan.
We still need a hold-off period, we're just waiting to make sure people have libarchive v3.3.3 instead of pacman v5.2.0.
On 3/25/19 12:22 AM, Andrew Gregory wrote:
On 03/25/19 at 12:15am, Robin Broda via arch-dev-public wrote:
On 3/24/19 11:20 PM, Evangelos Foutras via arch-dev-public wrote:
On Sun, 24 Mar 2019 at 23:45, Allan McRae via arch-dev-public <arch-dev-public@archlinux.org> wrote:
We need to assume every system has a copy of pacman-5.2+ before we can switch packages to zstd.
Why is pacman support needed here? I can already install .zstd packages using pacman 5.1.3.
The crucial part seems to be libarchive support, which was added in v3.3.3 (~ September 2018).
Yes, installing zstd packages works - the pacman release is merely required for makepkg. Unless that has already landed too, which would be news to me :)
Thus i don't think we need a hold-off period like this, Allan.
We still need a hold-off period, we're just waiting to make sure people have libarchive v3.3.3 instead of pacman v5.2.0.
That update happened half a year ago, i'm sure that most people with an installation that old will already have to fetch other packages, like the keyring, separately for it to go through. Plus, with libarchives' release cycle, i don't think that libarchive itself is gonna be rebuilt immediately after the change is implemented - providing extra time to upgrade libarchive without having to download a release packed as xz separately. Rob
On 25/3/19 9:28 am, Robin Broda via arch-dev-public wrote:
On 3/25/19 12:22 AM, Andrew Gregory wrote:
On 03/25/19 at 12:15am, Robin Broda via arch-dev-public wrote:
On 3/24/19 11:20 PM, Evangelos Foutras via arch-dev-public wrote:
On Sun, 24 Mar 2019 at 23:45, Allan McRae via arch-dev-public <arch-dev-public@archlinux.org> wrote:
We need to assume every system has a copy of pacman-5.2+ before we can switch packages to zstd.
Why is pacman support needed here? I can already install .zstd packages using pacman 5.1.3.
The crucial part seems to be libarchive support, which was added in v3.3.3 (~ September 2018).
Yes, installing zstd packages works - the pacman release is merely required for makepkg. Unless that has already landed too, which would be news to me :)
Thus i don't think we need a hold-off period like this, Allan.
We still need a hold-off period, we're just waiting to make sure people have libarchive v3.3.3 instead of pacman v5.2.0.
That update happened half a year ago, i'm sure that most people with an installation that old will already have to fetch other packages, like the keyring, separately for it to go through.
Fetching a keyring does not potentially bump sonames.
Plus, with libarchives' release cycle, i don't think that libarchive itself is gonna be rebuilt immediately after the change is implemented - providing extra time to upgrade libarchive without having to download a release packed as xz separately.
And if openssl gets and soname bump? A
On 03/25/19 at 12:28am, Robin Broda via arch-dev-public wrote:
On 3/25/19 12:22 AM, Andrew Gregory wrote:
On 03/25/19 at 12:15am, Robin Broda via arch-dev-public wrote:
On 3/24/19 11:20 PM, Evangelos Foutras via arch-dev-public wrote:
On Sun, 24 Mar 2019 at 23:45, Allan McRae via arch-dev-public <arch-dev-public@archlinux.org> wrote:
We need to assume every system has a copy of pacman-5.2+ before we can switch packages to zstd.
Why is pacman support needed here? I can already install .zstd packages using pacman 5.1.3.
The crucial part seems to be libarchive support, which was added in v3.3.3 (~ September 2018).
Yes, installing zstd packages works - the pacman release is merely required for makepkg. Unless that has already landed too, which would be news to me :)
Thus i don't think we need a hold-off period like this, Allan.
We still need a hold-off period, we're just waiting to make sure people have libarchive v3.3.3 instead of pacman v5.2.0.
That update happened half a year ago, i'm sure that most people with an installation that old will already have to fetch other packages, like the keyring, separately for it to go through.
If we go ahead with the switch those people won't be able to install new packages like the keyring, that's the whole point.
Plus, with libarchives' release cycle, i don't think that libarchive itself is gonna be rebuilt immediately after the change is implemented - providing extra time to upgrade libarchive without having to download a release packed as xz separately.
I don't consider hoping that libarchive doesn't need a rebuild in the near future a great strategy. That being said, this is really a question of how long of a period we need between libarchive v3.3.3 and us making the switch. I'm not a packager, so I don't have much of an opinion on that.
On Sun, Mar 24, 2019 at 04:39:54PM -0700, Andrew Gregory via arch-dev-public wrote:
I don't consider hoping that libarchive doesn't need a rebuild in the near future a great strategy. That being said, this is really a question of how long of a period we need between libarchive v3.3.3 and us making the switch. I'm not a packager, so I don't have much of an opinion on that.
Well, we pride ourselves with having competent users. I think waiting a year is conservative and safe. However, personally I think we can wait for the next pacman release and write an announcment. Then we give everyone a month to update and we can have a smooth transition. Assuming of course that everyone is on-board with this change. I would like to get some opinions from packaging devs with experiences. -- Morten Linderud PGP: 9C02FF419FECBE16
On Mon, Mar 25, 2019 at 12:46:15AM +0100, Public mailing list for Arch Linux development wrote:
On Sun, Mar 24, 2019 at 04:39:54PM -0700, Andrew Gregory via arch-dev-public wrote:
I don't consider hoping that libarchive doesn't need a rebuild in the near future a great strategy. That being said, this is really a question of how long of a period we need between libarchive v3.3.3 and us making the switch. I'm not a packager, so I don't have much of an opinion on that.
Well, we pride ourselves with having competent users. I think waiting a year is conservative and safe. However, personally I think we can wait for the next pacman release and write an announcment. Then we give everyone a month to update and we can have a smooth transition. Assuming of course that everyone is on-board with this change.
I would like to get some opinions from packaging devs with experiences.
I agree with this. Arch Linux's unique 'selling' point is that we treat our users as competent persons, who don't need a special care. They are able to read announcements, react on problems and get help in one of our many community channels. I think we should keep this transition as short as possible. Everytime when I tried to propose 'enterprise' features, I got told, that Arch Linux is just a 'hobby project' and we don't need so much reliability. So why should we decide different here? chris
[2019-03-25 00:46:15 +0100] Morten Linderud via arch-dev-public:
On Sun, Mar 24, 2019 at 04:39:54PM -0700, Andrew Gregory via arch-dev-public wrote:
I don't consider hoping that libarchive doesn't need a rebuild in the near future a great strategy. That being said, this is really a question of how long of a period we need between libarchive v3.3.3 and us making the switch. I'm not a packager, so I don't have much of an opinion on that.
Well, we pride ourselves with having competent users. I think waiting a year is conservative and safe. However, personally I think we can wait for the next pacman release and write an announcment. Then we give everyone a month to update and we can have a smooth transition. Assuming of course that everyone is on-board with this change.
So far we all seem to agree it's a change for the better. However your timeline is confused: we only need to wait for a new pacman release to start building zstd-compressed packages; we can then push them to the repos straight away, assuming users have had enough time to update libarchive-3.3.3. It's already been in [core] for more than six months. Traditionally we wait a year before pushing changes that break backward compatibility. That always seemed a bit extreme to me so I'd personally be fine with doing the switch in a month or two with certain precautions: - Post an announcement warning users they'll need libarchive-3.3.3 or higher a month from now and telling them to update if they haven't done so in the last six months. - Prepare a static build of libarchive-3.3.3 compressed with xz and write a wiki page with detailed instructions on how to manually switch from an old system (for users who might want to switch even later). Cheers. -- Gaetan
On 3/25/19 1:33 AM, Gaetan Bisson via arch-dev-public wrote:
- Prepare a static build of libarchive-3.3.3 compressed with xz and write a wiki page with detailed instructions on how to manually switch from an old system (for users who might want to switch even later).
Users that may wish to upgrade at an even later point can always do an upgrade using the archive, without requiring any sort of static builds or other hacks. Rob
Morten Linderud via arch-dev-public <arch-dev-public@archlinux.org> on Mon, 2019/03/25 00:46:
On Sun, Mar 24, 2019 at 04:39:54PM -0700, Andrew Gregory via arch-dev-public wrote:
I don't consider hoping that libarchive doesn't need a rebuild in the near future a great strategy. That being said, this is really a question of how long of a period we need between libarchive v3.3.3 and us making the switch. I'm not a packager, so I don't have much of an opinion on that.
Well, we pride ourselves with having competent users. I think waiting a year is conservative and safe. However, personally I think we can wait for the next pacman release and write an announcment. Then we give everyone a month to update and we can have a smooth transition. Assuming of course that everyone is on-board with this change.
I am in with this. Who ever runs a rolling release distribution and does not update within half a year did a bad decision. So let's go for it. -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On Sun, Mar 24, 2019 at 04:22:55PM -0700, Andrew Gregory via arch-dev-public wrote:
On 03/25/19 at 12:15am, Robin Broda via arch-dev-public wrote:
Thus i don't think we need a hold-off period like this, Allan.
We still need a hold-off period, we're just waiting to make sure people have libarchive v3.3.3 instead of pacman v5.2.0.
libarchive was released around 7th of September into the repos, so that is at least a shorter timeframe when waiting next pacman release + a full year. Wouldn't it be feasible to issue an Announcement early July and do the transition in September? -- Morten Linderud PGP: 9C02FF419FECBE16
On Sun, Mar 24, 2019 at 7:35 PM Robin Broda via arch-dev-public <arch-dev-public@archlinux.org> wrote:
The required changeset is, i think: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 -18 -)
When we implement this, I would say we go with "zstd -c -T0 -" in pacman's makepkg.conf and "zstd -C -T0 -18 -" in the configs shipped with devtools. I think users that build their own local packages are more likely to benefit from fast compression. Anyone building with makechrootpkg for distribution gets the high compression level.
On Mon, 25 Mar 2019 at 01:22, Jan Alexander Steffens via arch-dev-public <arch-dev-public@archlinux.org> wrote:
On Sun, Mar 24, 2019 at 7:35 PM Robin Broda via arch-dev-public <arch-dev-public@archlinux.org> wrote:
The required changeset is, i think: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 -18 -)
When we implement this, I would say we go with "zstd -c -T0 -" in pacman's makepkg.conf and "zstd -C -T0 -18 -" in the configs shipped with devtools.
I think users that build their own local packages are more likely to benefit from fast compression. Anyone building with makechrootpkg for distribution gets the high compression level.
That's actually really smart! We can also leave out "-T0" since the default compression level is very fast anyway. Plus, it's already implemented in this way in pacman git so we don't have to touch anything there. (But, if it were to be multithreaded there as well, I would replace zstd with zstdmt; same for devtools.) As far as compression level goes, I believe we should select the highest one that doesn't have increased memory requirements during decompression. So that would be -19. Robin makes a good case about -18; looking at upstream's "Compression Speed vs Ratio" graph [1] I would say -18 is preferable to -19 if we are concerned about compression speed and memory usage (my totally unscientific measurements show a 25% memory increase and 20% speed decrease going from -18 to -19 when using -T4). That said, I might still opt for -19 due to the slightly higher compression ratio; memory usage isn't too big of an issue and the slower speed is mitigated by multithreading (i.e.: it will still be much faster than xz). Assuming .zst packages have been installable as far back as September 2018 when libarchive 3.3.3 was released, it seems to me that the following steps can be taken: 1) Check if repo-add and dbscripts recognize .zst packages. 2) Add "COMPRESSZST=(zstdmt -19 -c -z -q -)" to devtools' makepkg-x86_64.conf. 3) Release a test package and confirm that it's installable. (Possibly also test with an old installation from September 2018.) 4) Announce the transition to the new compression algorithm and provide a date-stamped mirror URL [2] for really old installations without libarchive 3.3.3. [1] https://facebook.github.io/zstd/ [2] https://archive.archlinux.org/repos/2019/xx/xx/$repo/os/$arch
On 3/25/19 1:38 AM, Evangelos Foutras via arch-dev-public wrote:
On Mon, 25 Mar 2019 at 01:22, Jan Alexander Steffens via arch-dev-public <arch-dev-public@archlinux.org> wrote:
When we implement this, I would say we go with "zstd -c -T0 -" in pacman's makepkg.conf and "zstd -C -T0 -18 -" in the configs shipped with devtools.
I think users that build their own local packages are more likely to benefit from fast compression. Anyone building with makechrootpkg for distribution gets the high compression level.
That's actually really smart! We can also leave out "-T0" since the default compression level is very fast anyway. Plus, it's already implemented in this way in pacman git so we don't have to touch anything there. (But, if it were to be multithreaded there as well, I would replace zstd with zstdmt; same for devtools.)
That's why the subject line is prefixed with '(devtools)' :) What's unclear to me is the difference between zstd -T0 and zstdmt, however.
As far as compression level goes, I believe we should select the highest one that doesn't have increased memory requirements during decompression. So that would be -19. Robin makes a good case about -18; looking at upstream's "Compression Speed vs Ratio" graph [1] I would say -18 is preferable to -19 if we are concerned about compression speed and memory usage (my totally unscientific measurements show a 25% memory increase and 20% speed decrease going from -18 to -19 when using -T4). That said, I might still opt for -19 due to the slightly higher compression ratio; memory usage isn't too big of an issue and the slower speed is mitigated by multithreading (i.e.: it will still be much faster than xz).
I do think that at -19+, memory usage becomes a bigger issue. The difference between -18 and -19 on cuda is almost a gigabyte! While not really a problem for our beefy build boxes, some 4- or even 8-GB developer machines could really suffer from such an incline in memory usage. Thus i stand by -18 being the more sensible choice. Rob
On Mon, 25 Mar 2019 at 02:47, Robin Broda via arch-dev-public <arch-dev-public@archlinux.org> wrote:
What's unclear to me is the difference between zstd -T0 and zstdmt, however.
zstdmt is an alias/shortcut for "zstd -T0".
I do think that at -19+, memory usage becomes a bigger issue. The difference between -18 and -19 on cuda is almost a gigabyte!
The large memory usage comes from running it with 24 cores. It is fair to assume that boxes with high core counts can spare an extra 1G of RAM. Compressing cuda using 4 cores shows: -18: 613 MiB RSS / 4:18 elapsed time -19: 746 MiB RSS / 5:24 elapsed time (With 8 threads, memory usage is 1014 MiB with -18 and 1270 MiB with -19.) Memory usage is fine in my opinion. Compression time is a bigger concern (increased by 25%) for what seems to be 0.25% smaller package size for cuda (and about 0.5% for chromium). But for most packages the time difference would be a few seconds and the space savings, however tiny, would apply to the whole archive. So I still believe -19 is preferable since decompression speed is the same for both levels.
On 3/24/19 8:38 PM, Evangelos Foutras via arch-dev-public wrote:
On Mon, 25 Mar 2019 at 01:22, Jan Alexander Steffens via arch-dev-public <arch-dev-public@archlinux.org> wrote:
On Sun, Mar 24, 2019 at 7:35 PM Robin Broda via arch-dev-public <arch-dev-public@archlinux.org> wrote:
The required changeset is, i think: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 -18 -)
When we implement this, I would say we go with "zstd -c -T0 -" in pacman's makepkg.conf and "zstd -C -T0 -18 -" in the configs shipped with devtools.
I think users that build their own local packages are more likely to benefit from fast compression. Anyone building with makechrootpkg for distribution gets the high compression level.
That's actually really smart! We can also leave out "-T0" since the default compression level is very fast anyway. Plus, it's already implemented in this way in pacman git so we don't have to touch anything there. (But, if it were to be multithreaded there as well, I would replace zstd with zstdmt; same for devtools.)
As far as compression level goes, I believe we should select the highest one that doesn't have increased memory requirements during decompression. So that would be -19. Robin makes a good case about -18; looking at upstream's "Compression Speed vs Ratio" graph [1] I would say -18 is preferable to -19 if we are concerned about compression speed and memory usage (my totally unscientific measurements show a 25% memory increase and 20% speed decrease going from -18 to -19 when using -T4). That said, I might still opt for -19 due to the slightly higher compression ratio; memory usage isn't too big of an issue and the slower speed is mitigated by multithreading (i.e.: it will still be much faster than xz).
Assuming .zst packages have been installable as far back as September 2018 when libarchive 3.3.3 was released, it seems to me that the following steps can be taken:
1) Check if repo-add and dbscripts recognize .zst packages.
repo-add checks to see if bsdtar/libarchive recognizes the file as a compressed archive containing a .PKGINFO file, and therefore, repo-add will work whenever pacman works. dbscripts whitelists the known package extensions, and I will be adding new extensions to dbscripts in tandem with a stable pacman release that contains makepkg support, so that should not be a problem either.
2) Add "COMPRESSZST=(zstdmt -19 -c -z -q -)" to devtools' makepkg-x86_64.conf.
Or just sync it with the pacman package. :p
3) Release a test package and confirm that it's installable. (Possibly also test with an old installation from September 2018.)
You can confirm that today if you build using makepkg from pacman-git. Alternatively, try this test package I built: https://pkgbuild.com/~eschwartz/repo/x86_64/testpkg-foobar-1-1-any.pkg.tar.z...
4) Announce the transition to the new compression algorithm and provide a date-stamped mirror URL [2] for really old installations without libarchive 3.3.3.
We've had such transitions before, e.g. when adding hook support. IMO the transition does not need to be longer than a month, as we are substantially saying "your libarchive must be from the last six months". Arch Linux does not, AFAIK, have a general policy to support systems that have failed to update in 6 months. If anyone manages to break their system by having a very old libarchive version after a warning period provided as a news entry, we do not need to officially support anything... but I provide fully static recovery binaries for pacman, here: https://aur.archlinux.org/packages/pacman-static This should be sufficient without recourse to legacy compat packages for libarchive, or requiring users to reset their system to any give ALA date. (Notwithstanding this, many things might break in that time frame, and general advice usually seems to be to upgrade in stages using the ALA anyway.) They are available both as the (prebuilt custom repo) package "pacman-static", and as extracted binaries that can be downloaded and run directly without the need to run pacman or even decompress the file. They are verified with my PGP key. These binaries are based on a libarchive.a which is compiled with libzstd.a support, so that works too. -- Eli Schwartz Bug Wrangler and Trusted User
Given the suggestion of using -18-, I decided to calculate how much bigger our packages would be with the numbers given: cuda-10.0.130-2-x86_64.pkg.tar 58.5M 104.40% gcc-8.2.1+20181127-1-x86_64.pkg.tar 2.8M 108.30% go-2:1.12.1-1-x86_64.pkg.tar 10.6M 108.70% linux-5.0.3.arch1-1-x86_64.pkg.tar -0.4M 99.40% linux-headers-5.0.3.arch1-1-x86_64.pkg.tar 1.9M 111.20% tensorflow-1.13.1-2-x86_64.pkg.tar 6.2M 111.20% intellij-idea-community-edition-2:2018.3.5-1 8.1M 102.10% That is a decent increase. Are the times given for decompress just a decompress, not install by pacman? Were they done on a SSD or pointed at /dev/null? (I assume not a spinning disk given the times) Allan
Hello again, after archange and Baptiste mentioned that the numbers look a little odd, I took some more time and re-ran the tests with additional parameters. Most notably, this includes -T2 - to show behavior on lower-spec machines, and it fixes the higher compression levels by appending --ultra. Here are the new results: Compressor Package Name Size (MiB) Comp. Size (MiB) Ratio Time (mm:ss) Max. RSS in MiB Decomp. Time (mm:ss) Decomp. RSS in MiB xz -c -z - cuda 3038,58 1316,93 43,34% 19:03.44 95,32 1:19.74 10,18 zstd -c -T2 -18 - cuda 3038,58 1375,41 45,26% 7:10.76 373,53 0:04.49 10,70 zstd -c -T0 -18 - cuda 3038,58 1375,41 45,26% 1:17.23 2646,19 0:04.41 10,76 zstd -c -T2 -19 - cuda 3038,58 1371,94 45,15% 9:08.09 420,43 0:04.43 10,68 zstd -c -T0 -19 - cuda 3038,58 1371,94 45,15% 1:34.74 3415,77 0:04.51 10,75 zstd -c -T2 --ultra -20 - cuda 3038,58 1286,91 42,35% 10:05.19 1255,64 0:04.46 34,78 zstd -c -T0 --ultra -20 - cuda 3038,58 1286,91 42,35% 1:57.94 8192,42 0:04.43 34,76 zstd -c -T2 --ultra -21 - cuda 3038,58 1141,94 37,58% 10:37.84 2404,56 0:04.11 66,73 zstd -c -T0 --ultra -21 - cuda 3038,58 1141,94 37,58% 2:58.45 8035,52 0:04.08 66,77 xz -c -z - gcc 135,54 33,11 24,43% 0:54.54 95,34 0:02.59 10,11 zstd -c -T2 -18 - gcc 135,54 35,87 26,47% 0:23.39 255,13 0:00.27 10,76 zstd -c -T0 -18 - gcc 135,54 35,87 26,47% 0:12.42 419,35 0:00.24 10,81 zstd -c -T2 -19 - gcc 135,54 35,66 26,31% 0:30.34 319,03 0:00.24 10,45 zstd -c -T0 -19 - gcc 135,54 35,66 26,31% 0:16.07 579,00 0:00.24 10,73 zstd -c -T2 --ultra -20 - gcc 135,54 24,38 17,99% 0:51.69 484,32 0:00.20 34,63 zstd -c -T0 --ultra -20 - gcc 135,54 24,38 17,99% 0:51.79 484,66 0:00.20 34,73 zstd -c -T2 --ultra -21 - gcc 135,54 22,89 16,89% 1:10.22 481,77 0:00.22 66,71 zstd -c -T0 --ultra -21 - gcc 135,54 22,89 16,89% 1:10.39 482,17 0:00.21 66,65 xz -c -z - go 484,10 122,10 25,22% 3:19.11 95,35 0:08.78 10,16 zstd -c -T2 -18 - go 484,10 132,69 27,41% 1:20.42 292,36 0:00.78 10,75 zstd -c -T0 -18 - go 484,10 132,69 27,41% 0:15.42 1402,87 0:00.78 10,79 zstd -c -T2 -19 - go 484,10 131,84 27,23% 1:46.85 352,77 0:00.79 10,75 zstd -c -T0 -19 - go 484,10 131,84 27,23% 0:20.13 1914,13 0:00.80 10,75 zstd -c -T2 --ultra -20 - go 484,10 121,87 25,17% 1:58.00 879,29 0:00.83 34,68 zstd -c -T0 --ultra -20 - go 484,10 121,87 25,17% 1:07.37 1252,75 0:00.84 34,71 zstd -c -T2 --ultra -21 - go 484,10 112,18 23,17% 2:09.79 1240,84 0:00.82 66,73 zstd -c -T0 --ultra -21 - go 484,10 112,18 23,17% 2:09.70 1241,08 0:00.81 66,80 xz -c -z - intellij-* 772,46 384,37 49,76% 4:53.01 95,31 0:28.69 10,18 zstd -c -T2 -18 - intellij-* 772,46 392,44 50,80% 1:51.91 342,91 0:00.94 10,71 zstd -c -T0 -18 - intellij-* 772,46 392,44 50,80% 0:20.50 2341,05 0:00.93 10,70 zstd -c -T2 -19 - intellij-* 772,46 391,04 50,62% 2:40.44 407,06 0:00.94 10,82 zstd -c -T0 -19 - intellij-* 772,46 391,04 50,62% 0:28.37 3107,88 0:00.95 10,76 zstd -c -T2 --ultra -20 - intellij-* 772,46 380,38 49,24% 3:19.46 1182,29 0:01.04 34,73 zstd -c -T0 --ultra -20 - intellij-* 772,46 380,38 49,24% 1:28.10 2282,25 0:01.03 34,72 zstd -c -T2 --ultra -21 - intellij-* 772,46 374,19 48,44% 4:07.18 1788,19 0:01.06 66,81 zstd -c -T0 --ultra -21 - intellij-* 772,46 374,19 48,44% 2:31.77 2433,28 0:01.04 66,73 xz -c -z - linux 80,15 70,66 88,17% 0:31.27 95,35 0:03.85 10,11 zstd -c -T2 -18 - linux 80,15 70,22 87,62% 0:09.94 250,32 0:00.06 10,56 zstd -c -T0 -18 - linux 80,15 70,22 87,62% 0:07.52 299,25 0:00.06 10,72 zstd -c -T2 -19 - linux 80,15 70,18 87,56% 0:12.90 314,25 0:00.05 10,57 zstd -c -T0 -19 - linux 80,15 70,18 87,56% 0:09.32 395,17 0:00.06 10,71 zstd -c -T2 --ultra -20 - linux 80,15 70,18 87,56% 0:22.64 313,47 0:00.08 34,61 zstd -c -T0 --ultra -20 - linux 80,15 70,18 87,56% 0:22.69 313,82 0:00.08 34,68 zstd -c -T2 --ultra -21 - linux 80,15 70,17 87,55% 0:27.01 473,56 0:00.09 66,64 zstd -c -T0 --ultra -21 - linux 80,15 70,17 87,55% 0:26.97 473,89 0:00.09 66,71 xz -c -z - linux-headers 103,85 17,02 16,39% 0:42.24 95,35 0:01.45 10,15 zstd -c -T2 -18 - linux-headers 103,85 18,92 18,22% 0:19.51 218,35 0:00.17 10,48 zstd -c -T0 -18 - linux-headers 103,85 18,92 18,22% 0:12.74 320,92 0:00.17 10,47 zstd -c -T2 -19 - linux-headers 103,85 18,88 18,18% 0:24.42 282,43 0:00.16 10,67 zstd -c -T0 -19 - linux-headers 103,85 18,88 18,18% 0:16.28 448,92 0:00.17 10,59 zstd -c -T2 --ultra -20 - linux-headers 103,85 18,77 18,08% 0:43.86 286,13 0:00.19 34,74 zstd -c -T0 --ultra -20 - linux-headers 103,85 18,77 18,08% 0:44.00 286,32 0:00.19 34,79 zstd -c -T2 --ultra -21 - linux-headers 103,85 18,70 18,00% 1:03.41 445,96 0:00.20 66,61 zstd -c -T0 --ultra -21 - linux-headers 103,85 18,70 18,00% 1:03.29 446,32 0:00.20 66,66 xz -c -z - tensorflow 303,10 55,58 18,34% 1:59.56 95,40 0:04.78 10,27 zstd -c -T2 -18 - tensorflow 303,10 61,83 20,40% 0:54.04 277,06 0:00.48 10,72 zstd -c -T0 -18 - tensorflow 303,10 61,83 20,40% 0:15.64 856,86 0:00.47 10,61 zstd -c -T2 -19 - tensorflow 303,10 61,49 20,29% 1:15.56 340,99 0:00.48 10,75 zstd -c -T0 -19 - tensorflow 303,10 61,49 20,29% 0:20.82 1176,75 0:00.49 10,68 zstd -c -T2 --ultra -20 - tensorflow 303,10 60,63 20,00% 1:30.19 678,34 0:00.53 34,67 zstd -c -T0 --ultra -20 - tensorflow 303,10 60,63 20,00% 1:11.32 849,60 0:00.54 34,68 zstd -c -T2 --ultra -21 - tensorflow 303,10 59,98 19,79% 2:42.81 1007,56 0:00.54 66,47 zstd -c -T0 --ultra -21 - tensorflow 303,10 59,98 19,79% 2:43.03 1007,95 0:00.55 66,70 The new results show that -20 is actually more beneficial for our goals, as it: - Actually reduces the size compared to xz more often - While not as fast, still beats xz in compression time - Only increases the decompressor memory usage negligibly - Maintains similar decompression speed to the other levels TL;DR: Benefits: - Faster - Often smaller or similar to xz in size, an improvement over -18 either way - Still reproducible :) Trade-offs: - Minimal increase in decompressor memory usage, but we're talking 50 MiB here. - Increase in memory usage during compression, however the important part is that memory usage scales with the amount of threads used. Given that low-end systems can simply change the thread allocation to 1 or 2 to slash the compressor memory usage as a trade-off on speed, i don't think that is a problem. New changeset: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 -20 -) This would hopefully address the concerns over the filesize increase, while still maintaining most of the benefits. Regards, Rob (coderobe)
On 3/25/19 12:13 AM, Robin Broda via arch-dev-public wrote:
Given that low-end systems can simply change the thread allocation to 1 or 2 to slash the compressor memory usage as a trade-off on speed, i don't think that is a problem.
New changeset: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 -20 -)
Wouldn't this require allowing COMPRESSZST to be set in /etc/makepkg.conf and read by makechrootpkg? Currently makechrootpkg accepts MAKEFLAGS for the same general purpose except for the build stage. Given that zstd -T is reproducible at any level, but -20 is not, it would need to check for and reject settings that contain an invalid compression level. Alternatively, I had proposed a patch to pacman-dev which would save more information like this, and could, I suppose, be extended to cover the compression flags as well. This would allow packagers to set whatever they wanted and be fully reproducible-builds.org, even, which seems like it would solve everyone's problems. -- Eli Schwartz Bug Wrangler and Trusted User
Hello everyone, Now that Zstd 1.4.4 has been out, and released into our repos as well, i think it's time for a new status report on this. I re-ran the benchmarks with the new zstd, and we are hitting marginally better times in compression & decompression in all scenarios. In the past few weeks, other team members and I have taken a look at our own projects, and what would be needed to finally transition to zstd. Notable progress: - A news post was made by eworm, indicating that users should upgrade if they haven't done so since September 2018 (https://www.archlinux.org/news/required-update-to-recent-libarchive/) - dbscripts has been updated by eschwartz to support zstd What still needs to be done: - infrastructure: - nginx rewrite rules hardcode xz PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166... - archive config(?) hardcodes PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166... - potentially more things, someone from devops should take a look - archivetools: https://github.com/archlinux/archivetools/issues/6 - namcap? some files refer to a hardcoded .xz, though jelle concluded that this is irrelevant - someone with namcap knowledge should look into this - srcpac (is this still used?) - hardcodes 'pkg.tar.?z' - kde-build hardcodes pkg.tar.xz https://git.archlinux.org/kde-build.git/tree/build-packages?id=cda04698f4064... - devtools: - hardcodes 'pkg.tar?(.?z)' https://git.archlinux.org/devtools.git/tree/lib/common.sh?id=2c611d20bdd04fe... - more than one occurrence of this iirc There might be things I've missed. I encourage every Arch Linux project maintainer to check their own code for hardcoded xz extensions, you probably know your code best. As soon as these things are out of the way, we can proceed with the proposal. The changeset proposal remains on these settings: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 --ultra -20 -) -- Rob (coderobe) O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
On 12/8/19 7:39 AM, Robin Broda via arch-dev-public wrote:
Hello everyone,
Now that Zstd 1.4.4 has been out, and released into our repos as well, i think it's time for a new status report on this.
I re-ran the benchmarks with the new zstd, and we are hitting marginally better times in compression & decompression in all scenarios.
In the past few weeks, other team members and I have taken a look at our own projects, and what would be needed to finally transition to zstd. Notable progress: - A news post was made by eworm, indicating that users should upgrade if they haven't done so since September 2018 (https://www.archlinux.org/news/required-update-to-recent-libarchive/) - dbscripts has been updated by eschwartz to support zstd
What still needs to be done: - infrastructure: - nginx rewrite rules hardcode xz PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166... - archive config(?) hardcodes PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166... - potentially more things, someone from devops should take a look - archivetools: https://github.com/archlinux/archivetools/issues/6 - namcap? some files refer to a hardcoded .xz, though jelle concluded that this is irrelevant - someone with namcap knowledge should look into this
namcap is indeed irrelevant. It hardcodes 'xz' in three places: - the README - the makepkg.conf used for building packages within the namcap self-test suite - the case statement in the main script, which detects files ending in .xz, .zst, and other valid extensions, in order to choose the right decompressor before finally handing the uncompressed .tar to namcap itself
- srcpac (is this still used?) - hardcodes 'pkg.tar.?z'
Definitely not still used. :p It hasn't been developed in 5 years. It also still hardcodes the `abs` program.
- kde-build hardcodes pkg.tar.xz https://git.archlinux.org/kde-build.git/tree/build-packages?id=cda04698f4064... - devtools: - hardcodes 'pkg.tar?(.?z)' https://git.archlinux.org/devtools.git/tree/lib/common.sh?id=2c611d20bdd04fe... - more than one occurrence of this iirc
There might be things I've missed. I encourage every Arch Linux project maintainer to check their own code for hardcoded xz extensions, you probably know your code best.
As soon as these things are out of the way, we can proceed with the proposal.
The changeset proposal remains on these settings: PKGEXT='.pkg.tar.zst' COMPRESSZST=(zstd -c -T0 --ultra -20 -)
-- Eli Schwartz Bug Wrangler and Trusted User
Hello again, We are on our way to getting zstd merged. The planned merge window for this is **2019/12/27** at around 20:00 Europe/Berlin time. Several team members will be meeting at the 36c3, so we figured this is a good opportunity to do this, as it makes sure that several people are aware, ready, and together for this - should anything immediately explode. This has been ACK'd by anthraxx. Foxboron is also aware. I have been mentioning this on IRC, and discussed it with anthraxx via e-mail. This mail serves to formalize the plans. **There is one hard requirement on the merge happening:** All critical Arch Linux production infrastructure needs to be ready. The known-unsolved problems are labelled below, please see to them if any of you can. I will also take a look myself, but I do not have much involvement with these parts of the puzzle, so I would really appreciate someone with more in-depth knowledge taking a look. If the known issues are solved and nothing major comes up, the deployment will happen as planned. On 12/8/19 1:39 PM, Robin Broda via arch-dev-public wrote:
What still needs to be done: - infrastructure: - nginx rewrite rules hardcode xz PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166... - archive config(?) hardcodes PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166...
Unsolved: This needs fixing. I would *really* like someone from devtools to take a look at this, but if unsolved i will also attempt to fix it.
- archivetools: https://github.com/archlinux/archivetools/issues/6
Unsolved: This needs fixing. If nobody else gets to it, I'll see to it myself.
- namcap? some files refer to a hardcoded .xz, though jelle concluded that this is irrelevant
Solved: This is indeed irrelevant (see Eli's reply)
- srcpac (is this still used?)
Solved: This is indeed a dead project (see Eli's reply)
- kde-build hardcodes pkg.tar.xz https://git.archlinux.org/kde-build.git/tree/build-packages?id=cda04698f4064...
Solved: arojas said this will be fixed after zstd is deployed. This is also non-critical.
- devtools: Solved: Patches ready, will be merged with the switcharoo.
I encourage every Arch Linux project maintainer to check their own code for hardcoded xz extensions, you probably know your code best. Please -- Rob (coderobe) O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
On 12/14/19 1:58 PM, Robin Broda via arch-dev-public wrote:
Hello again,
We are on our way to getting zstd merged.
The planned merge window for this is **2019/12/27** at around 20:00 Europe/Berlin time. Several team members will be meeting at the 36c3, so we figured this is a good opportunity to do this, as it makes sure that several people are aware, ready, and together for this - should anything immediately explode.
This has been ACK'd by anthraxx. Foxboron is also aware. I have been mentioning this on IRC, and discussed it with anthraxx via e-mail. This mail serves to formalize the plans.
**There is one hard requirement on the merge happening:**
All critical Arch Linux production infrastructure needs to be ready. The known-unsolved problems are labelled below, please see to them if any of you can. I will also take a look myself, but I do not have much involvement with these parts of the puzzle, so I would really appreciate someone with more in-depth knowledge taking a look.
If the known issues are solved and nothing major comes up, the deployment will happen as planned.
On 12/8/19 1:39 PM, Robin Broda via arch-dev-public wrote:
What still needs to be done: - infrastructure: - nginx rewrite rules hardcode xz PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166... - archive config(?) hardcodes PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166...
Unsolved: This needs fixing. I would *really* like someone from devtools to take a look at this, but if unsolved i will also attempt to fix it.
- archivetools: https://github.com/archlinux/archivetools/issues/6
Unsolved: This needs fixing. If nobody else gets to it, I'll see to it myself.
Oof, this looks a bit confusing. It's using the $PKGEXT variable as both a /usr/bin/find glob pattern and a sed injection (counting the number of characters and trimming them off using .{number} !) See https://github.com/archlinux/dbscripts/pull/4 for inspiration in working around this misuse. -- Eli Schwartz Bug Wrangler and Trusted User
On 12/14/19 7:58 PM, Robin Broda via arch-dev-public wrote:
On 12/8/19 1:39 PM, Robin Broda via arch-dev-public wrote:
What still needs to be done: - infrastructure: - nginx rewrite rules hardcode xz PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166... - archive config(?) hardcodes PKGEXT https://github.com/archlinux/infrastructure/blob/7d0ad69030982875f862bc49166...
Unsolved: This needs fixing. I would *really* like someone from devtools to take a look at this, but if unsolved i will also attempt to fix it.
Gah, s/devtools/devops/ -- Rob (coderobe) O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
On 12/14/19 7:58 PM, Robin Broda via arch-dev-public wrote:
Hello again,
We are on our way to getting zstd merged.
The planned merge window for this is **2019/12/27** at around 20:00 Europe/Berlin time. Several team members will be meeting at the 36c3, so we figured this is a good opportunity to do this, as it makes sure that several people are aware, ready, and together for this - should anything immediately explode.
This has been ACK'd by anthraxx. Foxboron is also aware. I have been mentioning this on IRC, and discussed it with anthraxx via e-mail. This mail serves to formalize the plans.
**There is one hard requirement on the merge happening:**
All critical Arch Linux production infrastructure needs to be ready. The known-unsolved problems are labelled below, please see to them if any of you can. I will also take a look myself, but I do not have much involvement with these parts of the puzzle, so I would really appreciate someone with more in-depth knowledge taking a look.
If the known issues are solved and nothing major comes up, the deployment will happen as planned.
All known issues have been dealt with. We are waiting on the patches to land, thus: - https://www.archlinux.org/packages/extra/any/namcap/ being upgraded to 3.2.9 - https://github.com/archlinux/archivetools/pull/8 being merged - https://github.com/coderobe/infrastructure/commit/7155521ed6f823ac6c6d06d1b9... being picked into infrastructure.git - this *depends on* the archivetools patch - archive-cleaner is not mission-critical, it does not need to be patched before the scheduled deployment - Foxboron is working on a patch These can all be dealt with during the deployment if they do not happen beforehand anyways. This means the deployment will happen as planned. -- Rob (coderobe) O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
Em dezembro 15, 2019 15:22 Robin Broda via arch-dev-public escreveu:
All known issues have been dealt with. We are waiting on the patches to land, thus: - https://www.archlinux.org/packages/extra/any/namcap/ being upgraded to 3.2.9 - https://github.com/archlinux/archivetools/pull/8 being merged - https://github.com/coderobe/infrastructure/commit/7155521ed6f823ac6c6d06d1b9... being picked into infrastructure.git - this *depends on* the archivetools patch - archive-cleaner is not mission-critical, it does not need to be patched before the scheduled deployment - Foxboron is working on a patch
These can all be dealt with during the deployment if they do not happen beforehand anyways. This means the deployment will happen as planned.
Hi All, We seem to be getting .zst packages just fine. The following packages are on zstd format already: git kdevelop remind ipython chromium exim libva perl-xml-sax clang llvm llvm lib32-llvm nodejs ldc youtube-viewer cuda xfce4-terminal qt5-tools llvm lib32-llvm mpv qtcreator jenkins texstudio python-keyring ldc lldb you-get openmp python-keyrings-alt log4cplus acorn libfm-qt min matrix-synapse lld udftools v2ray compiler-rt trojan netcdf-openmpi shiboken2 shiboken2 python-parameterized parity-ethereum cbindgen ibm-sw-tpm2 golang-gopkg-check.v1 golang-golang-x-sys golang-golang-x-crypto golang-golang-x-image python-cfn-lint python-dephell mill python-gdspy tpm2-totp python-poetry ruby-public_suffix ruby-ffi devtools python-zstandard terraform-provider-keycloak wishbone-utils s-nail python-rdflib-jsonld libarchive grub openvpn Also, this is the order they were added to the repositories, so, kudos Christian for sending our very first .zst package. We haven't had any complaints either, so I think everything is good. Regards, Giancarlo Razzolini
On 12/31/19 10:04 AM, Giancarlo Razzolini via arch-dev-public wrote:
Em dezembro 15, 2019 15:22 Robin Broda via arch-dev-public escreveu:
All known issues have been dealt with. We are waiting on the patches to land, thus: - https://www.archlinux.org/packages/extra/any/namcap/ being upgraded to 3.2.9 - https://github.com/archlinux/archivetools/pull/8 being merged - https://github.com/coderobe/infrastructure/commit/7155521ed6f823ac6c6d06d1b9... being picked into infrastructure.git - this *depends on* the archivetools patch - archive-cleaner is not mission-critical, it does not need to be patched before the scheduled deployment - Foxboron is working on a patch
These can all be dealt with during the deployment if they do not happen beforehand anyways. This means the deployment will happen as planned.
Hi All,
We seem to be getting .zst packages just fine. The following packages are on zstd format already:
git kdevelop remind ipython chromium exim libva perl-xml-sax clang llvm llvm lib32-llvm nodejs ldc youtube-viewer cuda xfce4-terminal qt5-tools llvm lib32-llvm mpv qtcreator jenkins texstudio python-keyring ldc lldb you-get openmp python-keyrings-alt log4cplus acorn libfm-qt min matrix-synapse lld udftools v2ray compiler-rt trojan netcdf-openmpi shiboken2 shiboken2 python-parameterized parity-ethereum cbindgen ibm-sw-tpm2 golang-gopkg-check.v1 golang-golang-x-sys golang-golang-x-crypto golang-golang-x-image python-cfn-lint python-dephell mill python-gdspy tpm2-totp python-poetry ruby-public_suffix ruby-ffi devtools python-zstandard terraform-provider-keycloak wishbone-utils s-nail python-rdflib-jsonld libarchive grub openvpn
Also, this is the order they were added to the repositories, so, kudos Christian for sending our very first .zst package. We haven't had any complaints either, so I think everything is good.
Regards, Giancarlo Razzolini
Awesome!! Congrats to everyone involved, love it when things come together smoothly :) Regards, Andrew
Em dezembro 31, 2019 12:08 Andrew Crerar escreveu:
On 12/31/19 10:04 AM, Giancarlo Razzolini via arch-dev-public wrote:
Just an addendum, the ordering on my previous email was wrong. The first zst package we got (after devtools, ofc) was texstudio. So, it's Sven that should get the kudos. Regards, Giancarlo Razzolini
participants (14)
-
Allan McRae
-
Andrew Crerar
-
Andrew Gregory
-
Baptiste Jonglez
-
Christian Hesse
-
Christian Rebischke
-
Eli Schwartz
-
Evangelos Foutras
-
Gaetan Bisson
-
Giancarlo Razzolini
-
Jan Alexander Steffens
-
Morten Linderud
-
Robin Broda
-
Sven-Hendrik Haase