[arch-projects] [PATCH] [devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf
This allows using faster multicore compression tools like pixz Signed-off-by: Maxim Baz <archlinux@maximbaz.com> --- archbuild.in | 2 +- makechrootpkg.in | 18 +++++++++++++----- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/archbuild.in b/archbuild.in index bd5706d..e3d53ec 100644 --- a/archbuild.in +++ b/archbuild.in @@ -39,7 +39,7 @@ while getopts 'hcr:' arg; do esac done -check_root SOURCE_DATE_EPOCH,SRCDEST,SRCPKGDEST,PKGDEST,LOGDEST,MAKEFLAGS,PACKAGER,GNUPGHOME +check_root SOURCE_DATE_EPOCH,SRCDEST,SRCPKGDEST,PKGDEST,LOGDEST,MAKEFLAGS,PACKAGER,GNUPGHOME,COMPRESSGZ,COMPRESSXZ,COMPRESSLRZ,COMPRESSZ,COMPRESSBZ2,COMPRESSLZO # Pass all arguments after -- right to makepkg makechrootpkg_args+=("${@:$OPTIND}") diff --git a/makechrootpkg.in b/makechrootpkg.in index 4b72a36..e4b7ca3 100644 --- a/makechrootpkg.in +++ b/makechrootpkg.in @@ -29,9 +29,9 @@ usage() { echo 'command:' echo ' mkarchroot <chrootdir>/root base-devel' echo '' - echo 'This script reads {SRC,SRCPKG,PKG,LOG}DEST, MAKEFLAGS and PACKAGER' - echo 'from makepkg.conf(5), if those variables are not part of the' - echo 'environment.' + echo 'This script reads {SRC,SRCPKG,PKG,LOG}DEST, MAKEFLAGS, PACKAGER' + echo 'and COMPRESS{{G,X,LR,}Z,BZ2,LZO} from makepkg.conf(5),' + echo 'if those variables are not part of the environment.' echo '' echo "Default makepkg args: ${default_makepkg_args[*]}" echo '' @@ -63,12 +63,18 @@ usage() { # - LOGDEST # - MAKEFLAGS # - PACKAGER +# - COMPRESSGZ +# - COMPRESSXZ +# - COMPRESSLRZ +# - COMPRESSZ +# - COMPRESSBZ2 +# - COMPRESSLZO load_vars() { local makepkg_conf="$1" var [[ -f $makepkg_conf ]] || return 1 - for var in {SRC,SRCPKG,PKG,LOG}DEST MAKEFLAGS PACKAGER; do + for var in {SRC,SRCPKG,PKG,LOG}DEST MAKEFLAGS PACKAGER COMPRESS{{G,X,LR,}Z,BZ2,LZO}; do [[ -z ${!var:-} ]] && eval "$(grep -a "^${var}=" "$makepkg_conf")" done @@ -185,7 +191,9 @@ prepare_chroot() { sed -e '/^MAKEFLAGS=/d' -e '/^PACKAGER=/d' -i "$copydir/etc/makepkg.conf" for x in BUILDDIR=/build PKGDEST=/pkgdest SRCPKGDEST=/srcpkgdest SRCDEST=/srcdest LOGDEST=/logdest \ - "MAKEFLAGS='${MAKEFLAGS:-}'" "PACKAGER='${PACKAGER:-}'" + "MAKEFLAGS='${MAKEFLAGS:-}'" "PACKAGER='${PACKAGER:-}'" "COMPRESSGZ=(${COMPRESSGZ[*]:-})" \ + "COMPRESSXZ=(${COMPRESSXZ[*]:-})" "COMPRESSLRZ=(${COMPRESSLRZ[*]:-})" "COMPRESSZ=(${COMPRESSZ[*]:-})" \ + "COMPRESSBZ2=(${COMPRESSBZ2[*]:-})" "COMPRESSLZO=(${COMPRESSLZO[*]:-})" do grep -q "^$x" "$copydir/etc/makepkg.conf" && continue echo "$x" >>"$copydir/etc/makepkg.conf" -- 2.21.0
I did see an earlier patch proposing the same, which was rejected because it might break delta packages [1]. However, someone mentioned that pacman is dropping support for delta packages altogether [2] referring to a removal commit in pacman repo [3], so I wanted to bring this up again :) I'm creating a new thread because according to my tests the earlier patch [1] is incomplete. 1: https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html 2: https://github.com/AladW/aurutils/issues/547 3: https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e854... -- Maxim Baz
On March 18, 2019 10:56:34 AM GMT+01:00, Maxim Baz via arch-projects <arch-projects@archlinux.org> wrote:
I did see an earlier patch proposing the same, which was rejected because it might break delta packages [1]. However, someone mentioned that pacman is dropping support for delta packages altogether [2] referring to a removal commit in pacman repo [3], so I wanted to bring this up again :)
I'm creating a new thread because according to my tests the earlier patch [1] is incomplete.
1: https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html 2: https://github.com/AladW/aurutils/issues/547 3: https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e854...
--
Messing with this in devtools used for building official repo packages kills reproducible builds when not agreed on very specific flags and tools to compress. This is not a good idea to do and we won't kill reproducibility by going wild west.
On Mon, Mar 18, 2019 at 10:58:57AM +0100, Levente Polyak via arch-projects wrote:
Messing with this in devtools used for building official repo packages kills reproducible builds when not agreed on very specific flags and tools to compress. This is not a good idea to do and we won't kill reproducibility by going wild west.
True, sad but valid point. I've just confirmed that pixz produces different results when run twice on the same data, so I can't even propose to just switch the default to pixz without breaking reproducible builds :( Thanks for the quick feedback! -- Maxim Baz
On 3/18/19 5:56 AM, Maxim Baz via arch-projects wrote:
I did see an earlier patch proposing the same, which was rejected because it might break delta packages [1]. However, someone mentioned that pacman is dropping support for delta packages altogether [2] referring to a removal commit in pacman repo [3], so I wanted to bring this up again :)
I'm creating a new thread because according to my tests the earlier patch [1] is incomplete.
1: https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html 2: https://github.com/AladW/aurutils/issues/547 3: https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e854...
Well, I would ideally like to add delta packages back into pacman, whether that is via xdelta3 or something else, and it is also worth noting it got removed at least in large part because archlinux.org itself has never really used it, and therefore: - it doesn't exactly get well tested - for our use cases we don't ultimately notice the difference (I would like to one day add delta updates to dbscripts. :( But before that happens, I would need to convince Allan with a tested, safe delta implementation that someone actually understands and is willing to maintain.) Supporting packages in the official repos which are not byte-identical to the default COMPRESSXZ makes it quite difficult to ever add delta support back. ... The fundamental problem behind delta updates remains the same, though, which is that you cannot actually generate the same package twice using the same data (even from a byte-identical compressed tar, which is basically what xdelta3 did). As Levente pointed out, this completely nukes reproducible builds. And it is not a pixz issue either. For example xz -T $num will compress with up to $num cores, as opposed to the default 1, but basic testing shows that it produces the same output with 2, 3, 4 etc. cores but different output from what you get when using 1 core. You cannot even hardcode -T0, since (in addition to requiring packagers to use more cores than they want to) that may still get interpreted as 1, which is then unreproducible. You can verify this using taskset -c 1... (This has been discussed already in #archlinux-pacman.) Without a guarantee that a given compressor will always be reproducible no matter how many cores are used, we cannot take advantage of multithreaded compression on any level. More generally, without a variation on https://patchwork.archlinux.org/patch/692/ we cannot take advantage of any sort of configurability. But my patch was rejected. Specific sub-issues with unknown configurability that isn't recorded in the .BUILDINFO: User-configurable implementations. Without a guarantee that any implementation of a format retains byte-level compatibility with the standard format (in this case xz), we cannot support this as we do not know what was being used. User-configurable options. These are inherently unreproducible as long as they include the compression level. ... For the topic of configurable implementations, there is an additional hurdle. Since compression happens inside the chroot, even if we did support it, it would need some way to look up the required binary and install it in the chroot, but we have no tooling for this and in fact makepkg does not even know how to check that the binary exists, so you will probably just get: -> Compressing package... /usr/share/makepkg/util/compress.sh: line 39: pixz: command not found bsdtar: Write error ==> ERROR: Failed to create package file. The solution would have to require either a known, embedded list of "good" ones (if pixz is not reproducible when running the same command line twice then we'd need to be able to blacklist that), or else assume the user installed it when adding it to their makepkg.conf and using pacman -Qo. It is probably overkill to use pkgfile and support compressors that aren't installed on the host either... -- Eli Schwartz Bug Wrangler and Trusted User
Thanks for the detailed reply Eli :) Before I go further, I should mention that this thread has sparked some interest among our teammates, and folks on IRC are researching now a possibility of switching from xz to zstd entirely. So I think we will return to this topic soon with some updates :)
And it is not a pixz issue either. For example xz -T $num will compress with up to $num cores, as opposed to the default 1, but basic testing shows that it produces the same output with 2, 3, 4 etc. cores but different output from what you get when using 1 core. You cannot even hardcode -T0, since (in addition to requiring packagers to use more cores than they want to) that may still get interpreted as 1, which is then unreproducible. You can verify this using taskset -c 1...
(This has been discussed already in #archlinux-pacman.)
Turns out it is possible to make xz use more threads and produce reproducible results, check out this message: https://lists.debian.org/debian-dpkg/2016/10/msg00011.html I've confirmed (with real VMs) that -T2, -T3, -T8, all produce the same result, on both single-core and multi-core CPUs, and -T0 produces the same result on multi-core CPU, but -T0 produces a different result on a single-core CPU. Therefore, the simplest workaround that produces reproducible results on all types of CPU is this: -T$([ $(nproc) -lt 2 ] && echo '2' || echo '0') Or maybe even something like this: -T$(( $(nproc) + 1 ))
For the topic of configurable implementations, there is an additional hurdle. Since compression happens inside the chroot, even if we did support it, it would need some way to look up the required binary and install it in the chroot, but we have no tooling for this and in fact makepkg does not even know how to check that the binary exists, so you will probably just get:
-> Compressing package... /usr/share/makepkg/util/compress.sh: line 39: pixz: command not found bsdtar: Write error ==> ERROR: Failed to create package file.
The solution would have to require either a known, embedded list of "good" ones (if pixz is not reproducible when running the same command line twice then we'd need to be able to blacklist that), or else assume the user installed it when adding it to their makepkg.conf and using pacman -Qo. It is probably overkill to use pkgfile and support compressors that aren't installed on the host either...
That is correct. One final remark, pixz does produce the same result when is run twice, I made a mistake when I was testing it. But it doesn't matter, since we'll probably settle for zstd or xz -T. -- Maxim Baz
participants (3)
-
Eli Schwartz
-
Levente Polyak
-
Maxim Baz