Arch Linux minimal container userland 100% reproducible - now what?
hello, in last week's email to the reproducible-builds email list[1] about reproducible Arch Linux I mentioned there's only one unreproducible package left in docker.io/library/archlinux. [1]: https://lists.reproducible-builds.org/pipermail/rb-general/2024-March/003291... Due to amazing work by dvzrv and Foxboron this package is now also reproducible! INFO arch_repro_status > All packages are reproducible! INFO arch_repro_status > Your system is 100.00% reproducible. To try for yourself use: podman run --rm -t archlinux sh -c 'pacman -Suy arch-repro-status --noconfirm && arch-repro-status' However: Where do we go from here? It would be cool if the OCI container image itself could also be reproduced (bit-for-bit), but I'm not sure if there's any prior work (specifically for images listed as 'official' on Docker Hub)? Specifically what I mean - given a line like this: FROM archlinux@sha256:2dbd72d1e5510e047db7f441bf9069e9c53391b87e04e5bee3f379cd03cec060 I want to reproduce the artifact(s) that are pulled in by this, with the packages our Arch Linux rebuilders have reproduced from source code. From what I understand this hash points to a json manifest that is not contained in the container image itself and was generated by the registry (should we archive them?), and this manifest then points to the sha256 of the tar containing the filesystem (I'm possibly missing an indirection here). Hopefully one of the many SBOM formats can help with this. :P I know the container image is built from these two repositories but I don't have any in-depth knowledge: - https://github.com/docker-library/official-images/blob/master/library/archli... - https://gitlab.archlinux.org/archlinux/archlinux-docker The only work towards reproducible container images I'm aware of is by Akihiro Suda: https://github.com/reproducible-containers/repro-get#are-container-images-bi... I'm suspecting the current scripts used by Arch Linux would still be prone to mirror changes[2] though, meaning new package uploads would end up in our reproduced artifacts (causing mismatches) and the container image could only be reproduced for a short amount of time. [2]: https://gitlab.archlinux.org/archlinux/archlinux-docker/-/blob/98cd79111dd53... I'm also not sure if there's a missing puzzle piece with reproducible containers in regards to this manifest json that is generated by the registry. The image digest being unpredictable has also been mentioned in a cosign github issue[3]. [3]: https://github.com/sigstore/cosign/issues/2516 Input much appreciated! ## Caveats Probably worth mentioning, at the time of writing there's no consensus across multiple orgs yet, the https://reproducible.archlinux.org instance reports this status, two other rebuilders don't report the full 100% yet. $ arch-repro-status -r https://reproducible.crypto-lab.ch [...] INFO arch_repro_status > 3/118 packages are not reproducible. INFO arch_repro_status > Your system is 97.46% reproducible. $ arch-repro-status -r https://wolfpit.net/rebuild [...] INFO arch_repro_status > 3/118 packages are not reproducible. INFO arch_repro_status > Your system is 97.46% reproducible. The packages in question are part of this rebuild todo (specifically gcc-libs, glibc, ncurses): https://archlinux.org/todo/rebuild-core-with-reproducible-pacman/ Meaning there's currently some luck involved for these 3 packages, e.g. using btrfs currently increases your chances to get an exact match (after a few tries). We're obviously trying to get rid of this caveat though. --- If you appreciate this flavor of supply-chain security you may be interested in repro-env[4] that I'm currently trying to land[5] in ubuntu 24.04 LTS, but is blocked by Debian's libnettle[6]. [4]: https://github.com/kpcyrd/repro-env [5]: https://tracker.debian.org/pkg/rust-repro-env [6]: https://tracker.debian.org/pkg/nettle cheers, kpcyrd
Hey! On 20/03/2024 13:42, kpcyrd wrote:
INFO arch_repro_status > All packages are reproducible! INFO arch_repro_status > Your system is 100.00% reproducible.
\o/
Where do we go from here? It would be cool if the OCI container image itself could also be reproduced (bit-for-bit), but I'm not sure if there's any prior work (specifically for images listed as 'official' on Docker Hub)?
What would interest me a lot is if we gate the creation of the image on this, but of course that will not be a very trivial process. Basically, we only publish an image if it is reproducible.
Input much appreciated!
Don't know much about reproducing OCI images, but another interesting goal would be reproducing the other images we produce: https://mirror.pkgbuild.com/images/latest/ Likely needs a way to store the embedded packages. Greetings, Jelle
On 20.03.2024 13.42, kpcyrd wrote:
Where do we go from here? It would be cool if the OCI container image itself could also be reproduced (bit-for-bit), but I'm not sure if there's any prior work (specifically for images listed as 'official' on Docker Hub)?
Specifically what I mean - given a line like this:
FROM archlinux@sha256:2dbd72d1e5510e047db7f441bf9069e9c53391b87e04e5bee3f379cd03cec060
I want to reproduce the artifact(s) that are pulled in by this, with the packages our Arch Linux rebuilders have reproduced from source code. From what I understand this hash points to a json manifest that is not contained in the container image itself and was generated by the registry (should we archive them?), and this manifest then points to the sha256 of the tar containing the filesystem (I'm possibly missing an indirection here).
Hopefully one of the many SBOM formats can help with this. :P
I know the container image is built from these two repositories but I don't have any in-depth knowledge:
- https://github.com/docker-library/official-images/blob/master/library/archli... - https://gitlab.archlinux.org/archlinux/archlinux-docker
We do not control the full build pipeline for the Docker Hub official Arch Linux image. The workflow is basically (links for the 20240101.0.204074 tag provided): 1. Every week[1] new tarballs[2] and Dockerfiles[3] are built 2. A PR[4] is opened for the official-images repo on GitHub 3. The PR is merged 4. The Dockerfiles are built by official-images's build infrastructure[5] and pushed to Docker Hub I'm not sure to what extent the official-images's build infrastructure supports "reproducible builds", but it was discussed[6] and implemented in some capacity for the golang image[7] in January. There is also a existing issue for reproducibility in archlinux-docker[8]. For our images published at docker.io/archlinux/archlinux, Quay.io and ghcr.io, we do control the whole build pipeline and we can more easily tweak the pipeline to support "reproducible builds" if desired. They are also already signed with cosign FWIW. [1] https://gitlab.archlinux.org/archlinux/archlinux-docker/-/pipelines/87673 [2] https://gitlab.archlinux.org/archlinux/archlinux-docker/-/packages/1277 [3] https://gitlab.archlinux.org/archlinux/archlinux-docker/-/tree/v20240101.0.2... [4] https://github.com/docker-library/official-images/pull/15984 [5] https://doi-janky.infosiftr.net/ [6] https://github.com/docker-library/official-images/issues/16044 [7] https://github.com/docker-library/golang/pull/505 [8] https://gitlab.archlinux.org/archlinux/archlinux-docker/-/issues/44 Cheers, Kristian
participants (3)
-
Jelle van der Waa
-
kpcyrd
-
Kristian Klausen