Re: [arch-devops] [arch-projects] [dbscripts] [PATCH 2/4] Add reproducible archive of packages.
On 12/4/18 1:09 PM, Eli Schwartz wrote:
The question becomes, where can I store these? As-is, this will burden the mirror network as well. Unsure how to handle this. Could this be configurable by the mirror, as ISOs are now? Should we exclusively self-host this, and if so, where? archive.archlinux.org is managed by another service with its own exclusively writable location.
-- Eli Schwartz Bug Wrangler and Trusted User
On Tue, Dec 04, 2018 at 01:15:20PM -0500, Eli Schwartz via arch-devops <arch-devops@lists.archlinux.org> wrote:
I'm not a fan of adding this pool to the mirror root for multiple reasons: - Most mirrors would likely want to avoid mirroring it because it can become quite large and we told them that we only need around 100GB. If everyone wants to exclude it that requires action by every admin. Not ideal. - I'm not sure if all of our mirrors have hardlink support. We don't currently ask for it even though we suggest the -H rsync option. Also the current repos use symlinks for the packages instead of hardlinks. That said, I'm not even sure if rsync can detect hardlinks across directories. It can't even detect renames/moves across directories... - I don't expect that we need to mirror it because we don't even get that many requests to our current archive. If we ever need to mirror it, we can worry about that later I'd say since moving it to the mirror root should be rather simple. I'd suggest to make the base path of the repro pool configurable so that we can keep it out of the mirror root. For now I'd suggest something like this: REPRO_BASE="/srv/reproducible-archive/" pkgname="foo" pkgfile="foo-1.0-1.pkg.tar.xz" dest="$REPRO_BASE/packages/${pkgname:0:1}/$pkgname/$pkgfile" ln .. "$dest" Also note that this does intentionally not include $PKGPOOL any more even though you include it in your patch. The archive doesn't have it and I don't think it really helps anyone. It will just cause confusion if packages are moved between repos and it makes using the archive more difficult because the user would have to check all possible pool names or know which one to check. Ideally I'd like to later extend this to also include the current archive's features and from the looks of it, storing the packages like this is the first step. Then we just need to copy the repos (dbs and pkg symlinks) once a day and archive the ISOs. Also thinking about this, it would be great if we could skip the pkg symlinks for each day's archive and only copy the db itself. All we'd need is to have a dedicated PackageServer= setting (like Server=, but only for packages, not for the database) for pacman to find the packages, but I'm not sure if Allan would like that. That setting would also have to support the pkgname substring and the pkgname obviously. Comments/thoughts/patches/... welcome. Florian
On Wed, Dec 05, 2018 at 10:49:44AM +0100, Florian Pritz via arch-devops <arch-devops@lists.archlinux.org> wrote:
As discussed on IRC, we don't actually need support in pacman here. We can just set up a rewrite in nginx so that when pacman tries to download a package file, nginx maps the path correctly. So nginx would rewrite /$repo/os/$arch/package-....tar.xz to /packages/p/package/pacakge-....tar.xz. Pacman would still work then and the only difference would be that we don't have directory listings with the packages of each day, but who needs those anyways. You can get all that info from the dbs themselves. Florian
On 12/04/18 at 01:15pm, Eli Schwartz via arch-devops wrote:
Does this also clean up the archive? As in remove packages which are not required for reproducible builds? Since now our archive server is almost running out of space again.
-- Jelle van der Waa
On 12/12/18 3:55 AM, Jelle van der Waa wrote:
Patch 4/4 will do so, but I only cc'ed patch 2/4 to the devops list. -- Eli Schwartz Bug Wrangler and Trusted User
participants (3)
-
Eli Schwartz
-
Florian Pritz
-
Jelle van der Waa