I like the idea of having a meta data based system in json/yaml, but our
CI/CD is not asynchronous action based and probably won't be. Our
pipeline does add/remove/move repo ops as well calculated from the
commit sha status.
However, I think that repod should not do any data validation, but
makepkg, namcap could do it by a yaml schema validation for example,
assuming a yaml PKGBUILD and yaml meta files inside the pkg.tar.
The ideal scenario would be that the repo tool could pass the PKGINFO
meta directly to the meta db, if necessary append it with eg sig or
checksum, but you would only deal with more or less on file format that
is schema validated.
The consuming repo tool would not have and should not have to bother
with such task at all in my opinion, but could do another schema
validation if necessary.
I am saying, the repo tool should rely on the reference meta yaml
implementation of pacman/makepkg in my view, rather than adding another
layer of complexity and thus eventual error in the transformation
between these formats, including the archiving or decompression step. If
pacman changes the meta files, your tool has to implement any changes,
instead of a possible updated schema validation only.
The last thing I'd want on a server is a mismatch between db.tar.gz and
meta db for whatever reason. That would be my great concern with the
current implementation. It should move in pacman I think, and have the
repod tool just handle the meta files directly in whatever fashion suitable.
In short, I would decouple the nice idea of meta data files in json(I'd
prefer yaml) from the repod, and move that in pacman upstream possibly
tied to schema validations so the yaml is guaranteed valid.
As an idea, perhaps consider different pacman meta file backends, the
classic and maybe a new json/yaml based that could be tested and
developed this way?
I would imagine pyalpm would also profit greatly from a yaml based
approach, as well as namcap and maybe even archweb.
Am 24.06.22 um 23:42 schrieb David Runge:
> Hi artoo,
>
> thanks for your input!
>
> There seem to be a few misconceptions about repod and I'll try to
> untangle them below.
>
> On 2022-06-24 20:33:15 (+0200), artoo(a)artixlinux.org wrote:
>> Hi arch team,
>>
>> after receiving your email on repod, here is an idea.
>>
>> What I have been asking myself since the python db scripts arrived at
>> the arch gitlab instance.
>>
>> "Why doesn't arch consider writing yaml files with makepkg instead of
>> the various formats in pkg.tar, db.tar, files.tar and links.tar?"
> The scope of repod is to eventually create an alternative to dbscripts,
> which currently handles the binary package repository state, while being
> tied to our svn mono repos that contain our package build sources.
>
> Rewriting pacman or makepkg is out-of-scope for the repod project, as it
> is meant to consume package files (which have sort of
> well-established/defined metadata) and their potential signatures while
> outputting machine readable state files which allow to reproducibly
> create repository sync databases from them.
> This type of setup allows us to recreate the entire set of repository
> sync databases from existing packages in their package pools and the
> state data (in this scenario e.g. the repository sync databases had been
> damaged or needed to be reset), or even completely rebuilding all
> packages from their respective package build sources and recreating
> everything from scratch (in this scenario we lost all or some package
> files and their potential signatures and/ or the repository sync
> databases).
> All actions in the management stateg are meant to be tracked in a git
> repository (plus additional caching), to allow for maximum transparency.
>
> As such repod does replace/ supersede parts of the functionality shipped
> with pacman (e.g. repo-add/ repo-remove) and implements a very basic -
> yet powerful - approach to managing binary package repositories, which
> can replace our use of dbscripts (if we move to package build sources in
> git in the future).
>
> You can find a few thoughts on this in this article [1] and in the
> current repod documentation [2].
>
> This all being said: The project's usefulness is currently still quite
> limited and in the beginning it will only expose a few CLI tools for
> conversion and validation of packages and repository sync databases.
>
> Going forward, the idea is to expose functionality via an API, that can
> be integrated into different authentication schemes, so that we can move
> away from a scenario in which we "call a script as some user on some
> host" to one where we make an authenticated call to an endpoint to
> trigger an asynchronous action.
>
> As you can imagine, data validation alone takes quite some time to
> figure out (this part is reaching a first milestone with 0.1.0 though),
> as many things in pacman/ makepkg serve as reference implementation and
> might not offer a versioned approach to adding or deprecating data
> fields.
>
>> In theory, build jobs could use PKGBUILD dependencies and do queue
>> checks for example before building a given package.
> The automation of simple version bumps to mass rebuilds in CI is
> something that we are looking at as well and there are many different
> approaches by now.
>
> In the future we can imagine a workflow in which packages can built on
> guarded build machines and be signed by a signing enclave, after which
> the build process e.g. hands the files to repod for consumption.
>
>> This is the PKGBUILD side of things, thus better implemented upstream
>> pacman.
> Implementing data handling in a completely new format while being able
> to maintain compatibility or a migration path from old formats is a
> huge undertaking. There is no "clean split" scenario for these things.
> Doing something like that in the context of a set of thousands of
> existing packages and an already existing code base, that uses an
> established structured data format is complicated.
>
> Starting to define versioned approaches, as we do with repod is the
> first step in the direction of being able to more easily introduce
> change and react to change (e.g. new fields in .PKGINFO) while allowing
> to search through the use of certain keywords (e.g. "how many/ which
> packages still use that deprecated keyword?").
>
>> In my humble opinion, addressing the repod first is the wrong end to
>> start,
>> if you want a CI based build system.
>> If everything was in yaml, repod could simply directly handle these yaml
>> files and tar them as db, very streamlined approach.
> Unfortunately, everything is rather complicated/ entangled and
> implementing this in the various pieces of software at the same time as
> suggested by you would require a lot of time.
>
> Hence starting with the binary package repository management system is a
> reasonable approach, as it allows us to decouple our package build
> sources from the existing system and manage a secure standalone
> solution separate from the package build and signing process.
>
> I hope I could answer some of your questions and dissolve some of the
> misconceptions.
>
> Best,
> David
>
>
> [1] https://sleepmap.de/2022/packaging-for-arch-linux/
> [2] https://repod.readthedocs.io
>
Hi arch team,
after receiving your email on repod, here is an idea.
What I have been asking myself since the python db scripts arrived at
the arch gitlab instance.
"Why doesn't arch consider writing yaml files with makepkg instead of
the various formats in pkg.tar, db.tar, files.tar and links.tar?"
For the sake of argument of the idea, I assume yaml to be the file
format, but it could be json too.
So here is the idea, that would certainly not materialize over night if
it was implemented. Maybe for pacman 7+ ?
It is not be about implementation details atm, just about a general idea
to cure some shortcomings in the build process.
What if:
1. PKGBUILD was in yaml
2. makepkg/pacman handled, read/write yaml files, ie standardize the
file formats to yaml
* pkg.tar: PKGINFO yaml(would contain the MTREE as a node), BUILDINFO
yaml, maybe refactor these if there is redundant data
* it might even be possible to use parts of the PKGBUILD yaml in
code for the above, once read, it can be queried and/or reused
* db.tar: a unified db consisting of one yaml for each package
3. *repod directly handled* these yaml files and tar'ed them in a new db
instead of writing the json from the various formats and files as it is
the case currently
* there is a lot of redundancy in the current build process up to
some repo operation in terms of writing, reading files in different formats
The idea would be to *base everything on yaml*, which eventually raises
the question on the viability of makepkg/repo-add in bash. Maybe python?
However, yaml writing could even be done in bash, replacing the current
formats.
Afaik, arch plans have been to have one git repo per package and these
connected to a CI?
It is much more easy to feed some structured data format to a CI in my
view, so that's the main thought behind the idea.
We have been having a CI/CD running for some years now, and the
shortcomings are, that its not easy to hook into makepkg process to have
say a dedicated CI test stage, the check() function in PKGBUILD.
A yaml based PKGBUILD would allow to have these makepkg stages easily
accessible by the CI and run them each in their own stage with proper
makepkg flags.
To have a little glimpse what a PKGBUILD in yaml might look like, here
it is:
$ pkg2yaml artixlinux/main/udev/trunk
---
pkgbase:
name: udev
pkgver: 251.2
pkgrel: 2
url: https://www.github.com/systemd/systemd
arch:
- x86_64
license:
- GPL2
- LGPL2.1
makedepends:
- acl
- libacl.so
- kmod
- libkmod.so
- util-linux
- libblkid.so
- hwdata
- libcap
- libcap.so
- kbd
- gperf
- intltool
- git
- meson
- docbook-xsl
- rsync
- python-jinja
packages:
- pkgname: udev
depends:
- acl
- libacl.so
- kmod
- libkmod.so
- util-linux
- libblkid.so
- libudev
- hwdata
- kbd
provides:
- udev=251.2
- pkgname: libudev
depends:
- gcc-libs
provides:
- libudev.so
- pkgname: esysusers
groups:
- base-devel
depends:
- gcc-libs
- libxcrypt
- pkgname: etmpfiles
groups:
- base-devel
depends:
- acl
- libacl.so
- libcap
- libcap.so
version: 251.2-2
files:
- udev-251.2-2-x86_64.pkg.tar.zst
- libudev-251.2-2-x86_64.pkg.tar.zst
- esysusers-251.2-2-x86_64.pkg.tar.zst
- etmpfiles-251.2-2-x86_64.pkg.tar.zst
debug:
```
The yaml representation doesn't contain any functions so far, and thus
there is no separate makepkg implementation, but I have been considering
it for our CI for quite a while. Our CI gets all necessary information
from the yaml.
In theory, build jobs could use PKGBUILD dependencies and do queue
checks for example before building a given package.
This is the PKGBUILD side of things, thus better implemented upstream
pacman.
In my humble opinion, addressing the repod first is the wrong end to
start, if you want a CI based build system.
If everything was in yaml, repod could simply directly handle these yaml
files and tar them as db, very streamlined approach.
In its current state, repod would only be of limited use for us, since
we don't add packages asynchronously to the repo.
They are added on build success by our CI via a repo-add/links-add
wrapper called in a stage.
I hope it doesn't sound all too crazy, its a pretty radical and work
intensive idea, but that would be my vision for a complete set of build
and repo management tools easy to connect to some CI.
An async repo management tool could be built on top of such a structure.
Kind regards
Artoo
Highlights:
* Parabola GNU/Linux-libre member comes to see about collaboration
* List of relevant Arch derivatives is made; Will soon be contacting
* Reaching a 0.1.0 milestone is underway
Rendered minutes:
https://md.archlinux.org/s/T8plrhNkI
Minutes in raw markdown:
# 2022-06-22 repod meeting
Date: 2022-06-22T17:00:00Z - 18:21:00Z
Location: Jitsi
Scribe: Brett (ainola/brett)
## Attendees
* artafinde
* brett (ainola)
* dvzrv (David)
* heftig (Jan)
* oaken-source (Andreas, from Parabola GNU/Linux-libre)
## Agenda
### Contacting Arch derivatives for collaboration on repod
Derivatives that host their own package repository:
- [ ] ArchBang
- [X] [Artix Linux](https://wiki.artixlinux.org/Main/Repositories#Stable)
- [X] [Arch Linux 32](https://archlinux32.org/packages/)
- [X] [Arch Strike](https://archstrike.org/wiki/repositories)
- [X] [ArchLabs](https://bitbucket.org/archlabslinux/archlabs_repo/src/master/x86_…
- [X] [BlackArch](https://www.blackarch.org/downloads.html#install-repo)
- [ ] [ChimeraOS](https://chimeraos.org/faq#is-chimeraos-arch)
- [ ] EndeavourOS ([They just use Arch packages](https://endeavouros-team.github.io/EndeavourOS-Development/))
- [X] Frugalware Linux ([example package](https://www.frugalware.org/packages/3269))
- [X] [Garuda Linux](https://gitlab.com/garuda-linux/packages)
- [X] [Hyperbola GNU/Linux-libre](https://www.hyperbola.info/packages/)
- [X] [InstantOS](https://instantos.io/faq#does-it-use-its-own-repos)
- [X] [KaOS](https://kaosx.us/about/based/)
- [X] [LinHES](http://linhes.org/projects/linhes/wiki/Build_a_LinHES_Package)
- [X] Manjaro
- [X] [Parabola GNU/Linux-libre](https://www.parabola.nu/packages/?sort=&repo=Libre)
- oaken-source: Currently building all packages by hand, releasing
is all manual. For a small team, it's been usable. Currently in
need of more architectures and is making it more painful.
Hopefully repod can make this work better. Keeping up-to-date is
not something they can do feasibly at the moment.
- Most packages come from Arch but they do build a good number of
packages themselves.
- [X] [SteamOS](https://steamdeck-packages.steamos.cloud/archlinux-mirror/)
Distributions that host their own packages tend to have a small
repository that they host alongside the main Arch repositories (Arch
Linux ARM/Arch Linux 32 seems to be the only exception since they build
the entire repo for specific architecture).
Brett will reach out to each of the marked distributions.
### Milestones
[0.1.0 milestone in progress](https://gitlab.archlinux.org/archlinux/repod/-/milestones/6#tab-i…,
which includes the work thus far, such as:
* Package file parsing
* JSON-based management/parsing tools
* [Web-based documentation](https://repod.readthedocs.io/)
In the near-term, we need to solve [#67](https://gitlab.archlinux.org/archlinux/repod/-/issues/67)
and get routine tests running against new versions of all Poetry
dependencies.
### Parabola GNU/Linux-libre representation
oaken-source saw the previous meeting minutes and thought to join.
Parabola is in need of fixing up their build pipelines but that's not in
the scope of repod currently (Arch's priorities are repod first, build
pipelines after)
Parabola uses dbscripts to manage their repositories and have even
diverged a little from upstream.
Parabola and Arch have potential duplicated efforts:
[parabola-repolint](https://github.com/oaken-source/parabola-repolint)
(also in Python) may have some overlap with current efforts and may
provide some opportunity for collaboration.
Formatted minutes can be viewed here:
https://md.archlinux.org/s/HDtd8_C0z
Raw markdown (Still waiting for an angel to come down and tell me about
markdown conversion to plain text with inline url markers!):
# 2022-06-08 repod meeting
Date: 2022-06-08T17:00:00Z - 19:20:00Z
Location: Jitsi
Scribe: Brett (ainola/brett)
## Attendees
* artafinde
* brett
* dvzrv
## Agenda
### macOS build environment
* pyalpm needed but cannot install on macOS
* makepkg fails to build in brew due to fakeroot failing to build
### Refactoring models
Huge refactoring for models and some helper functions
([#39](https://gitlab.archlinux.org/archlinux/repod/-/merge_requests/39))
* Lots of regexes added to models for validation. Accompanying tests
added.
* Static analysis isn't necessary because of fixtures
* Documentation of testing in CONTRIBUTION.md
* Defaults are set for models
* PKGINFO schema/parser is the last remaining item
([#53](https://gitlab.archlinux.org/archlinux/repod/-/merge_requests/39))
### Python pinning
* This morning saw master branches fail building without any changes due
to Python 3.10.5 making it to `[core]`.
* Python version pinning not desirable/easy, issues will be handled as
they come.
### python-magic
New dep added for python-magic; the version of python-magic in the Arch
repos does not exist on PyPi, making installation confusing. The Arch
package even points to [file(1)'s homepage](https://darwinsys.com/file/).
### Distro agnosticism
A few members were under the mistaken impression that `repod` was
intending to become a general-purpose project that is distro-agnostic,
i.e. `apt`, `dnf`, `pacman`, et al could be serviced with `repod`. This
is not the case!
* `repod`'s Python-based runtime should be able to run on any
distribution
* `repod` is only designed to service pacman-based packages
* All Arch Linux-based derivatives (i.e. any distro that utilizes
pacman) will be able to use this software.
It's likely that the confusion came from an off-hand comment some weeks
ago regarding distro-agnosticism. The project README is pretty clear:
>This project contains tooling to maintain binary package repositories
>for Linux distributions using the pacman package manager.
It's *possible* for other package managers to add their support here but
it's not a goal of the project.
### Funding
Prototype Fund sadly did not work out.
However, two applications for [NLnet](https://nlnet.nl/) have also been
sent in.
### Announcing repod to forks
Support for multiple architectures is an important future goal for Arch
Linux, and we should strive to e.g. incorporate external ARM efforts
into Arch Linux.
Other [Arch based distributions](https://wiki.archlinux.org/title/Arch-based_distributions)
might also be interested in collaborating with repod since they may end
up deploying their own copy of it.
### Action items
* Figure out how [python-magic](https://pypi.org/project/python-magic/)
is to be packaged so it doesn't conflict with the current package
and/or make repod compatible with file's python-magic and python-magic
* Research all of the notable Arch-based distributions (distrowatch
seems a decent-enough overview of popularity) and contact them about
collaboration with repod:
* Which ones have their own build automation vs how many directly
use Arch's repositories? (i.e. which distributions would be
interested in collaborating with us on repod to make all of our
lives easier?)
* Draft up communications for these distributions: Our goal is to
introduce repod/our packaging vision and to solicit their
feedback/help on the project itself.
Formatted minutes can be viewed here:
https://md.archlinux.org/s/HDtd8_C0z
Raw markdown (Still waiting for an angel to come down and tell me about
markdown conversion to plain text with inline url markers!):
# 2022-06-08 repod meeting
Date: 2022-06-08T17:00:00Z - 19:20:00Z
Location: Jitsi
Scribe: Brett (ainola/brett)
## Attendees
* artafinde
* brett
* dvzrv
## Agenda
### macOS build environment
* pyalpm needed but cannot install on macOS
* makepkg fails to build in brew due to fakeroot failing to build
### Refactoring models
Huge refactoring for models and some helper functions
([#39](https://gitlab.archlinux.org/archlinux/repod/-/merge_requests/39))
* Lots of regexes added to models for validation. Accompanying tests
added.
* Static analysis isn't necessary because of fixtures
* Documentation of testing in CONTRIBUTION.md
* Defaults are set for models
* PKGINFO schema/parser is the last remaining item
([#53](https://gitlab.archlinux.org/archlinux/repod/-/merge_requests/39))
### Python pinning
* This morning saw master branches fail building without any changes due
to Python 3.10.5 making it to `[core]`.
* Python version pinning not desirable/easy, issues will be handled as
they come.
### python-magic
New dep added for python-magic; the version of python-magic in the Arch
repos does not exist on PyPi, making installation confusing. The Arch
package even points to [file(1)'s homepage](https://darwinsys.com/file/).
### Distro agnosticism
A few members were under the mistaken impression that `repod` was
intending to become a general-purpose project that is distro-agnostic,
i.e. `apt`, `dnf`, `pacman`, et al could be serviced with `repod`. This
is not the case!
* `repod`'s Python-based runtime should be able to run on any
distribution
* `repod` is only designed to service pacman-based packages
* All Arch Linux-based derivatives (i.e. any distro that utilizes
pacman) will be able to use this software.
It's likely that the confusion came from an off-hand comment some weeks
ago regarding distro-agnosticism. The project README is pretty clear:
>This project contains tooling to maintain binary package repositories
>for Linux distributions using the pacman package manager.
It's *possible* for other package managers to add their support here but
it's not a goal of the project.
### Funding
Prototype Fund sadly did not work out.
However, two applications for [NLnet](https://nlnet.nl/) have also been
sent in.
### Announcing repod to forks
Support for multiple architectures is an important future goal for Arch
Linux, and we should strive to e.g. incorporate external ARM efforts
into Arch Linux.
Other [Arch based distributions](https://wiki.archlinux.org/title/Arch-based_distributions)
might also be interested in collaborating with repod since they may end
up deploying their own copy of it.
### Action items
* Figure out how [python-magic](https://pypi.org/project/python-magic/)
is to be packaged so it doesn't conflict with the current package
and/or make repod compatible with file's python-magic and python-magic
* Research all of the notable Arch-based distributions (distrowatch
seems a decent-enough overview of popularity) and contact them about
collaboration with repod:
* Which ones have their own build automation vs how many directly
use Arch's repositories? (i.e. which distributions would be
interested in collaborating with us on repod to make all of our
lives easier?)
* Draft up communications for these distributions: Our goal is to
introduce repod/our packaging vision and to solicit their
feedback/help on the project itself.