[arch-dev-public] arch-repo-management walkthrough 2022-02-02 19:00 CET (UTC+01:00)
Hi all, given recent topics for build automation and work on internal projects I would like to announce a code walkthrough for arch-repo-management [1]. I would like to give an overview of the scope of the project, its current features and which features I would like to see implemented (some of which are still in a discussion phase). I will try to add a few more tickets beforehand to track open features and to outline them better. This way I hope that it becomes easier for interested people to follow up on them. The meeting will take place on Jitsi https://meet.jit.si/20220202-arch-repo-management on 2022-02-02 starting around 19:00 CET (UTC+01:00). Any changes to the date and/or location will be announced in this mail thread. Best, David [1] https://gitlab.archlinux.org/archlinux/arch-repo-management/ -- https://sleepmap.de
On 31/1/22 23:23, David Runge via arch-dev-public wrote:
Hi all,
given recent topics for build automation and work on internal projects I would like to announce a code walkthrough for arch-repo-management [1].
I would like to give an overview of the scope of the project, its current features and which features I would like to see implemented (some of which are still in a discussion phase). I will try to add a few more tickets beforehand to track open features and to outline them better. This way I hope that it becomes easier for interested people to follow up on them.
The meeting will take place on Jitsi https://meet.jit.si/20220202-arch-repo-management
on 2022-02-02 starting around 19:00 CET (UTC+01:00).
Any changes to the date and/or location will be announced in this mail thread.
Best, David
[1] https://gitlab.archlinux.org/archlinux/arch-repo-management/
Any chance this can be recorded? It will be at 4am in my timezone? I am interested in mainly what problem this is solving. From what I can tell, our current workflow is package->db, and this goes package->json->db. What is the advantage of the extra step? Will this be covered by your talk? Also a couple of quick comments: 1) might as well drop putting the signature into the package database - pacman will not add these be default from next release as the signatures are downloaded alongside the package. This reduced db size substantially. 2) I see databases hard coded as gz. I think we should investigate switching to zstd - we did not switch to xz due to performance compared to gz, but I think zstd does not have that issue. Allan
On Mon, Jan 31, 2022 at 11:55:07PM +1000, Allan McRae via arch-dev-public wrote:
On 31/1/22 23:23, David Runge via arch-dev-public wrote:
Hi all,
given recent topics for build automation and work on internal projects I would like to announce a code walkthrough for arch-repo-management [1].
I would like to give an overview of the scope of the project, its current features and which features I would like to see implemented (some of which are still in a discussion phase). I will try to add a few more tickets beforehand to track open features and to outline them better. This way I hope that it becomes easier for interested people to follow up on them.
The meeting will take place on Jitsi https://meet.jit.si/20220202-arch-repo-management
on 2022-02-02 starting around 19:00 CET (UTC+01:00).
Any changes to the date and/or location will be announced in this mail thread.
Best, David
[1] https://gitlab.archlinux.org/archlinux/arch-repo-management/
Any chance this can be recorded? It will be at 4am in my timezone?
I am interested in mainly what problem this is solving. From what I can tell, our current workflow is package->db, and this goes package->json->db. What is the advantage of the extra step? Will this be covered by your talk?
It would be in python which is more maintainable then the juggle of bash we curently have. JSON would also provide us with a machine readable format which is usefull for us in extending tooling in the future. The JSON document would also replace the repository state that is kept in our svn mono-repository, so it solves a few more issues with the git migration. https://wiki.archlinux.org/title/User:Foxboron/GitMigration You can see a plaintext space seperated version of this in this mock up which we decided on a couple of years ago when we discussed git. -- Morten Linderud PGP: 9C02FF419FECBE16
On 2022-01-31 23:55:07 (+1000), Allan McRae via arch-dev-public wrote:
Any chance this can be recorded? It will be at 4am in my timezone?
I think that can certainly be arranged!
I am interested in mainly what problem this is solving. From what I can tell, our current workflow is package->db, and this goes package->json->db. What is the advantage of the extra step? Will this be covered by your talk?
Without going into too much detail: It allows us to import current package repository databases and retain their entire state in a decomposed directory structure (e.g. in a git repository) and reproduce the package repository databases from this state as well. This is somewhat similar to our current "package sources and binary package location" state approach in svn, with the difference, that in the case of arch-repo-management we would allow for the *entire state* of a binary package repository (default database and files database) to be described in a unified decomposed directory structure and provide transparent, validated builds or rebuilds of binary package databases from that state. When looking at svn vs. git approaches the fundamental difference is, that with svn we track both the package sources *and* their "location" state in the repositories while repo-add/repo-remove is used to add/remove things on the fly to the package repository databases. While with a future git based setup we would have a package source repository per pkgbase and a management repository for arch-repo-management which tracks the state of the repositories transparently and should allow for atomic operations towards the package repository databases (e.g. dbscripts may fail halfway through and leave repositories in a bit of an undefined state when e.g. "moving" package files from a to b).
Also a couple of quick comments:
1) might as well drop putting the signature into the package database - pacman will not add these be default from next release as the signatures are downloaded alongside the package. This reduced db size substantially.
Yes, that is an open topic in the implementation (this was decided after I implemented it/ I only got to know of that change after I implemented this attribute). For me this removal raises the following question which has been bothering me a bit and maybe you have an idea how to solve it: How would you allow for filtering packages in a repository for a particular PGP key? We have had quite a few rebuilds due to invalid packager keys or resigning packager keys. It would be great to have this in mind, as I believe that e.g. querying all PGP signature files of a repository to do so is not very feasible, but maybe this can still live on in the proposed management repository as unused "metadata" (e.g. PGP ID) of a given pkgbase which is populated upon import of a given package/ set of packages.
2) I see databases hard coded as gz. I think we should investigate switching to zstd - we did not switch to xz due to performance compared to gz, but I think zstd does not have that issue.
That is an implementation detail and can be changed/extended (it is just not exposed to the outside currently). At the time of writing we are using .gz which is why I used it that way to be able to test against live databases. Best, David -- https://sleepmap.de
On 1/2/22 00:36, David Runge wrote: <snip>
When looking at svn vs. git approaches the fundamental difference is, that with svn we track both the package sources *and* their "location" state in the repositories while repo-add/repo-remove is used to add/remove things on the fly to the package repository databases. While with a future git based setup we would have a package source repository per pkgbase and a management repository for arch-repo-management which tracks the state of the repositories transparently and should allow for atomic operations towards the package repository databases (e.g. dbscripts may fail halfway through and leave repositories in a bit of an undefined state when e.g. "moving" package files from a to b).
Thanks - I finally understand the point of this!
Also a couple of quick comments:
1) might as well drop putting the signature into the package database - pacman will not add these be default from next release as the signatures are downloaded alongside the package. This reduced db size substantially.
Yes, that is an open topic in the implementation (this was decided after I implemented it/ I only got to know of that change after I implemented this attribute).
For me this removal raises the following question which has been bothering me a bit and maybe you have an idea how to solve it: How would you allow for filtering packages in a repository for a particular PGP key? We have had quite a few rebuilds due to invalid packager keys or resigning packager keys. It would be great to have this in mind, as I believe that e.g. querying all PGP signature files of a repository to do so is not very feasible, but maybe this can still live on in the proposed management repository as unused "metadata" (e.g. PGP ID) of a given pkgbase which is populated upon import of a given package/ set of packages.
I assumed we were just grepping packager, because I forgot pacman can output the signing keyid from a package signature! I guess you can store the signature in the json files that are stored in VCS. Maybe you want to do the keyid extraction from the signature when adding it to the json file to facilitate easy querying? There is proto code in RFC 4880 for doing this (this is what I used for pacman). This also fits with the package state repository being the source of truth and not the pacman database. Allan
On 2022-01-31 23:55, Allan McRae via arch-dev-public wrote:
On 31/1/22 23:23, David Runge via arch-dev-public wrote:
Hi all,
given recent topics for build automation and work on internal projects I would like to announce a code walkthrough for arch-repo-management [1].
I would like to give an overview of the scope of the project, its current features and which features I would like to see implemented (some of which are still in a discussion phase). I will try to add a few more tickets beforehand to track open features and to outline them better. This way I hope that it becomes easier for interested people to follow up on them.
The meeting will take place on Jitsi https://meet.jit.si/20220202-arch-repo-management
on 2022-02-02 starting around 19:00 CET (UTC+01:00).
Any changes to the date and/or location will be announced in this mail thread.
Best, David
[1] https://gitlab.archlinux.org/archlinux/arch-repo-management/
Any chance this can be recorded? It will be at 4am in my timezone?
Also happy to take minutes - they could be converted into documentation afterwards.
On 2022-01-31 14:23:10 (+0100), David Runge via arch-dev-public wrote:
The meeting will take place on Jitsi https://meet.jit.si/20220202-arch-repo-management
on 2022-02-02 starting around 19:00 CET (UTC+01:00).
Any changes to the date and/or location will be announced in this mail thread.
For simplicity I will attach an .ics file which can be consumed by calendar software. Best, David -- https://sleepmap.de
On 2022-01-31 14:23:10 (+0100), David Runge via arch-dev-public wrote:
given recent topics for build automation and work on internal projects I would like to announce a code walkthrough for arch-repo-management [1].
I would like to give an overview of the scope of the project, its current features and which features I would like to see implemented (some of which are still in a discussion phase). I will try to add a few more tickets beforehand to track open features and to outline them better. This way I hope that it becomes easier for interested people to follow up on them.
Thanks to everyone who attended and participated! We do have a set of meeting notes for anyone who has not been able to attend and wants to read up on things (video link included): https://md.archlinux.org/NIPYO4ZNSkaTneEajF9G1g?view I will send a follow-up mail for the next meeting to the arch-projects mailing list [1] as it is less limited who can interact there. We will try to do bi-weekly meetings now! :) Best, David [1] https://lists.archlinux.org/listinfo/arch-projects -- https://sleepmap.de
participants (4)
-
Allan McRae
-
Brett Cornwall
-
David Runge
-
Morten Linderud