On 2022-01-31 23:55:07 (+1000), Allan McRae via arch-dev-public wrote:
Any chance this can be recorded? It will be at 4am in my timezone?
I think that can certainly be arranged!
I am interested in mainly what problem this is solving. From what I can tell, our current workflow is package->db, and this goes package->json->db. What is the advantage of the extra step? Will this be covered by your talk?
Without going into too much detail: It allows us to import current package repository databases and retain their entire state in a decomposed directory structure (e.g. in a git repository) and reproduce the package repository databases from this state as well. This is somewhat similar to our current "package sources and binary package location" state approach in svn, with the difference, that in the case of arch-repo-management we would allow for the *entire state* of a binary package repository (default database and files database) to be described in a unified decomposed directory structure and provide transparent, validated builds or rebuilds of binary package databases from that state. When looking at svn vs. git approaches the fundamental difference is, that with svn we track both the package sources *and* their "location" state in the repositories while repo-add/repo-remove is used to add/remove things on the fly to the package repository databases. While with a future git based setup we would have a package source repository per pkgbase and a management repository for arch-repo-management which tracks the state of the repositories transparently and should allow for atomic operations towards the package repository databases (e.g. dbscripts may fail halfway through and leave repositories in a bit of an undefined state when e.g. "moving" package files from a to b).
Also a couple of quick comments:
1) might as well drop putting the signature into the package database - pacman will not add these be default from next release as the signatures are downloaded alongside the package. This reduced db size substantially.
Yes, that is an open topic in the implementation (this was decided after I implemented it/ I only got to know of that change after I implemented this attribute). For me this removal raises the following question which has been bothering me a bit and maybe you have an idea how to solve it: How would you allow for filtering packages in a repository for a particular PGP key? We have had quite a few rebuilds due to invalid packager keys or resigning packager keys. It would be great to have this in mind, as I believe that e.g. querying all PGP signature files of a repository to do so is not very feasible, but maybe this can still live on in the proposed management repository as unused "metadata" (e.g. PGP ID) of a given pkgbase which is populated upon import of a given package/ set of packages.
2) I see databases hard coded as gz. I think we should investigate switching to zstd - we did not switch to xz due to performance compared to gz, but I think zstd does not have that issue.
That is an implementation detail and can be changed/extended (it is just not exposed to the outside currently). At the time of writing we are using .gz which is why I used it that way to be able to test against live databases. Best, David -- https://sleepmap.de