On 2017-05-09 22:54 +1000 Allan McRae wrote:
I am looking for ideas here. Please brainstorm to your hearts content.
Ok :)
So two points up for discussion:
1) Sync repository layout? I don't see any point in leaving the tar based format, as reading of sync databases is not a bottleneck. (The local db format can be a bottleneck, but that is a separate discussion...)
Do we split the information in .db out of .files and add a .full db with complete information? Then any .src db could follow suit and just have source package information. How do we get around the out of sync issue (e.g., a package is removed from .db, but we have an old .files database with it). Do we add timestamps, and print a warning on -F operations when the two are out of sync?
Add a timestamp inside each database (*.db, *.files, *.src). When pacman downloads a database, instead of saving it as <repo>.<ext> and squashing the previous database, save it as <repo>-<timestamp>.<ext>. Each refresh operation (pacman -Sy, pacman -Fy) is associated with a particular database (*.db and *.files, respectively). Create an untimestamped symlink to that database, e.g. $ pacman -Sy... # retrieve <repo>.db and save as <repo>-<timestamp_1>.db # ln -s <repo>-<timestamp_1>.db <repo>.db $ pacman -Fy... # retrieve <repo>.db and save as <repo>-<timestamp_2>.db # retrieve <repo>.files and save as <repo>-<timestamp_2>.files # ln -s <repo>-<timestamp_2>.files <repo>.files # something similar for *.src files For operations that only involve the current <repo>.db files, no change is needed for loading the database. For loading <repo>.files, you will need to dereference <repo>.files first, grab <timestamp_2> from <repo>-<timestamp_2>.files in the example above, and then use it to load <repo>-<timestamp_2>.db instead of <repo>.db. Same method for *.src files. For cleanup of the timestamped files, collect the valid timestamps from the untimestamped symlinks and then remove anything that doesn't match them. This should probably be done with each database refresh. Maybe you can use the same function that you use to clean up the package cache with -Sc while leaving installed packages. Obviously there will be some redundancy in the up to 3 copies of <repo>-<timestamp>.db but I think that's better than e.g. breaking pkgfile searches after an upgrade. With this approach you could also download the latest version of the sync databases as <repo>-<timestamp>.db without symlinking <repo>.db to it, and then use that to query upgradable packages and other info from the mirror. For propagating the database to the servers, nothing changes. Whenever the database is updated, generate <repo>.db, <repo>.files, <repo>.src and whatever else at the same time with the same internal timestamp and then just push them out as usual.
2) Do we need a better (read "more easily maintainable") tool for handling database generation and updates? libalpm already can read in information package files, so we could add libalpm/db_write.c with the database creation functions. Should we unify our repo format with our local database format which we already write?
Yes for unification, preferably in a standardized format (e.g. yaml). Having the functionality to read and write the files in libalpm would be useful for third-party tool developers. On 2017-05-10 12:54 -0400 Dave Reisner wrote:
WRT replacing repo-add, I'd suggest we come up with a the use cases we want to support, design an interface to meet them, and then come up with the implementation. Might be nice to start with the Arch Linux repository layout as an example that we'd want to support (pooled packages with symlinks into repo dirs).
What about using a relative subpath instead of a filename in the database. That would enable transparent freeform repo layouts (e.g. pooled packages without symlinks, package groups in different subdirs, etc.). You could also avoid the need for subdirectories by adding the architecture to the database filename, e.g. <repo>.<arch>.<ext> To simplify repo-add, you could include .SRCINFO directly to avoid parsing and reformatting/rewriting that metadata. Keep it as a separate file then add a new one (call it PKGINFO?) for information about the *.pkg.* file itself (build date, packager, signature, checksum, size, relative filepath, etc.). Add other files to contain related information (e.g. INSTALLINFO with install time, file list, install origin?). That way, each step copies existing files and adds a new one with the new info (repo-add: collect SRCINFO, add PKGINFO; install a package: copy SRCINFO AND PKGINFO to local db, create INSTALLINFO etc.) A repo metadata file would also be required in the root directory with the repo timestamp for the timestamped databases described above. The file could also collect other metadata such as package providers and maybe replacements to speed up some operations. Regards, Xyne