On 07/08/2018 09:14 PM, Luke Shumaker wrote:
From: Luke Shumaker <lukeshu@parabola.nu>
In a patchset that I recently submitted, Eli was concerned that I was parsing .db files with bsdtar+awk, when the format of .db files isn't "public"; the only guarantees made about it are that libalpm can parse it.
https://lists.archlinux.org/pipermail/arch-projects/2018-June/004932.html
I wasn't too concerned, because `ftpdir-cleanup` and `sourceballs` already parse the .db files in the same way. Nonetheless, I think Eli is right: we shouldn't be parsing these files ourselves.
So, add a `dbquery` function that uses pyalpm to parse the .db files:
- It takes as arguments Python 3 expressions; 1. one that that returns a bool deciding whether we want to print information on a package, and 2. another that returns the string to print for a package.
Currently, all callers use "True" for the decider expression, as ftpdir-cleanup and sourceballs operate on *every* package. However, I'm including a way to filter packages because, I'm coming at this from the context that I want to parse .db files in other places too.
- libalpm doesn't offer an easy way to say "parse this DB file for me"; instead, we must construct a configuration that has a syncdb pointing to that file, which we then have it sync in to a temporary directory.
As a final note, when re-writing the bit of sourceballs to use dbquery instead of AWK, I realized that it does not correctly handle licenses that have a space in them (as of 2018-07-07 there are 67 packages in the Arch repos that have license containing a space). I did not fix this bug; I merely translated it from AWK to Python, as the program would also need to be adjusted elsewhere. Keeping in mind the ones we're looking for are a whitelist of strictly-defined license types... I think those are all ad-hoc custom
What's wrong with expac? expac --config ${dbscripts_root}/pacman-community.conf -S '%f' expac is not only super elegant, there's pending patches to provide it in pacman 6 as part of the core project. This is what I'm waiting for, actually. I see no reason to add an external dependency on both python and pyalpm, in order to run a small python program which evals its arguments in order to inject database queries, when a tool with a simple API can do the same and will eventually be guaranteed to be everywhere pacman itself is. (Let's ignore for a moment, the defunct integrity checks service which is written in python, but not pyalpm. pyalpm is not currently installed on the dbscripts server ATM.) licenses, none of which we're interested in in the primary sourceballs deployment. -- Eli Schwartz Bug Wrangler and Trusted User