[arch-projects] [dbscripts] [PATCH] Don't parse .db files ourselves; use pyalpm instead

Luke Shumaker lukeshu at lukeshu.com
Mon Jul 9 17:32:21 UTC 2018

On Sun, 08 Jul 2018 22:38:06 -0400,
Eli Schwartz wrote:
> On 07/08/2018 09:14 PM, Luke Shumaker wrote:
> > From: Luke Shumaker <lukeshu at parabola.nu>
> > 
> > In a patchset that I recently submitted, Eli was concerned that I was
> > parsing .db files with bsdtar+awk, when the format of .db files isn't
> > "public"; the only guarantees made about it are that libalpm can parse it.
> > 
> > https://lists.archlinux.org/pipermail/arch-projects/2018-June/004932.html
> > 
> > I wasn't too concerned, because `ftpdir-cleanup` and `sourceballs` already
> > parse the .db files in the same way.  Nonetheless, I think Eli is right: we
> > shouldn't be parsing these files ourselves.
> > 
> > So, add a `dbquery` function that uses pyalpm to parse the .db files:
> What's wrong with expac?
> expac --config ${dbscripts_root}/pacman-community.conf -S '%f'
> expac is not only super elegant, there's pending patches to provide it
> in pacman 6 as part of the core project. This is what I'm waiting for,
> actually.
> I see no reason to add an external dependency on both python and pyalpm,
> in order to run a small python program which evals its arguments in
> order to inject database queries, when a tool with a simple API can do
> the same and will eventually be guaranteed to be everywhere pacman
> itself is.

With the "True" filter that ftpdir-cleanup and sourceballs both use,
you're right; this could be done with expac.  But, with the context
that this patch exists to enable me to address the concern you had
with the other patchset:

AFAICT, with expac there's no way to do a query like:

    dbquery core x86_64 \
    	    "(pkg.base or pkg.name) == '$pkgbase'" \

Which is what most (all?) of the queries in the other patchset would

(Drat, it seems that discussing this separately from the other
patchset won't work after all.)

> (Let's ignore for a moment, the defunct integrity checks service which
> is written in python, but not pyalpm. pyalpm is not currently installed
> on the dbscripts server ATM.)

Good call; check_packages.py is python2, the pyalpm dep does add a new
dependency on python3.

> > As a final note, when re-writing the bit of sourceballs to use dbquery
> > instead of AWK, I realized that it does not correctly handle licenses that
> > have a space in them (as of 2018-07-07 there are 67 packages in the Arch
> > repos that have license containing a space).  I did not fix this bug; I
> > merely translated it from AWK to Python, as the program would also need to
> > be adjusted elsewhere.
> Keeping in mind the ones we're looking for are a whitelist of
> strictly-defined license types... I think those are all ad-hoc custom
> licenses, none of which we're interested in in the primary sourceballs
> deployment.

Indeed; if I thought it were a serious problem, I would have written a
patch for it :)

Happy hacking,
~ Luke Shumaker

