[arch-projects] [dbscripts] [PATCH] Don't parse .db files ourselves; use pyalpm instead

Eli Schwartz eschwartz at archlinux.org
Mon Jul 9 02:38:06 UTC 2018


On 07/08/2018 09:14 PM, Luke Shumaker wrote:
> From: Luke Shumaker <lukeshu at parabola.nu>
> 
> In a patchset that I recently submitted, Eli was concerned that I was
> parsing .db files with bsdtar+awk, when the format of .db files isn't
> "public"; the only guarantees made about it are that libalpm can parse it.
> 
> https://lists.archlinux.org/pipermail/arch-projects/2018-June/004932.html
> 
> I wasn't too concerned, because `ftpdir-cleanup` and `sourceballs` already
> parse the .db files in the same way.  Nonetheless, I think Eli is right: we
> shouldn't be parsing these files ourselves.
> 
> So, add a `dbquery` function that uses pyalpm to parse the .db files:

What's wrong with expac?

expac --config ${dbscripts_root}/pacman-community.conf -S '%f'

expac is not only super elegant, there's pending patches to provide it
in pacman 6 as part of the core project. This is what I'm waiting for,
actually.

I see no reason to add an external dependency on both python and pyalpm,
in order to run a small python program which evals its arguments in
order to inject database queries, when a tool with a simple API can do
the same and will eventually be guaranteed to be everywhere pacman
itself is.

(Let's ignore for a moment, the defunct integrity checks service which
is written in python, but not pyalpm. pyalpm is not currently installed
on the dbscripts server ATM.)

>  - It takes as arguments Python 3 expressions;
>    1. one that that returns a bool deciding whether we want to print
>       information on a package, and
>    2. another that returns the string to print for a package.
> 
>    Currently, all callers use "True" for the decider expression, as
>    ftpdir-cleanup and sourceballs operate on *every* package.  However, I'm
>    including a way to filter packages because, I'm coming at this from the
>    context that I want to parse .db files in other places too.
> 
>  - libalpm doesn't offer an easy way to say "parse this DB file for me";
>    instead, we must construct a configuration that has a syncdb pointing to
>    that file, which we then have it sync in to a temporary directory.
> 
> As a final note, when re-writing the bit of sourceballs to use dbquery
> instead of AWK, I realized that it does not correctly handle licenses that
> have a space in them (as of 2018-07-07 there are 67 packages in the Arch
> repos that have license containing a space).  I did not fix this bug; I
> merely translated it from AWK to Python, as the program would also need to
> be adjusted elsewhere.
Keeping in mind the ones we're looking for are a whitelist of
strictly-defined license types... I think those are all ad-hoc custom
licenses, none of which we're interested in in the primary sourceballs
deployment.

-- 
Eli Schwartz
Bug Wrangler and Trusted User

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.archlinux.org/pipermail/arch-projects/attachments/20180708/863bac2d/attachment.asc>


More information about the arch-projects mailing list