[arch-projects] [dbscripts] [PATCH] Don't parse .db files ourselves; use pyalpm instead

Luke Shumaker lukeshu at lukeshu.com
Mon Jul 9 01:14:00 UTC 2018


From: Luke Shumaker <lukeshu at parabola.nu>

In a patchset that I recently submitted, Eli was concerned that I was
parsing .db files with bsdtar+awk, when the format of .db files isn't
"public"; the only guarantees made about it are that libalpm can parse it.

https://lists.archlinux.org/pipermail/arch-projects/2018-June/004932.html

I wasn't too concerned, because `ftpdir-cleanup` and `sourceballs` already
parse the .db files in the same way.  Nonetheless, I think Eli is right: we
shouldn't be parsing these files ourselves.

So, add a `dbquery` function that uses pyalpm to parse the .db files:

 - It takes as arguments Python 3 expressions;
   1. one that that returns a bool deciding whether we want to print
      information on a package, and
   2. another that returns the string to print for a package.

   Currently, all callers use "True" for the decider expression, as
   ftpdir-cleanup and sourceballs operate on *every* package.  However, I'm
   including a way to filter packages because, I'm coming at this from the
   context that I want to parse .db files in other places too.

 - libalpm doesn't offer an easy way to say "parse this DB file for me";
   instead, we must construct a configuration that has a syncdb pointing to
   that file, which we then have it sync in to a temporary directory.

As a final note, when re-writing the bit of sourceballs to use dbquery
instead of AWK, I realized that it does not correctly handle licenses that
have a space in them (as of 2018-07-07 there are 67 packages in the Arch
repos that have license containing a space).  I did not fix this bug; I
merely translated it from AWK to Python, as the program would also need to
be adjusted elsewhere.
---
 cron-jobs/ftpdir-cleanup |  2 +-
 cron-jobs/sourceballs    | 14 ++------------
 db-functions             | 25 +++++++++++++++++++++++++
 test/Dockerfile          |  2 +-
 4 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/cron-jobs/ftpdir-cleanup b/cron-jobs/ftpdir-cleanup
index 9df5f99..77e49c8 100755
--- a/cron-jobs/ftpdir-cleanup
+++ b/cron-jobs/ftpdir-cleanup
@@ -44,7 +44,7 @@ for repo in "${PKGREPOS[@]}"; do
 			fi
 		done | sort > "${WORKDIR}/repo-${repo}-${arch}"
 		# get a list of package files defined in the repo db
-		bsdtar -xOf "${FTP_BASE}/${repo}/os/${arch}/${repo}${DBEXT}" | awk '/^%FILENAME%/{getline;print}' | sort > "${WORKDIR}/db-${repo}-${arch}"
+		dbquery "$repo" "$arch" True pkg.filename | sort > "${WORKDIR}/db-${repo}-${arch}"
 
 		missing_pkgs=($(comm -13 "${WORKDIR}/repo-${repo}-${arch}" "${WORKDIR}/db-${repo}-${arch}"))
 		if (( ${#missing_pkgs[@]} >= 1 )); then
diff --git a/cron-jobs/sourceballs b/cron-jobs/sourceballs
index 6be28ab..784b48b 100755
--- a/cron-jobs/sourceballs
+++ b/cron-jobs/sourceballs
@@ -24,18 +24,8 @@ for repo in "${PKGREPOS[@]}"; do
 		if [[ ! -f ${FTP_BASE}/${repo}/os/${arch}/${repo}${DBEXT} ]]; then
 			continue
 		fi
-		bsdtar -xOf "${FTP_BASE}/${repo}/os/${arch}/${repo}${DBEXT}" \
-			| awk '/^%NAME%/ { getline b };
-				/^%BASE%/ { getline b };
-				/^%VERSION%/ { getline v };
-				/^%LICENSE%/,/^$/ {
-					if ( !/^%LICENSE%/ ) { l=l" "$0 }
-					};
-				/^%ARCH%/ {
-					getline a;
-					printf "%s %s %s %s\n", b, v, a, l;
-					l="";
-				}'
+		dbquery "$repo" "$arch" True \
+			'f"{pkg.base or pkg.name} {pkg.version} {pkg.arch} {'\'' '\''.join(pkg.licenses)}"'
 	done | sort -u > "${WORKDIR}/db-${repo}"
 done
 
diff --git a/db-functions b/db-functions
index 0491c22..f1d821a 100644
--- a/db-functions
+++ b/db-functions
@@ -294,6 +294,31 @@ getpkgfiles() {
 	echo "${files[@]}"
 }
 
+# usage: dbquery repo arch filter_expr output_expr
+dbquery() {
+	local repo=$1
+	local arch=$2
+	local filter=$3
+	local output=$4
+	local dbfile="${FTP_BASE}/${repo}/os/${arch}/${repo}.db"
+
+	python3 - "$dbfile" "$filter" "$output" <<-'EOT'
+		import os.path
+		import sys
+		import tempfile
+		import pyalpm
+		db_dir, db_file = os.path.split(os.path.abspath(sys.argv[1]))
+		with tempfile.TemporaryDirectory() as tmpdirname:
+		    handle = pyalpm.Handle(tmpdirname, tmpdirname)
+		    db = handle.register_syncdb(db_file[:-3], 0)
+		    db.servers = ["file://{}".format(db_dir)]
+		    db.update(False)
+		    for pkg in db.search(".*"):
+		        if eval(sys.argv[2], {}, {"pkg": pkg}):
+		            print(eval(sys.argv[3], {}, {"pkg": pkg}))
+		EOT
+}
+
 check_pkgfile() {
 	local pkgfile=$1
 
diff --git a/test/Dockerfile b/test/Dockerfile
index 83c8449..0d01a75 100644
--- a/test/Dockerfile
+++ b/test/Dockerfile
@@ -1,5 +1,5 @@
 FROM archlinux/base
-RUN pacman -Syu --noconfirm --needed sudo fakeroot awk subversion make kcov bash-bats gettext grep
+RUN pacman -Syu --noconfirm --needed sudo fakeroot awk subversion make kcov bash-bats gettext grep pyalpm
 RUN pacman-key --init
 RUN echo '%wheel ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/wheel
 RUN useradd -N -g users -G wheel -d /build -m tester
-- 
2.17.1


More information about the arch-projects mailing list