Re: list of package names, versions[, descriptions]

2 Oct 2019

      On 10/2/19 11:32 AM, Greg Minshall wrote:
...
i'm not sure i can explain why i find having the complete list, with
descriptions, local on my machine useful.  but, i do.  "search locally,
build globally" somehow works well for me.  (one rationalization might
be that searching is inherently more interactive than building, so
random network latencies, etc., during building are less annoying than
during searching.)
anyway, grant me the desire to maintain, offline, a complete list of AUR
packages, version numbers, descriptions.
Could be, I dunno. All I know is what I would consider personally useful
-- your use cases remain your own, regardless of my opinions or
expressed doubts. :)
...
let's say that i've managed, over a period of a week or so, to download
the entire database (or, at least, the "rows" in which i am interested:
package name, version, description) into my own local database.
then, a week later, i'd like to *update* my local database with what's
changed in the AUR repository.  how would i proceed?  as things
currently stand, iiuc (always a dubious proposition), i'd need to again
download the entire database.
on the other hand, if there were a packages-vers.gz (*), i could
download that, then compare the package names and versions in it with
those in my local database, and schedule to download the database
entries for those packages whose version numbers had changed (as well as
those packages in packages-vers.gz that are new; and at the same time
delete those packages in my local database that are no longer in
packages-vers.gz); one can visualize this code.
my presumption is that this would be much lighter on server resources
than downloading the entire database each week.  and, maybe (you'll know
the "churn" in the repository) would even be very light.
and, i think this could be useful for general use.  i may only care
about descriptions, but if someone cares about dependencies,
maintainers, etc., they would still use the version-number mechanism
(again, see (*) below) to determine which packages have changed, and
only download the information from those changed packages.
Well, I guess I could hear the argument you make for providing a way to
invalidate offline assumptions about a package. Even if providing a dump
of names-versions is not strictly useful itself.
...
ps -- thanks for the pointer to expac.  i'll look at converting to that.
no one ever accused me of writing overly-efficient code... :)
(*) NB:
note that, for "true consistency", using "version" depends on the
assumption, likely to be at least occasionally, maybe often, invalid,
that if the *metadata* for a package in the database changes then the
*version* of the package itself also changes.
This is "supposed to be true", as in, it's generally considered pretty
bad if people update a PKGBUILD so that it creates a different package
but don't update the pkgrel for metadata or package content changes. It
isn't guaranteed, sure, but I guess there are worse things than simply
failing to detect a cache invalidation for that package.

On the other hand...
...
if "last modified" time in the database is updated when any of the
metadata changes, that would be better to use than package version
number.
if "last modified" time isn't updated when (some) metadata is updated,
one could also run an md5sum(1) over (a textual representation of) each
package's database entry, and provide packages-md5sums.gz, say.  i'll
note that a simple test shows that adding an md5sum to each line
inflates the size of the file considerably
: % ls -skh packages*.gz
: 1.5M packages-md5sums.gz  344K packages.gz
the inflation for version numbers and/or "last modified time" (as
seconds since the epoch) would probably be less, maybe double the size
of packages.gz?
The package details key "last modified" is indeed updated to the time of
the latest push to the package's git repository, see
https://git.archlinux.org/aurweb.git/tree/aurweb/git/update.py#n92

So this would be a valid method for guaranteeing cache invalidation.

-- 
Eli Schwartz
Bug Wrangler and Trusted User