On 10/2/19 11:32 AM, Greg Minshall wrote:
i'm not sure i can explain why i find having the complete list, with descriptions, local on my machine useful. but, i do. "search locally, build globally" somehow works well for me. (one rationalization might be that searching is inherently more interactive than building, so random network latencies, etc., during building are less annoying than during searching.)
anyway, grant me the desire to maintain, offline, a complete list of AUR packages, version numbers, descriptions.
Could be, I dunno. All I know is what I would consider personally useful -- your use cases remain your own, regardless of my opinions or expressed doubts. :)
let's say that i've managed, over a period of a week or so, to download the entire database (or, at least, the "rows" in which i am interested: package name, version, description) into my own local database.
then, a week later, i'd like to *update* my local database with what's changed in the AUR repository. how would i proceed? as things currently stand, iiuc (always a dubious proposition), i'd need to again download the entire database.
on the other hand, if there were a packages-vers.gz (*), i could download that, then compare the package names and versions in it with those in my local database, and schedule to download the database entries for those packages whose version numbers had changed (as well as those packages in packages-vers.gz that are new; and at the same time delete those packages in my local database that are no longer in packages-vers.gz); one can visualize this code.
my presumption is that this would be much lighter on server resources than downloading the entire database each week. and, maybe (you'll know the "churn" in the repository) would even be very light.
and, i think this could be useful for general use. i may only care about descriptions, but if someone cares about dependencies, maintainers, etc., they would still use the version-number mechanism (again, see (*) below) to determine which packages have changed, and only download the information from those changed packages.
Well, I guess I could hear the argument you make for providing a way to invalidate offline assumptions about a package. Even if providing a dump of names-versions is not strictly useful itself.
ps -- thanks for the pointer to expac. i'll look at converting to that. no one ever accused me of writing overly-efficient code... :)
(*) NB:
note that, for "true consistency", using "version" depends on the assumption, likely to be at least occasionally, maybe often, invalid, that if the *metadata* for a package in the database changes then the *version* of the package itself also changes.
This is "supposed to be true", as in, it's generally considered pretty bad if people update a PKGBUILD so that it creates a different package but don't update the pkgrel for metadata or package content changes. It isn't guaranteed, sure, but I guess there are worse things than simply failing to detect a cache invalidation for that package. On the other hand...
if "last modified" time in the database is updated when any of the metadata changes, that would be better to use than package version number.
if "last modified" time isn't updated when (some) metadata is updated, one could also run an md5sum(1) over (a textual representation of) each package's database entry, and provide packages-md5sums.gz, say. i'll note that a simple test shows that adding an md5sum to each line inflates the size of the file considerably : % ls -skh packages*.gz : 1.5M packages-md5sums.gz 344K packages.gz
the inflation for version numbers and/or "last modified time" (as seconds since the epoch) would probably be less, maybe double the size of packages.gz?
The package details key "last modified" is indeed updated to the time of the latest push to the package's git repository, see https://git.archlinux.org/aurweb.git/tree/aurweb/git/update.py#n92 So this would be a valid method for guaranteeing cache invalidation. -- Eli Schwartz Bug Wrangler and Trusted User