[pacman-dev] Mark outdated packages automatically (aka Livecheck)

Anatol Pomozov anatol.pomozov at gmail.com
Thu Apr 18 10:38:10 EDT 2013


On Tue, Mar 26, 2013 at 8:38 AM, Anatol Pomozov
<anatol.pomozov at gmail.com> wrote:
> Hi
> On Sun, Mar 24, 2013 at 11:33 AM, William Giokas <1007380 at gmail.com> wrote:
>> On Sun, Mar 24, 2013 at 11:29:32AM -0700, Anatol Pomozov wrote:
>>> Hi
>>> On Sun, Mar 24, 2013 at 11:23 AM, Lukas Fleischer
>>> <archlinux at cryptocrack.de> wrote:
>>> > On Sun, Mar 24, 2013 at 11:12:05AM -0700, Anatol Pomozov wrote:
>>> >> Hi,
>>> >>
>>> >> I believe in automatization. Any routine work that can be done
>>> >> automatically should be done this way.
>>> >>
>>> >> One such thing that can be improved in Arch project is discovering
>>> >> out-of-date packages. Currently it is done by users who go to
>>> >> https://www.archlinux.org/packages/ find the package and then click
>>> >> "Flag Package Out-of-Date" link. Why to bother users? Why not to let
>>> >> some bot to visit websites and check for new versions?
>>> >>
>>> >> There are examples of package managers that have such functionality -
>>> >> macports http://guide.macports.org/chunked/reference.livecheck.html
>>> >> Their Portfiles can have information about how to find released files
>>> >> (using regexp). Then periodically (e.g. daily) a bot visits webpages,
>>> >> parses html and checks if new files are present.
>>> >>
>>> >> Is it possible to have such functionality in pacman? It would save
>>> >> users time and make package update time lower.
>>> >
>>> > Some developers and Trusted Users already use tools to check websites
>>> > for updates. I agree that it might be better to do this in a central
>>> > location but this is certainly not a pacman issue. Maybe we could add
>>> > something to archweb (or just use a bot, as you already mentioned).
>>> Sure, I can file a ticket against archweb.
>>> But I believe PKGBUILD file should have a field that describes how to
>>> find a new version for the package.
>> There is already the url= field.
> In theory the bot can use "source" field of PKGBUILD. e.g. source looks like
> source=("http://download.savannah.gnu.org/releases/$pkgname/$pkgname-$pkgver.src.tar.gz"{,.sig})
> It can extract the archive url
> http://download.savannah.gnu.org/releases/$pkgname/$pkgname-$pkgver.src.tar.gz
> and then try to probe newer $pkgver. Current version is XX.YY.ZZ so
> bot can try  XX.YY.ZZ+1, XX.YY+1.0, XX+1.0.0 This should cover most
> projects updates. There are some cases when version is not numeric one
> e.g. includes -alpha, -beta -rc, or some other versioning.

I have some good news. I implemented a script that checks the "next"
version using algorithm above. Here it is
https://github.com/anatol/pkglivecheck and here is its output

The script has following algorithm: it iterates through all PKGBUILD
files in /var/abs, parses them and gets "sources" field. Then it tries
to find "pkgversion" field in the url and replace that part with
"next" version. Next version is one of the "X.Y.Z+1", "X.Y+1.0",
"X+1.0.0". It uses curl to check whether the file exists on server.

It works pretty well. It found a tons of outdated packages. I already
marked ~40 packages as out-of-date. But it the future packages should
be marked automatically without human intervention.

There are some false positives/negatives though:

- Not all urls are parseble. e.g. they have suffixes like "-rc1" or
"-beta" or something like this. Some packages change versions in URLS
e.g. 2.0.3 -> http://../foo-203.zip. My script does not handle it.

- Some of the download files were removed, or project closed and
website is dead.

- There is no reliable way to get information whether file exists on
server. I use "curl --head --L" and get "content_type" of the
response. This works for ~97% of all packages. There are websites that
e.g. return zip files with "content-type: plain/text". It is quite
difficult whether it is "404 not found" page or a zip file.

- Some web-site try to "help" users and return latest stable version
of sources in case if you requested incorrect zip file. e.g. you ask
for a version 3.2.1 and server returns "200" response with file

- Some projects have weird release version conventions. e.g. 422 and
424 are marked as beta and 423 is stable. Arch needs 423, but not 424.

Saying that script works pretty well, and I am trying to address the
rest of the issue. It worth looking at it output and check outofdate
packages. I also think that packages need some "livecheck" field that
describes how to find released versions. And algorithm above could be
used as a default in case if there is no such "livecheck" field.

More information about the pacman-dev mailing list