[pacman-dev] Version comparison algorithm.

Maarten de Vries maarten at de-vri.es
Wed Mar 20 13:13:47 UTC 2019


On Wed, 20 Mar 2019 at 12:19, Richard Dodd <richard.o.dodd at gmail.com> wrote:
>
> I'm just trying to get a mathematically sound implementation of all the
> properties (traits in rust parlance) that the version exhibits. If there is
> an equivalence relation and a hash function as above, then I can store
> versions in a hash set/map. If there is a total ordering, then I can store
> versions in a binary tree set/map. I'm making a library, so I want to allow
> as much as possible without allowing anything unsound. Rust's type system
> allows me to make these promises in code, rather than just documenting them.
>
> You can see my implementation at
> https://github.com/derekdreery/alpm-experimental/blob/master/src/version.rs,
> if you're interested.
>
>
> On Sat, Mar 16, 2019 at 2:02 PM Richard Dodd <richard.o.dodd at gmail.com>
> wrote:
>
> > I'm writing to ask about the version comparison function `rpmvercmp` in
> > `libalpm/version.cL83`. The algorithm is complex and I'm trying to
> > understand its behavior. I want to hash using the package name and version
> > as a key, and so I need a hash function where `i1 == i2 => hash(i1) ==
> > hash(i2)` according to the version comparison operation. I'll describe how
> > the algorithm behaves and then ask my questions.
> >
> > The algorithm works on a byte string and uses ascii comparison rules (no
> > unicode).
> >
> >  - First, split the input up into blocks of *alpha*, *digit* or
> > *non-alphanum*.
> >  - For each pair of blocks
> >     - If the types are different, then *non-alphanum* is newer than
> > *numeric*, which is newer than *alpha*
> >     - If the types are the same, then the rules are
> >       - For *non-alphanum*, compare lengths, longer is newer, equal
> > lengths are equal segments (so *--* and *::* are the same)
> >       - For *alpha* just do a lexicographic comparison (so *b* is newer
> > than *a* etc.)
> >       - For *numeric*, do a numeric comparison. (this can be done by
> > skipping leading zeros, then comparing length, then comparing
> > lexicographically, to avoid overflow on integer conversion)
> >   - If one input is longer than the other, and all sections so far have
> > been equal, then if the next section of the longer is *alpha*, it is older,
> > and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is
> > newer than "a").
> >   - If the inputs have the same number of sections that are all equal,
> > then they are equal.
> >
> > Questions:
> >
> >  1. Is the algorithm correctly described here.
> >  2. This should mean that if I hash length for *non-numeric*, the string
> > with stripped zeros for *numeric*, and just the string for *alpha*, the has
> > law should be upheld. Is this correct?
> >
> > Thanks
> > Rich
> >
> >
> >

I have a somewhat abandoned rust implementation of the version
comparison algorithm too:
https://github.com/de-vri-es/pacman-repo-tools/blob/master/src/version/compare.rs

Assuming my implementation is correct, it's does define a total ordering.

Maybe most interesting for you there is the test cases. I'm pretty
sure I verified all of them against `/usr/bin/vercmp` (the binary
executable shipped with pacman). You could even write the test cases
to actually run vercmp (at the cost of portability). Or you could use
vercmp to do some fuzz testing against randomly generated inputs. I
imagine you would quickly find most discrepancies, so you could
at-least use it to verify your algorithm against the reference
implementation.

-- Maarten


More information about the pacman-dev mailing list