[pacman-dev] Version comparison algorithm.
Allan McRae
allan at archlinux.org
Wed Mar 20 13:24:12 UTC 2019
On 20/3/19 11:13 pm, Maarten de Vries wrote:
> On Wed, 20 Mar 2019 at 12:19, Richard Dodd <richard.o.dodd at gmail.com> wrote:
>>
>> I'm just trying to get a mathematically sound implementation of all the
>> properties (traits in rust parlance) that the version exhibits. If there is
>> an equivalence relation and a hash function as above, then I can store
>> versions in a hash set/map. If there is a total ordering, then I can store
>> versions in a binary tree set/map. I'm making a library, so I want to allow
>> as much as possible without allowing anything unsound. Rust's type system
>> allows me to make these promises in code, rather than just documenting them.
>>
>> You can see my implementation at
>> https://github.com/derekdreery/alpm-experimental/blob/master/src/version.rs,
>> if you're interested.
>>
>>
>> On Sat, Mar 16, 2019 at 2:02 PM Richard Dodd <richard.o.dodd at gmail.com>
>> wrote:
>>
>>> I'm writing to ask about the version comparison function `rpmvercmp` in
>>> `libalpm/version.cL83`. The algorithm is complex and I'm trying to
>>> understand its behavior. I want to hash using the package name and version
>>> as a key, and so I need a hash function where `i1 == i2 => hash(i1) ==
>>> hash(i2)` according to the version comparison operation. I'll describe how
>>> the algorithm behaves and then ask my questions.
>>>
>>> The algorithm works on a byte string and uses ascii comparison rules (no
>>> unicode).
>>>
>>> - First, split the input up into blocks of *alpha*, *digit* or
>>> *non-alphanum*.
>>> - For each pair of blocks
>>> - If the types are different, then *non-alphanum* is newer than
>>> *numeric*, which is newer than *alpha*
>>> - If the types are the same, then the rules are
>>> - For *non-alphanum*, compare lengths, longer is newer, equal
>>> lengths are equal segments (so *--* and *::* are the same)
>>> - For *alpha* just do a lexicographic comparison (so *b* is newer
>>> than *a* etc.)
>>> - For *numeric*, do a numeric comparison. (this can be done by
>>> skipping leading zeros, then comparing length, then comparing
>>> lexicographically, to avoid overflow on integer conversion)
>>> - If one input is longer than the other, and all sections so far have
>>> been equal, then if the next section of the longer is *alpha*, it is older,
>>> and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is
>>> newer than "a").
>>> - If the inputs have the same number of sections that are all equal,
>>> then they are equal.
>>>
>>> Questions:
>>>
>>> 1. Is the algorithm correctly described here.
>>> 2. This should mean that if I hash length for *non-numeric*, the string
>>> with stripped zeros for *numeric*, and just the string for *alpha*, the has
>>> law should be upheld. Is this correct?
>>>
>>> Thanks
>>> Rich
>>>
>>>
>>>
>
> I have a somewhat abandoned rust implementation of the version
> comparison algorithm too:
> https://github.com/de-vri-es/pacman-repo-tools/blob/master/src/version/compare.rs
>
> Assuming my implementation is correct, it's does define a total ordering.
>
> Maybe most interesting for you there is the test cases. I'm pretty
> sure I verified all of them against `/usr/bin/vercmp` (the binary
> executable shipped with pacman). You could even write the test cases
> to actually run vercmp (at the cost of portability). Or you could use
> vercmp to do some fuzz testing against randomly generated inputs. I
> imagine you would quickly find most discrepancies, so you could
> at-least use it to verify your algorithm against the reference
> implementation.
>
Note pacman has a test suite for version comparison.
A
More information about the pacman-dev
mailing list