[pacman-dev] Version comparison algorithm.
I'm writing to ask about the version comparison function `rpmvercmp` in `libalpm/version.cL83`. The algorithm is complex and I'm trying to understand its behavior. I want to hash using the package name and version as a key, and so I need a hash function where `i1 == i2 => hash(i1) == hash(i2)` according to the version comparison operation. I'll describe how the algorithm behaves and then ask my questions. The algorithm works on a byte string and uses ascii comparison rules (no unicode). - First, split the input up into blocks of *alpha*, *digit* or *non-alphanum*. - For each pair of blocks - If the types are different, then *non-alphanum* is newer than *numeric*, which is newer than *alpha* - If the types are the same, then the rules are - For *non-alphanum*, compare lengths, longer is newer, equal lengths are equal segments (so *--* and *::* are the same) - For *alpha* just do a lexicographic comparison (so *b* is newer than *a* etc.) - For *numeric*, do a numeric comparison. (this can be done by skipping leading zeros, then comparing length, then comparing lexicographically, to avoid overflow on integer conversion) - If one input is longer than the other, and all sections so far have been equal, then if the next section of the longer is *alpha*, it is older, and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is newer than "a"). - If the inputs have the same number of sections that are all equal, then they are equal. Questions: 1. Is the algorithm correctly described here. 2. This should mean that if I hash length for *non-numeric*, the string with stripped zeros for *numeric*, and just the string for *alpha*, the has law should be upheld. Is this correct? Thanks Rich
On Sat, Mar 16, 2019, 10:02 Richard Dodd <richard.o.dodd@gmail.com> wrote:
I'm writing to ask about the version comparison function `rpmvercmp` in `libalpm/version.cL83`. The algorithm is complex and I'm trying to understand its behavior. I want to hash using the package name and version as a key, and so I need a hash function where `i1 == i2 => hash(i1) == hash(i2)` according to the version comparison operation. I'll describe how the algorithm behaves and then ask my questions.
Could you explain what your actual goal is? Having ordering in a hash function sounds extremely odd. Generally, you care about associativity *or* ordering, not both. The algorithm works on a byte string and uses ascii comparison rules (no
unicode).
- First, split the input up into blocks of *alpha*, *digit* or *non-alphanum*. - For each pair of blocks - If the types are different, then *non-alphanum* is newer than *numeric*, which is newer than *alpha* - If the types are the same, then the rules are - For *non-alphanum*, compare lengths, longer is newer, equal lengths are equal segments (so *--* and *::* are the same) - For *alpha* just do a lexicographic comparison (so *b* is newer than *a* etc.) - For *numeric*, do a numeric comparison. (this can be done by skipping leading zeros, then comparing length, then comparing lexicographically, to avoid overflow on integer conversion) - If one input is longer than the other, and all sections so far have been equal, then if the next section of the longer is *alpha*, it is older, and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is newer than "a"). - If the inputs have the same number of sections that are all equal, then they are equal.
Questions:
1. Is the algorithm correctly described here. 2. This should mean that if I hash length for *non-numeric*, the string with stripped zeros for *numeric*, and just the string for *alpha*, the has law should be upheld. Is this correct?
Thanks Rich
I'm just trying to get a mathematically sound implementation of all the properties (traits in rust parlance) that the version exhibits. If there is an equivalence relation and a hash function as above, then I can store versions in a hash set/map. If there is a total ordering, then I can store versions in a binary tree set/map. I'm making a library, so I want to allow as much as possible without allowing anything unsound. Rust's type system allows me to make these promises in code, rather than just documenting them. You can see my implementation at https://github.com/derekdreery/alpm-experimental/blob/master/src/version.rs, if you're interested. On Sat, Mar 16, 2019 at 2:02 PM Richard Dodd <richard.o.dodd@gmail.com> wrote:
I'm writing to ask about the version comparison function `rpmvercmp` in `libalpm/version.cL83`. The algorithm is complex and I'm trying to understand its behavior. I want to hash using the package name and version as a key, and so I need a hash function where `i1 == i2 => hash(i1) == hash(i2)` according to the version comparison operation. I'll describe how the algorithm behaves and then ask my questions.
The algorithm works on a byte string and uses ascii comparison rules (no unicode).
- First, split the input up into blocks of *alpha*, *digit* or *non-alphanum*. - For each pair of blocks - If the types are different, then *non-alphanum* is newer than *numeric*, which is newer than *alpha* - If the types are the same, then the rules are - For *non-alphanum*, compare lengths, longer is newer, equal lengths are equal segments (so *--* and *::* are the same) - For *alpha* just do a lexicographic comparison (so *b* is newer than *a* etc.) - For *numeric*, do a numeric comparison. (this can be done by skipping leading zeros, then comparing length, then comparing lexicographically, to avoid overflow on integer conversion) - If one input is longer than the other, and all sections so far have been equal, then if the next section of the longer is *alpha*, it is older, and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is newer than "a"). - If the inputs have the same number of sections that are all equal, then they are equal.
Questions:
1. Is the algorithm correctly described here. 2. This should mean that if I hash length for *non-numeric*, the string with stripped zeros for *numeric*, and just the string for *alpha*, the has law should be upheld. Is this correct?
Thanks Rich
On Wed, 20 Mar 2019 at 12:19, Richard Dodd <richard.o.dodd@gmail.com> wrote:
I'm just trying to get a mathematically sound implementation of all the properties (traits in rust parlance) that the version exhibits. If there is an equivalence relation and a hash function as above, then I can store versions in a hash set/map. If there is a total ordering, then I can store versions in a binary tree set/map. I'm making a library, so I want to allow as much as possible without allowing anything unsound. Rust's type system allows me to make these promises in code, rather than just documenting them.
You can see my implementation at https://github.com/derekdreery/alpm-experimental/blob/master/src/version.rs, if you're interested.
On Sat, Mar 16, 2019 at 2:02 PM Richard Dodd <richard.o.dodd@gmail.com> wrote:
I'm writing to ask about the version comparison function `rpmvercmp` in `libalpm/version.cL83`. The algorithm is complex and I'm trying to understand its behavior. I want to hash using the package name and version as a key, and so I need a hash function where `i1 == i2 => hash(i1) == hash(i2)` according to the version comparison operation. I'll describe how the algorithm behaves and then ask my questions.
The algorithm works on a byte string and uses ascii comparison rules (no unicode).
- First, split the input up into blocks of *alpha*, *digit* or *non-alphanum*. - For each pair of blocks - If the types are different, then *non-alphanum* is newer than *numeric*, which is newer than *alpha* - If the types are the same, then the rules are - For *non-alphanum*, compare lengths, longer is newer, equal lengths are equal segments (so *--* and *::* are the same) - For *alpha* just do a lexicographic comparison (so *b* is newer than *a* etc.) - For *numeric*, do a numeric comparison. (this can be done by skipping leading zeros, then comparing length, then comparing lexicographically, to avoid overflow on integer conversion) - If one input is longer than the other, and all sections so far have been equal, then if the next section of the longer is *alpha*, it is older, and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is newer than "a"). - If the inputs have the same number of sections that are all equal, then they are equal.
Questions:
1. Is the algorithm correctly described here. 2. This should mean that if I hash length for *non-numeric*, the string with stripped zeros for *numeric*, and just the string for *alpha*, the has law should be upheld. Is this correct?
Thanks Rich
I have a somewhat abandoned rust implementation of the version comparison algorithm too: https://github.com/de-vri-es/pacman-repo-tools/blob/master/src/version/compa... Assuming my implementation is correct, it's does define a total ordering. Maybe most interesting for you there is the test cases. I'm pretty sure I verified all of them against `/usr/bin/vercmp` (the binary executable shipped with pacman). You could even write the test cases to actually run vercmp (at the cost of portability). Or you could use vercmp to do some fuzz testing against randomly generated inputs. I imagine you would quickly find most discrepancies, so you could at-least use it to verify your algorithm against the reference implementation. -- Maarten
On 20/3/19 11:13 pm, Maarten de Vries wrote:
On Wed, 20 Mar 2019 at 12:19, Richard Dodd <richard.o.dodd@gmail.com> wrote:
I'm just trying to get a mathematically sound implementation of all the properties (traits in rust parlance) that the version exhibits. If there is an equivalence relation and a hash function as above, then I can store versions in a hash set/map. If there is a total ordering, then I can store versions in a binary tree set/map. I'm making a library, so I want to allow as much as possible without allowing anything unsound. Rust's type system allows me to make these promises in code, rather than just documenting them.
You can see my implementation at https://github.com/derekdreery/alpm-experimental/blob/master/src/version.rs, if you're interested.
On Sat, Mar 16, 2019 at 2:02 PM Richard Dodd <richard.o.dodd@gmail.com> wrote:
I'm writing to ask about the version comparison function `rpmvercmp` in `libalpm/version.cL83`. The algorithm is complex and I'm trying to understand its behavior. I want to hash using the package name and version as a key, and so I need a hash function where `i1 == i2 => hash(i1) == hash(i2)` according to the version comparison operation. I'll describe how the algorithm behaves and then ask my questions.
The algorithm works on a byte string and uses ascii comparison rules (no unicode).
- First, split the input up into blocks of *alpha*, *digit* or *non-alphanum*. - For each pair of blocks - If the types are different, then *non-alphanum* is newer than *numeric*, which is newer than *alpha* - If the types are the same, then the rules are - For *non-alphanum*, compare lengths, longer is newer, equal lengths are equal segments (so *--* and *::* are the same) - For *alpha* just do a lexicographic comparison (so *b* is newer than *a* etc.) - For *numeric*, do a numeric comparison. (this can be done by skipping leading zeros, then comparing length, then comparing lexicographically, to avoid overflow on integer conversion) - If one input is longer than the other, and all sections so far have been equal, then if the next section of the longer is *alpha*, it is older, and if it is *numeric* it is newer. (so "1a" is older than "1", "a1" is newer than "a"). - If the inputs have the same number of sections that are all equal, then they are equal.
Questions:
1. Is the algorithm correctly described here. 2. This should mean that if I hash length for *non-numeric*, the string with stripped zeros for *numeric*, and just the string for *alpha*, the has law should be upheld. Is this correct?
Thanks Rich
I have a somewhat abandoned rust implementation of the version comparison algorithm too: https://github.com/de-vri-es/pacman-repo-tools/blob/master/src/version/compa...
Assuming my implementation is correct, it's does define a total ordering.
Maybe most interesting for you there is the test cases. I'm pretty sure I verified all of them against `/usr/bin/vercmp` (the binary executable shipped with pacman). You could even write the test cases to actually run vercmp (at the cost of portability). Or you could use vercmp to do some fuzz testing against randomly generated inputs. I imagine you would quickly find most discrepancies, so you could at-least use it to verify your algorithm against the reference implementation.
Note pacman has a test suite for version comparison. A
participants (4)
-
Allan McRae
-
Dave Reisner
-
Maarten de Vries
-
Richard Dodd