[pacman-dev] [PATCH 0/4] Package list find performance improvements
This series of patches makes finding a package in our linked list implementation a whole lot faster, if that search is using the standard _alpm_pkg_find, which nearly all are (after the first patch). It does this by adding a hash function to util.c which is nothing too complicated and named after a publicly available algorithm. When packages are created, we fill in this hash value as soon as the pkgname is read. Finally, the _alpm_pkg_find function is rewritten to take advantage of this field, avoiding repeated strcmp() calls and only falling back to that if a hash is not available and to verify the hash value was not some sort of collision. Performance figures and numbers are available in the last patch. This actually speeds up operations by nearly 33%, so this is not a total waste of time to consider. :) Review and questions/comments/concerns welcome! -Dan Dan McGee (4): Use _alpm_pkg_find in deps search Add hash_sdbm function When setting package name, set hash value as well Used hashed package name in _alpm_pkg_find lib/libalpm/be_package.c | 1 + lib/libalpm/deps.c | 4 ++-- lib/libalpm/package.c | 16 ++++++++++++++-- lib/libalpm/package.h | 1 + lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 6 files changed, 41 insertions(+), 4 deletions(-) -- 1.7.3.3
Signed-off-by: Dan McGee
This is prepping for the addition of a hash field to each package to greatly
speed up the string comparisons we frequently do on package name in
_alpm_pkg_find.
Signed-off-by: Dan McGee
Signed-off-by: Dan McGee
This results in huge gains to a lot of our codepaths since this is the most
frequent method of random access to packages in a list. The gains are seen
in both profiling and real life.
$ pacman -Sii zvbi
real: 0.41 sec -> 0.32 sec
strcmp: 16,669,760 calls -> 473,942 calls
_alpm_pkg_find: 52.73% -> 26.31% of time
$ pacman -Su (no upgrades found)
real: 0.40 sec -> 0.50 sec
strcmp: 19,497,226 calls -> 524,097 calls
_alpm_pkg_find: 52.36% -> 26.15% of time
There is some minor risk with this patch, but most of it should be avoided
by falling back to strcmp() if we encounter a package with a '0' hash value
(which we should not via any existing code path). We also do a strcmp once
hash values match to ensure against hash collisions. The risk left is that a
package name is modified once it was originally set, but the hash value is
left alone. That would probably result in a lot of other problems anyway.
Signed-off-by: Dan McGee
On Tue, Dec 14, 2010 at 7:46 PM, Dan McGee
This series of patches makes finding a package in our linked list implementation a whole lot faster, if that search is using the standard _alpm_pkg_find, which nearly all are (after the first patch).
It does this by adding a hash function to util.c which is nothing too complicated and named after a publicly available algorithm. When packages are created, we fill in this hash value as soon as the pkgname is read. Finally, the _alpm_pkg_find function is rewritten to take advantage of this field, avoiding repeated strcmp() calls and only falling back to that if a hash is not available and to verify the hash value was not some sort of collision.
Performance figures and numbers are available in the last patch. This actually speeds up operations by nearly 33%, so this is not a total waste of time to consider. :) Review and questions/comments/concerns welcome!
That's nice and short :)
-Dan
Dan McGee (4): Use _alpm_pkg_find in deps search Add hash_sdbm function When setting package name, set hash value as well Used hashed package name in _alpm_pkg_find
lib/libalpm/be_package.c | 1 + lib/libalpm/deps.c | 4 ++-- lib/libalpm/package.c | 16 ++++++++++++++-- lib/libalpm/package.h | 1 + lib/libalpm/util.c | 22 ++++++++++++++++++++++ lib/libalpm/util.h | 1 + 6 files changed, 41 insertions(+), 4 deletions(-)
-- 1.7.3.3
On 15/12/10 04:46, Dan McGee wrote:
This series of patches makes finding a package in our linked list implementation a whole lot faster, if that search is using the standard _alpm_pkg_find, which nearly all are (after the first patch).
It does this by adding a hash function to util.c which is nothing too complicated and named after a publicly available algorithm. When packages are created, we fill in this hash value as soon as the pkgname is read. Finally, the _alpm_pkg_find function is rewritten to take advantage of this field, avoiding repeated strcmp() calls and only falling back to that if a hash is not available and to verify the hash value was not some sort of collision.
Performance figures and numbers are available in the last patch. This actually speeds up operations by nearly 33%, so this is not a total waste of time to consider. :) Review and questions/comments/concerns welcome!
My only comment is more of a wondering whether would it be better to have an _alpm_pkg_set_name(pmpkg_t *) function that automatically updates the hash. It is a tradeoff between having to always remember to update the hash after adjusting pmpkg_t->name (seems likely to get missed at some stage) and complexity that I am undecided on. Allan
On Tue, Dec 14, 2010 at 5:00 PM, Allan McRae
On 15/12/10 04:46, Dan McGee wrote:
This series of patches makes finding a package in our linked list implementation a whole lot faster, if that search is using the standard _alpm_pkg_find, which nearly all are (after the first patch).
It does this by adding a hash function to util.c which is nothing too complicated and named after a publicly available algorithm. When packages are created, we fill in this hash value as soon as the pkgname is read. Finally, the _alpm_pkg_find function is rewritten to take advantage of this field, avoiding repeated strcmp() calls and only falling back to that if a hash is not available and to verify the hash value was not some sort of collision.
Performance figures and numbers are available in the last patch. This actually speeds up operations by nearly 33%, so this is not a total waste of time to consider. :) Review and questions/comments/concerns welcome!
My only comment is more of a wondering whether would it be better to have an _alpm_pkg_set_name(pmpkg_t *) function that automatically updates the hash. It is a tradeoff between having to always remember to update the hash after adjusting pmpkg_t->name (seems likely to get missed at some stage) and complexity that I am undecided on.
I thought about that as well; realized I only had to update two places, "forgot" about it. If we did this we would want to do something like we did for db and path: rename the field to _pkgname so people realize it shouldn't be mucked with directly, and all the sudden the patch is huge. So I think it can be missed either way, but here's to automated testing to hopefully catching fallout when someone forgets it. -Dan
On Tue, Dec 14, 2010 at 12:46:15PM -0600, Dan McGee wrote:
This series of patches makes finding a package in our linked list implementation a whole lot faster, if that search is using the standard _alpm_pkg_find, which nearly all are (after the first patch).
It does this by adding a hash function to util.c which is nothing too complicated and named after a publicly available algorithm. When packages are created, we fill in this hash value as soon as the pkgname is read. Finally, the _alpm_pkg_find function is rewritten to take advantage of this field, avoiding repeated strcmp() calls and only falling back to that if a hash is not available and to verify the hash value was not some sort of collision.
Performance figures and numbers are available in the last patch. This actually speeds up operations by nearly 33%, so this is not a total waste of time to consider. :) Review and questions/comments/concerns welcome!
-Dan
Well, nothing's broken so far. Installed a few new packages with -U and ran an -Syu with no trouble. Speed improvement is there, but nothing significant on my smallish database of 500 packages. These patches are boring -- they just work. I thought you had to be brave to use pacman-git. Stop boring me, Dan. (very nice work though) dave
participants (5)
-
Allan McRae
-
Dan McGee
-
Dan McGee
-
Dave Reisner
-
Xavier Chantry