[pacman-dev] A WIP fix for removing unneeded dependency cycles
This is a work in progress patchset. DO NOT MERGE! I've been looking at a fix for FS#41031 [1]. What I've attempted to do in this prototype is to fix the issue without needing to calculate the strongly connected components of the dependency graph to see if I could improve on run time performance or memory consumption. The way this works is I first perform depth-first cycle detection on the graph. When a cycle is detected the list of packages in the cycle is added to the removes list of each package in the cycle. This marks a package as "in a cycle". After this preprocessing step a package is determined as removable by pretending that every other package in the cycle is being removed, and then determining what or not the package itself is removable as normal. My thinking from this prototype though is that it isn't as great a solution as calculating the strongly connected components of the dependency graph. Doing cycle detection this way means that dependency graphs with packages belonging to more than one cycle aren't detected. Although this is a crazy edge case, it's theoretically a potential graph, and implementing Tarjan's strongly connected components algorithm instead of the cycle detection used here shouldn't be much more work. It can still use the removes array and the `_can_remove_cycle` method for determining whether a cycle is removable; only the preprocessing step would need to change. The only way that I think calculating the strongly connected components could be worse than this prototype is that it requires a list of every vertex in the dependency graph. This is easy enough to calculate with a breadth first search but it will use more memory than simple cycle detection. So I wanted to see what people's thoughts were on these two solutions and if there was a preference to one or the other. Ashley [1] https://bugs.archlinux.org/task/41031
--- lib/libalpm/deps.c | 224 ++++++++++++++++++++++++--- test/pacman/tests/TESTS | 1 + test/pacman/tests/remove-dependency-cycle.py | 25 +++ 3 files changed, 231 insertions(+), 19 deletions(-) create mode 100644 test/pacman/tests/remove-dependency-cycle.py diff --git a/lib/libalpm/deps.c b/lib/libalpm/deps.c index c35a275..5ee5137 100644 --- a/lib/libalpm/deps.c +++ b/lib/libalpm/deps.c @@ -591,6 +591,191 @@ static int can_remove_package(alpm_db_t *db, alpm_pkg_t *pkg, return 1; } +// TODO: Write test for multi entry graph (ie two targets depend on a package +// each which depend on a cycle. +// TODO: Test the removal of a package belonging to a cycle isn't removed +int _alpm_find_cycles(alpm_graph_t *v, alpm_list_t **path) +{ + v->state = -1; + *path = alpm_list_add(*path, v->data); + + for(alpm_list_t *child_i= v->children; child_i; child_i = child_i->next) { + alpm_graph_t *child = child_i->data; + if(child->state == -1) { + alpm_pkg_t *child_pkg = child->data; + if (child_pkg->removes) { + /* A cycle has already been identified on this package. + * Note that a single package that's part of two cycles is unsupported. + * Not just because of this check but because this method of cycle + * detection does not support figure of 8 dependency cycles due to only + * adding the currently detected cycle to each packages removes list, + * not every cycle that every package in the cycle belongs to. + */ + continue; + } + + alpm_list_t *cycle_pkgs; + + for(cycle_pkgs = *path; cycle_pkgs; cycle_pkgs = cycle_pkgs->next) { + alpm_pkg_t *cycle_pkg = cycle_pkgs->data; + if(cycle_pkg->name_hash == child_pkg->name_hash) { + break; + } + } + + if (cycle_pkgs == NULL) { + /* False alarm. We've seen this node before but on a different path. + * Therefore the node has already been processed. */ + continue; + } + + /* Store the cycle in the removes of each package */ + for(alpm_list_t *i = cycle_pkgs; i; i = i->next) { + alpm_pkg_t *cycle_pkg = i->data; + + cycle_pkgs = alpm_list_remove_item(cycle_pkgs, i); + + alpm_list_t *removes = alpm_list_copy(cycle_pkgs); + + cycle_pkg->removes = alpm_list_join(cycle_pkg->removes, removes); + + /* Reinsert cycle_pkg into the list */ + if (i->next == NULL) { + /* i was the last item in the list */ + i->prev->next = i; + cycle_pkgs->prev = i; + } + else if (i->next == cycle_pkgs) { + /* i was the first item in the list */ + cycle_pkgs->prev = i; + cycle_pkgs = i; + } + else { + /* i was in the middle of the list somewhere */ + i->next->prev = i; + i->prev->next = i; + } + } + } + else { + if(_alpm_find_cycles(child, path)) { + return 1; + } + } + } + + *path = alpm_list_remove_item(*path, alpm_list_last(*path)); + + return 0; +} + +void _alpm_graph_unvisit_all(alpm_graph_t *v) +{ + if (v->state == 0) { + return; + } + + v->state = 0; + for(alpm_list_t *child_i = v->children; child_i; child_i = child_i->next) { + alpm_graph_t *child = child_i->data; + if(child->state) { + _alpm_graph_unvisit_all(child); + } + } +} + +/** + * @brief Checks that a cycle can be removed. + * In its `removes` list, + * a cycle package should have a list of the other packages in the cycle. + * + * A cycle can be removed if no packages outside of the cycle depend on any + * packages in the cycle. + * + * @param TODO + * @returns true if the cycle can be removed. false otherwise. + */ +int _can_remove_cycle(alpm_db_t *db, alpm_pkg_t *cycle_pkg, alpm_list_t *targs, + int include_explicit) +{ + alpm_list_t *to_check = alpm_list_copy(cycle_pkg->removes); + to_check = alpm_list_add(to_check, cycle_pkg); + int retval = 1; + + _alpm_log(db->handle, ALPM_LOG_DEBUG, "checks\n"); + for(alpm_list_t *i = to_check; i; i = i->next) { + alpm_pkg_t *pkg = i->data; + /*_alpm_log(db->handle, ALPM_LOG_DEBUG, "checking if '%s' can be removed\n", + pkg->name);*/ + + /* Pretend we are removing the rest of the cycle */ + alpm_list_t *targs_with_cycle = alpm_list_copy(targs); + alpm_list_t *cycle_pkgs = alpm_list_copy(pkg->removes); + targs_with_cycle = alpm_list_join(targs_with_cycle, cycle_pkgs); + + int can_rm = can_remove_package(db, pkg, targs_with_cycle, include_explicit); + + alpm_list_free(targs_with_cycle); + + if (!can_rm) { + //_alpm_log(db->handle, ALPM_LOG_DEBUG, "It can't\n"); + retval = 0; + break; + } + //_alpm_log(db->handle, ALPM_LOG_DEBUG, "It can!\n"); + } + + /* Cleanup the creation of to_check */ + alpm_list_free(to_check); + + return retval; +} + +int _alpm_find_removables(alpm_db_t *db, alpm_graph_t *v, alpm_list_t **targs, int include_explicit) +{ + if (v->state) { + return 0; + } + + v->state = -1; + alpm_pkg_t *pkg = v->data; + + alpm_list_t *to_remove = NULL; + if(pkg->removes != NULL) { + if(_can_remove_cycle(db, pkg, *targs, include_explicit)) { + to_remove = alpm_list_add(to_remove, pkg); + to_remove = alpm_list_join(to_remove, alpm_list_copy(pkg->removes)); + } + } + else if(can_remove_package(db, pkg, *targs, include_explicit)) { + to_remove = alpm_list_add(to_remove, pkg); + } + + for(alpm_list_t *k = to_remove; k; k = k->next) { + alpm_pkg_t *rm_pkg = k->data; + alpm_pkg_t *copy = NULL; + _alpm_log(db->handle, ALPM_LOG_DEBUG, "adding '%s' to the targets\n", + rm_pkg->name); + /* add it to the target list */ + if(_alpm_pkg_dup(rm_pkg, ©)) { + /* we return memory on "non-fatal" error in _alpm_pkg_dup */ + _alpm_pkg_free(copy); + + alpm_list_free(to_remove); + return -1; + } + *targs = alpm_list_add(*targs, copy); + } + + alpm_list_free(to_remove); + + for(alpm_list_t *child = v->children; child; child = child->next) { + _alpm_find_removables(db, child->data, targs, include_explicit); + } + + return 0; +} + /** * @brief Adds unneeded dependencies to an existing list of packages. * By unneeded, we mean dependencies that are only required by packages in the @@ -604,32 +789,33 @@ static int can_remove_package(alpm_db_t *db, alpm_pkg_t *pkg, */ int _alpm_recursedeps(alpm_db_t *db, alpm_list_t **targs, int include_explicit) { - alpm_list_t *i, *j; + alpm_list_t *vertices; + int retval = 0; if(db == NULL || targs == NULL) { return -1; } - for(i = *targs; i; i = i->next) { - alpm_pkg_t *pkg = i->data; - for(j = _alpm_db_get_pkgcache(db); j; j = j->next) { - alpm_pkg_t *deppkg = j->data; - if(_alpm_pkg_depends_on(pkg, deppkg) - && can_remove_package(db, deppkg, *targs, include_explicit)) { - alpm_pkg_t *copy = NULL; - _alpm_log(db->handle, ALPM_LOG_DEBUG, "adding '%s' to the targets\n", - deppkg->name); - /* add it to the target list */ - if(_alpm_pkg_dup(deppkg, ©)) { - /* we return memory on "non-fatal" error in _alpm_pkg_dup */ - _alpm_pkg_free(copy); - return -1; - } - *targs = alpm_list_add(*targs, copy); - } + vertices = dep_graph_init(db->handle, *targs, NULL); + for(alpm_list_t *v_i = vertices; v_i; v_i = v_i->next) { + alpm_graph_t *v = v_i->data; + alpm_list_t *path = NULL; + + if (_alpm_find_cycles(v, &path)) { + retval = -2; + goto cleanup; } + + _alpm_graph_unvisit_all(v); + _alpm_find_removables(db, v, targs, include_explicit); + _alpm_graph_unvisit_all(v); } - return 0; + +cleanup: + alpm_list_free_inner(vertices, _alpm_graph_free); + alpm_list_free(vertices); + + return retval; } /** diff --git a/test/pacman/tests/TESTS b/test/pacman/tests/TESTS index 62d1f2a..eee845f 100644 --- a/test/pacman/tests/TESTS +++ b/test/pacman/tests/TESTS @@ -109,6 +109,7 @@ TESTS += test/pacman/tests/querycheck002.py TESTS += test/pacman/tests/querycheck_fast_file_type.py TESTS += test/pacman/tests/reason001.py TESTS += test/pacman/tests/remove-assumeinstalled.py +TESTS += test/pacman/tests/remove-dependency-cycle.py TESTS += test/pacman/tests/remove001.py TESTS += test/pacman/tests/remove002.py TESTS += test/pacman/tests/remove010.py diff --git a/test/pacman/tests/remove-dependency-cycle.py b/test/pacman/tests/remove-dependency-cycle.py new file mode 100644 index 0000000..a4769de --- /dev/null +++ b/test/pacman/tests/remove-dependency-cycle.py @@ -0,0 +1,25 @@ +self.description = "Recursively removing a package depending on a cycle removes the cycle" + +cycle_dep1 = pmpkg("cycle_dep1") +self.addpkg2db("local", cycle_dep1) + +cycle_pkg1 = pmpkg("cycle_pkg1") +cycle_pkg2 = pmpkg("cycle_pkg2") + +cycle_pkg1.depends = ["cycle_dep1", "cycle_pkg2"] +cycle_pkg2.depends= ["cycle_pkg1"] + +self.addpkg2db("local", cycle_pkg1) +self.addpkg2db("local", cycle_pkg2) + +removal_target = pmpkg("removal_target") +removal_target.depends = ["cycle_pkg1"] +self.addpkg2db("local", removal_target) + +self.args = "-Rss removal_target" + +self.addrule("PACMAN_RETCODE=0") +self.addrule("!PKG_EXIST=cycle_dep1") +self.addrule("!PKG_EXIST=cycle_pkg1") +self.addrule("!PKG_EXIST=cycle_pkg2") +self.addrule("!PKG_EXIST=removal_target") -- 2.8.0
On 09/05/16 01:08, Ashley Whetter wrote:
--- lib/libalpm/deps.c | 224 ++++++++++++++++++++++++--- test/pacman/tests/TESTS | 1 + test/pacman/tests/remove-dependency-cycle.py | 25 +++ 3 files changed, 231 insertions(+), 19 deletions(-) create mode 100644 test/pacman/tests/remove-dependency-cycle.py
Have you been given any comments on this yet? I have not looked in the patch itself as from memory you were discussing this with Andrew. Is that right? Ihe inclusion of a test case is great. Even better that it passes! Did you check it did not pass before applying your patch too? Please test your build using "--enable-git-version --enable-debug --enable-warningflags" deps.c:597:5: error: no previous prototype for ‘_alpm_find_cycles’ [-Werror=missing-prototypes] int _alpm_find_cycles(alpm_graph_t *v, alpm_list_t **path) ^~~~~~~~~~~~~~~~~ deps.c:672:6: error: no previous prototype for ‘_alpm_graph_unvisit_all’ [-Werror=missing-prototypes] void _alpm_graph_unvisit_all(alpm_graph_t *v) ^~~~~~~~~~~~~~~~~~~~~~~ deps.c:698:5: error: no previous prototype for ‘_can_remove_cycle’ [-Werror=missing-prototypes] int _can_remove_cycle(alpm_db_t *db, alpm_pkg_t *cycle_pkg, alpm_list_t *targs, ^~~~~~~~~~~~~~~~~ deps.c:734:5: error: no previous prototype for ‘_alpm_find_removables’ [-Werror=missing-prototypes] int _alpm_find_removables(alpm_db_t *db, alpm_graph_t *v, alpm_list_t **targs, int include_explicit) ^~~~~~~~~~~~~~~~~~~~~
On Wed, 18 May 2016 15:14:08 +1000, Allan McRae <allan@archlinux.org> wrote:
On 09/05/16 01:08, Ashley Whetter wrote:
--- lib/libalpm/deps.c | 224 ++++++++++++++++++++++++--- test/pacman/tests/TESTS | 1 + test/pacman/tests/remove-dependency-cycle.py | 25 +++ 3 files changed, 231 insertions(+), 19 deletions(-) create mode 100644 test/pacman/tests/remove-dependency-cycle.py
Have you been given any comments on this yet? I have not looked in the patch itself as from memory you were discussing this with Andrew. Is that right?
I've not had any comments as of yet. I did have a brief discussion with Andrew very early on about his suggestion on the flyspray issue. The outcome was that it was too difficult to filter the list of all dependencies, which I started looking towards graph methods instead.
Ihe inclusion of a test case is great. Even better that it passes! Did you check it did not pass before applying your patch too?
I did check. Took me some time to get it to not fail ;) I need to add a few more test cases before I submit the final version. For example for when two packages are being removed that have the same dependency cycle, when an explicitly installed package exists in a cycle, and when a removal target exists in a cycle.
Please test your build using "--enable-git-version --enable-debug --enable-warningflags"
deps.c:597:5: error: no previous prototype for ‘_alpm_find_cycles’ [-Werror=missing-prototypes] int _alpm_find_cycles(alpm_graph_t *v, alpm_list_t **path) ^~~~~~~~~~~~~~~~~ deps.c:672:6: error: no previous prototype for ‘_alpm_graph_unvisit_all’ [-Werror=missing-prototypes] void _alpm_graph_unvisit_all(alpm_graph_t *v) ^~~~~~~~~~~~~~~~~~~~~~~ deps.c:698:5: error: no previous prototype for ‘_can_remove_cycle’ [-Werror=missing-prototypes] int _can_remove_cycle(alpm_db_t *db, alpm_pkg_t *cycle_pkg, alpm_list_t *targs, ^~~~~~~~~~~~~~~~~ deps.c:734:5: error: no previous prototype for ‘_alpm_find_removables’ [-Werror=missing-prototypes] int _alpm_find_removables(alpm_db_t *db, alpm_graph_t *v, alpm_list_t **targs, int include_explicit) ^~~~~~~~~~~~~~~~~~~~~
I will do this for the final version. I may (separately) add this suggestion to HACKING as well. Ashley
On 05/08/16 at 04:08pm, Ashley Whetter wrote:
--- lib/libalpm/deps.c | 224 ++++++++++++++++++++++++--- test/pacman/tests/TESTS | 1 + test/pacman/tests/remove-dependency-cycle.py | 25 +++ 3 files changed, 231 insertions(+), 19 deletions(-) create mode 100644 test/pacman/tests/remove-dependency-cycle.py
This is way more complicated than necessary. For larger, more complex, dependency chains the performance degrades significantly and it gives incorrect results. I still think the best solution is to just select the entire dependency chain then filter out packages required by packages we won't be uninstalling. I've pushed an initial draft to my recursedeps branch: https://github.com/andrewgregory/pacman/commits/recursedeps I modified your test to be more strenuous in order to compare performance and got the following results: origin/master: 0.11s FAIL awhetter/recursedeps: 22.71s FAIL (only passes with limit < 4) apg/recursedeps: 0.20s PASS Unless there's something critically wrong with my patch, I think using a graph to detect cycles is a dead-end. The modified test: self.description = "Recursively removing a package depending on a cycle removes the cycle" cycle_dep1 = pmpkg("cycle_dep1") self.addpkg2db("local", cycle_dep1) cycle_pkg1 = pmpkg("cycle_pkg1") cycle_pkg2 = pmpkg("cycle_pkg2") cycle_pkg1.depends = ["cycle_dep1", "cycle_pkg2"] cycle_pkg2.depends= ["cycle_pkg1"] self.addpkg2db("local", cycle_pkg1) self.addpkg2db("local", cycle_pkg2) removal_target = pmpkg("removal_target") removal_target.depends = ["cycle_pkg1", "pkg1"] self.addpkg2db("local", removal_target) limit = 1000 self.addpkg2db("local", pmpkg("pkg%d" % 0)) self.addpkg2db("local", pmpkg("pkg%d" % limit)) for i in xrange(1, limit): pkg = pmpkg("pkg%d" % i) pkg.depends = [ "pkg%d" % (i - 1), "pkg%d" % (i + 1) ] self.addpkg2db("local", pkg) outsider = pmpkg("outsider") outsider.depends = [ "pkg%d" % (limit - 0) ] self.addpkg2db("local", outsider) self.args = "-Rss removal_target" self.addrule("PACMAN_RETCODE=0") self.addrule("!PKG_EXIST=cycle_dep1") self.addrule("!PKG_EXIST=cycle_pkg1") self.addrule("!PKG_EXIST=cycle_pkg2") self.addrule("!PKG_EXIST=pkg0") self.addrule("!PKG_EXIST=pkg1") self.addrule("!PKG_EXIST=pkg%d" % (limit - 1)) self.addrule("!PKG_EXIST=removal_target") self.addrule("PKG_EXIST=pkg%d" % limit) self.addrule("PKG_EXIST=outsider")
participants (3)
-
Allan McRae
-
Andrew Gregory
-
Ashley Whetter