[pacman-dev] [patch] patch IV rewritten for git
Hi! Some comments: 1. This is a simple topological sort algorithm, based on the "depth first search" algorithm, for more info plz visit: http://en.wikipedia.org/wiki/Topological_sorting (An alternative algorithm for...) Usually this is done by a recursive way but I not prefer that, because this version is more expressive to me (however this looks more difficult, because we need some extra variables to precisely store the current state). 2. As I noted in the patch (TODO), the most CPU-cost part of the algorithm is to build the graph, so this should somehow combined with depcheck (because that function does this too...) So, here is the patch: ---------------------- diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 29a2f02..afbfed0 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -43,7 +43,7 @@ libalpm_la_SOURCES = \ versioncmp.h versioncmp.c libalpm_la_LDFLAGS = -no-undefined -version-info $(LIB_VERSION_INFO) -libalpm_la_LIBADD = -larchive -ldownload -lm +libalpm_la_LIBADD = -larchive -ldownload if HAS_DOXYGEN all: doxygen.in diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h index d4c584f..e6c905f 100644 --- a/lib/libalpm/alpm.h +++ b/lib/libalpm/alpm.h @@ -50,6 +50,7 @@ typedef struct __pmsyncpkg_t pmsyncpkg_t; typedef struct __pmdepend_t pmdepend_t; typedef struct __pmdepmissing_t pmdepmissing_t; typedef struct __pmconflict_t pmconflict_t; +typedef struct __pmgraph_t pmgraph_t; /* * Library diff --git a/lib/libalpm/deps.c b/lib/libalpm/deps.c index b6f8cc8..ceee5c1 100644 --- a/lib/libalpm/deps.c +++ b/lib/libalpm/deps.c @@ -29,7 +29,6 @@ #ifdef __sun__ #include <strings.h> #endif -#include <math.h> /* libalpm */ #include "deps.h" @@ -46,6 +45,27 @@ extern pmhandle_t *handle; +pmgraph_t *_alpm_graph_new(void) +{ + pmgraph_t *graph = NULL; + + graph = (pmgraph_t *)malloc(sizeof(pmgraph_t)); + if(graph) { + graph->state = 0; + graph->data = NULL; + graph->father = NULL; + graph->sons = NULL; + graph->sonptr = NULL; + } + return(graph); +} + +void _alpm_graph_free(pmgraph_t *graph) +{ + alpm_list_free(graph->sons); + free(graph); +} + pmdepmissing_t *_alpm_depmiss_new(const char *target, pmdeptype_t type, pmdepmod_t depmod, const char *depname, const char *depversion) @@ -108,11 +128,10 @@ int _alpm_depmiss_isin(pmdepmissing_t *needle, alpm_list_t *haystack) alpm_list_t *_alpm_sortbydeps(alpm_list_t *targets, pmtranstype_t mode) { alpm_list_t *newtargs = NULL; - alpm_list_t *i, *j, *k, *l; - int change = 1; - int numscans = 0; - int numtargs = 0; - int maxscans; + alpm_list_t *i, *j, *k; + alpm_list_t *vertices = NULL; + alpm_list_t *vptr; + pmgraph_t *vertex; ALPM_LOG_FUNC; @@ -120,65 +139,64 @@ alpm_list_t *_alpm_sortbydeps(alpm_list_t *targets, pmtranstype_t mode) return(NULL); } + _alpm_log(PM_LOG_DEBUG, _("started sorting dependencies")); + + /* We create the vertices */ for(i = targets; i; i = i->next) { - newtargs = alpm_list_add(newtargs, i->data); - numtargs++; + pmgraph_t *vertex = _alpm_graph_new(); + vertex->data = (void *)i->data; + vertices = alpm_list_add(vertices, vertex); } - maxscans = (int)sqrt(numtargs); + /* We compute the edges */ + for(i = vertices; i; i = i->next) { + pmgraph_t *vertex_i = i->data; + pmpkg_t *p_i = vertex_i->data; + //TODO: this should be somehow combined with _alpm_checkdeps + for(j = vertices; j; j = j->next) { + pmgraph_t *vertex_j = j->data; + pmpkg_t *p_j = vertex_j->data; + int son=0; + for(k = alpm_pkg_get_depends(p_i); k && !son; k = k->next) { + pmdepend_t *depend = alpm_splitdep(k->data); + son = alpm_depcmp(p_j, depend); + free(depend); + } + if(son) vertex_i->sons=alpm_list_add(vertex_i->sons, vertex_j); + } + vertex_i->sonptr=vertex_i->sons; + } - _alpm_log(PM_LOG_DEBUG, _("started sorting dependencies")); - while(change) { - alpm_list_t *tmptargs = NULL; - change = 0; - if(numscans > maxscans) { - _alpm_log(PM_LOG_DEBUG, _("possible dependency cycle detected")); - continue; + vptr = vertices; + vertex = vertices->data; + while(vptr) { + vertex->state = -1; // we touched this vertex + int found = 0; + while(vertex->sonptr && !found) { + pmgraph_t *nextson = (vertex->sonptr)->data; + vertex->sonptr = (vertex->sonptr)->next; + if (nextson->state == 0) { + found = 1; + nextson->father = vertex; + vertex = nextson; + } + else if(nextson->state == -1) { + _alpm_log(PM_LOG_WARNING, _("dependency cycle detected\n")); + } } - numscans++; - /* run thru targets, moving up packages as necessary */ - for(i = newtargs; i; i = i->next) { - pmpkg_t *p = i->data; - _alpm_log(PM_LOG_DEBUG, " sorting %s", alpm_pkg_get_name(p)); - for(j = alpm_pkg_get_depends(p); j; j = j->next) { - pmdepend_t *depend = alpm_splitdep(j->data); - pmpkg_t *q = NULL; - if(depend == NULL) { - continue; - } - /* look for depend->name -- if it's farther down in the list, then - * move it up above p - */ - for(k = i->next; k; k = k->next) { - q = k->data; - const char *qname = alpm_pkg_get_name(q); - if(!strcmp(depend->name, qname)) { - if(!_alpm_pkg_find(qname, tmptargs)) { - change = 1; - tmptargs = alpm_list_add(tmptargs, q); - } - break; - } - for(l = alpm_pkg_get_provides(q); l; l = l->next) { - const char *provname = l->data; - if(!strcmp(depend->name, provname)) { - if(!_alpm_pkg_find(qname, tmptargs)) { - change = 1; - tmptargs = alpm_list_add(tmptargs, q); - } - break; - } - } + if(!found) { + newtargs = alpm_list_add(newtargs, vertex->data); + vertex->state = 1; //we leave this vertex + vertex = vertex->father; + if(!vertex) { + while(vptr = vptr->next) { + vertex = vptr->data; + if (vertex->state == 0) break; } - free(depend); - } - if(!_alpm_pkg_find(alpm_pkg_get_name(p), tmptargs)) { - tmptargs = alpm_list_add(tmptargs, p); } } - alpm_list_free(newtargs); - newtargs = tmptargs; } + _alpm_log(PM_LOG_DEBUG, _("sorting dependencies finished")); if(mode == PM_TRANS_TYPE_REMOVE) { @@ -189,6 +207,8 @@ alpm_list_t *_alpm_sortbydeps(alpm_list_t *targets, pmtranstype_t mode) newtargs = tmptargs; } + alpm_list_free_inner(vertices, _alpm_graph_free); + return(newtargs); } diff --git a/lib/libalpm/deps.h b/lib/libalpm/deps.h index 8f3a9b9..56028a4 100644 --- a/lib/libalpm/deps.h +++ b/lib/libalpm/deps.h @@ -42,6 +42,18 @@ struct __pmdepmissing_t { pmdepend_t depend; }; +/* Graphs */ +struct __pmgraph_t { + int state; //0: untouched, -1: entered, other: leaving time + void *data; + struct __pmgraph_t *father; //where did we come from? + alpm_list_t *sons; + alpm_list_t *sonptr; // points to a son in the sons list +}; + +pmgraph_t *_alpm_graph_new(void); +void _alpm_graph_free(pmgraph_t *graph); + pmdepmissing_t *_alpm_depmiss_new(const char *target, pmdeptype_t type, pmdepmod_t depmod, const char *depname, const char *depversion); diff --git a/src/pacman/Makefile.am b/src/pacman/Makefile.am index 6a7f4e4..1ab5916 100644 --- a/src/pacman/Makefile.am +++ b/src/pacman/Makefile.am @@ -26,11 +26,11 @@ pacman_SOURCES = \ util.h util.c pacman_LDADD = $(top_builddir)/lib/libalpm/.libs/libalpm.la \ - -ldownload + -ldownload -lm pacman_static_SOURCES = $(pacman_SOURCES) pacman_static_LDFLAGS = $(LDFLAGS) -all-static pacman_static_LDADD = $(top_builddir)/lib/libalpm/.libs/libalpm.la \ - -ldownload + -ldownload -lm # vim:set ts=2 sw=2 noet:
2007/6/10, Nagy Gabor <ngaba@petra.hos.u-szeged.hu>:
Hi! Some comments: 1. This is a simple topological sort algorithm, based on the "depth first search" algorithm, for more info plz visit: http://en.wikipedia.org/wiki/Topological_sorting (An alternative algorithm for...) Usually this is done by a recursive way but I not prefer that, because this version is more expressive to me (however this looks more difficult, because we need some extra variables to precisely store the current state). 2. As I noted in the patch (TODO), the most CPU-cost part of the algorithm is to build the graph, so this should somehow combined with depcheck (because that function does this too...) So, here is the patch:
Thanks for submitting this again, I love it :) I really didn't like the sqrt(n) speed hack which produced totally wrong result. btw look at this : http://bugs.archlinux.org/task/7229 Anyway, also applied it to my tree : http://chantry.homelinux.org/~xav/git/gitweb.cgi?p=pacman.git;a=commit;h=9bf... Thanks
On 6/10/07, Nagy Gabor <ngaba@petra.hos.u-szeged.hu> wrote:
diff --git a/lib/libalpm/deps.h b/lib/libalpm/deps.h index 8f3a9b9..56028a4 100644 --- a/lib/libalpm/deps.h +++ b/lib/libalpm/deps.h @@ -42,6 +42,18 @@ struct __pmdepmissing_t { pmdepend_t depend; };
+/* Graphs */ +struct __pmgraph_t { + int state; //0: untouched, -1: entered, other: leaving time + void *data; + struct __pmgraph_t *father; //where did we come from? + alpm_list_t *sons; + alpm_list_t *sonptr; // points to a son in the sons list
For consistancy with the rest of the libalpm/pacman code, please make sure you are using only the /* */ comment style and not //. I'm fixing it now, but just a FYI. I'm also renaming sons and father to child and parent, respectively, to be a bit more gender-friendly. Don't want to upset the female packages, now do we? -Dan
Cool. From my point of view I could argue that the recursive algorithm would be more expressive and obvious because I find tree/DAG recursive functions more natural. However from an efficiency standpoint a topological sort is a simple (breadth first) postfix tree traversal O(n) whereas your algorithm is at least O(N^2). But it's better than the previous algo. :) k On Sun, 2007-06-10 at 23:08 +0200, Nagy Gabor wrote:
Hi! Some comments: 1. This is a simple topological sort algorithm, based on the "depth first search" algorithm, for more info plz visit: http://en.wikipedia.org/wiki/Topological_sorting (An alternative algorithm for...) Usually this is done by a recursive way but I not prefer that, because this version is more expressive to me (however this looks more difficult, because we need some extra variables to precisely store the current state). 2. As I noted in the patch (TODO), the most CPU-cost part of the algorithm is to build the graph, so this should somehow combined with depcheck (because that function does this too...) So, here is the patch: ---------------------- diff --git a/lib/libalpm/Makefile.am b/lib/libalpm/Makefile.am index 29a2f02..afbfed0 100644 --- a/lib/libalpm/Makefile.am +++ b/lib/libalpm/Makefile.am @@ -43,7 +43,7 @@ libalpm_la_SOURCES = \ versioncmp.h versioncmp.c
libalpm_la_LDFLAGS = -no-undefined -version-info $(LIB_VERSION_INFO) -libalpm_la_LIBADD = -larchive -ldownload -lm +libalpm_la_LIBADD = -larchive -ldownload
if HAS_DOXYGEN all: doxygen.in diff --git a/lib/libalpm/alpm.h b/lib/libalpm/alpm.h index d4c584f..e6c905f 100644 --- a/lib/libalpm/alpm.h +++ b/lib/libalpm/alpm.h @@ -50,6 +50,7 @@ typedef struct __pmsyncpkg_t pmsyncpkg_t; typedef struct __pmdepend_t pmdepend_t; typedef struct __pmdepmissing_t pmdepmissing_t; typedef struct __pmconflict_t pmconflict_t; +typedef struct __pmgraph_t pmgraph_t;
/* * Library diff --git a/lib/libalpm/deps.c b/lib/libalpm/deps.c index b6f8cc8..ceee5c1 100644 --- a/lib/libalpm/deps.c +++ b/lib/libalpm/deps.c @@ -29,7 +29,6 @@ #ifdef __sun__ #include <strings.h> #endif -#include <math.h>
/* libalpm */ #include "deps.h" @@ -46,6 +45,27 @@
extern pmhandle_t *handle;
+pmgraph_t *_alpm_graph_new(void) +{ + pmgraph_t *graph = NULL; + + graph = (pmgraph_t *)malloc(sizeof(pmgraph_t)); + if(graph) { + graph->state = 0; + graph->data = NULL; + graph->father = NULL; + graph->sons = NULL; + graph->sonptr = NULL; + } + return(graph); +} + +void _alpm_graph_free(pmgraph_t *graph) +{ + alpm_list_free(graph->sons); + free(graph); +} + pmdepmissing_t *_alpm_depmiss_new(const char *target, pmdeptype_t type, pmdepmod_t depmod, const char *depname, const char *depversion) @@ -108,11 +128,10 @@ int _alpm_depmiss_isin(pmdepmissing_t *needle, alpm_list_t *haystack) alpm_list_t *_alpm_sortbydeps(alpm_list_t *targets, pmtranstype_t mode) { alpm_list_t *newtargs = NULL; - alpm_list_t *i, *j, *k, *l; - int change = 1; - int numscans = 0; - int numtargs = 0; - int maxscans; + alpm_list_t *i, *j, *k; + alpm_list_t *vertices = NULL; + alpm_list_t *vptr; + pmgraph_t *vertex;
ALPM_LOG_FUNC;
@@ -120,65 +139,64 @@ alpm_list_t *_alpm_sortbydeps(alpm_list_t *targets, pmtranstype_t mode) return(NULL); }
+ _alpm_log(PM_LOG_DEBUG, _("started sorting dependencies")); + + /* We create the vertices */ for(i = targets; i; i = i->next) { - newtargs = alpm_list_add(newtargs, i->data); - numtargs++; + pmgraph_t *vertex = _alpm_graph_new(); + vertex->data = (void *)i->data; + vertices = alpm_list_add(vertices, vertex); }
- maxscans = (int)sqrt(numtargs); + /* We compute the edges */ + for(i = vertices; i; i = i->next) { + pmgraph_t *vertex_i = i->data; + pmpkg_t *p_i = vertex_i->data; + //TODO: this should be somehow combined with _alpm_checkdeps + for(j = vertices; j; j = j->next) { + pmgraph_t *vertex_j = j->data; + pmpkg_t *p_j = vertex_j->data; + int son=0; + for(k = alpm_pkg_get_depends(p_i); k && !son; k = k->next) { + pmdepend_t *depend = alpm_splitdep(k->data); + son = alpm_depcmp(p_j, depend); + free(depend); + } + if(son) vertex_i->sons=alpm_list_add(vertex_i->sons, vertex_j); + } + vertex_i->sonptr=vertex_i->sons; + }
- _alpm_log(PM_LOG_DEBUG, _("started sorting dependencies")); - while(change) { - alpm_list_t *tmptargs = NULL; - change = 0; - if(numscans > maxscans) { - _alpm_log(PM_LOG_DEBUG, _("possible dependency cycle detected")); - continue; + vptr = vertices; + vertex = vertices->data; + while(vptr) { + vertex->state = -1; // we touched this vertex + int found = 0; + while(vertex->sonptr && !found) { + pmgraph_t *nextson = (vertex->sonptr)->data; + vertex->sonptr = (vertex->sonptr)->next; + if (nextson->state == 0) { + found = 1; + nextson->father = vertex; + vertex = nextson; + } + else if(nextson->state == -1) { + _alpm_log(PM_LOG_WARNING, _("dependency cycle detected\n")); + } } - numscans++; - /* run thru targets, moving up packages as necessary */ - for(i = newtargs; i; i = i->next) { - pmpkg_t *p = i->data; - _alpm_log(PM_LOG_DEBUG, " sorting %s", alpm_pkg_get_name(p)); - for(j = alpm_pkg_get_depends(p); j; j = j->next) { - pmdepend_t *depend = alpm_splitdep(j->data); - pmpkg_t *q = NULL; - if(depend == NULL) { - continue; - } - /* look for depend->name -- if it's farther down in the list, then - * move it up above p - */ - for(k = i->next; k; k = k->next) { - q = k->data; - const char *qname = alpm_pkg_get_name(q); - if(!strcmp(depend->name, qname)) { - if(!_alpm_pkg_find(qname, tmptargs)) { - change = 1; - tmptargs = alpm_list_add(tmptargs, q); - } - break; - } - for(l = alpm_pkg_get_provides(q); l; l = l->next) { - const char *provname = l->data; - if(!strcmp(depend->name, provname)) { - if(!_alpm_pkg_find(qname, tmptargs)) { - change = 1; - tmptargs = alpm_list_add(tmptargs, q); - } - break; - } - } + if(!found) { + newtargs = alpm_list_add(newtargs, vertex->data); + vertex->state = 1; //we leave this vertex + vertex = vertex->father; + if(!vertex) { + while(vptr = vptr->next) { + vertex = vptr->data; + if (vertex->state == 0) break; } - free(depend); - } - if(!_alpm_pkg_find(alpm_pkg_get_name(p), tmptargs)) { - tmptargs = alpm_list_add(tmptargs, p); } } - alpm_list_free(newtargs); - newtargs = tmptargs; } + _alpm_log(PM_LOG_DEBUG, _("sorting dependencies finished"));
if(mode == PM_TRANS_TYPE_REMOVE) { @@ -189,6 +207,8 @@ alpm_list_t *_alpm_sortbydeps(alpm_list_t *targets, pmtranstype_t mode) newtargs = tmptargs; }
+ alpm_list_free_inner(vertices, _alpm_graph_free); + return(newtargs); }
diff --git a/lib/libalpm/deps.h b/lib/libalpm/deps.h index 8f3a9b9..56028a4 100644 --- a/lib/libalpm/deps.h +++ b/lib/libalpm/deps.h @@ -42,6 +42,18 @@ struct __pmdepmissing_t { pmdepend_t depend; };
+/* Graphs */ +struct __pmgraph_t { + int state; //0: untouched, -1: entered, other: leaving time + void *data; + struct __pmgraph_t *father; //where did we come from? + alpm_list_t *sons; + alpm_list_t *sonptr; // points to a son in the sons list +}; + +pmgraph_t *_alpm_graph_new(void); +void _alpm_graph_free(pmgraph_t *graph); + pmdepmissing_t *_alpm_depmiss_new(const char *target, pmdeptype_t type, pmdepmod_t depmod, const char *depname, const char *depversion); diff --git a/src/pacman/Makefile.am b/src/pacman/Makefile.am index 6a7f4e4..1ab5916 100644 --- a/src/pacman/Makefile.am +++ b/src/pacman/Makefile.am @@ -26,11 +26,11 @@ pacman_SOURCES = \ util.h util.c
pacman_LDADD = $(top_builddir)/lib/libalpm/.libs/libalpm.la \ - -ldownload + -ldownload -lm
pacman_static_SOURCES = $(pacman_SOURCES) pacman_static_LDFLAGS = $(LDFLAGS) -all-static pacman_static_LDADD = $(top_builddir)/lib/libalpm/.libs/libalpm.la \ - -ldownload + -ldownload -lm
# vim:set ts=2 sw=2 noet:
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev
On 6/12/07, Kevin Piche <kevin.piche@cgi.com> wrote:
Cool. From my point of view I could argue that the recursive algorithm would be more expressive and obvious because I find tree/DAG recursive functions more natural. However from an efficiency standpoint a topological sort is a simple (breadth first) postfix tree traversal O(n) whereas your algorithm is at least O(N^2). But it's better than the previous algo. :)
k
By the way, this landed in the main tree here: http://projects.archlinux.org/git/?p=pacman.git;a=commitdiff;h=544bcbe6641bb... We'll always take rewrites to make it more efficient. :P -Dan
I also find recursive algorithm more natural for recursive structure. Maybe Nagy is the only one who doesn't :) But as far as I can tell, this is a correct implementation of the topo sort, so O(n+e) for the sort part. Did you also include the computation of the edges, which is O(n^2), or is there something else that I overlooked ? This was already observed by Nagy here : http://archlinux.org/pipermail/pacman-dev/2007-April/008135.html Though when I first looked at this part, I find it odd, because O(n^2) shouldn't be needed there either, it's just a matter of : for t in targets do for d in deps(t) do add(d, t.children) But the problem is we don't have constant time access to the vertices.
I got the O(N^2) from the edge computation. If the package hierarchy could be stored as a forest of trees or DAGs then the topo sort would be O(N). A straight tree/DAG implementation would be an interesting project if I had the time. k On Tue, 2007-06-12 at 19:53 +0200, Xavier wrote:
I also find recursive algorithm more natural for recursive structure. Maybe Nagy is the only one who doesn't :) But as far as I can tell, this is a correct implementation of the topo sort, so O(n+e) for the sort part. Did you also include the computation of the edges, which is O(n^2), or is there something else that I overlooked ? This was already observed by Nagy here : http://archlinux.org/pipermail/pacman-dev/2007-April/008135.html
Though when I first looked at this part, I find it odd, because O(n^2) shouldn't be needed there either, it's just a matter of : for t in targets do for d in deps(t) do add(d, t.children) But the problem is we don't have constant time access to the vertices.
_______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://archlinux.org/mailman/listinfo/pacman-dev
participants (4)
-
Dan McGee
-
Kevin Piche
-
Nagy Gabor
-
Xavier