[pacman-dev] alpm_list: missing quick access to last element
Hi, I'm back to pacman-dev, and just just found this out profiling pacman-cvs (pacman -Ss test): ... ----------------------------------------------- 0.12 0.00 13785/13785 alpm_list_add [7] [8] 42.9 0.12 0.00 13785 alpm_list_last [8] ----------------------------------------------- ... alpm_list_last is eating up 42.9% CPU-time on my slow epia system. This is due the missing "last element hint" which speeded up the add operation before aaron introduced the public alpm_list type two months ago. I consider adding the "last element" member again. Any objections? Jürgen
On 3/4/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
Hi,
I'm back to pacman-dev, and just just found this out profiling pacman-cvs (pacman -Ss test):
... ----------------------------------------------- 0.12 0.00 13785/13785 alpm_list_add [7] [8] 42.9 0.12 0.00 13785 alpm_list_last [8] ----------------------------------------------- ...
alpm_list_last is eating up 42.9% CPU-time on my slow epia system. This is due the missing "last element hint" which speeded up the add operation before aaron introduced the public alpm_list type two months ago. I consider adding the "last element" member again. Any objections?
Yeah, something like this definitely needs to be done. HOWEVER, let's hold off until post-3.0.0 release. We are in bugfix-only mode for the time being, and this is more than a bugfix. If you want to patch it locally, then we can probably get it into 3.0.1, but I really want to stick with this bugfix-only mentality otherwise we will just keep finding things we want to fix and never stay quite stable enough to release. By the way, we are thinking of changing some of the data structures used, and we have at least glanced at these things: kernel-style linked lists <http://isis.poly.edu/kulesh/stuff/src/klist/>, a hash-table package cache, and something along this line- <http://tech.inhelsinki.nl/d_store/>. -Dan
On Sun, Mar 04, 2007 at 08:40:38PM -0500, Dan McGee wrote:
On 3/4/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
Hi,
I'm back to pacman-dev, and just just found this out profiling pacman-cvs (pacman -Ss test):
... ----------------------------------------------- 0.12 0.00 13785/13785 alpm_list_add [7] [8] 42.9 0.12 0.00 13785 alpm_list_last [8] ----------------------------------------------- ...
alpm_list_last is eating up 42.9% CPU-time on my slow epia system. This is due the missing "last element hint" which speeded up the add operation before aaron introduced the public alpm_list type two months ago. I consider adding the "last element" member again. Any objections?
Yeah, something like this definitely needs to be done. HOWEVER, let's hold off until post-3.0.0 release. We are in bugfix-only mode for the time being, and this is more than a bugfix. If you want to patch it locally, then we can probably get it into 3.0.1, but I really want to stick with this bugfix-only mentality otherwise we will just keep finding things we want to fix and never stay quite stable enough to release.
By the way, we are thinking of changing some of the data structures used, and we have at least glanced at these things: kernel-style linked lists <http://isis.poly.edu/kulesh/stuff/src/klist/> hash-table package cache, and something along this line-
I have looked into kernel-style linked lists. I like the implementation but it has the same shortcomings like the current implementation (it doesn't allow easy access to the last element, required by frequent append operations in pacmans code). Also Judds previous implementation (using the last element in every member) is a little bit error-prone, because it doesn't distinguish between the list and list elements (think of empty lists). I prefer an implementation using sentinel nodes (storing pointer to first and last element). Jürgen
On 3/4/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
due the missing "last element hint" which speeded up the add operation before aaron introduced the public alpm_list type two months ago. I consider adding the "last element" member again. Any objections?
Actually, that's probably a bad idea. It's almost trivial to convert the list to a circular one. Then alpm_list_last(foo) would just be foo->prev; There should never be a case when this would cause a problem (i.e. some sort of "scan backward until ->prev is null" function would only be useful in the case of bad design).
When i understood the current pacman code, there is a cycle check, therefore i don't see a reason why ->last should not be provided. Though, i don't completely understand when you build a linear depend list why you should want to step backwards anyway, especially not in Ss mode. On 3/5/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On 3/4/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
due the missing "last element hint" which speeded up the add operation before aaron introduced the public alpm_list type two months ago. I consider adding the "last element" member again. Any objections?
Actually, that's probably a bad idea. It's almost trivial to convert the list to a circular one. Then alpm_list_last(foo) would just be foo->prev;
There should never be a case when this would cause a problem (i.e. some sort of "scan backward until ->prev is null" function would only be useful in the case of bad design). _______________________________________________ pacman-dev mailing list pacman-dev@archlinux.org http://www.archlinux.org/mailman/listinfo/pacman-dev
On Sun, Mar 04, 2007 at 10:15:44PM -0600, Aaron Griffin wrote:
On 3/4/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
due the missing "last element hint" which speeded up the add operation before aaron introduced the public alpm_list type two months ago. I consider adding the "last element" member again. Any objections?
Actually, that's probably a bad idea. It's almost trivial to convert the list to a circular one. Then alpm_list_last(foo) would just be foo->prev;
This is the way Judd implemented it: He needed fast access to the last element and also the "prev" element to allow fast inserts. So he used non-circular double linked lists and the "last element hint".
There should never be a case when this would cause a problem (i.e. some sort of "scan backward until ->prev is null" function would only be useful in the case of bad design).
Circular lists are different. There is NO end of the list. consider alpm_list_find: it will loop forever ;) pacmans implementation expects "non-circular" lists everywhere. Just an example: for(lp = info->files; lp; lp = lp->next) { fprintf(fp, "%s\n", (char *)lp->data); } Jürgen
On 3/5/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
Circular lists are different. There is NO end of the list. consider alpm_list_find: it will loop forever ;) pacmans implementation expects "non-circular" lists everywhere. Just an example:
for(lp = info->files; lp; lp = lp->next) { fprintf(fp, "%s\n", (char *)lp->data); }
True. Using circular lists would cause a bit of a problem here. The only problem with the ->last implementation is that it's a mostly NULL pointer. Generally, if something is NULL for an entire list, except for one node, there's probably a better way to do it. For instance, if we go with the fairly typical: struct list_node { struct list_node *next, *prev; void *data; }; struct list { struct list_node *head, *tail; }; It's easy to maintain the list head and tail in the main list struct, and not pay the price for hundreds of NULL pointers. Alternatively, adding entries at the beginning of the list would be nearly as good, as there is no requirement that says a standard list needs to be in any given order. (unless add_sorted is used, in which case the ->last optimization is moot as well). I guess my point is that there are hundreds of options here. I think we should think it through rather than putting the ->last pointer back in there.
On 3/5/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On 3/5/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
Circular lists are different. There is NO end of the list. consider alpm_list_find: it will loop forever ;) pacmans implementation expects "non-circular" lists everywhere. Just an example:
I guess my point is that there are hundreds of options here. I think we should think it through rather than putting the ->last pointer back in there.
Can someone point me to where this ->last pointer exists? I couldn't find it in CVS (but only looked briefly). Is it in pacman 2.9.8? -Dan
On 3/5/07, Dan McGee <dpmcgee@gmail.com> wrote:
On 3/5/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On 3/5/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
Circular lists are different. There is NO end of the list. consider alpm_list_find: it will loop forever ;) pacmans implementation expects "non-circular" lists everywhere. Just an example:
I guess my point is that there are hundreds of options here. I think we should think it through rather than putting the ->last pointer back in there.
Can someone point me to where this ->last pointer exists? I couldn't find it in CVS (but only looked briefly). Is it in pacman 2.9.8?
It's in the CVS Attic: http://cvs.archlinux.org/cgi-bin/viewcvs.cgi/lib/libalpm/Attic/list.c?cvsroo...
On Mon, Mar 05, 2007 at 09:56:43AM -0600, Aaron Griffin wrote:
On 3/5/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
Circular lists are different. There is NO end of the list. consider alpm_list_find: it will loop forever ;) pacmans implementation expects "non-circular" lists everywhere. Just an example:
for(lp = info->files; lp; lp = lp->next) { fprintf(fp, "%s\n", (char *)lp->data); }
True. Using circular lists would cause a bit of a problem here. The only problem with the ->last implementation is that it's a mostly NULL pointer. Generally, if something is NULL for an entire list, except for one node, there's probably a better way to do it. For instance, if we go with the fairly typical: struct list_node { struct list_node *next, *prev; void *data; }; struct list { struct list_node *head, *tail; };
Yes, "struct list" is like the sentinel node i proposed in my previous post. I'm quite impressed by the simple and elegant "Linux Kernel Linked Lists" (http://isis.poly.edu/kulesh/stuff/src/klist/ proposed by Dan) and currently merge them with ALPM's code into a new implementation using sentinel nodes. I will post details soon. Jürgen
On 3/5/07, Jürgen Hötzel <juergen@hoetzel.info> wrote:
I'm quite impressed by the simple and elegant "Linux Kernel Linked Lists" (http://isis.poly.edu/kulesh/stuff/src/klist/ proposed by Dan) and currently merge them with ALPM's code into a new implementation using sentinel nodes. I will post details soon.
Yes the kernel implementation is REALLY nice. I always loved it. I like that this guy actually took the extra step to remove the hardware prefetching from the actual kernel header. I also like the type safety it provides. That is a list of "pmpkg_t" structs enforces the type. It does not require void* aliasing. While it's unimportant really, the kernel implementation only uses 2 pointers per node, whereas a typical implementation of the same thing (like ours) requires two. I find it neat that they were able to do that. Go go kernel efficiency. This would, however, require some hefty changes but would make the code better in the long run (we will ALWAYS know the type of the lists).
On 3/5/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
whereas a typical implementation of the same thing (like ours) requires two.
requires THREE*
On Mon, Mar 05, 2007 at 01:14:27PM -0600, Aaron Griffin wrote:
On 3/5/07, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
whereas a typical implementation of the same thing (like ours) requires two.
requires THREE*
Well, this is because the list_header is embedded in the data structures. This requires us to change every pacman struct to include the list_header (if it used by the list functions). List callback function like alpm_list_fn_cmp will still require *void casting. There are other areas where libpacman is quite inefficient (in respect of memory usage). Consider the fixed size preallocated strings in pmpkg_t (sizeof(pmpkg_t) == 1948). Memory footprint of pacman reading current/extra/community requires currently 10M (desc files on disc allocate 0.8M). Jürgen
participants (4)
-
Aaron Griffin
-
Dan McGee
-
Georg Grabler
-
Jürgen Hötzel