[pacman-dev] [patch] alpm_removedeps bug+fix && alpm_depcmp-discussion
Hi! I attached the pactest file and my patch. I also made some modifications in alpm_removedeps' args, because the old one was a bit messy for me. And I think that my version is faster. can_remove_package is a bit stupid (compared to alpm_checkdeps), but I leave it untouched, because speed is more important here than "perfection". As you see, this patch contains my usual "for (pkg in pkgcache) if(alpm_depcmp(pkg, depend))..." stuff. We should discuss something now: This method has a drawback sometimes in speed: This method implicitly reads the pkgcache with full pkginfo, because alpm_depcmp needs provisions. If we've already done this (or we need to do this later) or we must collect ALL dependency satisfier in pkgcache this is not a problem. However, for example if we just want to know if there is ANY package in pkgcache which satisfies a dependency (something similar to can_remove_package without using requiredby fields), then FIRST check for package names THEN for providers is probably faster, if the pkgcache is loaded with pkgname-pkgver info only: because we have great chance to find a satisfier pkgname-pkgver pair and we can stop, we can answer the question without reading the ugly PKGINFO files for providers. Bye, ngaba ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
2007/6/18, ngaba@petra.hos.u-szeged.hu <ngaba@petra.hos.u-szeged.hu>:
As you see, this patch contains my usual "for (pkg in pkgcache) if(alpm_depcmp(pkg, depend))..." stuff. We should discuss something now: This method has a drawback sometimes in speed: This method implicitly reads the pkgcache with full pkginfo, because alpm_depcmp needs provisions. If we've already done this (or we need to do this later) or we must collect ALL dependency satisfier in pkgcache this is not a problem. However, for example if we just want to know if there is ANY package in pkgcache which satisfies a dependency (something similar to can_remove_package without using requiredby fields), then FIRST check for package names THEN for providers is probably faster, if the pkgcache is loaded with pkgname-pkgver info only: because we have great chance to find a satisfier pkgname-pkgver pair and we can stop, we can answer the question without reading the ugly PKGINFO files for providers.
Hmm, maybe this is an interesting idea, I don't know, but can't we decide about this later ? Once all this repetitive code is moved to one function, then it'll probably be much easier to see the benefit of only looking at providers in a second time. Sorry if I didn't understand or overlooked something :) Thanks
On 6/18/07, ngaba@petra.hos.u-szeged.hu <ngaba@petra.hos.u-szeged.hu> wrote:
Hi!
I attached the pactest file and my patch. I also made some modifications in alpm_removedeps' args, because the old one was a bit messy for me. And I think that my version is faster.
can_remove_package is a bit stupid (compared to alpm_checkdeps), but I leave it untouched, because speed is more important here than "perfection".
IMO, doing things right is what we want to do in 99% of the cases this speed vs. perfection issue comes up, so I'm going to disagree with you here. In addition, to assume anything only makes an ass out of u (you) and me. I don't want to introduce any functions that depend on a particular quirk of our list implementation, especially since this is one of the things we have considered changing. I feel that if we look at the bigger picture and stop thinking of add, remove, and sync as isolated actions, then we could get some HUGE refactoring and code reduction going on with cases like this. However, this will be one large undertaking. -Dan
2007/6/18, Dan McGee <dpmcgee@gmail.com>:
On 6/18/07, ngaba@petra.hos.u-szeged.hu <ngaba@petra.hos.u-szeged.hu> wrote:
Hi!
I attached the pactest file and my patch. I also made some modifications in alpm_removedeps' args, because the old one was a bit messy for me. And I think that my version is faster.
can_remove_package is a bit stupid (compared to alpm_checkdeps), but I leave it untouched, because speed is more important here than "perfection".
IMO, doing things right is what we want to do in 99% of the cases this speed vs. perfection issue comes up, so I'm going to disagree with you here.
It looks like checkdeps could indeed be used here. There is also the install reason which comes into play here though. This problem was mentioned here : http://bugs.archlinux.org/task/3241 Aaron didn't see any reason for checking that the packages removed by -Rs are only dependencies, but I believe there is one, as I tried to explain in my comment.
In addition, to assume anything only makes an ass out of u (you) and me. I don't want to introduce any functions that depend on a particular quirk of our list implementation, especially since this is one of the things we have considered changing.
What about just renaming alpm_list_add to alpm_list_add_last ? That way, there isn't any assumption, it's explicit :)
I feel that if we look at the bigger picture and stop thinking of add, remove, and sync as isolated actions, then we could get some HUGE refactoring and code reduction going on with cases like this. However, this will be one large undertaking.
There are often several ways to do things right, and we probably don't all see the same, so it would help if it was a bit more concrete :) You are probably the only one who can get things exactly like you want, unless maybe you explain it in details.
2007/6/18, Dan McGee <dpmcgee@gmail.com>:
In addition, to assume anything only makes an ass out of u (you) and me. I don't want to introduce any functions that depend on a particular quirk of our list implementation, especially since this is one of the things we have considered changing.
To Nagy : I don't even understand why this assumption is needed : * assumptions: alpm_list_add adds new member to the end of the list, so we can reuse the list pointer As far as I can tell, this isn't needed, so please explain more in detail :)
To Nagy : I don't even understand why this assumption is needed : * assumptions: alpm_list_add adds new member to the end of the list, so we can reuse the list pointer
As far as I can tell, this isn't needed, so please explain more in detail :)
OK, you are right. To be perfect, this is not needed, but this cause a big performance boost: If this assumption is true, you can simply leave the "ready" variable and the "while(!ready)" stuff, because every dependencies will be found in the first loop (ready is just a safety variable now...). This wonderful fact proves the speed-up IMHO. However, if we want to keep the nice feature that this function keeps topo sort, this is needed. However, the algorithm (expected to) works fine when input list is not topo sorted (I put it in the comment just as a suggestion, because topo sort is slow, and small list's topo-sort is faster). Anyway, this assumption is _used_ in alpm_resolvedeps, so I put that comment to the patch as a reminder. Bye, ngaba PS: It would be nice if someone created "-Rs" pactests to test this patch. PS2: Xavier, sry for replying to your mail address earlier ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
2007/6/19, ngaba@petra.hos.u-szeged.hu <ngaba@petra.hos.u-szeged.hu>:
To Nagy : I don't even understand why this assumption is needed : * assumptions: alpm_list_add adds new member to the end of the list, so we can reuse the list pointer
As far as I can tell, this isn't needed, so please explain more in detail :)
OK, you are right. To be perfect, this is not needed, but this cause a big performance boost: If this assumption is true, you can simply leave the "ready" variable and the "while(!ready)" stuff, because every dependencies will be found in the first loop (ready is just a safety variable now...). This wonderful fact proves the speed-up IMHO. However, if we want to keep the nice feature that this function keeps topo sort, this is needed. However, the algorithm (expected to) works fine when input list is not topo sorted (I put it in the comment just as a suggestion, because topo sort is slow, and small list's topo-sort is faster).
Ok that's why I thought, that assumption would be needed only if the while (!ready) wasn't there. That wasn't clear at all when reading your comments in the patch :) Anyway, it took me a while to figure out how it could still be correct when removing this while loop, but as you said, this would rely on the list being topo sorted. I believe that if the list was sorted, it would be correct indeed.
Anyway, this assumption is _used_ in alpm_resolvedeps, so I put that comment to the patch as a reminder.
As I already said, I didn't understand this part yet, but just because it's done here doesn't mean it's good. (and Dan said it was wrong) I thought you already knew that all pacman code wasn't 100% perfect, otherwise, you wouldn't want to change it, would you? ;)
I don't want to introduce any functions that depend on a particular quirk of our list implementation, especially since this is one of the things we have considered changing.
Give a look at alpm_resolvedeps;-) PS: Dan, sry for reply to your mail address earlier. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
2007/6/19, ngaba@petra.hos.u-szeged.hu <ngaba@petra.hos.u-szeged.hu>:
Give a look at alpm_resolvedeps;-)
I still don't understand this function :D Like the role of the arguments for beginning. So I wouldn't know if this relied on list_add adding to the end :) If it really does, I'm not sure that's a good idea. If one part of the code relies on where the item is inserted in the list, I think it would be much better if the function used for inserting was explicit (with the function name for example, but there are probably other ways). And if it's not a good idea, it shouldn't be done elsewhere, just because it's done in this function. This resolvedeps function should be fixed instead.
IMO, doing things right is what we want to do in 99% of the cases this speed vs. perfection issue comes up, so I'm going to disagree with you here. I have the same opinion usually (see my earlier sortbydeps complaints .-). But now we just search for removable dependencies, if we miss some pathological cases this is not a big problem, the database won't be corrupted, just we don't remove a dependency which we could etc. Let me explain: I meant in my previous mail that we say that we cannot remove a package bacause it is needed by an other one without checking that it cannot be "replaced" by an other installed one (see alpm_checkdeps && multiple provision). But this "fault" is _very_ rare and this extra check would cause _notable_ slowdown, because can_remove_package is (/was?) called "often". Bye, ngaba
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
2007/6/19, ngaba@petra.hos.u-szeged.hu <ngaba@petra.hos.u-szeged.hu>:
I have the same opinion usually (see my earlier sortbydeps complaints .-). But now we just search for removable dependencies, if we miss some pathological cases this is not a big problem, the database won't be corrupted, just we don't remove a dependency which we could etc. Let me explain: I meant in my previous mail that we say that we cannot remove a package bacause it is needed by an other one without checking that it cannot be "replaced" by an other installed one (see alpm_checkdeps && multiple provision). But this "fault" is _very_ rare and this extra check would cause _notable_ slowdown, because can_remove_package is (/was?) called "often".
I agree that this function is much less critical, and that multiple provisions are pretty rare (is there any real case?). This is not a reason for not getting it right though. About notable slowdown, I don't know, are you sure about this ?
participants (3)
-
Dan McGee
-
ngaba@petra.hos.u-szeged.hu
-
Xavier