[pacman-dev] resolvedeps is negligent: resolvedeps.py
Hi! This is a not-to-commit pactest, I just refer to this old thread...: http://www.archlinux.org/pipermail/pacman-dev/2007-August/009072.html I don't know whether we should fix this or not... This is not a critical bug, just resolvedeps is a bit negligent, but checkdeps will report its negligence (however, pacman doesn't seem to be very informative...). By fixing this the code would be cleaner, more clever, but _notably_ slower, so I still don't know what to do... What do you think? ---resolvedeps.py--- self.description = "resolvedeps test" sp1 = pmpkg("pkg1") sp1.depends = [ "pkg2>=2.0-1" , "provision" ] self.addpkg2db("sync", sp1) sp2 = pmpkg("pkg2", "2.0-1") self.addpkg2db("sync", sp2) sp3 = pmpkg("provision") self.addpkg2db("sync", sp3) lp = pmpkg("pkg2", "1.0-1") lp.provides = [ "provision" ] self.addpkg2db("local", lp) self.args = "-S pkg1 pkg2" self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg1") self.addrule("PKG_EXIST=pkg2") self.addrule("PKG_VERSION=pkg2|2.0-1") self.addrule("PKG_EXIST=provision") ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
On Mon, Nov 26, 2007 at 03:56:28PM +0100, Nagy Gabor wrote:
Hi! This is a not-to-commit pactest, I just refer to this old thread...: http://www.archlinux.org/pipermail/pacman-dev/2007-August/009072.html I don't know whether we should fix this or not... This is not a critical bug, just resolvedeps is a bit negligent, but checkdeps will report its negligence (however, pacman doesn't seem to be very informative...).
Yes I noticed, I think this little bug was in your original checkdeps patch. At the end of sync_prepare, you should do s/FREELIST(deps);/*data = deps;/ Then the missing dependencies message will magically appear again in the frontend :)
By fixing this the code would be cleaner, more clever, but _notably_ slower, so I still don't know what to do... What do you think?
---resolvedeps.py--- self.description = "resolvedeps test"
sp1 = pmpkg("pkg1") sp1.depends = [ "pkg2>=2.0-1" , "provision" ] self.addpkg2db("sync", sp1)
sp2 = pmpkg("pkg2", "2.0-1") self.addpkg2db("sync", sp2)
sp3 = pmpkg("provision") self.addpkg2db("sync", sp3)
lp = pmpkg("pkg2", "1.0-1") lp.provides = [ "provision" ] self.addpkg2db("local", lp)
self.args = "-S pkg1 pkg2"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg1") self.addrule("PKG_EXIST=pkg2") self.addrule("PKG_VERSION=pkg2|2.0-1") self.addrule("PKG_EXIST=provision")
I think it would be interesting to see the patch you propose for fixing it, and a benchmark for measuring how slower it is :)
Idézés Xavier <shiningxc@gmail.com>:
On Mon, Nov 26, 2007 at 03:56:28PM +0100, Nagy Gabor wrote:
Hi! This is a not-to-commit pactest, I just refer to this old thread...: http://www.archlinux.org/pipermail/pacman-dev/2007-August/009072.html I don't know whether we should fix this or not... This is not a critical bug, just resolvedeps is a bit negligent, but checkdeps will report its negligence (however, pacman doesn't seem to be very informative...).
Yes I noticed, I think this little bug was in your original checkdeps patch. At the end of sync_prepare, you should do s/FREELIST(deps);/*data = deps;/ Then the missing dependencies message will magically appear again in the frontend :)
By fixing this the code would be cleaner, more clever, but _notably_ slower, so I still don't know what to do... What do you think?
---resolvedeps.py--- self.description = "resolvedeps test"
sp1 = pmpkg("pkg1") sp1.depends = [ "pkg2>=2.0-1" , "provision" ] self.addpkg2db("sync", sp1)
sp2 = pmpkg("pkg2", "2.0-1") self.addpkg2db("sync", sp2)
sp3 = pmpkg("provision") self.addpkg2db("sync", sp3)
lp = pmpkg("pkg2", "1.0-1") lp.provides = [ "provision" ] self.addpkg2db("local", lp)
self.args = "-S pkg1 pkg2"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg1") self.addrule("PKG_EXIST=pkg2") self.addrule("PKG_VERSION=pkg2|2.0-1") self.addrule("PKG_EXIST=provision")
I think it would be interesting to see the patch you propose for fixing it, and a benchmark for measuring how slower it is :)
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes... Bye ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
Bye
Well, resolvedeps is also quite hard to read (but seems OK). Imho we should share some code between alpm_sync_addtarget and resolvedeps: I mean they could use common search_for_literal(), search_for_satisfier() functions. By doing this alpm_resolvedeps would be much clearer and '-S provision' would be more general, user could do -S 'depend>=2.0' if he wants to do so (he got an unsatisfied dependency error after -A) and we can slowly kill whatprovides... (foo provides bar is a special case of foo satisfies bar) Bye ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
On Fri, Nov 30, 2007 at 08:39:22PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
Bye
Well, resolvedeps is also quite hard to read (but seems OK). Imho we should share some code between alpm_sync_addtarget and resolvedeps: I mean they could use common search_for_literal(), search_for_satisfier() functions. By doing this alpm_resolvedeps would be much clearer and '-S provision' would be more general, user could do -S 'depend>=2.0' if he wants to do so (he got an unsatisfied dependency error after -A) and we can slowly kill whatprovides... (foo provides bar is a special case of foo satisfies bar)
On this subject, see http://bugs.archlinux.org/task/8763 .
On Fri, Nov 30, 2007 at 08:39:22PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
Bye
Well, resolvedeps is also quite hard to read (but seems OK). Imho we should share some code between alpm_sync_addtarget and resolvedeps: I mean they could use common search_for_literal(), search_for_satisfier() functions. By doing this alpm_resolvedeps would be much clearer and '-S provision' would be more general, user could do -S 'depend>=2.0' if he wants to do so (he got an unsatisfied dependency error after -A) and we can slowly kill whatprovides... (foo provides bar is a special case of foo satisfies bar)
On this subject, see http://bugs.archlinux.org/task/8763 .
Well, I wanted to ask this earlier: is it possible (and acceptable) to "forward" pacman FS activity to here automatically? (/me never reads FS) Back to 8763: Imho this is a bit off here. But my opinion hasn't changed since -Sg: user should know and define what he wants to do. So my vote: "-S literal" only, if this wasn't successful, pacman can ask the user whether he wants to search for providers etc... And a special option to install provider (or satisfier). Bye
On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
I couldn't measure any difference. pacman always need ~3.5 seconds for running smoke001.
From 3b8f21423035f7bf683494a8e369b6501be8e379 Mon Sep 17 00:00:00 2001 From: Nagy Gabor <ngaba@bibl.u-szeged.hu> Date: Thu, 29 Nov 2007 21:55:55 +0100 Subject: [PATCH] Reduce the negligence of _alpm_resolvedeps
This is a not-to-commit patch NOTE: -Se is temporarily removed, because now resolvedeps doesn't have spkg param
What is really bad about this patch besides breaking -Se? Do you see a clean way of reimplementing it?
On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
I couldn't measure any difference. pacman always need ~3.5 seconds for running smoke001.
Because it is an add transaction;-) I just said that it contains a long dependency path, which can cause notable slowdown in case of -S first_element_of_path. (Unfortunately I cannot write such tricky pactests in python).
From 3b8f21423035f7bf683494a8e369b6501be8e379 Mon Sep 17 00:00:00 2001 From: Nagy Gabor <ngaba@bibl.u-szeged.hu> Date: Thu, 29 Nov 2007 21:55:55 +0100 Subject: [PATCH] Reduce the negligence of _alpm_resolvedeps
This is a not-to-commit patch NOTE: -Se is temporarily removed, because now resolvedeps doesn't have spkg param
What is really bad about this patch besides breaking -Se? Do you see a clean way of reimplementing it? Well, the way you didn't like earlier: http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=b0c064d59b8786a1e...
dependency-list == populate(populatedlist - oldlist) Bye
On Sat, Dec 01, 2007 at 12:24:16AM +0100, Nagy Gabor wrote:
On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
I couldn't measure any difference. pacman always need ~3.5 seconds for running smoke001.
Because it is an add transaction;-) I just said that it contains a long dependency path, which can cause notable slowdown in case of -S first_element_of_path. (Unfortunately I cannot write such tricky pactests in python).
Oops, my bad. In my defense, it was a bit late :) But now, I'm slightly confused. How hard is it to edit smoke001 to do what you want? It looks like a totally trivial change to me. So I'm not sure I understood what you wanted.
From 3b8f21423035f7bf683494a8e369b6501be8e379 Mon Sep 17 00:00:00 2001 From: Nagy Gabor <ngaba@bibl.u-szeged.hu> Date: Thu, 29 Nov 2007 21:55:55 +0100 Subject: [PATCH] Reduce the negligence of _alpm_resolvedeps
This is a not-to-commit patch NOTE: -Se is temporarily removed, because now resolvedeps doesn't have spkg param
What is really bad about this patch besides breaking -Se? Do you see a clean way of reimplementing it? Well, the way you didn't like earlier: http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=b0c064d59b8786a1e...
dependency-list == populate(populatedlist - oldlist)
The previous code did that, and you wanted to call recursively sync_prepare on dependency-list, right? I think that's why I didn't like it much. In my opinion, there is no need to worry much about -Se. Where is this option used anyway? If there are use cases for it, it probably only involes running -Se with one target. Not several targets with inter-dependencies. So this extreme corner case doesn't need to be supported. That is, sync1001 should still pass, but sync1002 doesn't need to.
On Sat, Dec 01, 2007 at 12:06:59PM +0100, Xavier wrote:
On Sat, Dec 01, 2007 at 12:24:16AM +0100, Nagy Gabor wrote:
On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
I couldn't measure any difference. pacman always need ~3.5 seconds for running smoke001.
Because it is an add transaction;-) I just said that it contains a long dependency path, which can cause notable slowdown in case of -S first_element_of_path. (Unfortunately I cannot write such tricky pactests in python).
Oops, my bad. In my defense, it was a bit late :) But now, I'm slightly confused. How hard is it to edit smoke001 to do what you want? It looks like a totally trivial change to me. So I'm not sure I understood what you wanted.
So, I made a simple change to smoke001 : --------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction" p = pmpkg("pkg1000") self.addpkg2db("local", p) for i in range(1000): p = pmpkg("pkg%03d" % i) p.depends = ["pkg%03d" % (i+1)] p.files = ["usr/share/pkg%03d" % i] self.addpkg2db("sync", p) self.args = "-S pkg001" self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg050") self.addrule("PKG_EXIST=pkg674") self.addrule("PKG_EXIST=pkg999") --------------------------------------------------------------------------- Your resolvedeps patch indeed made its complexity explode :) I couldn't even wait long enough for it to finish. So I reduced the number to 100 packages, so that it could handle it in a few seconds: --------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction" p = pmpkg("pkg100") self.addpkg2db("local", p) for i in range(100): p = pmpkg("pkg%02d" % i) p.depends = ["pkg%02d" % (i+1)] p.files = ["usr/share/pkg%02d" % i] self.addpkg2db("sync", p) self.args = "-S pkg01" self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg50") self.addrule("PKG_EXIST=pkg74") self.addrule("PKG_EXIST=pkg99") --------------------------------------------------------------------------- I think I see the problem now, the checkdeps calls done in resolvedeps are something like: checkdeps(pkg1) checkdeps(pkg1, pkg2) checkdeps(pkg1, pkg2, pkg3) ... checkdeps(pkg1, .......... , pkg99) While previously, it was: checkdeps(pkg1) checkdeps(pkg2) ... checkdeps(pkg99) So, I had the following idea, do a first pass with the old algorithm : checkdeps(pkg1) checkdeps(pkg2) ... checkdeps(pkg99) And once it's done, do a checkdeps call on the whole list : checkdeps(pkg1, .......... , pkg99) So there would be a first pass with the old algorithm, and a second pass with your new one. But I don't really see how to implement this cleanly.
Sun, 2 Dec 2007 11:56:34 +0100 -n Xavier <shiningxc@gmail.com> írta:
On Sat, Dec 01, 2007 at 12:06:59PM +0100, Xavier wrote:
On Sat, Dec 01, 2007 at 12:24:16AM +0100, Nagy Gabor wrote:
On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
I couldn't measure any difference. pacman always need ~3.5 seconds for running smoke001.
Because it is an add transaction;-) I just said that it contains a long dependency path, which can cause notable slowdown in case of -S first_element_of_path. (Unfortunately I cannot write such tricky pactests in python).
Oops, my bad. In my defense, it was a bit late :) But now, I'm slightly confused. How hard is it to edit smoke001 to do what you want? It looks like a totally trivial change to me. So I'm not sure I understood what you wanted.
So, I made a simple change to smoke001 :
--------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction"
p = pmpkg("pkg1000")
self.addpkg2db("local", p)
for i in range(1000): p = pmpkg("pkg%03d" % i) p.depends = ["pkg%03d" % (i+1)] p.files = ["usr/share/pkg%03d" % i] self.addpkg2db("sync", p)
self.args = "-S pkg001"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg050") self.addrule("PKG_EXIST=pkg674") self.addrule("PKG_EXIST=pkg999") ---------------------------------------------------------------------------
Your resolvedeps patch indeed made its complexity explode :) I couldn't even wait long enough for it to finish. So I reduced the number to 100 packages, so that it could handle it in a few seconds:
--------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction"
p = pmpkg("pkg100")
self.addpkg2db("local", p)
for i in range(100): p = pmpkg("pkg%02d" % i) p.depends = ["pkg%02d" % (i+1)] p.files = ["usr/share/pkg%02d" % i] self.addpkg2db("sync", p)
self.args = "-S pkg01"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg50") self.addrule("PKG_EXIST=pkg74") self.addrule("PKG_EXIST=pkg99") ---------------------------------------------------------------------------
I think I see the problem now, the checkdeps calls done in resolvedeps are something like: checkdeps(pkg1) checkdeps(pkg1, pkg2) checkdeps(pkg1, pkg2, pkg3) ... checkdeps(pkg1, .......... , pkg99)
While previously, it was: checkdeps(pkg1) checkdeps(pkg2) ... checkdeps(pkg99)
So, I had the following idea, do a first pass with the old algorithm : checkdeps(pkg1) checkdeps(pkg2) ... checkdeps(pkg99)
And once it's done, do a checkdeps call on the whole list : checkdeps(pkg1, .......... , pkg99)
So there would be a first pass with the old algorithm, and a second pass with your new one. But I don't really see how to implement this cleanly.
Hm. This would be a bit hackish imho. The last step [checkdeps(pkg1, .......... , pkg99)] can be interpreted as a new first step, because nobody can guarantee that adding pkg100 won't break the dependency of pkg60, and we must repeat the checkdeps process. Thus we reached my patch again, or we use old-algorithm, new-algorithm, old-algorithm... no. Of course in usual cases this would help, but I can bet that we have no so long (>100) dependency paths neither. Note: Probably huge localdb is also cause an extra slowdown, but we could eliminate this issue (passing alpm_list_t pkgcache to checkdeps instead of pmdb_t). Summary: This patch has a higher degree polinomial time-complexity than the old one (we knew this before your speed test, but I could not predict if this acceptable or not). Your test showed this is not an acceptable slowdown, so patch is reverted. Bye
On Sun, Dec 02, 2007 at 01:29:42PM +0100, Nagy Gabor wrote:
Hm. This would be a bit hackish imho. The last step [checkdeps(pkg1, .......... , pkg99)] can be interpreted as a new first step, because nobody can guarantee that adding pkg100 won't break the dependency of pkg60, and we must repeat the checkdeps process. Thus we reached my patch again, or we use old-algorithm, new-algorithm, old-algorithm... no.
I am not sure if I didn't explain my proposal clearly enough, or if I don't understand your objection, but just in case: What I had in mind is to first run the old resolvedeps : this causes 99 checkdeps call, with a one element list. This gives us the following list : pkg1, .......... , pkg99 Then we run your new resolveps on it, which will call checkdeps as many times as necessary, starting with a list of 99 elements. But in the case of smoke002, it'll call it just once, and see all dependencies are satisfied. So this should give a huge speedup. And your resolvedeps.py test case should still pass with that method.
Sun, 2 Dec 2007 13:44:28 +0100 -n Xavier <shiningxc@gmail.com> írta:
On Sun, Dec 02, 2007 at 01:29:42PM +0100, Nagy Gabor wrote:
Hm. This would be a bit hackish imho. The last step [checkdeps(pkg1, .......... , pkg99)] can be interpreted as a new first step, because nobody can guarantee that adding pkg100 won't break the dependency of pkg60, and we must repeat the checkdeps process. Thus we reached my patch again, or we use old-algorithm, new-algorithm, old-algorithm... no.
I am not sure if I didn't explain my proposal clearly enough, or if I don't understand your objection, but just in case: What I had in mind is to first run the old resolvedeps : this causes 99 checkdeps call, with a one element list.
This gives us the following list : pkg1, .......... , pkg99 Then we run your new resolveps on it, which will call checkdeps as many times as necessary, starting with a list of 99 elements. But in the case of smoke002, it'll call it just once, and see all dependencies are satisfied. So this should give a huge speedup.
And your resolvedeps.py test case should still pass with that method.
OK. I understood this, but what if we run into this 1000-length dependency path in the second round?: I mean the old-and-fast method elect some packages (a, b, c), and the new method realises that we should add pkg1 too. Bye
On Sun, Dec 02, 2007 at 02:05:17PM +0100, Nagy Gabor wrote:
OK. I understood this, but what if we run into this 1000-length dependency path in the second round?: I mean the old-and-fast method elect some packages (a, b, c), and the new method realises that we should add pkg1 too.
Ok, I indeed just misunderstood you. Now it's very clear, thanks.
Sun, 2 Dec 2007 13:29:42 +0100 -n Nagy Gabor <ngaba@bibl.u-szeged.hu> írta:
Sun, 2 Dec 2007 11:56:34 +0100 -n Xavier <shiningxc@gmail.com> írta:
On Sat, Dec 01, 2007 at 12:06:59PM +0100, Xavier wrote:
On Sat, Dec 01, 2007 at 12:24:16AM +0100, Nagy Gabor wrote:
On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
I couldn't measure any difference. pacman always need ~3.5 seconds for running smoke001.
Because it is an add transaction;-) I just said that it contains a long dependency path, which can cause notable slowdown in case of -S first_element_of_path. (Unfortunately I cannot write such tricky pactests in python).
Oops, my bad. In my defense, it was a bit late :) But now, I'm slightly confused. How hard is it to edit smoke001 to do what you want? It looks like a totally trivial change to me. So I'm not sure I understood what you wanted.
So, I made a simple change to smoke001 :
--------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction"
p = pmpkg("pkg1000")
self.addpkg2db("local", p)
for i in range(1000): p = pmpkg("pkg%03d" % i) p.depends = ["pkg%03d" % (i+1)] p.files = ["usr/share/pkg%03d" % i] self.addpkg2db("sync", p)
self.args = "-S pkg001"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg050") self.addrule("PKG_EXIST=pkg674") self.addrule("PKG_EXIST=pkg999") ---------------------------------------------------------------------------
Your resolvedeps patch indeed made its complexity explode :) I couldn't even wait long enough for it to finish. So I reduced the number to 100 packages, so that it could handle it in a few seconds:
--------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction"
p = pmpkg("pkg100")
self.addpkg2db("local", p)
for i in range(100): p = pmpkg("pkg%02d" % i) p.depends = ["pkg%02d" % (i+1)] p.files = ["usr/share/pkg%02d" % i] self.addpkg2db("sync", p)
self.args = "-S pkg01"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg50") self.addrule("PKG_EXIST=pkg74") self.addrule("PKG_EXIST=pkg99") ---------------------------------------------------------------------------
I think I see the problem now, the checkdeps calls done in resolvedeps are something like: checkdeps(pkg1) checkdeps(pkg1, pkg2) checkdeps(pkg1, pkg2, pkg3) ... checkdeps(pkg1, .......... , pkg99)
While previously, it was: checkdeps(pkg1) checkdeps(pkg2) ... checkdeps(pkg99)
So, I had the following idea, do a first pass with the old algorithm : checkdeps(pkg1) checkdeps(pkg2) ... checkdeps(pkg99)
And once it's done, do a checkdeps call on the whole list : checkdeps(pkg1, .......... , pkg99)
So there would be a first pass with the old algorithm, and a second pass with your new one. But I don't really see how to implement this cleanly.
Hm. This would be a bit hackish imho. The last step [checkdeps(pkg1, .......... , pkg99)] can be interpreted as a new first step, because nobody can guarantee that adding pkg100 won't break the dependency of pkg60, and we must repeat the checkdeps process. Thus we reached my patch again, or we use old-algorithm, new-algorithm, old-algorithm... no. Of course in usual cases this would help, but I can bet that we have no so long (>100) dependency paths neither. Note: Probably huge localdb is also cause an extra slowdown, but we could eliminate this issue (passing alpm_list_t pkgcache to checkdeps instead of pmdb_t).
Summary: This patch has a higher degree polinomial time-complexity than the old one (we knew this before your speed test, but I could not predict if this acceptable or not). Your test showed this is not an acceptable slowdown, so patch is reverted.
Bye
Note: this would be a trivial "problem" if we used more efficient data structures in libalpm (I mean graph structure for example now). Or in other words, the problem is, that we don't keep the "internal" results of checkdeps (see also the TODO in sortbydeps): in the example above we compute (and find!) ~1000 times that pkg2 satisfies a dependency of pkg1. If we kept the internal results, we could easily check, whether adding a new target breaks an already satisfied dependency (check the to-be-overwritten "vertex": is there any "in-edge" from the target list?...) Bye
Idézés Xavier <shiningxc@gmail.com>:
On Sat, Dec 01, 2007 at 12:06:59PM +0100, Xavier wrote:
On Sat, Dec 01, 2007 at 12:24:16AM +0100, Nagy Gabor wrote:
On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
OK, but I leave the benchmark for you;-) Note: Long "dependency paths" like in smoke001.py may cause notable slowdown. I made a quick "hack", so probably I made some mistakes...
I couldn't measure any difference. pacman always need ~3.5 seconds for running smoke001.
Because it is an add transaction;-) I just said that it contains a long dependency path, which can cause notable slowdown in case of -S first_element_of_path. (Unfortunately I cannot write such tricky pactests in python).
Oops, my bad. In my defense, it was a bit late :) But now, I'm slightly confused. How hard is it to edit smoke001 to do what you want? It looks like a totally trivial change to me. So I'm not sure I understood what you wanted.
So, I made a simple change to smoke001 :
--------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction"
p = pmpkg("pkg1000")
self.addpkg2db("local", p)
for i in range(1000): p = pmpkg("pkg%03d" % i) p.depends = ["pkg%03d" % (i+1)] p.files = ["usr/share/pkg%03d" % i] self.addpkg2db("sync", p)
self.args = "-S pkg001"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg050") self.addrule("PKG_EXIST=pkg674") self.addrule("PKG_EXIST=pkg999") ---------------------------------------------------------------------------
Your resolvedeps patch indeed made its complexity explode :) I couldn't even wait long enough for it to finish. So I reduced the number to 100 packages, so that it could handle it in a few seconds:
--------------------------------------------------------------------------- self.description = "Install a thousand packages in a single transaction"
p = pmpkg("pkg100")
self.addpkg2db("local", p)
for i in range(100): p = pmpkg("pkg%02d" % i) p.depends = ["pkg%02d" % (i+1)] p.files = ["usr/share/pkg%02d" % i] self.addpkg2db("sync", p)
self.args = "-S pkg01"
self.addrule("PACMAN_RETCODE=0") self.addrule("PKG_EXIST=pkg50") self.addrule("PKG_EXIST=pkg74") self.addrule("PKG_EXIST=pkg99") ---------------------------------------------------------------------------
Well. Other bad news: I'm pretty sure now, that -Rc will also explode with such long dependency paths (all pkg* is installed, and you do a -Rc pkg1000), what's more, probably that would be much slower, because: 0. this is the "inverse" of my patched resolvedeps, just we use requiredbys here <- we should move -Rc handling to deps.c and rewrite to resolvedeps-style. 1. compute_requiredby (explicit or implicit alpm_checkdeps) !! 2. the ugly list_diff(pkgcache, dblist) <- this is about O(n^2) ~ 1.000.000 for each (about 1000) checkdeps call And we've just found a bad thing with compute_requiredby usage: you shouldn't compute it too many times for the same package, because you will get an extreme slowdown! [Well, originally I would have kept pkg_getrequiredby and compute-and-store it on demand, like in pkg_getdepends]. ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
participants (2)
-
Nagy Gabor
-
Xavier