[pacman-dev] [PATCH] Reduce the negligence of _alpm_resolvedeps

Nagy Gabor ngaba at bibl.u-szeged.hu
Sun Dec 2 07:53:02 EST 2007


Sun, 2 Dec 2007 13:29:42 +0100 -n
Nagy Gabor <ngaba at bibl.u-szeged.hu> írta:

> Sun, 2 Dec 2007 11:56:34 +0100 -n
> Xavier <shiningxc at gmail.com> írta:
> 
> > On Sat, Dec 01, 2007 at 12:06:59PM +0100, Xavier wrote:
> > > On Sat, Dec 01, 2007 at 12:24:16AM +0100, Nagy Gabor wrote:
> > > > > On Thu, Nov 29, 2007 at 10:13:42PM +0100, Nagy Gabor wrote:
> > > > > > OK, but I leave the benchmark for you;-)
> > > > > > Note: Long "dependency paths" like in smoke001.py may cause
> > > > > > notable slowdown. I made a quick "hack", so probably I made
> > > > > > some mistakes...
> > > > > > 
> > > > > 
> > > > > I couldn't measure any difference. pacman always need ~3.5
> > > > > seconds for running smoke001.
> > > > > 
> > > > Because it is an add transaction;-) I just said that it contains
> > > > a long dependency path, which can cause notable slowdown in case
> > > > of -S first_element_of_path. (Unfortunately I cannot write such
> > > > tricky pactests in python).
> > > > 
> > > 
> > > Oops, my bad. In my defense, it was a bit late :)
> > > But now, I'm slightly confused. How hard is it to edit smoke001 to
> > > do what you want? It looks like a totally trivial change to me. So
> > > I'm not sure I understood what you wanted.
> > > 
> > 
> > So, I made a simple change to smoke001 :
> > 
> > ---------------------------------------------------------------------------
> > self.description = "Install a thousand packages in a single
> > transaction"
> > 
> > p = pmpkg("pkg1000")
> > 
> > self.addpkg2db("local", p)
> > 
> > for i in range(1000):
> > 	p = pmpkg("pkg%03d" % i)
> > 	p.depends = ["pkg%03d" % (i+1)]
> > 	p.files = ["usr/share/pkg%03d" % i]
> > 	self.addpkg2db("sync", p)
> > 
> > self.args = "-S pkg001"
> > 
> > self.addrule("PACMAN_RETCODE=0")
> > self.addrule("PKG_EXIST=pkg050")
> > self.addrule("PKG_EXIST=pkg674")
> > self.addrule("PKG_EXIST=pkg999")
> > ---------------------------------------------------------------------------
> > 
> > Your resolvedeps patch indeed made its complexity explode :) I
> > couldn't even wait long enough for it to finish.
> > So I reduced the number to 100 packages, so that it could handle it
> > in a few seconds:
> > 
> > ---------------------------------------------------------------------------
> > self.description = "Install a thousand packages in a single
> > transaction"
> > 
> > p = pmpkg("pkg100")
> > 
> > self.addpkg2db("local", p)
> > 
> > for i in range(100):
> > 	p = pmpkg("pkg%02d" % i)
> > 	p.depends = ["pkg%02d" % (i+1)]
> > 	p.files = ["usr/share/pkg%02d" % i]
> > 	self.addpkg2db("sync", p)
> > 
> > self.args = "-S pkg01"
> > 
> > self.addrule("PACMAN_RETCODE=0")
> > self.addrule("PKG_EXIST=pkg50")
> > self.addrule("PKG_EXIST=pkg74")
> > self.addrule("PKG_EXIST=pkg99")
> > ---------------------------------------------------------------------------
> > 
> > 
> > I think I see the problem now, the checkdeps calls done in
> > resolvedeps are something like:
> > checkdeps(pkg1)
> > checkdeps(pkg1, pkg2)
> > checkdeps(pkg1, pkg2, pkg3)
> > ...
> > checkdeps(pkg1, .......... , pkg99)
> > 
> > While previously, it was:
> > checkdeps(pkg1)
> > checkdeps(pkg2)
> > ...
> > checkdeps(pkg99)
> > 
> > So, I had the following idea, do a first pass with the old
> > algorithm : checkdeps(pkg1)
> > checkdeps(pkg2)
> > ...
> > checkdeps(pkg99)
> > 
> > And once it's done, do a checkdeps call on the whole list :
> > checkdeps(pkg1, .......... , pkg99)
> > 
> > So there would be a first pass with the old algorithm, and a second
> > pass with your new one. But I don't really see how to implement this
> > cleanly.
> > 
> 
> Hm. This would be a bit hackish imho.
> The last step [checkdeps(pkg1, .......... , pkg99)] can be interpreted
> as a new first step, because nobody can guarantee that adding pkg100
> won't break the dependency of pkg60, and we must repeat the checkdeps
> process. Thus we reached my patch again, or we use old-algorithm,
> new-algorithm, old-algorithm... no.
> Of course in usual cases this would help, but I can bet that we have
> no so long (>100) dependency paths neither.
> Note: Probably huge localdb is also cause an extra slowdown, but we
> could eliminate this issue (passing alpm_list_t pkgcache to checkdeps
> instead of pmdb_t).
> 
> Summary: This patch has a higher degree polinomial time-complexity
> than the old one (we knew this before your speed test, but I could not
> predict if this acceptable or not). Your test showed this is not an
> acceptable slowdown, so patch is reverted.
> 
> Bye

Note: this would be a trivial "problem" if we used more
efficient data structures in libalpm (I mean graph structure for
example now). Or in other words, the problem is, that we don't keep
the "internal" results of checkdeps (see also the TODO in sortbydeps):
in the example above we compute (and find!) ~1000 times that pkg2
satisfies a dependency of pkg1.
If we kept the internal results, we could easily check, whether
adding a new target breaks an already satisfied dependency (check the
to-be-overwritten "vertex": is there any "in-edge" from the target
list?...)

Bye




More information about the pacman-dev mailing list