[pacman-dev] Pacman speed improvement

23 Feb 2007

      Hi!

As known, pacman is very slow in some cases (mainly when -S switch is
used). I looked into the source (of pacman2, but the following
problems still exists on pacman-cvs) and I'd like to make some
suggestions: 1. I think, that the main problem is the dbcache, which is
calculated in most cases (e.g. this is the first step when you do a
'pacman -S foo' (?!)). In the current state of pacman, the whole
package info is always loaded into the dbcache for all package, which
means that all /var/lib/pacman/*/desc files (hundreds of files) needs
to be read (in my computer this process takes 20 sec). I think loading
desc for all package (in remote repo) is needless, because usually this
is needed for the packages which we working with (except -Ss for
example). The package name and version can be easily got from the repo
directory (~ls), so this can be done as a dbcache initialization (this
is fast.) To maintain the current structure of the source, I think that
the best solution is to replace 'dbcache_entry->info' (and similar)
with a function package_info(dbcache_entry) which checks if the info
field is loaded to dbcache_entry already or not and loads it if needed
and returns with it. In the 'pacman -S foo' example you need to read the
repo directory (~ ls => fast) to initialize dbcache, search the package
'foo', read its dependencies (from desc) and the dependencies'
dependencies... so you need to read the desc files only for packages to
be installed. 2. Sometimes (e.g. -Ss) reading desc for all package is
needed (I think this is needed for remote repos only, see 3.). This
would be much faster if we used one file instead of many small ones.
For remote repos fdb is needed, so we have done this (tar) file
already, so we should keep it (after sync) and use it. (Because we work
with special files (text files), we may find/write a more efficient
"tar-format".) 3. Working with the local repo is slow when you install
a new package. Because you must reed all local package info to check
conflicts. This is done by reading many small depends files. If we
assume that only few packages have CONFLICTS and PROVIDES field filled
(which is true, I think), this process could be radically speed up if
you "cache" these packages with "double store": For example, there
would be two special directories (.provides and .conflict)
in /var/lib/pacman/local; when you install a foo package you check if
conflict field is filled: if yes, after installing the package you
create a symlink in .conflict to foo package directory (This is ~no
overhead), and the same for provides. So when you need to resolve
conflicts you just read the (few) packages can be found in .conflict.
4. (Small thing, similar to 3) In remote repo the groups may be
collected into a .groups directory (which contains groups & symlinks),
so shouldn't be read the whole repo to find the members of a group.

Bye, Nagy Gabor

Nagy Gabor

Aaron Griffin

tags

participants (2)