[pacman-dev] Pacman speed improvement
Hi! As known, pacman is very slow in some cases (mainly when -S switch is used). I looked into the source (of pacman2, but the following problems still exists on pacman-cvs) and I'd like to make some suggestions: 1. I think, that the main problem is the dbcache, which is calculated in most cases (e.g. this is the first step when you do a 'pacman -S foo' (?!)). In the current state of pacman, the whole package info is always loaded into the dbcache for all package, which means that all /var/lib/pacman/*/desc files (hundreds of files) needs to be read (in my computer this process takes 20 sec). I think loading desc for all package (in remote repo) is needless, because usually this is needed for the packages which we working with (except -Ss for example). The package name and version can be easily got from the repo directory (~ls), so this can be done as a dbcache initialization (this is fast.) To maintain the current structure of the source, I think that the best solution is to replace 'dbcache_entry->info' (and similar) with a function package_info(dbcache_entry) which checks if the info field is loaded to dbcache_entry already or not and loads it if needed and returns with it. In the 'pacman -S foo' example you need to read the repo directory (~ ls => fast) to initialize dbcache, search the package 'foo', read its dependencies (from desc) and the dependencies' dependencies... so you need to read the desc files only for packages to be installed. 2. Sometimes (e.g. -Ss) reading desc for all package is needed (I think this is needed for remote repos only, see 3.). This would be much faster if we used one file instead of many small ones. For remote repos fdb is needed, so we have done this (tar) file already, so we should keep it (after sync) and use it. (Because we work with special files (text files), we may find/write a more efficient "tar-format".) 3. Working with the local repo is slow when you install a new package. Because you must reed all local package info to check conflicts. This is done by reading many small depends files. If we assume that only few packages have CONFLICTS and PROVIDES field filled (which is true, I think), this process could be radically speed up if you "cache" these packages with "double store": For example, there would be two special directories (.provides and .conflict) in /var/lib/pacman/local; when you install a foo package you check if conflict field is filled: if yes, after installing the package you create a symlink in .conflict to foo package directory (This is ~no overhead), and the same for provides. So when you need to resolve conflicts you just read the (few) packages can be found in .conflict. 4. (Small thing, similar to 3) In remote repo the groups may be collected into a .groups directory (which contains groups & symlinks), so shouldn't be read the whole repo to find the members of a group. Bye, Nagy Gabor
The sluggishness due to file reading is well known. I've heard complaints about it for the past... 4 years. Everyone seems to have their own solution. Here's the thing: we're trying. Soon, I want to replace all package access with calls to the pkg_get_* accessor functions which will read the files on demand. This will help minimize the number of needed files as much as possible, as each will only be read just before it is needed. As for changes to the actual backend, you are free to submit patches. It should be as simple as modifying be_files.c do some be_<whatever>.c (be == 'backend'). Thanks for the input, Aaron PS I found your mail hard to follow due to lack of newlines... were they stripped somehow, or is it gmail doing that?
participants (2)
-
Aaron Griffin
-
Nagy Gabor