On Fri, Oct 27, 2006 at 12:03:40AM +0300, Roman Kyrylych wrote:
2006/10/26, Aaron Griffin <aaronmgriffin@gmail.com>:
What I'm saying is that you have the overhead of a full-scale database with little gain. Indexing by package names, yes that's great and all, but that doesn't help with the slowest-of-the-slow -Ss operation. -Ss searches package names AND descriptions, allowing for regex matching. Sqlite (and most DBs) do not support regex matching. Indexing the search by package name means little, because a -Ss "foo" still SHOULD match a package named "barfoo" and a package with "foo" in the description. This means a sequential search. Not only that, but because the DB will not support regex, that means that one must iterate over each and every entry, get the values at the C level, apply a regex pattern, and note if it is a match or not. The only speed you gain would be in the initial opening of the files. To me, this does not mean a DB backend is the solution. It may be better than the files backend, yes, but not the best, and outperforming the files backend is not hard... I was able to improve performance approximately 6 times by simply using the db.tar.gz files in place of disparate text files.
Yes, the main problem with files backend is that huge amount of files in /var/lib/pacman that leads to slow performance on many filesystems. gdbm/sqlite/whatever has a big "+" that everithing can be in one file.
This is a big "-", because unix lovers prefer multiple text files. This makes it easy to use powerful text/file-processing tools like "sed", "awk", "shell" and "find" to build package-management related scripts.
Plus, as I said, with SQLite you have the ability to easily use in-memory database + you have easy ACID transactions support. This will be faster than seeking through the /var/lib/pacman/ anyway. And there will be no need for pacman-optimize-like scripts for database backends. The overhead is not big. (A quote from sqlite.org: "Small code footprint: less than 250KiB fully configured or less than 150KiB with optional features omitted")
You will gain little performance (some seconds on a slow system?) at the expense of complexity. Jürgen