2006/10/26, Aaron Griffin <aaronmgriffin@gmail.com>:
What I'm saying is that you have the overhead of a full-scale database with little gain. Indexing by package names, yes that's great and all, but that doesn't help with the slowest-of-the-slow -Ss operation. -Ss searches package names AND descriptions, allowing for regex matching. Sqlite (and most DBs) do not support regex matching. Indexing the search by package name means little, because a -Ss "foo" still SHOULD match a package named "barfoo" and a package with "foo" in the description. This means a sequential search. Not only that, but because the DB will not support regex, that means that one must iterate over each and every entry, get the values at the C level, apply a regex pattern, and note if it is a match or not. The only speed you gain would be in the initial opening of the files. To me, this does not mean a DB backend is the solution. It may be better than the files backend, yes, but not the best, and outperforming the files backend is not hard... I was able to improve performance approximately 6 times by simply using the db.tar.gz files in place of disparate text files.
Yes, the main problem with files backend is that huge amount of files in /var/lib/pacman that leads to slow performance on many filesystems. gdbm/sqlite/whatever has a big "+" that everithing can be in one file. Plus, as I said, with SQLite you have the ability to easily use in-memory database + you have easy ACID transactions support. This will be faster than seeking through the /var/lib/pacman/ anyway. And there will be no need for pacman-optimize-like scripts for database backends. The overhead is not big. (A quote from sqlite.org: "Small code footprint: less than 250KiB fully configured or less than 150KiB with optional features omitted") Anyway nobody imposes use of some database backend as default and the only one. But having alternative is good. Users can test performance by themself and choose the solution that fits their needs in the best way. Otherwise, what's the point of the ALPM's ability to have different backends? ;-) About regexps - yes, SQLite doesn't support them, but adding this on top of it will be not slower than in files backend anyway. I don't see a problem here. I'm not "advertising" SQLite or other databases because "everybody use databases for everything". I just think that this will offer some new abilities for libalpm and will simplify (and speedup, yes - speedup) some things. That's IMHO, of course, but I think it worth a try gdbm and sqlite as alternatives to files backend. -- Roman Kyrylych (Роман Кирилич)