roman.kyrylych at gmail.com
Thu Oct 26 17:03:40 EDT 2006
2006/10/26, Aaron Griffin <aaronmgriffin at gmail.com>:
> What I'm saying is that you have the overhead of a full-scale database
> with little gain. Indexing by package names, yes that's great and
> all, but that doesn't help with the slowest-of-the-slow -Ss operation.
> -Ss searches package names AND descriptions, allowing for regex
> matching. Sqlite (and most DBs) do not support regex matching.
> Indexing the search by package name means little, because a -Ss "foo"
> still SHOULD match a package named "barfoo" and a package with "foo"
> in the description. This means a sequential search. Not only that,
> but because the DB will not support regex, that means that one must
> iterate over each and every entry, get the values at the C level,
> apply a regex pattern, and note if it is a match or not. The only
> speed you gain would be in the initial opening of the files. To me,
> this does not mean a DB backend is the solution. It may be better
> than the files backend, yes, but not the best, and outperforming the
> files backend is not hard... I was able to improve performance
> approximately 6 times by simply using the db.tar.gz files in place of
> disparate text files.
Yes, the main problem with files backend is that huge amount of files
in /var/lib/pacman that leads to slow performance on many filesystems.
gdbm/sqlite/whatever has a big "+" that everithing can be in one file.
Plus, as I said, with SQLite you have the ability to easily use
in-memory database + you have easy ACID transactions support. This
will be faster than seeking through the /var/lib/pacman/ anyway. And
there will be no need for pacman-optimize-like scripts for database
The overhead is not big. (A quote from sqlite.org: "Small code
footprint: less than 250KiB fully configured or less than 150KiB with
optional features omitted")
Anyway nobody imposes use of some database backend as default and the
only one. But having alternative is good. Users can test performance
by themself and choose the solution that fits their needs in the best
way. Otherwise, what's the point of the ALPM's ability to have
different backends? ;-)
About regexps - yes, SQLite doesn't support them, but adding this on
top of it will be not slower than in files backend anyway. I don't see
a problem here.
I'm not "advertising" SQLite or other databases because "everybody use
databases for everything". I just think that this will offer some new
abilities for libalpm and will simplify (and speedup, yes - speedup)
some things. That's IMHO, of course, but I think it worth a try gdbm
and sqlite as alternatives to files backend.
Roman Kyrylych (Роман Кирилич)
More information about the pacman-dev