[pacman-dev] backends

26 Oct 2006

      Split.

On 10/26/06, Roman Kyrylych <roman.kyrylych@gmail.com> wrote:
...
2006/10/26, Aaron Griffin <aaronmgriffin@gmail.com>:
...
On 10/26/06, Roman Kyrylych <roman.kyrylych@gmail.com> wrote:
...
[offtopic on]
BTW, about complex patches - I have a patch from Martin Devera that
adds support for gdbm database backend to pacman 2.9.8. I suggested
him to contact VMiklos for help in inclusion it in pacman, but I don't
know if he did that. If I adapt it for Pacman 3 API is there a chance
it will be included at least in CVS/DARCS?
[/offtopic off]
There's no chance that would get included in pacman 2.9.8 - it will die soon.
Converting it to pacman3 is entirely doable and it's already setup to
do so.  You should be able to copy be_files.c to be_gdbm.c and consume
the interface provided there.
FTR 'be_' means 'backend'.  I still need to add compile-time flags for
selecting a backend, but as we only provide _files at the moment,
there's no need.
I worked on a 'compressed' backend that required no changes, but read
directly from the downloaded db files... it caused problems on the
local db, however, so I never completed it (yet).
As for backend schemes, I think we need to discuss avenues of attack
here.  The files backend is painfully slow on ext filesystems (11
seconds to parse [community]).  There are many users touting a
database backend as a good idea (I highly disagree), and if we do
this, we NEED a generic layer to support any sane database format, not
just one specific thing.  Using a flat-file database is a better
option, but here people seem to think sqlite is a good idea (it's not)
- there are many flat-file schemes that can drastically outperform
sqlite.
There are many many other options, and I think it's worth at least
_supplying_ multiple backends, even if they're not used.  So gdbm is
cool.
Maybe we should start separate thread for this?
Personally, I think sqlite will be the best choice because of:
1) it's small
2) it's fast
3) it's embedable
Ok, sqlite is "fast and etc etc", right? So if I do [pacman -Ss foo],
what does sqlite do?  It opens the file (exactly as plaintext
searching would), initializes some structures and things like that
describing table data, then proceeds to sequentially search (FULL
TABLE SCAN) through all rows.  It is impossible, even in a good
database, to index substrings.  Yes, you can index entire strings,
that's easy.  The search operations check internal substrings, not
whole strings.
You could make this faster by converting each entry into a suffix tree
and storing that somewhere, but that gets irritating and gets into
that crappy Computer Science stuf everyone hates.

Fact of the matter is, there's two main "slow" operations - searching,
and loading the whole DB (in the case of an -Su).  Loading the whole
database can easilly be sped up by using lazy  package evaluation -
that is, a simple readdir() or the db directory gives us package name,
version, and release - all we need for a upgrade check.  The
additional data can be read when/if required.  Substring searching
will never be uber fast, but we can maintain a minor indexed file with
all text that's searched... something like:

extra/package-name-1.0-1 : This is the description for the package
community/package-two-1.1.1-1 : This is the description for this thing
extra/another-99-1 : This is the description for some stuff

-Ss can check this file, and use the first entry before the colon to
construct the path to open (though that may not be needed as all the
information is right there).

Ok, I'm gonna stop myself before I rant too much.  There are cries of
"use a database" from many people.  Frankly, I don't think that these
people have evaluated all the options and are stuck in the "everyone
uses databases for their websites" mentality.

Take this for instance: The google engineers here in Chicago explained
to me something they use to store whatever the hell google stores
there: A custom filesystem tooled to their spec.

They don't use mysql or oracle or sqlite any of that - the king of
searching uses the simplest of all options.
...
4) it offers full power of SQL99 (with ACID transactions!!!)
Nope. Not at all.  It doesn't even support SQL92 - a 14 year old standard.
http://www.sqlite.org/omitted.html
http://www.sqlite.org/cvstrac/wiki?p=UnsupportedSql
They have said publicly they do not plan on adding support.  Standards
compliance is HUGE in my book.