Split. On 10/26/06, Roman Kyrylych <roman.kyrylych@gmail.com> wrote:
2006/10/26, Aaron Griffin <aaronmgriffin@gmail.com>:
On 10/26/06, Roman Kyrylych <roman.kyrylych@gmail.com> wrote:
[offtopic on] BTW, about complex patches - I have a patch from Martin Devera that adds support for gdbm database backend to pacman 2.9.8. I suggested him to contact VMiklos for help in inclusion it in pacman, but I don't know if he did that. If I adapt it for Pacman 3 API is there a chance it will be included at least in CVS/DARCS? [/offtopic off]
There's no chance that would get included in pacman 2.9.8 - it will die soon.
Converting it to pacman3 is entirely doable and it's already setup to do so. You should be able to copy be_files.c to be_gdbm.c and consume the interface provided there.
FTR 'be_' means 'backend'. I still need to add compile-time flags for selecting a backend, but as we only provide _files at the moment, there's no need.
I worked on a 'compressed' backend that required no changes, but read directly from the downloaded db files... it caused problems on the local db, however, so I never completed it (yet).
As for backend schemes, I think we need to discuss avenues of attack here. The files backend is painfully slow on ext filesystems (11 seconds to parse [community]). There are many users touting a database backend as a good idea (I highly disagree), and if we do this, we NEED a generic layer to support any sane database format, not just one specific thing. Using a flat-file database is a better option, but here people seem to think sqlite is a good idea (it's not) - there are many flat-file schemes that can drastically outperform sqlite.
There are many many other options, and I think it's worth at least _supplying_ multiple backends, even if they're not used. So gdbm is cool.
Maybe we should start separate thread for this? Personally, I think sqlite will be the best choice because of: 1) it's small 2) it's fast 3) it's embedable
Ok, sqlite is "fast and etc etc", right? So if I do [pacman -Ss foo], what does sqlite do? It opens the file (exactly as plaintext searching would), initializes some structures and things like that describing table data, then proceeds to sequentially search (FULL TABLE SCAN) through all rows. It is impossible, even in a good database, to index substrings. Yes, you can index entire strings, that's easy. The search operations check internal substrings, not whole strings. You could make this faster by converting each entry into a suffix tree and storing that somewhere, but that gets irritating and gets into that crappy Computer Science stuf everyone hates. Fact of the matter is, there's two main "slow" operations - searching, and loading the whole DB (in the case of an -Su). Loading the whole database can easilly be sped up by using lazy package evaluation - that is, a simple readdir() or the db directory gives us package name, version, and release - all we need for a upgrade check. The additional data can be read when/if required. Substring searching will never be uber fast, but we can maintain a minor indexed file with all text that's searched... something like: extra/package-name-1.0-1 : This is the description for the package community/package-two-1.1.1-1 : This is the description for this thing extra/another-99-1 : This is the description for some stuff -Ss can check this file, and use the first entry before the colon to construct the path to open (though that may not be needed as all the information is right there). Ok, I'm gonna stop myself before I rant too much. There are cries of "use a database" from many people. Frankly, I don't think that these people have evaluated all the options and are stuck in the "everyone uses databases for their websites" mentality. Take this for instance: The google engineers here in Chicago explained to me something they use to store whatever the hell google stores there: A custom filesystem tooled to their spec. They don't use mysql or oracle or sqlite any of that - the king of searching uses the simplest of all options.
4) it offers full power of SQL99 (with ACID transactions!!!)
Nope. Not at all. It doesn't even support SQL92 - a 14 year old standard. http://www.sqlite.org/omitted.html http://www.sqlite.org/cvstrac/wiki?p=UnsupportedSql They have said publicly they do not plan on adding support. Standards compliance is HUGE in my book.