On Nov 9, 2007 4:10 PM, Miklos Vajna <vmiklos@frugalware.org> wrote:
good question. i actually like the current backend as it's easy to repair. if you have a corrupted db and bdb's magic repair says it can't fix.. say your data goodbye (i managed to do this with both my bogofilter db and with my old rpm database, too)
Yeah, I said this to someone offlist - 1 misplaced byte will blow a binary blob DB out of the water.
if you look at git, they also had this problem, git operates with many-many small files and they introduced packs (with indexes) to speed up reading from the database
an idea i have: it would be interesting to see if this works for pacman's db, too. like: the sync dbs are never modified, so the sync dbs could be one single file: just files concatenated after each other, and having an index to mark where an entry (which is a file atm) starts, its length and maybe pkgname-pkgver (so by reading the index you would get what you currently get by reading the dir only but no files)
Yeah. I had proof-of-concept code somewhere that never untarred the sync DBs, and simply parsed them in their tar.gz form (libarchive). It actually worked very very well, but it complicates things as we now need a different scheme for sync and local DBs. Which is acceptable, just a little ugly.
and finally one more argument for the current implementation: several times i had a hdd crash when parts of the pacman db was affected during fsck. and it was easy to solve. pacman -Qt, it printed 3 packages, reinstall, and you're ready. afaik none of the single-file implementations have this advantage
I know. I really like the plain text approach. It's very elegant. Maybe using something similar to git's packs may help us here. I will look into it.