On 01/11/12 15:12, Allan McRae wrote:
I was thinking about the local database backend in pacman and how we could improve it.
The tar-based backed for sync dbs have been quite a success. Ever since we did that, I have been wanting to do the same with the local database. But there were two issues:
1) tar is not designed for removing and updating files in place
2) with a directory/files per package structure, we are quite robust to corruption as a single file effects only a single package.
Well... I have a cunning plan... How about we do both!
Have the local database in a tarball but also extracted. All reading is done from the tarball, so -Q operations would be fast due to not require reading from lots of small files. With an operation that adds/removes/updates a package, the local database is still read from the tarball, but the modifications are done on the files backend and then the tarball is recreated at the end. We could even be efficient during the recreation and read all the old files from the old tarball and only read the new files from the filesystem (which will be in the kernel cache...).
This would also give another use for "pacman -D" - an option could be added to recreate the local db tarball - in case it became corrupt or the files were manually edited.
What do people think?
Just to be clear, I wanted comments on the dual local database (one "binary", one filesystem based) would be a good solution to increase read speed (due to not having many small files), but also keeping the robustness of the non-binary format to corruption. I.e would the extra 10-20MB be an acceptable trade-off. Given I have absolutely no interest in using sqlite, bdb, etc... and I can almost guarantee that I will be the one to provide a patchset that changes the local backend, comments about choosing a relational database are not needed. Allan