[pacman-dev] Fast binary database
Hello, I have been playing with the idea of speeding up pacman for a little while. This is my first try. It's a binary database that uses a ternary search tree to speedup file lookup and package lookup. The result can be found at http://owwsnap.com/sarcina-0.0.1.tar.gz. To compile it you need to run the normal ./configure && make, but you would probably want to edit src/sarcina.c and change the hardcoded paths. If you're running a new version of pacman (3.1 or something) you probably need to change libsarc/sarc.c:505: db_local = alpm_db_register("local"); into alpm_option_set_dbpath("/var/lib/pacman/"); db_local = alpm_db_register_local(); I see you have discussed the database question several times. I hope this code could give you some answers (is a binary database really faster etc). To run this program you first need to build the database using sarcina -u. Then you can: - find who owns a file (sarcina -o file) - list files that are outdated (sarcina -s) - "install" a package (sarcina -i package). This is pretty much just a proof-of-concept, so don't expect it to do much, but I think it might be a start. Sivert
On Fri, 23 Nov 2007 21:44:54 +0100 Sivert Berg <siveberg@online.no> wrote:
Hello,
[cut]
I see you have discussed the database question several times. I hope this code could give you some answers (is a binary database really faster etc).
It's clear that binary data are faster than txt data. There are some guy that don't want to hear this: their way is the best {clean, useful etc} and others sucks.
This is pretty much just a proof-of-concept, so don't expect it to do much, but I think it might be a start.
[danimoth@jane src]$time ./sarcina -o `which vim` /usr/bin/vim is owned by vim real 0m0.127s user 0m0.040s sys 0m0.037s [danimoth@jane src]$time pacman -Qo `which vim` /usr/bin/vim è contenuto in vim 7.1.156-1 real 1m58.891s user 0m42.174s sys 0m7.100s Ok, the pacman in [core] is bugged; whit the git version, I have 40 sec (without cache). 40 sec VS 0.1 sec ... It's a GOOD starting point. I'm trying to setting up ( despite git .. that's really unusable..) a copy of the git repo, with sqlite-backend coded.I want try to insert your modify into this. -- JJDaNiMoTh - ArchLinux Trusted User
On Sun, Nov 25, 2007 at 12:32:23PM +0100, JJDaNiMoTh wrote:
[danimoth@jane src]$time ./sarcina -o `which vim` /usr/bin/vim is owned by vim
real 0m0.127s user 0m0.040s sys 0m0.037s
[danimoth@jane src]$time pacman -Qo `which vim` /usr/bin/vim è contenuto in vim 7.1.156-1
real 1m58.891s user 0m42.174s sys 0m7.100s
Ok, the pacman in [core] is bugged; whit the git version, I have 40 sec (without cache).
40 sec VS 0.1 sec ... It's a GOOD starting point.
I first made this comparison too, that's why I tried to investigate what was the hotpoint in the current -Qo code. See http://www.archlinux.org/pipermail/pacman-dev/2007-November/010277.html So the above comparison isn't fair, and the difference isn't explained by the backends. Still, sarcina backend indeed seems to be an improvement, but is it worth it? The advantages of the text backend were explained earlier this month, following your SQL structure proposal : http://www.archlinux.org/pipermail/pacman-dev/2007-November/009938.html However, sarcina can probably help as a reference, to see if it's possible to come close to its speed while keeping a text backend. Firstly, the text backend itself can be changed. And secondly, the backend isn't everything, as my -Qo optimization shows. In any cases, it's an interesting experiment.
Sivert Berg <siveberg@online.no> wrote:
Hello,
[cut]
I see you have discussed the database question several times. I hope this
could give you some answers (is a binary database really faster etc). It's clear that binary data are faster than txt data. There are some guy
code that don't want to hear this: their way is the best {clean, useful etc} and others sucks.
Well, I could accept pluggable db back-ends in libalpm [and maintain an officially supported one here]. The problem is, that you must get your distro to provide you special db formats for sync repos. [Converting databases is odd imho]. But we agreed earlier that optimizing sync-repo-db is much easier, because that is a read-only db, so their "inefficiency" [from your point of view] might disappear in the future. Bye ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
participants (4)
-
JJDaNiMoTh
-
Nagy Gabor
-
Sivert Berg
-
Xavier