[pacman-dev] Backend DB [was: Hypotetic pacman's db structure]
On Nov 9, 2007 3:22 PM, Miklos Vajna <vmiklos@frugalware.org> wrote:
On Fri, Nov 09, 2007 at 05:26:01PM +0100, JJDaNiMoTh <jjdanimoth@gmail.com> wrote:
I'm writing an example databe structure for pacman db ( may be useful when someone decide to use SQLite as backend, and not simple text file). [1]
last year i writed such a structure, but it was much more complex:
http://git.frugalware.org/gitweb/gitweb.cgi?p=vmexam.git;a=blob;f=sql/pacman...
and to be honest i must say i agree with Aaron about using an sql-based db for such a lowlevel purpose (pkgdb) would be something like using curl in libpacman
Hah, this is the second time you agreed with me! The sky is falling! To probe your thoughts a bit further, if the world were ideal, what kind of backend would YOU go for?
On Fri, Nov 09, 2007 at 03:28:33PM -0600, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
To probe your thoughts a bit further, if the world were ideal, what kind of backend would YOU go for?
:) good question. i actually like the current backend as it's easy to repair. if you have a corrupted db and bdb's magic repair says it can't fix.. say your data goodbye (i managed to do this with both my bogofilter db and with my old rpm database, too) if you look at git, they also had this problem, git operates with many-many small files and they introduced packs (with indexes) to speed up reading from the database an idea i have: it would be interesting to see if this works for pacman's db, too. like: the sync dbs are never modified, so the sync dbs could be one single file: just files concatenated after each other, and having an index to mark where an entry (which is a file atm) starts, its length and maybe pkgname-pkgver (so by reading the index you would get what you currently get by reading the dir only but no files) for the local db, we could have a 'marked as deleted' bit for removed entries then pacman-optimize could really remove those entires if it hurts someone ok, enough. i think it does not worth a big discussion, it would be interesting after i implemented it and i have benchmarks to see how fast it is, compared to the current implementation pacman-1.0 proved that there is single-file db implementation that doesn't work - it would be nice if i could provide one that does :) ah and my additional problem with bdb is its size: 14M (compressed) and finally one more argument for the current implementation: several times i had a hdd crash when parts of the pacman db was affected during fsck. and it was easy to solve. pacman -Qt, it printed 3 packages, reinstall, and you're ready. afaik none of the single-file implementations have this advantage thanks, - VMiklos
Hi! When I started to work with pacman, I also hated this db structure, because of its inefficiency. But now, I'm not sure that it is inefficient. I don't have too much experience with "dbms"s, but after pkgcache is loaded, I cannot imagine radically faster solution than the current[*]. So we should compute the full-load-time of the pkgcache, my guess: 3-4 sec. Which is not too much (in case of -Su for example); but I'm sure that this will be less then running pkgconfig. So we are talking about seconds; and I dunno if the db change would worth the effort. I'm not sure that would be even faster (see [*]) (keeping pkgcache is needless with a dbms imho). I remember the results of my "time pacman -R foo" tests (after empty disk cache): "running ldconfig" caused at least 60% (!) of the running time. Then I said: ok, we can speed up this a bit, but we won't see anything (because of ldconfig) However, we can speed up the "pkgcache loading time" of sync repos, as Vmiklos said. Bye, ngaba
Idézés Nagy Gabor <ngaba@bibl.u-szeged.hu>:
Hi! When I started to work with pacman, I also hated this db structure, because of its inefficiency. But now, I'm not sure that it is inefficient. I don't have too much experience with "dbms"s, but after pkgcache is loaded, I cannot imagine radically faster solution than the current[*]. So we should compute the full-load-time of the pkgcache, my guess: 3-4 sec. Which is not too much (in case of -Su for example); but I'm sure that this will be less then running pkgconfig. So we are talking about seconds; and I dunno if the db change would worth the effort. I'm not sure that would be even faster (see [*]) (keeping pkgcache is needless with a dbms imho). I remember the results of my "time pacman -R foo" tests (after empty disk cache): "running ldconfig" caused at least 60% (!) of the running time. Then I said: ok, we can speed up this a bit, but we won't see anything (because of ldconfig) However, we can speed up the "pkgcache loading time" of sync repos, as Vmiklos said. Bye, ngaba
Hi! I attached some speedtest results; I hope that this is not offtopic here. As I see, this is clearly what I predicted (except: ldconfig is much faster than I remembered): Loading pkgcache time is about 10 sec; after it is loaded working with pkgcache is very fast (see the last two tests for example). So IMHO the _constant_ ~10 sec we can win with a new db backend is not notable with large (-Su) transactions. So personally I can live with the current backend. Note: my system is quite lightweight... Bye PS: As I wrote in the subject, analyzing pacman's speed would be much easier if debug printed timestamps to log (on request?). What do you think about this? If you like it, I will create a patch (~0 effort work.) ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
On Mon, Nov 12, 2007 at 12:56:26PM +0100, Nagy Gabor wrote:
I attached some speedtest results; I hope that this is not offtopic here. As I see, this is clearly what I predicted (except: ldconfig is much faster than I remembered):
Hm, did you try when installing or removing a package with shared libs?
PS: As I wrote in the subject, analyzing pacman's speed would be much easier if debug printed timestamps to log (on request?). What do you think about this? If you like it, I will create a patch (~0 effort work.)
Does that mean replacing the hour by the time elapsed since pacman start? That is, just relative instead of absolute values?
[root@Arch sync]# find -type d | wc -l 4139
pacman -Sl ?
---testdb (without Aaron's alpm_list speed-up) with empty disk cache--- [root@Arch ~]# sync; echo 3 > /proc/sys/vm/drop_caches [root@Arch ~]# time testdb wrong requiredby for esd : sdl_net wrong requiredby for python-numeric : pycairo wrong requiredby for xorg : xfce-utils (grr, what's this?)
Maybe you forgot what your UPGRADERM patch was for? ;)
On Nov 12, 2007 5:56 AM, Nagy Gabor <ngaba@bibl.u-szeged.hu> wrote:
PS: As I wrote in the subject, analyzing pacman's speed would be much easier if debug printed timestamps to log (on request?). What do you think about this? If you like it, I will create a patch (~0 effort work.)
You know compiling with PACMAN_DEBUG and using --debug will give you timestamps, yes?
On Nov 12, 2007 3:21 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Nov 12, 2007 5:56 AM, Nagy Gabor <ngaba@bibl.u-szeged.hu> wrote:
PS: As I wrote in the subject, analyzing pacman's speed would be much easier if debug printed timestamps to log (on request?). What do you think about this? If you like it, I will create a patch (~0 effort work.)
You know compiling with PACMAN_DEBUG and using --debug will give you timestamps, yes?
Just run ./configure --enable-debug ...
On Nov 12, 2007 3:30 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Nov 12, 2007 3:21 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Nov 12, 2007 5:56 AM, Nagy Gabor <ngaba@bibl.u-szeged.hu> wrote:
PS: As I wrote in the subject, analyzing pacman's speed would be much easier if debug printed timestamps to log (on request?). What do you think about this? If you like it, I will create a patch (~0 effort work.)
You know compiling with PACMAN_DEBUG and using --debug will give you timestamps, yes?
Just run ./configure --enable-debug ...
That's what I meant. I was just using the #define, because it's right there
You know compiling with PACMAN_DEBUG and using --debug will give you timestamps, yes? Well, I didn't know this. Thx for the info (to Dan, too). Bye, ngaba
---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
On Nov 9, 2007 4:10 PM, Miklos Vajna <vmiklos@frugalware.org> wrote:
good question. i actually like the current backend as it's easy to repair. if you have a corrupted db and bdb's magic repair says it can't fix.. say your data goodbye (i managed to do this with both my bogofilter db and with my old rpm database, too)
Yeah, I said this to someone offlist - 1 misplaced byte will blow a binary blob DB out of the water.
if you look at git, they also had this problem, git operates with many-many small files and they introduced packs (with indexes) to speed up reading from the database
an idea i have: it would be interesting to see if this works for pacman's db, too. like: the sync dbs are never modified, so the sync dbs could be one single file: just files concatenated after each other, and having an index to mark where an entry (which is a file atm) starts, its length and maybe pkgname-pkgver (so by reading the index you would get what you currently get by reading the dir only but no files)
Yeah. I had proof-of-concept code somewhere that never untarred the sync DBs, and simply parsed them in their tar.gz form (libarchive). It actually worked very very well, but it complicates things as we now need a different scheme for sync and local DBs. Which is acceptable, just a little ugly.
and finally one more argument for the current implementation: several times i had a hdd crash when parts of the pacman db was affected during fsck. and it was easy to solve. pacman -Qt, it printed 3 packages, reinstall, and you're ready. afaik none of the single-file implementations have this advantage
I know. I really like the plain text approach. It's very elegant. Maybe using something similar to git's packs may help us here. I will look into it.
2007/11/10, Aaron Griffin <aaronmgriffin@gmail.com>:
On Nov 9, 2007 4:10 PM, Miklos Vajna <vmiklos@frugalware.org> wrote:
and finally one more argument for the current implementation: several times i had a hdd crash when parts of the pacman db was affected during fsck. and it was easy to solve. pacman -Qt, it printed 3 packages, reinstall, and you're ready. afaik none of the single-file implementations have this advantage
I know. I really like the plain text approach. It's very elegant. Maybe using something similar to git's packs may help us here. I will look into it.
I'd like to resurrect this thread: http://archlinux.org/pipermail/pacman-dev/2007-April/008163.html (even older is here: http://www.archlinux.org/pipermail/pacman-dev/2006-March/005702.html) ;-) I think it would be a nice change for 3.2. -- Roman Kyrylych (Роман Кирилич)
On Nov 10, 2007 6:33 AM, Roman Kyrylych <roman.kyrylych@gmail.com> wrote:
2007/11/10, Aaron Griffin <aaronmgriffin@gmail.com>:
On Nov 9, 2007 4:10 PM, Miklos Vajna <vmiklos@frugalware.org> wrote:
and finally one more argument for the current implementation: several times i had a hdd crash when parts of the pacman db was affected during fsck. and it was easy to solve. pacman -Qt, it printed 3 packages, reinstall, and you're ready. afaik none of the single-file implementations have this advantage
I know. I really like the plain text approach. It's very elegant. Maybe using something similar to git's packs may help us here. I will look into it.
I'd like to resurrect this thread: http://archlinux.org/pipermail/pacman-dev/2007-April/008163.html (even older is here: http://www.archlinux.org/pipermail/pacman-dev/2006-March/005702.html)
Those are totally unrelated.... I mean, they're important, sure, but they're not what we're talking about here.
Idézés Aaron Griffin <aaronmgriffin@gmail.com>:
On Nov 10, 2007 6:33 AM, Roman Kyrylych <roman.kyrylych@gmail.com> wrote:
2007/11/10, Aaron Griffin <aaronmgriffin@gmail.com>:
On Nov 9, 2007 4:10 PM, Miklos Vajna <vmiklos@frugalware.org> wrote:
and finally one more argument for the current implementation: several times i had a hdd crash when parts of the pacman db was affected during fsck. and it was easy to solve. pacman -Qt, it printed 3 packages, reinstall, and you're ready. afaik none of the single-file implementations have this advantage
I know. I really like the plain text approach. It's very elegant. Maybe using something similar to git's packs may help us here. I will look into it.
I'd like to resurrect this thread: http://archlinux.org/pipermail/pacman-dev/2007-April/008163.html (even older is here: http://www.archlinux.org/pipermail/pacman-dev/2006-March/005702.html)
Those are totally unrelated.... I mean, they're important, sure, but they're not what we're talking about here.
Not totally;-P If you change the db backend, then the first question simply may disappear... Bye ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
participants (6)
-
Aaron Griffin
-
Dan McGee
-
Miklos Vajna
-
Nagy Gabor
-
Roman Kyrylych
-
Xavier