[pacman-dev] [PATCH] pacman/query.c : -Qo optimization.
I didn't understand why realpath was called on every files of every filelist in query_fileowner : ppath = resolve_path(path); It turns out this is needed for the diverted files. For example, cddb_get installs /usr/lib/perl5/site_perl/5.8.8/CDDB_get.pm which actually ends in /usr/lib/perl5/site_perl/current/CDDB_get.pm . And for making pacman -Qo /usr/lib/perl5/site_perl/current/CDDB_get.pm , realpath has to be called on both the target, and the file in the filelist. However, realpath is costly, and calling it on every single file resulted in a poor -Qo performance. Worst case : pacman -Qo /lib/libz.so.1 0.35s user 1.51s system 99% cpu 1.864 total So I did a little optimization to avoid calling realpath as much as possible: first compare the basename of each file. Result: src/pacman/pacman -Qo /lib/libz.so.1 0.24s user 0.05s system 99% cpu 0.298 total Obviously, the difference will be even bigger at the first run (no fs cache), though it's quite scary on my system : 1.7s vs 40s previously. Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- src/pacman/query.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/src/pacman/query.c b/src/pacman/query.c index bb69527..7989756 100644 --- a/src/pacman/query.c +++ b/src/pacman/query.c @@ -71,6 +71,7 @@ static int query_fileowner(alpm_list_t *targets) for(t = targets; t; t = alpm_list_next(t)) { int found = 0; char *filename = alpm_list_getdata(t); + char *bname; char *rpath; struct stat buf; alpm_list_t *i, *j; @@ -88,6 +89,8 @@ static int query_fileowner(alpm_list_t *targets) continue; } + bname = basename(filename); + if(!(rpath = resolve_path(filename))) { fprintf(stderr, _("error: cannot determine real path for '%s': %s\n"), filename, strerror(errno)); @@ -103,6 +106,11 @@ static int query_fileowner(alpm_list_t *targets) snprintf(path, PATH_MAX, "%s%s", alpm_option_get_root(), (const char *)alpm_list_getdata(j)); + /* avoid the costly resolve_path usage if the basenames don't match */ + if(strcmp(basename(path), bname) != 0) { + continue; + } + ppath = resolve_path(path); if(ppath && strcmp(ppath, rpath) == 0) { -- 1.5.3.6
I didn't understand why realpath was called on every files of every filelist in query_fileowner : ppath = resolve_path(path);
It turns out this is needed for the diverted files. For example, cddb_get installs /usr/lib/perl5/site_perl/5.8.8/CDDB_get.pm which actually ends in /usr/lib/perl5/site_perl/current/CDDB_get.pm .
A real life example ;-): /opt directory can be a symlink for example. And an offtopic note here: iirc (<- cannot check now) you cannot search for owner of a directory symlink [/usr/var], because dir-symlinks are treated as dirs in files <- this may(?) lead to an other symlink puzzle: conflicting "symlinks" in target list. Xavier, could you check this please?
And for making pacman -Qo /usr/lib/perl5/site_perl/current/CDDB_get.pm , realpath has to be called on both the target, and the file in the filelist.
However, realpath is costly, and calling it on every single file resulted in a poor -Qo performance. Worst case : pacman -Qo /lib/libz.so.1 0.35s user 1.51s system 99% cpu 1.864 total
So I did a little optimization to avoid calling realpath as much as possible: first compare the basename of each file.
Result: src/pacman/pacman -Qo /lib/libz.so.1 0.24s user 0.05s system 99% cpu 0.298 total
Obviously, the difference will be even bigger at the first run (no fs cache), though it's quite scary on my system : 1.7s vs 40s previously.
---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
On Sat, Nov 24, 2007 at 02:24:02PM +0100, Nagy Gabor wrote:
I didn't understand why realpath was called on every files of every filelist in query_fileowner : ppath = resolve_path(path);
It turns out this is needed for the diverted files. For example, cddb_get installs /usr/lib/perl5/site_perl/5.8.8/CDDB_get.pm which actually ends in /usr/lib/perl5/site_perl/current/CDDB_get.pm .
A real life example ;-): /opt directory can be a symlink for example.
And an offtopic note here: iirc (<- cannot check now) you cannot search for owner of a directory symlink [/usr/var], because dir-symlinks are treated as dirs in files <- this may(?) lead to an other symlink puzzle: conflicting "symlinks" in target list. Xavier, could you check this please?
Not sure what you mean, but usr/var is owned by filesystem and show up as usr/var in the filelist. So it's treated as a file and not a dir, right? pacman -Ql filesystem | grep "usr/var" filesystem /usr/var
Not sure what you mean, but usr/var is owned by filesystem and show up as usr/var in the filelist. So it's treated as a file and not a dir, right? pacman -Ql filesystem | grep "usr/var" filesystem /usr/var
[*] Fine. First I meant "pacman -Qo /usr/var", when I tried last time, "I got cannot determine ownership of a directory" <- if this still exists, we can fix with the help of [*]. Bye ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
On Sat, Nov 24, 2007 at 02:59:05PM +0100, Nagy Gabor wrote:
Not sure what you mean, but usr/var is owned by filesystem and show up as usr/var in the filelist. So it's treated as a file and not a dir, right? pacman -Ql filesystem | grep "usr/var" filesystem /usr/var
[*] Fine. First I meant "pacman -Qo /usr/var", when I tried last time, "I got cannot determine ownership of a directory" <- if this still exists, we can fix with the help of [*].
Hm, I don't understand your [*] thing. But pacman -Qo simply stats the file (so if it's a symlink, it resolves it), and if it is a directory, it simply stops. That's because directories are usually owned by multiple packages, and -Qo is maybe supposed to return just one package.
Idézés Xavier <shiningxc@gmail.com>:
On Sat, Nov 24, 2007 at 02:59:05PM +0100, Nagy Gabor wrote:
Not sure what you mean, but usr/var is owned by filesystem and show up as usr/var in the filelist. So it's treated as a file and not a dir, right? pacman -Ql filesystem | grep "usr/var" filesystem /usr/var
[*] Fine. First I meant "pacman -Qo /usr/var", when I tried last time, "I got cannot determine ownership of a directory" <- if this still exists, we can fix with the help of [*].
Hm, I don't understand your [*] thing. But pacman -Qo simply stats the file (so if it's a symlink, it resolves it), and if it is a directory, it simply stops. That's because directories are usually owned by multiple packages, and -Qo is maybe supposed to return just one package.
Directory symlink is not a directory [that is a symlink ;-], that should be owned by one package imho. This is why I said [*], and because alpm_fileconflicts can detect two conflicting directory symlinks. Bye ---------------------------------------------------- SZTE Egyetemi Könyvtár - http://www.bibl.u-szeged.hu This mail sent through IMP: http://horde.org/imp/
A few thoughts of mine below. On Nov 23, 2007 5:20 PM, Chantry Xavier <shiningxc@gmail.com> wrote:
I didn't understand why realpath was called on every files of every filelist in query_fileowner : ppath = resolve_path(path);
It turns out this is needed for the diverted files. For example, cddb_get installs /usr/lib/perl5/site_perl/5.8.8/CDDB_get.pm which actually ends in /usr/lib/perl5/site_perl/current/CDDB_get.pm .
And for making pacman -Qo /usr/lib/perl5/site_perl/current/CDDB_get.pm , realpath has to be called on both the target, and the file in the filelist.
However, realpath is costly, and calling it on every single file resulted in a poor -Qo performance. Worst case : pacman -Qo /lib/libz.so.1 0.35s user 1.51s system 99% cpu 1.864 total
So I did a little optimization to avoid calling realpath as much as possible: first compare the basename of each file.
Result: src/pacman/pacman -Qo /lib/libz.so.1 0.24s user 0.05s system 99% cpu 0.298 total
Obviously, the difference will be even bigger at the first run (no fs cache), though it's quite scary on my system : 1.7s vs 40s previously.
Signed-off-by: Chantry Xavier <shiningxc@gmail.com> --- src/pacman/query.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/src/pacman/query.c b/src/pacman/query.c index bb69527..7989756 100644 --- a/src/pacman/query.c +++ b/src/pacman/query.c @@ -71,6 +71,7 @@ static int query_fileowner(alpm_list_t *targets) for(t = targets; t; t = alpm_list_next(t)) { int found = 0; char *filename = alpm_list_getdata(t); + char *bname; char *rpath; struct stat buf; alpm_list_t *i, *j; @@ -88,6 +89,8 @@ static int query_fileowner(alpm_list_t *targets) continue; }
+ bname = basename(filename);
The problem with basename is...it is too inconsistent across platforms. See what I did with the mbasename() function in pacman.c. (commit dea9b3bc0f6ba49aec8452958f5373fbb20e7df2) basename() may modify its arguments- this is bad news for us, where we definitely don't want the frontend destroying filelists. So I think you will have to move mbasename to util.c and use that instead if you wish to do it this way.
+ if(!(rpath = resolve_path(filename))) { fprintf(stderr, _("error: cannot determine real path for '%s': %s\n"), filename, strerror(errno)); @@ -103,6 +106,11 @@ static int query_fileowner(alpm_list_t *targets) snprintf(path, PATH_MAX, "%s%s", alpm_option_get_root(), (const char *)alpm_list_getdata(j));
+ /* avoid the costly resolve_path usage if the basenames don't match */ + if(strcmp(basename(path), bname) != 0) { + continue; + } + ppath = resolve_path(path);
if(ppath && strcmp(ppath, rpath) == 0) { -- 1.5.3.6
I feel like their could be an edge case where something breaks and this shortcut bites us. However, I can't figure out out. Anyone else? Otherwise this makes a lot of sense. -Dan
On Nov 25, 2007 2:24 PM, Dan McGee <dpmcgee@gmail.com> wrote:
I feel like their could be an edge case where something breaks and this shortcut bites us. However, I can't figure out out. Anyone else? Otherwise this makes a lot of sense.
I don't think so. operation 1 : check basename if same operation 2 : check full path if same tada! The only edge cases are when basename matches, but the full path does not, which is covered anyway. Personally, I think this is a brilliant idea on Xavier's part, so keep up the good work. Also, make sure to note what Dan mentioned about basename (we want to try and be cross-platform here whenever possible)
On Nov 26, 2007 4:09 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Nov 25, 2007 2:24 PM, Dan McGee <dpmcgee@gmail.com> wrote:
I feel like their could be an edge case where something breaks and this shortcut bites us. However, I can't figure out out. Anyone else? Otherwise this makes a lot of sense.
I don't think so.
operation 1 : check basename if same operation 2 : check full path if same tada!
The only edge cases are when basename matches, but the full path does not, which is covered anyway.
Personally, I think this is a brilliant idea on Xavier's part, so keep up the good work.
Also, make sure to note what Dan mentioned about basename (we want to try and be cross-platform here whenever possible)
You're a day behind. :P <http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=11133da587ebc1c78478cfcd05d5e8298bd61b84> <http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=d683033d3ea79956faf8786f784ce2e271179892> -Dan
On Nov 26, 2007 3:13 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Nov 26, 2007 4:09 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Nov 25, 2007 2:24 PM, Dan McGee <dpmcgee@gmail.com> wrote:
I feel like their could be an edge case where something breaks and this shortcut bites us. However, I can't figure out out. Anyone else? Otherwise this makes a lot of sense.
I don't think so.
operation 1 : check basename if same operation 2 : check full path if same tada!
The only edge cases are when basename matches, but the full path does not, which is covered anyway.
Personally, I think this is a brilliant idea on Xavier's part, so keep up the good work.
Also, make sure to note what Dan mentioned about basename (we want to try and be cross-platform here whenever possible)
You're a day behind. :P
<http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=11133da587ebc1c78478cfcd05d5e8298bd61b84> <http://projects.archlinux.org/git/?p=pacman.git;a=commit;h=d683033d3ea79956faf8786f784ce2e271179892>
Damn you people and your merging behind my back! 8)
On Mon, Nov 26, 2007 at 03:09:44PM -0600, Aaron Griffin wrote:
On Nov 25, 2007 2:24 PM, Dan McGee <dpmcgee@gmail.com> wrote:
I feel like their could be an edge case where something breaks and this shortcut bites us. However, I can't figure out out. Anyone else? Otherwise this makes a lot of sense.
I don't think so.
operation 1 : check basename if same operation 2 : check full path if same tada!
The only edge cases are when basename matches, but the full path does not, which is covered anyway.
Personally, I think this is a brilliant idea on Xavier's part, so keep up the good work.
Also, make sure to note what Dan mentioned about basename (we want to try and be cross-platform here whenever possible)
Oh, sorry, we discussed this with Dan on IRC, and then I was willing to answer the mail so that others can follow, and totally forgot. When Dan asked me about these edge cases, I wasn't sure what to answer. But when thinking back about it later, I thought about something similar to what you just did for proving it was correct, so thanks :) About the basename, I did what Dan asked, move mbasename from pacman.c to util.c , and then use that instead. That change and the Qo patch are both already in master.
participants (5)
-
Aaron Griffin
-
Chantry Xavier
-
Dan McGee
-
Nagy Gabor
-
Xavier