[pacman-dev] [PATCH] pacman/query.c : -Qo optimization.

Dan McGee dpmcgee at gmail.com
Sun Nov 25 15:24:26 EST 2007


A few thoughts of mine below.

On Nov 23, 2007 5:20 PM, Chantry Xavier <shiningxc at gmail.com> wrote:
> I didn't understand why realpath was called on every files of every filelist
> in query_fileowner :
> ppath = resolve_path(path);
>
> It turns out this is needed for the diverted files. For example, cddb_get
> installs /usr/lib/perl5/site_perl/5.8.8/CDDB_get.pm which actually ends in
> /usr/lib/perl5/site_perl/current/CDDB_get.pm .
>
> And for making pacman -Qo /usr/lib/perl5/site_perl/current/CDDB_get.pm ,
> realpath has to be called on both the target, and the file in the filelist.
>
> However, realpath is costly, and calling it on every single file resulted
> in a poor -Qo performance. Worst case :
> pacman -Qo /lib/libz.so.1  0.35s user 1.51s system 99% cpu 1.864 total
>
> So I did a little optimization to avoid calling realpath as much as
> possible: first compare the basename of each file.
>
> Result:
> src/pacman/pacman -Qo /lib/libz.so.1  0.24s user 0.05s system 99% cpu 0.298 total
>
> Obviously, the difference will be even bigger at the first run (no fs
> cache), though it's quite scary on my system : 1.7s vs 40s previously.
>
> Signed-off-by: Chantry Xavier <shiningxc at gmail.com>
> ---
>  src/pacman/query.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/src/pacman/query.c b/src/pacman/query.c
> index bb69527..7989756 100644
> --- a/src/pacman/query.c
> +++ b/src/pacman/query.c
> @@ -71,6 +71,7 @@ static int query_fileowner(alpm_list_t *targets)
>         for(t = targets; t; t = alpm_list_next(t)) {
>                 int found = 0;
>                 char *filename = alpm_list_getdata(t);
> +               char *bname;
>                 char *rpath;
>                 struct stat buf;
>                 alpm_list_t *i, *j;
> @@ -88,6 +89,8 @@ static int query_fileowner(alpm_list_t *targets)
>                         continue;
>                 }
>
> +               bname = basename(filename);

The problem with basename is...it is too inconsistent across
platforms. See what I did with the mbasename() function in pacman.c.
(commit dea9b3bc0f6ba49aec8452958f5373fbb20e7df2)

basename() may modify its arguments- this is bad news for us, where we
definitely don't want the frontend destroying filelists. So I think
you will have to move mbasename to util.c and use that instead if you
wish to do it this way.

> +
>                 if(!(rpath = resolve_path(filename))) {
>                         fprintf(stderr, _("error: cannot determine real path for '%s': %s\n"),
>                                         filename, strerror(errno));
> @@ -103,6 +106,11 @@ static int query_fileowner(alpm_list_t *targets)
>                                 snprintf(path, PATH_MAX, "%s%s",
>                                          alpm_option_get_root(), (const char *)alpm_list_getdata(j));
>
> +                               /* avoid the costly resolve_path usage if the basenames don't match */
> +                               if(strcmp(basename(path), bname) != 0) {
> +                                       continue;
> +                               }
> +
>                                 ppath = resolve_path(path);
>
>                                 if(ppath && strcmp(ppath, rpath) == 0) {
> --
> 1.5.3.6

I feel like their could be an edge case where something breaks and
this shortcut bites us. However, I can't figure out out. Anyone else?
Otherwise this makes a lot of sense.

-Dan




More information about the pacman-dev mailing list