[arch-dev-public] [PATCH 0/7] *** SUBJECT HERE ***
In the spirit of Eli making a bunch of patches for the AUR, I finally decided to sit down tonight and figure out why the heck namcap was sucking it up, and did a little code cleanup along the way. namcap.py is now a bit cleaner and separated into functions, and the real treat is namcap is a hell of a lot faster now that I found the bottleneck, which was the depends hook. The first 6 patches in this series lead up to the 7th, which is where the speed increase is found. Let me know what you see, otherwise it would be cool to get this in and a new namcap release made, as it has rather dramatic effects with regards to speed. If you don't like patches, you can get all this from my git tree as well: http://code.toofishes.net/cgit/dan/namcap.git/log/?h=working -Dan Dan McGee (7): Rename 'tags' to 'namcap-tags' Only process tags if necessary Move extracted variable to the correct scope Only do active_modules check once Move PKGBUILD processing to a function Move real package processing to a function Make the depends module not suck Namcap/depends.py | 104 +++++++++++++++++++------------- README | 10 ++-- namcap-tags | 65 ++++++++++++++++++++ namcap.py | 173 +++++++++++++++++++++++++++-------------------------- setup.py | 2 +- tags | 65 -------------------- tests/tags-check | 4 +- 7 files changed, 224 insertions(+), 199 deletions(-) create mode 100644 namcap-tags delete mode 100644 tags
It was a rather generic name for a file. More importantly, it prevented the use of ctags in this projects since ctags creates a file named 'tags' by default for storing its information. Signed-off-by: Dan McGee <dan@archlinux.org> --- README | 10 ++++---- namcap-tags | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ namcap.py | 2 +- setup.py | 2 +- tags | 65 ------------------------------------------------------ tests/tags-check | 4 +- 6 files changed, 74 insertions(+), 74 deletions(-) create mode 100644 namcap-tags delete mode 100644 tags diff --git a/README b/README index c41eaa0..9586752 100644 --- a/README +++ b/README @@ -126,17 +126,17 @@ that each namcap module must have: Each member of these tag lists should be a tuple consisting of two components: the short, hyphenated form of the tag with the appropriate format specifiers (%s, etc.) The first word of this string must be the tag - name. The human readable form of the tag should be put in the tags file. - The format of the tags file is described below; and the parameters which - should replace the format specifier tokens in the final output. + name. The human readable form of the tag should be put in the namcap-tags + file. The format of the tags file is described below; and the parameters + which should replace the format specifier tokens in the final output. * type(self) "pkgbuild" for a module processing PKGBUILDs "tarball" for a module processing a binary package file. -The tags file consists of lines specifying the human readable form of the -hyphenated tags used in the namcap code. A line beginning with a '#' is +The namcap-tags file consists of lines specifying the human readable form of +the hyphenated tags used in the namcap code. A line beginning with a '#' is treated as a comment. Otherwise the format of the file is: machine-parseable-tag %s :: This is machine parseable tag %s diff --git a/namcap-tags b/namcap-tags new file mode 100644 index 0000000..c6f170e --- /dev/null +++ b/namcap-tags @@ -0,0 +1,65 @@ +# namcap tags file +# The tags file consists of lines specifying the human readable form of the +# hyphenated tags used in the namcap code. A line beginning with a '#' is +# treated as a comment. Otherwise the format of the file is: +# +# machine-parseable-tag %s :: This is machine parseable tag %s +# +# Note that a double colon (::) is used to separate the hyphenated tag from the +# human readable description. + +backups-preceding-slashes :: Backup entries should not have preceding slashes +dangling-hardlink %s points to %s :: Hard link (%s) points to non-existing %s +dangling-symlink %s points to %s :: Symlink (%s) points to non-existing %s +dependency-already-satisfied %s :: Dependency included but already satisfied ('%s') +dependency-covered-by-link-dependence %s :: Dependency covered by dependencies from link dependence (%s) +dependency-detected-not-included %s :: Dependency detected and not included ('%s') +dependency-detected-not-included %s from files %s :: Dependency detected and not included (%s) from files %s +dependency-is-testing-release %s :: Dependency '%s' on your system is a testing release +dependency-not-needed %s :: Dependency included and not needed ('%s') +depends-by-namcap-sight depends=(%s) :: Depends as namcap sees them: depends=(%s) +directory-not-world-executable %s :: Directory (%s) does not have the world executable bit set. +elffile-not-in-allowed-dirs %s :: ELF file (%s) outside of a valid path. +empty-directory %s :: Directory (%s) is empty +extra-var-begins-without-underscore %s :: Non standard variable '%s' doesn't start with an underscore +file-in-non-standard-dir %s :: File (%s) exists in a non-standard directory. +file-not-world-readable %s :: File (%s) does not have the world readable bit set. +file-referred-in-startdir :: File referenced in $startdir outside of $startdir/src or $startdir/pkg +file-world-writable %s :: File (%s) has the world writable bit set. +gnome-mime-file %s :: File (%s) is an auto-generated GNOME mime file +hardlink-found %s points to %s :: Hard link (%s) found that points to %s +hicolor-icon-cache-not-updated :: The hicolor icon cache has not been updated even though there are files in /usr/share/icons/hicolor. Use gtk-update-icon-theme or xdg-icon-resource to update the icon cache. +improper-md5sum %s :: Improper md5sum: '%s' +incorrect-library-permissions %s :: Library (%s) does not have permission set to 644. +incorrect-permissions %s (%s/%s) :: File (%s) has %s/%s permissions. +info-dir-file-present %s :: Info directory file (%s) needs removed. +insecure-rpath %s :: Insecure RPATH (%s). If present, RPATH should be only /usr/lib. +libtool-file-present %s :: File (%s) is a libtool file. +link-level-dependence %s on %s :: File '%s' link-level dependence on %s +mime-cache-not-updated :: Mime-file found. Add "update-mime-database usr/share/mime" to the install file +missing-contributor :: Missing Contributor tag +missing-custom-license-dir usr/share/licenses/%s :: Missing custom license directory (usr/share/licenses/%s) +missing-custom-license-file usr/share/licenses/%s/* :: Missing custom license file in package (usr/share/licenses/%s/*) +missing-license :: Missing license +missing-maintainer :: Missing Maintainer tag +missing-md5sums :: Missing md5sums +missing-url :: Missing url +non-fhs-info-page %s :: Non-FHS info page (%s) found. Use /usr/share/info instead +non-fhs-man-page %s :: Non-FHS man page (%s) found. Use /usr/share/man instead +not-a-common-license %s :: %s is not a common license (it's not in /usr/share/licenses/common/) +not-enough-md5sums %i needed :: Not enough md5sums: %i needed +package-name-in-uppercase :: No upper case letters in package names +perllocal-pod-present %s :: perllocal.pod found in %s. +pkgname-in-description :: Description should not contain the package name. +potential-non-fhs-info-page %s :: Potential non-FHS info page (%s) found. +potential-non-fhs-man-page %s :: Potential non-FHS man page (%s) found. +recommend-use-pkgdir :: Recommend use of $pkgdir instead of $startdir/pkg +recommend-use-srcdir :: Recommend use of $srcdir instead of $startdir/src +script-link-detected %s in %s :: Script link detected (%s) in file %s +scrollkeeper-dir-exists %s :: Scrollkeeper directory exists (%s). Remember to not run scrollkeeper till post_{install,upgrade,remove}. +specific-host-type-used %s :: Reference to one of %s should be changed to $CARCH +specific-sourceforge-mirror :: Attempting to use specific sourceforge mirror, use downloads.sourceforge.net instead +symlink-found %s points to %s :: Symlink (%s) found that points to %s +too-many-md5sums %i needed :: Too Many md5sums: %i needed +using-dl-sourceforge :: Attempting to use dl sourceforge domain, use downloads.sourceforge.net instead +variable-not-array %s :: Variable %s is not an array. diff --git a/namcap.py b/namcap.py index 10cdc14..7a26c2d 100755 --- a/namcap.py +++ b/namcap.py @@ -66,7 +66,7 @@ def verify_package(filename): return 0 return tar -def process_tags(filename="/usr/share/namcap/tags"): +def process_tags(filename="/usr/share/namcap/namcap-tags"): tags = {} f = open(filename) for i in f.readlines(): diff --git a/setup.py b/setup.py index fdf6109..0280e04 100755 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ from distutils.core import setup DATAFILES = [('/usr/share/man/man1', ['namcap.1']), - ('/usr/share/namcap', ['tags']), + ('/usr/share/namcap', ['namcap-tags']), ('/usr/share/doc/namcap',['README','AUTHORS','TODO'])] setup(name="namcap", diff --git a/tags b/tags deleted file mode 100644 index c6f170e..0000000 --- a/tags +++ /dev/null @@ -1,65 +0,0 @@ -# namcap tags file -# The tags file consists of lines specifying the human readable form of the -# hyphenated tags used in the namcap code. A line beginning with a '#' is -# treated as a comment. Otherwise the format of the file is: -# -# machine-parseable-tag %s :: This is machine parseable tag %s -# -# Note that a double colon (::) is used to separate the hyphenated tag from the -# human readable description. - -backups-preceding-slashes :: Backup entries should not have preceding slashes -dangling-hardlink %s points to %s :: Hard link (%s) points to non-existing %s -dangling-symlink %s points to %s :: Symlink (%s) points to non-existing %s -dependency-already-satisfied %s :: Dependency included but already satisfied ('%s') -dependency-covered-by-link-dependence %s :: Dependency covered by dependencies from link dependence (%s) -dependency-detected-not-included %s :: Dependency detected and not included ('%s') -dependency-detected-not-included %s from files %s :: Dependency detected and not included (%s) from files %s -dependency-is-testing-release %s :: Dependency '%s' on your system is a testing release -dependency-not-needed %s :: Dependency included and not needed ('%s') -depends-by-namcap-sight depends=(%s) :: Depends as namcap sees them: depends=(%s) -directory-not-world-executable %s :: Directory (%s) does not have the world executable bit set. -elffile-not-in-allowed-dirs %s :: ELF file (%s) outside of a valid path. -empty-directory %s :: Directory (%s) is empty -extra-var-begins-without-underscore %s :: Non standard variable '%s' doesn't start with an underscore -file-in-non-standard-dir %s :: File (%s) exists in a non-standard directory. -file-not-world-readable %s :: File (%s) does not have the world readable bit set. -file-referred-in-startdir :: File referenced in $startdir outside of $startdir/src or $startdir/pkg -file-world-writable %s :: File (%s) has the world writable bit set. -gnome-mime-file %s :: File (%s) is an auto-generated GNOME mime file -hardlink-found %s points to %s :: Hard link (%s) found that points to %s -hicolor-icon-cache-not-updated :: The hicolor icon cache has not been updated even though there are files in /usr/share/icons/hicolor. Use gtk-update-icon-theme or xdg-icon-resource to update the icon cache. -improper-md5sum %s :: Improper md5sum: '%s' -incorrect-library-permissions %s :: Library (%s) does not have permission set to 644. -incorrect-permissions %s (%s/%s) :: File (%s) has %s/%s permissions. -info-dir-file-present %s :: Info directory file (%s) needs removed. -insecure-rpath %s :: Insecure RPATH (%s). If present, RPATH should be only /usr/lib. -libtool-file-present %s :: File (%s) is a libtool file. -link-level-dependence %s on %s :: File '%s' link-level dependence on %s -mime-cache-not-updated :: Mime-file found. Add "update-mime-database usr/share/mime" to the install file -missing-contributor :: Missing Contributor tag -missing-custom-license-dir usr/share/licenses/%s :: Missing custom license directory (usr/share/licenses/%s) -missing-custom-license-file usr/share/licenses/%s/* :: Missing custom license file in package (usr/share/licenses/%s/*) -missing-license :: Missing license -missing-maintainer :: Missing Maintainer tag -missing-md5sums :: Missing md5sums -missing-url :: Missing url -non-fhs-info-page %s :: Non-FHS info page (%s) found. Use /usr/share/info instead -non-fhs-man-page %s :: Non-FHS man page (%s) found. Use /usr/share/man instead -not-a-common-license %s :: %s is not a common license (it's not in /usr/share/licenses/common/) -not-enough-md5sums %i needed :: Not enough md5sums: %i needed -package-name-in-uppercase :: No upper case letters in package names -perllocal-pod-present %s :: perllocal.pod found in %s. -pkgname-in-description :: Description should not contain the package name. -potential-non-fhs-info-page %s :: Potential non-FHS info page (%s) found. -potential-non-fhs-man-page %s :: Potential non-FHS man page (%s) found. -recommend-use-pkgdir :: Recommend use of $pkgdir instead of $startdir/pkg -recommend-use-srcdir :: Recommend use of $srcdir instead of $startdir/src -script-link-detected %s in %s :: Script link detected (%s) in file %s -scrollkeeper-dir-exists %s :: Scrollkeeper directory exists (%s). Remember to not run scrollkeeper till post_{install,upgrade,remove}. -specific-host-type-used %s :: Reference to one of %s should be changed to $CARCH -specific-sourceforge-mirror :: Attempting to use specific sourceforge mirror, use downloads.sourceforge.net instead -symlink-found %s points to %s :: Symlink (%s) found that points to %s -too-many-md5sums %i needed :: Too Many md5sums: %i needed -using-dl-sourceforge :: Attempting to use dl sourceforge domain, use downloads.sourceforge.net instead -variable-not-array %s :: Variable %s is not an array. diff --git a/tests/tags-check b/tests/tags-check index 31619e8..8392c2c 100755 --- a/tests/tags-check +++ b/tests/tags-check @@ -18,7 +18,7 @@ def process_tags(file): tags.add(i.split('::')[0].strip()) return tags -tagpath = '../tags' +tagpath = '../namcap-tags' basepath = '../Namcap' modules = filter(lambda s: s.endswith('.py'), os.listdir(basepath)) tags = process_tags(tagpath) @@ -38,7 +38,7 @@ for m in modules: if tags_in_modules - tags != set([]): print "Some tags are defined in the modules" - print "but not in the 'tags' file" + print "but not in the 'namcap-tags' file" print "\n".join(map(lambda s: " %s (in %s)" % (s, tags_by_file[s]), list(tags_in_modules - tags))) sys.exit(1) -- 1.6.4.4
We don't even need to look at the tags file if we are outputting machine readable tags, so don't bother. Make process_tags just return the lambda directly and even find a valid case for using a closure! Signed-off-by: Dan McGee <dan@archlinux.org> --- namcap.py | 14 ++++++++------ 1 files changed, 8 insertions(+), 6 deletions(-) diff --git a/namcap.py b/namcap.py index 7a26c2d..509ca3f 100755 --- a/namcap.py +++ b/namcap.py @@ -66,7 +66,9 @@ def verify_package(filename): return 0 return tar -def process_tags(filename="/usr/share/namcap/namcap-tags"): +def process_tags(filename="/usr/share/namcap/namcap-tags", machine=False): + if machine: + return lambda s: s tags = {} f = open(filename) for i in f.readlines(): @@ -74,7 +76,7 @@ def process_tags(filename="/usr/share/namcap/namcap-tags"): tagdata = i[:-1].split("::") tags[tagdata[0].strip()] = tagdata[1].strip() - return tags + return lambda s: tags[s] def check_rules_exclude(optlist): '''Check if the -r (--rules) and the -r (--exclude) options @@ -89,8 +91,8 @@ def check_rules_exclude(optlist): # Main modules = get_modules() -tags = process_tags() info_reporting = 0 +machine_readable = False # get our options and process them try: @@ -99,7 +101,6 @@ except getopt.GetoptError: usage() active_modules = [] -m = lambda s: tags[s] # Verifying if we are using the -r and -r options at same time if check_rules_exclude(optlist) > 1: @@ -137,8 +138,7 @@ for i, k in optlist: info_reporting = 1 if i in ('-m', '--machine-readable'): - machine_readable = 1 - m = lambda s: s + machine_readable = True if i in ('-h', '--help'): usage() @@ -147,6 +147,7 @@ for i, k in optlist: if (args == []): usage() +m = process_tags(machine=machine_readable) packages = args # Go through each package, get the info, and apply the rules @@ -237,4 +238,5 @@ for package in packages: if ret[2] != [] and info_reporting: for j in ret[2]: print string.ljust("PKGBUILD (" + pkginfo.name + ")", 20) + " I: " + m(j[0]) % j[1] + # vim: set ts=4 sw=4 noet: -- 1.6.4.4
Signed-off-by: Dan McGee <dan@archlinux.org> --- namcap.py | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/namcap.py b/namcap.py index 509ca3f..55c5240 100755 --- a/namcap.py +++ b/namcap.py @@ -152,12 +152,12 @@ packages = args # Go through each package, get the info, and apply the rules for package in packages: - extracted = 0 if not os.access(package, os.R_OK): print "Error: Problem reading " + package usage() if package[-7:] == '.tar.gz': + extracted = 0 pkgtar = verify_package(package) if not pkgtar: @@ -166,7 +166,6 @@ for package in packages: continue else: sys.exit(2) - pkginfo = pacman.load(package) @@ -206,7 +205,6 @@ for package in packages: for j in ret[2]: print string.ljust(pkginfo.name, 10) + " I: " + m(j[0]) % j[1] - # Clean up if we extracted anything if extracted: shutil.rmtree(sandbox_directory) -- 1.6.4.4
Signed-off-by: Dan McGee <dan@archlinux.org> --- namcap.py | 11 ++++------- 1 files changed, 4 insertions(+), 7 deletions(-) diff --git a/namcap.py b/namcap.py index 55c5240..e36bb7a 100755 --- a/namcap.py +++ b/namcap.py @@ -150,6 +150,10 @@ if (args == []): m = process_tags(machine=machine_readable) packages = args +# No rules selected? Then select them all! +if active_modules == []: + active_modules = modules + # Go through each package, get the info, and apply the rules for package in packages: if not os.access(package, os.R_OK): @@ -169,10 +173,6 @@ for package in packages: pkginfo = pacman.load(package) - # No rules selected? Then select them all! - if active_modules == []: - active_modules = modules - # Loop through each one, load them apply if possible for i in active_modules: cur_class = __import__('Namcap.' + i, globals(), locals(), [Namcap]) @@ -216,9 +216,6 @@ for package in packages: print "Error: " + package + " is not a valid PKGBUILD" continue - if active_modules == []: - active_modules = modules - for i in active_modules: cur_class = __import__('Namcap.' + i, globals(), locals(), [Namcap]) pkg = cur_class.package() -- 1.6.4.4
Signed-off-by: Dan McGee <dan@archlinux.org> --- namcap.py | 52 ++++++++++++++++++++++++++++------------------------ 1 files changed, 28 insertions(+), 24 deletions(-) diff --git a/namcap.py b/namcap.py index e36bb7a..06c79ea 100755 --- a/namcap.py +++ b/namcap.py @@ -89,6 +89,33 @@ def check_rules_exclude(optlist): args_used += 1 return args_used +def process_pkgbuild(package, modules): + # We might want to do some verifying in here... but really... isn't that what pacman.load is for? + pkginfo = pacman.load(package) + + if pkginfo == None: + print "Error: " + package + " is not a valid PKGBUILD" + return 1 + + for i in modules: + cur_class = __import__('Namcap.' + i, globals(), locals(), [Namcap]) + pkg = cur_class.package() + ret = [[],[],[]] + if pkg.type() == "pkgbuild": + ret = pkg.analyze(pkginfo, package) + + # Output the PKGBUILD messages + if ret[0] != []: + for j in ret[0]: + print string.ljust("PKGBUILD (" + pkginfo.name + ")", 20) + " E: " + m(j[0]) % j[1] + if ret[1] != []: + for j in ret[1]: + print string.ljust("PKGBUILD (" + pkginfo.name + ")", 20) + " W: " + m(j[0]) % j[1] + if ret[2] != [] and info_reporting: + for j in ret[2]: + print string.ljust("PKGBUILD (" + pkginfo.name + ")", 20) + " I: " + m(j[0]) % j[1] + + # Main modules = get_modules() info_reporting = 0 @@ -209,29 +236,6 @@ for package in packages: if extracted: shutil.rmtree(sandbox_directory) elif package[-8:] == 'PKGBUILD': - # We might want to do some verifying in here... but really... isn't that what pacman.load is for? - pkginfo = pacman.load(package) - - if pkginfo == None: - print "Error: " + package + " is not a valid PKGBUILD" - continue - - for i in active_modules: - cur_class = __import__('Namcap.' + i, globals(), locals(), [Namcap]) - pkg = cur_class.package() - ret = [[],[],[]] - if pkg.type() == "pkgbuild": - ret = pkg.analyze(pkginfo, package) - - # Output the PKGBUILD messages - if ret[0] != []: - for j in ret[0]: - print string.ljust("PKGBUILD (" + pkginfo.name + ")", 20) + " E: " + m(j[0]) % j[1] - if ret[1] != []: - for j in ret[1]: - print string.ljust("PKGBUILD (" + pkginfo.name + ")", 20) + " W: " + m(j[0]) % j[1] - if ret[2] != [] and info_reporting: - for j in ret[2]: - print string.ljust("PKGBUILD (" + pkginfo.name + ")", 20) + " I: " + m(j[0]) % j[1] + process_pkgbuild(package, active_modules) # vim: set ts=4 sw=4 noet: -- 1.6.4.4
Signed-off-by: Dan McGee <dan@archlinux.org> --- namcap.py | 96 +++++++++++++++++++++++++++++++------------------------------ 1 files changed, 49 insertions(+), 47 deletions(-) diff --git a/namcap.py b/namcap.py index 06c79ea..d9fccee 100755 --- a/namcap.py +++ b/namcap.py @@ -89,6 +89,52 @@ def check_rules_exclude(optlist): args_used += 1 return args_used +def process_realpackage(package, modules): + extracted = 0 + pkgtar = verify_package(package) + + if not pkgtar: + print "Error: " + package + " is empty or is not a valid package" + return 1 + + pkginfo = pacman.load(package) + + # Loop through each one, load them apply if possible + for i in modules: + cur_class = __import__('Namcap.' + i, globals(), locals(), [Namcap]) + pkg = cur_class.package() + ret = [[],[],[]] + if pkg.type() == "tarball": + if pkg.prereq() == "extract": + # If it's not extracted, then extract it and then analyze the package + if not extracted: + os.mkdir(sandbox_directory) + for j in pkgtar.getmembers(): + pkgtar.extract(j, sandbox_directory) + extracted = 1 + ret = pkg.analyze(pkginfo, sandbox_directory) + elif pkg.prereq() == "pkg": + ret = pkg.analyze(pkginfo, None) + elif pkg.prereq() == "tar": + ret = pkg.analyze(pkginfo, pkgtar) + else: + ret = [['Error running rule (' + i + ')'],[],[]] + + # Output the three types of messages + if ret[0] != []: + for j in ret[0]: + print string.ljust(pkginfo.name, 10) + " E: " + m(j[0]) % j[1] + if ret[1] != []: + for j in ret[1]: + print string.ljust(pkginfo.name, 10) + " W: " + m(j[0]) % j[1] + if ret[2] != [] and info_reporting: + for j in ret[2]: + print string.ljust(pkginfo.name, 10) + " I: " + m(j[0]) % j[1] + + # Clean up if we extracted anything + if extracted: + shutil.rmtree(sandbox_directory) + def process_pkgbuild(package, modules): # We might want to do some verifying in here... but really... isn't that what pacman.load is for? pkginfo = pacman.load(package) @@ -188,54 +234,10 @@ for package in packages: usage() if package[-7:] == '.tar.gz': - extracted = 0 - pkgtar = verify_package(package) - - if not pkgtar: - print "Error: " + package + " is empty or is not a valid package" - if len(packages) > 1: - continue - else: - sys.exit(2) - - pkginfo = pacman.load(package) - - # Loop through each one, load them apply if possible - for i in active_modules: - cur_class = __import__('Namcap.' + i, globals(), locals(), [Namcap]) - pkg = cur_class.package() - ret = [[],[],[]] - if pkg.type() == "tarball": - if pkg.prereq() == "extract": - # If it's not extracted, then extract it and then analyze the package - if not extracted: - os.mkdir(sandbox_directory) - for j in pkgtar.getmembers(): - pkgtar.extract(j, sandbox_directory) - extracted = 1 - ret = pkg.analyze(pkginfo, sandbox_directory) - elif pkg.prereq() == "pkg": - ret = pkg.analyze(pkginfo, None) - elif pkg.prereq() == "tar": - ret = pkg.analyze(pkginfo, pkgtar) - else: - ret = [['Error running rule (' + i + ')'],[],[]] - - # Output the three types of messages - if ret[0] != []: - for j in ret[0]: - print string.ljust(pkginfo.name, 10) + " E: " + m(j[0]) % j[1] - if ret[1] != []: - for j in ret[1]: - print string.ljust(pkginfo.name, 10) + " W: " + m(j[0]) % j[1] - if ret[2] != [] and info_reporting: - for j in ret[2]: - print string.ljust(pkginfo.name, 10) + " I: " + m(j[0]) % j[1] - - # Clean up if we extracted anything - if extracted: - shutil.rmtree(sandbox_directory) + process_realpackage(package, active_modules) elif package[-8:] == 'PKGBUILD': process_pkgbuild(package, active_modules) + else: + print "Error: Cannot process %s" % package # vim: set ts=4 sw=4 noet: -- 1.6.4.4
Namcap being slow pissed me off enough that I wanted to figure out what could be taking so long. As is usual with software, 1 module was causing about 90% of the slowdown. Compare the before and after with this patch: $ time namcap -m -r depends {4 packages} >/dev/null real 0m18.001s user 0m9.313s sys 0m7.966s $ time ./namcap.py -m -r depends {4 packages }>/dev/null real 0m4.512s user 0m3.770s sys 0m0.740s This patch addresses a few issue noticed when crawling and profiling the code using cProfile. First, we were doing O(n*m) string compares of files in the local database with libraries in the package. We can use a few tricks to cut down the number of checks- namely, once a library has been found we don't need to look for it anymore, and we can also ignore any non-'.so' files. Another part where we were really shooting ourselves in the foot was invoking readelf on *every single file* in the package. Cut that down so we only call readelf on files with the ELF magic bytes, which gives us a dramatic savings as we no longer fork for files in /usr/share, for instance. Bonus material: script checking left out perl, and script dependencies were formerly listed as link-level dependencies as well. Both of these have been fixed. The sad part is there is still room to improve here, but I decided enough was enough for now. :) Signed-off-by: Dan McGee <dan@archlinux.org> --- Namcap/depends.py | 104 ++++++++++++++++++++++++++++++++--------------------- namcap-tags | 2 +- 2 files changed, 64 insertions(+), 42 deletions(-) diff --git a/Namcap/depends.py b/Namcap/depends.py index e851e5c..4a3f733 100644 --- a/Namcap/depends.py +++ b/Namcap/depends.py @@ -19,7 +19,7 @@ import re, os, os.path, pacman, subprocess -supported_scripts = ['python', 'ruby', 'wish', 'expect', +supported_scripts = ['python', 'perl', 'ruby', 'wish', 'expect', 'tk', 'bash', 'sh', 'dash', 'tcsh', 'pdksh' ] pkgcache = {} @@ -50,7 +50,8 @@ def getcovered(current, dependlist, covereddepend): def figurebitsize(line): """ - Given a line of output from readelf (usually Shared library:) return 'i686' or 'x86-64' if the binary is a 32bit or 64bit binary + Given a line of output from readelf (usually Shared library:) return + 'i686' or 'x86-64' if the binary is a 32bit or 64bit binary """ address = line.split()[0] @@ -59,6 +60,22 @@ def figurebitsize(line): else: return 'i686' +def is_elf(path): + """ + Given a file path, ensure it exists and peek at the first few bytes + to determine if it is an ELF file. + """ + if not os.path.isfile(path): + return False + fd = open(path) + bytes = fd.read(4) + fd.close() + # magic elf header, present in binaries and libraries + if bytes == "\x7FELF": + return True + else: + return False + def scanlibs(data, dirname, names): """ Walk over all the files in the package and run "readelf -d" on them. @@ -66,41 +83,39 @@ def scanlibs(data, dirname, names): """ sharedlibs, scripts = data global supported_scripts + shared = re.compile('Shared library: \[(.*)\]') + script = re.compile('#!.*/(.*)') for i in names: - if os.path.isfile(dirname+'/'+i): - var = subprocess.Popen('readelf -d ' + dirname+'/'+i, + path = dirname + '/' + i + if is_elf(path): + var = subprocess.Popen('readelf -d ' + path, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() for j in var[0].split('\n'): - n = re.search('Shared library: \[(.*)\]', j) + n = shared.search(j) # Is this a Shared library: line? if n != None: # Find out its architecture architecture = figurebitsize(j) try: libpath = os.path.abspath(libcache[architecture][n.group(1)])[1:] - sharedlibs.setdefault(libpath, {})[dirname+'/'+i] = 1 + sharedlibs.setdefault(libpath, {})[path] = 1 except KeyError: # Ignore that library if we can't find it # TODO: review it pass - # But we can check to see if it's a script we know about - else: - try: - fd = open(dirname+'/'+i) - except: - continue - firstline = fd.readline() - firstlinere = re.match('#!.*/(.*)', firstline) - if firstlinere != None: - cmdname = firstlinere.group(1).split()[0] - if cmdname in supported_scripts: - scripts.setdefault(cmdname, {})[dirname+'/'+i] = 1 - - fd.close() - return - + # Maybe it is a script file + elif os.path.isfile(path): + fd = open(path) + firstline = fd.readline() + fd.close() + firstlinere = script.match(firstline) + if firstlinere != None: + cmdname = firstlinere.group(1).split()[0] + if cmdname in supported_scripts: + scripts.setdefault(cmdname, {})[path] = 1 + def finddepends(liblist): dependlist = {} foundlist = [] @@ -108,24 +123,28 @@ def finddepends(liblist): somatches = {} actualpath = {} - for j in liblist.keys(): + knownlibs = liblist.keys() + + for j in knownlibs: actualpath[j] = os.path.realpath('/'+j)[1:] # Sometimes packages don't include all so .so, .so.1, .so.1.13, .so.1.13.19 files # They rely on ldconfig to create all the symlinks # So we will strip off the matching part of the files and use this regexp to match the rest so_end = re.compile('(\.\d+)*') + # Whether we should even look at a particular file + is_so = re.compile('\.so') pacmandb = '/var/lib/pacman/local' for i in os.listdir(pacmandb): if os.path.isfile(pacmandb+'/'+i+'/files'): file = open(pacmandb+'/'+i+'/files') for j in file.readlines(): - if j[len(j)-1:]=='\n': - j = j[:len(j)-1] + if not is_so.search(j): + continue - for k in liblist.keys(): - # If the file is an exact match, so it's a match up to a point and everything after that point matches a the regexp + for k in knownlibs: + # File must be an exact match or have the right .so ending numbers # i.e. gpm includes libgpm.so and libgpm.so.1.19.0, but everything links to libgpm.so.1 # We compare find libgpm.so.1.19.0 startswith libgpm.so.1 and .19.0 matches the regexp if j == actualpath[k] or (j.startswith(actualpath[k]) and so_end.match(j[len(actualpath[k]):])): @@ -134,13 +153,14 @@ def finddepends(liblist): dependlist[n.group(1)] = {} for x in liblist[k]: dependlist[n.group(1)][x] = 1 + knownlibs.remove(k) foundlist.append(k) file.close() ret = [] - for i in liblist.keys(): - if i not in foundlist: - ret.append('Library ' + i + ' has no package associated') + # knownlibs only contains what hasn't been found at this point + for i in knownlibs: + ret.append('Library ' + i + ' has no package associated') return dependlist, ret def getprovides(depends, provides): @@ -155,8 +175,9 @@ def filllibcache(): shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() + libline = re.compile('\s*(.*) \((.*)\) => (.*)') for j in var[0].split('\n'): - g = re.match('\s*(.*) \((.*)\) => (.*)',j) + g = libline.match(j) if g != None: if g.group(2).startswith('libc6,x86-64'): libcache['x86-64'][g.group(1)] = g.group(3) @@ -190,6 +211,16 @@ class package: for i in tmpret: ret[1].append(i) + # Remove the package name from that list, we can't depend on ourselves. + if dependlist.has_key(pkginfo.name): + del dependlist[pkginfo.name] + + # Print link-level deps + for i, v in dependlist.iteritems(): + if type(v) == dict: + files = [x[len(data)+1:] for x in v.keys()] + ret[2].append(("link-level-dependence %s in %s", (i, str(files)))) + # Do the script handling stuff for i, v in liblist[1].iteritems(): if not dependlist.has_key(i): @@ -199,16 +230,6 @@ class package: files = [x[len(data)+1:] for x in v.keys()] ret[2].append(("script-link-detected %s in %s", (i, str(files)))) - # Remove the package name from that list, we can't depend on ourselves. - if dependlist.has_key(pkginfo.name): - del dependlist[pkginfo.name] - - # Do the info stuff - for i, v in dependlist.iteritems(): - if type(v) == dict: - files = [x[len(data)+1:] for x in v.keys()] - ret[2].append(("link-level-dependence %s on %s", (str(files), i))) - # Check for packages in testing if os.path.isdir('/var/lib/pacman/sync/testing'): for i in dependlist.keys(): @@ -267,4 +288,5 @@ class package: return ret def type(self): return "tarball" + # vim: set ts=4 sw=4 noet: diff --git a/namcap-tags b/namcap-tags index c6f170e..710077c 100644 --- a/namcap-tags +++ b/namcap-tags @@ -35,7 +35,7 @@ incorrect-permissions %s (%s/%s) :: File (%s) has %s/%s permissions. info-dir-file-present %s :: Info directory file (%s) needs removed. insecure-rpath %s :: Insecure RPATH (%s). If present, RPATH should be only /usr/lib. libtool-file-present %s :: File (%s) is a libtool file. -link-level-dependence %s on %s :: File '%s' link-level dependence on %s +link-level-dependence %s in %s :: Link-level dependence (%s) in file %s mime-cache-not-updated :: Mime-file found. Add "update-mime-database usr/share/mime" to the install file missing-contributor :: Missing Contributor tag missing-custom-license-dir usr/share/licenses/%s :: Missing custom license directory (usr/share/licenses/%s) -- 1.6.4.4
On Mon, Sep 28, 2009 at 6:16 AM, Dan McGee <dan@archlinux.org> wrote:
In the spirit of Eli making a bunch of patches for the AUR, I finally decided to sit down tonight and figure out why the heck namcap was sucking it up, and did a little code cleanup along the way. namcap.py is now a bit cleaner and separated into functions, and the real treat is namcap is a hell of a lot faster now that I found the bottleneck, which was the depends hook. The first 6 patches in this series lead up to the 7th, which is where the speed increase is found.
Let me know what you see, otherwise it would be cool to get this in and a new namcap release made, as it has rather dramatic effects with regards to speed.
If you don't like patches, you can get all this from my git tree as well: http://code.toofishes.net/cgit/dan/namcap.git/log/?h=working
That sounds pretty cool. Just yesterday, I was complaining to myself that namcap was too slow. I will give this a try :)
participants (2)
-
Dan McGee
-
Xavier