[pacman-dev] [PATCH] repo-add: use bsdtar optimization for better performance
When unzipping packages and the database archives, we don't need to look through the entire archive to do what we need to do. For packages, .PKGINFO should only be found once and should be the first file in the package. For the database check, we only really need to look for one desc file. The bsdtar -q option is very similar to the GNU tar --occurrence=1 option. Example of speedup: $ time repo-add junkdb.db.tar.gz *.pkg.tar.gz >/dev/null real 0m16.159s user 0m14.836s sys 0m2.277s $ time ./scripts/repo-add junkdb.db.tar.gz *.pkg.tar.gz >/dev/null real 0m4.949s user 0m3.730s sys 0m2.093s Signed-off-by: Dan McGee <dan@archlinux.org> --- scripts/repo-add.sh.in | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 7c12aaf..5454fb0 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -193,7 +193,7 @@ db_write_entry() # read info from the zipped package local line var val - for line in $(bsdtar -xOf "$pkgfile" .PKGINFO | + for line in $(bsdtar -xOqf "$pkgfile" .PKGINFO | grep -v '^#' | sed 's|\(\w*\)\s*=\s*\(.*\)|\1 \2|'); do # bash awesomeness here- var is always one word, val is everything else var=${line%% *} @@ -305,7 +305,7 @@ check_repo_db() fi if [ -f "$REPO_DB_FILE" ]; then - if ! (bsdtar -tf "$REPO_DB_FILE" | grep -q "/desc"); then + if ! bsdtar -tqf "$REPO_DB_FILE" '*/desc' 2>&1 >/dev/null; then error "$(gettext "Repository file '%s' is not a proper pacman database.")" "$REPO_DB_FILE" exit 1 fi @@ -351,7 +351,7 @@ add() fi pkgfile=$1 - if ! bsdtar -tf "$pkgfile" .PKGINFO 2>&1 >/dev/null; then + if ! bsdtar -tqf "$pkgfile" .PKGINFO 2>&1 >/dev/null; then error "$(gettext "'%s' is not a package file, skipping")" "$pkgfile" return 1 fi -- 1.6.3.2
On Wednesday 17 June 2009 03:10:27 Dan McGee wrote:
The bsdtar -q option is very similar to the GNU tar --occurrence=1 option.
Good catch. I tested this myself yesterday and just wondered why there was a delay even if the PKGINFO files was already read and printed. Does this mean that now extracting PKGINFO can be done in constant time no matter how big the package is? -- Pierre Schmitz Clemens-August-Straße 76 53115 Bonn Telefon 0228 9716608 Mobil 0160 95269831 Jabber pierre@jabber.archlinux.de WWW http://www.archlinux.de
On Wed, Jun 17, 2009 at 12:03 AM, Pierre Schmitz<pierre@archlinux.de> wrote:
On Wednesday 17 June 2009 03:10:27 Dan McGee wrote:
The bsdtar -q option is very similar to the GNU tar --occurrence=1 option.
Good catch. I tested this myself yesterday and just wondered why there was a delay even if the PKGINFO files was already read and printed. Does this mean that now extracting PKGINFO can be done in constant time no matter how big the package is?
This should be the case, as long as .PKGINFO is always the first file in the archive. makepkg ensures this when it zips, but anyone playing tricks could screw this up- however, that just means these packages would take longer. -Dan
participants (3)
-
Dan McGee
-
Dan McGee
-
Pierre Schmitz