[pacman-dev] [PATCH] repo-add: use bsdtar optimization for better performance
When unzipping packages and the database archives, we don't need to look
through the entire archive to do what we need to do. For packages, .PKGINFO
should only be found once and should be the first file in the package. For
the database check, we only really need to look for one desc file.
The bsdtar -q option is very similar to the GNU tar --occurrence=1 option.
Example of speedup:
$ time repo-add junkdb.db.tar.gz *.pkg.tar.gz >/dev/null
real 0m16.159s
user 0m14.836s
sys 0m2.277s
$ time ./scripts/repo-add junkdb.db.tar.gz *.pkg.tar.gz >/dev/null
real 0m4.949s
user 0m3.730s
sys 0m2.093s
Signed-off-by: Dan McGee
On Wednesday 17 June 2009 03:10:27 Dan McGee wrote:
The bsdtar -q option is very similar to the GNU tar --occurrence=1 option.
Good catch. I tested this myself yesterday and just wondered why there was a delay even if the PKGINFO files was already read and printed. Does this mean that now extracting PKGINFO can be done in constant time no matter how big the package is? -- Pierre Schmitz Clemens-August-Straße 76 53115 Bonn Telefon 0228 9716608 Mobil 0160 95269831 Jabber pierre@jabber.archlinux.de WWW http://www.archlinux.de
On Wed, Jun 17, 2009 at 12:03 AM, Pierre Schmitz
On Wednesday 17 June 2009 03:10:27 Dan McGee wrote:
The bsdtar -q option is very similar to the GNU tar --occurrence=1 option.
Good catch. I tested this myself yesterday and just wondered why there was a delay even if the PKGINFO files was already read and printed. Does this mean that now extracting PKGINFO can be done in constant time no matter how big the package is?
This should be the case, as long as .PKGINFO is always the first file in the archive. makepkg ensures this when it zips, but anyone playing tricks could screw this up- however, that just means these packages would take longer. -Dan
participants (3)
-
Dan McGee
-
Dan McGee
-
Pierre Schmitz