[pacman-dev] [PATCH 1/3] libmakepkg: use extraction commands instead of file to find archive type
Previously, to determine which command we should use to extract an archive, we would run file and match the output against our list of possible extraction commands Instead, run the archive through each extraction command's -t (--test) flag, if this succeeds then we know that the command is able to extract the file and is the one to use Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com> --- scripts/libmakepkg/source/file.sh.in | 39 ++++++++-------------------- 1 file changed, 11 insertions(+), 28 deletions(-) diff --git a/scripts/libmakepkg/source/file.sh.in b/scripts/libmakepkg/source/file.sh.in index 7297a1c6..faace79b 100644 --- a/scripts/libmakepkg/source/file.sh.in +++ b/scripts/libmakepkg/source/file.sh.in @@ -96,35 +96,18 @@ extract_file() { fi # do not rely on extension for file type - local file_type=$(@FILECMD@ -bizL -- "$file") - local ext=${file##*.} local cmd='' - case "$file_type" in - *application/x-tar*|*application/zip*|*application/x-zip*|*application/x-cpio*) - cmd="bsdtar" ;; - *application/x-gzip*|*application/gzip*) - case "$ext" in - gz|z|Z) cmd="gzip" ;; - *) return;; - esac ;; - *application/x-bzip*) - case "$ext" in - bz2|bz) cmd="bzip2" ;; - *) return;; - esac ;; - *application/x-xz*) - case "$ext" in - xz) cmd="xz" ;; - *) return;; - esac ;; - *) - # See if bsdtar can recognize the file - if bsdtar -tf "$file" -q '*' &>/dev/null; then - cmd="bsdtar" - else - return 0 - fi ;; - esac + if bsdtar -tf "$file" -q '*'; then + cmd='bsdtar' + elif gzip -t "$file"; then + cmd='gzip' + elif bzip2 -t "$file"; then + cmd='bzip2' + elif xz -t "$file"; then + cmd='xz' + else + return 0 + fi &>/dev/null local ret=0 msg2 "$(gettext "Extracting %s with %s")" "$file" "$cmd" -- 2.23.0
Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com> --- scripts/libmakepkg/tidy/strip.sh.in | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in index 1bd810f0..301d1989 100644 --- a/scripts/libmakepkg/tidy/strip.sh.in +++ b/scripts/libmakepkg/tidy/strip.sh.in @@ -111,22 +111,20 @@ tidy_strip() { local binary strip_flags find . -type f -perm -u+w -print0 2>/dev/null | while IFS= read -rd '' binary ; do - case "$(file -bi "$binary")" in - *application/x-sharedlib*) # Libraries (.so) + case "$(LC_ALL=C readelf -h "$binary" 2>/dev/null)" in + *Type:*'DYN (Shared object file)'*) # Libraries (.so) or Relocatable binaries strip_flags="$STRIP_SHARED";; - *application/x-archive*) # Libraries (.a) - strip_flags="$STRIP_STATIC";; - *application/x-object*) - case "$binary" in - *.ko) # Kernel module - strip_flags="$STRIP_SHARED";; - *) - continue;; - esac;; - *application/x-executable*) # Binaries + *Type:*'EXEC (Executable file)'*) # Binaries strip_flags="$STRIP_BINARIES";; - *application/x-pie-executable*) # Relocatable binaries - strip_flags="$STRIP_SHARED";; + *Type:*'REL (Relocatable file)'*) # Libraries (.a) or objects + if ar t "$binary" &>/dev/null; then # Libraries (.a) + strip_flags="$STRIP_STATIC" + elif [[ $binary = *'.ko' ]]; then # Kernel module + strip_flags="$STRIP_SHARED" + else + continue + fi + ;; *) continue ;; esac -- 2.23.0
On 11/26/19 4:29 PM, Ethan Sommer wrote:
Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com> --- scripts/libmakepkg/tidy/strip.sh.in | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in index 1bd810f0..301d1989 100644 --- a/scripts/libmakepkg/tidy/strip.sh.in +++ b/scripts/libmakepkg/tidy/strip.sh.in @@ -111,22 +111,20 @@ tidy_strip() {
local binary strip_flags find . -type f -perm -u+w -print0 2>/dev/null | while IFS= read -rd '' binary ; do - case "$(file -bi "$binary")" in - *application/x-sharedlib*) # Libraries (.so) + case "$(LC_ALL=C readelf -h "$binary" 2>/dev/null)" in
More squelching of errors. Incidentally, is the output of this, documented, or can it arbitrarily change in the future? file is guaranteed to emit output in the RFC-documented IANA media type format.
+ *Type:*'DYN (Shared object file)'*) # Libraries (.so) or Relocatable binaries strip_flags="$STRIP_SHARED";; - *application/x-archive*) # Libraries (.a) - strip_flags="$STRIP_STATIC";; - *application/x-object*) - case "$binary" in - *.ko) # Kernel module - strip_flags="$STRIP_SHARED";; - *) - continue;; - esac;; - *application/x-executable*) # Binaries + *Type:*'EXEC (Executable file)'*) # Binaries strip_flags="$STRIP_BINARIES";; - *application/x-pie-executable*) # Relocatable binaries - strip_flags="$STRIP_SHARED";; + *Type:*'REL (Relocatable file)'*) # Libraries (.a) or objects + if ar t "$binary" &>/dev/null; then # Libraries (.a) + strip_flags="$STRIP_STATIC" + elif [[ $binary = *'.ko' ]]; then # Kernel module + strip_flags="$STRIP_SHARED" + else + continue + fi + ;; *) continue ;; esac
-- Eli Schwartz Bug Wrangler and Trusted User
--- a/scripts/libmakepkg/tidy/strip.sh.in +++ b/scripts/libmakepkg/tidy/strip.sh.in @@ -111,22 +111,20 @@ tidy_strip() {
local binary strip_flags find . -type f -perm -u+w -print0 2>/dev/null | while IFS= read -rd '' binary ; do - case "$(file -bi "$binary")" in - *application/x-sharedlib*) # Libraries (.so) + case "$(LC_ALL=C readelf -h "$binary" 2>/dev/null)" in
More squelching of errors. The error redirection to null is because readelf outputs an error when the file it is run on isn't an ELF file at all, and obviously that error is unwanted.
Incidentally, is the output of this, documented, or can it arbitrarily change in the future? file is guaranteed to emit output in the RFC-documented IANA media type format. These file types and their specification in the file header are part of the ELF spec, and the readelf output has remained consistent, at least the part that matters to us, and the relevant part is the same across 4 different implementations, and I doubt there will be some change of the word 'Type:' being used, given that the ELF specification refers to those file types as 'types'.
On 11/27/19 12:32 PM, Ethan Sommer wrote:
--- a/scripts/libmakepkg/tidy/strip.sh.in +++ b/scripts/libmakepkg/tidy/strip.sh.in @@ -111,22 +111,20 @@ tidy_strip() {
local binary strip_flags find . -type f -perm -u+w -print0 2>/dev/null | while IFS= read -rd '' binary ; do - case "$(file -bi "$binary")" in - *application/x-sharedlib*) # Libraries (.so) + case "$(LC_ALL=C readelf -h "$binary" 2>/dev/null)" in
More squelching of errors. The error redirection to null is because readelf outputs an error when the file it is run on isn't an ELF file at all, and obviously that error is unwanted.
Right, we just want all the other errors.
Incidentally, is the output of this, documented, or can it arbitrarily change in the future? file is guaranteed to emit output in the RFC-documented IANA media type format. These file types and their specification in the file header are part of the ELF spec
Let's pretend I stated from the beginning that I trust the readelf program to be able to also interpret future revisions of the file format.
, and the readelf output has remained consistent, at least the part that matters to us, and the relevant part is the same across 4 different implementations, and I doubt there will be some change of the word 'Type:' being used, given that the ELF specification refers to those file types as 'types'.
I'll take that as a "no, but it's worked for long enough they probably won't change it now". Which I suppose is not so unreasonable... -- Eli Schwartz Bug Wrangler and Trusted User
On 27/11/19 7:29 am, Ethan Sommer wrote:
Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com> --- scripts/libmakepkg/tidy/strip.sh.in | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-)
This diff is an absolute pain to read! I went through all files on my system, and confirmed this change does what it is supposed to. Files are all stripped the same way. Onto the issues Eli brought up: 1) Hiding errors from readelf. If readelf errors in any way, we are not going to be stripping the file anyway. So I am fine with that. 2) Format of output. I am OK with this given multiple implementations use the same format, and we already rely on parsing the readelf output when making debug packages. In summmary. Ack. A
diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in index 1bd810f0..301d1989 100644 --- a/scripts/libmakepkg/tidy/strip.sh.in +++ b/scripts/libmakepkg/tidy/strip.sh.in @@ -111,22 +111,20 @@ tidy_strip() {
local binary strip_flags find . -type f -perm -u+w -print0 2>/dev/null | while IFS= read -rd '' binary ; do - case "$(file -bi "$binary")" in - *application/x-sharedlib*) # Libraries (.so) + case "$(LC_ALL=C readelf -h "$binary" 2>/dev/null)" in + *Type:*'DYN (Shared object file)'*) # Libraries (.so) or Relocatable binaries strip_flags="$STRIP_SHARED";; - *application/x-archive*) # Libraries (.a) - strip_flags="$STRIP_STATIC";; - *application/x-object*) - case "$binary" in - *.ko) # Kernel module - strip_flags="$STRIP_SHARED";; - *) - continue;; - esac;; - *application/x-executable*) # Binaries + *Type:*'EXEC (Executable file)'*) # Binaries strip_flags="$STRIP_BINARIES";; - *application/x-pie-executable*) # Relocatable binaries - strip_flags="$STRIP_SHARED";; + *Type:*'REL (Relocatable file)'*) # Libraries (.a) or objects + if ar t "$binary" &>/dev/null; then # Libraries (.a) + strip_flags="$STRIP_STATIC" + elif [[ $binary = *'.ko' ]]; then # Kernel module + strip_flags="$STRIP_SHARED" + else + continue + fi + ;; *) continue ;; esac
Signed-off-by: Ethan Sommer <e5ten.arch@gmail.com> --- build-aux/edit-script.sh.in | 1 - configure.ac | 12 ------------ meson.build | 9 --------- scripts/makepkg.sh.in | 4 +--- 4 files changed, 1 insertion(+), 25 deletions(-) diff --git a/build-aux/edit-script.sh.in b/build-aux/edit-script.sh.in index 661c22d5..6ed563be 100644 --- a/build-aux/edit-script.sh.in +++ b/build-aux/edit-script.sh.in @@ -19,7 +19,6 @@ mode=$3 -e "s|@TEMPLATE_DIR[@]|@TEMPLATE_DIR@|g" \ -e "s|@DEBUGSUFFIX[@]|@DEBUGSUFFIX@|g" \ -e "s|@INODECMD[@]|@INODECMD@|g" \ - -e "s|@FILECMD[@]|@FILECMD@|g" \ "$input" >"$output" if [[ $mode ]]; then diff --git a/configure.ac b/configure.ac index e59f82e9..137f30e4 100644 --- a/configure.ac +++ b/configure.ac @@ -228,18 +228,6 @@ PKG_CHECK_VAR(bashcompdir, [bash-completion], [completionsdir], , PKG_CHECK_MODULES(LIBARCHIVE, [libarchive >= 3.0.0], , AC_MSG_ERROR([*** libarchive >= 3.0.0 is needed to compile pacman!])) -# Check file for seccomp -if test "x$with_file_seccomp" = "xauto"; then - file_version="$(file --version| sed -n 's/^file-\(.*\)/\1/p')" - AX_COMPARE_VERSION([$file_version], [ge], [5.38], [with_file_seccomp=yes]) -fi -if test "x$with_file_seccomp" = "xyes"; then - FILECMD="file -S" -else - FILECMD="file" -fi -AC_SUBST(FILECMD) - # Check for OpenSSL have_openssl=no have_nettle=no diff --git a/meson.build b/meson.build index 2c9185a6..f149548d 100644 --- a/meson.build +++ b/meson.build @@ -219,19 +219,11 @@ config_h = configure_file( configuration : conf) add_project_arguments('-include', 'config.h', language : 'c') -filecmd = 'file' inodecmd = 'stat -c \'%i %n\'' strip_binaries = '--strip-all' strip_shared = '--strip-unneeded' strip_static = '--strip-debug' -file_seccomp = get_option('file-seccomp') -# meson-git has find_program('file', required: false, version: '>=5.38') -filever = run_command('sh', '-c', 'file --version | sed -n "s/^file-\(.*\)/\\1/p"').stdout() -if file_seccomp.enabled() or ( file_seccomp.auto() and filever.version_compare('>= 5.38') ) - filecmd = 'file -S' -endif - os = host_machine.system() if os.startswith('darwin') inodecmd = '/usr/bin/stat -f \'%i %N\'' @@ -266,7 +258,6 @@ substs.set('BUILDSCRIPT', BUILDSCRIPT) substs.set('TEMPLATE_DIR', get_option('makepkg-template-dir')) substs.set('DEBUGSUFFIX', get_option('debug-suffix')) substs.set('INODECMD', inodecmd) -substs.set('FILECMD', filecmd) substs.set('LIBMAKEPKGDIR', LIBMAKEPKGDIR) substs.set('STRIP_BINARIES', strip_binaries) substs.set('STRIP_SHARED', strip_shared) diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 2deb61da..06d36f6b 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -26,15 +26,13 @@ # makepkg uses quite a few external programs during its execution. You # need to have at least the following installed for makepkg to function: -# awk, bsdtar (libarchive), bzip2, coreutils, fakeroot, file, find (findutils), +# awk, bsdtar (libarchive), bzip2, coreutils, fakeroot, find (findutils), # gettext, gpg, grep, gzip, sed, tput (ncurses), xz # gettext initialization export TEXTDOMAIN='pacman-scripts' export TEXTDOMAINDIR='@localedir@' -# file -i does not work on Mac OSX unless legacy mode is set -export COMMAND_MODE='legacy' # Ensure CDPATH doesn't screw with our cd calls unset CDPATH # Ensure GREP_OPTIONS doesn't screw with our grep calls -- 2.23.0
On 11/26/19 4:29 PM, Ethan Sommer wrote:
Previously, to determine which command we should use to extract an archive, we would run file and match the output against our list of possible extraction commands
Instead, run the archive through each extraction command's -t (--test) flag, if this succeeds then we know that the command is able to extract the file and is the one to use
Missing rationale why we care.
scripts/libmakepkg/source/file.sh.in | 39 ++++++++-------------------- 1 file changed, 11 insertions(+), 28 deletions(-)
diff --git a/scripts/libmakepkg/source/file.sh.in b/scripts/libmakepkg/source/file.sh.in index 7297a1c6..faace79b 100644 --- a/scripts/libmakepkg/source/file.sh.in +++ b/scripts/libmakepkg/source/file.sh.in @@ -96,35 +96,18 @@ extract_file() { fi
# do not rely on extension for file type - local file_type=$(@FILECMD@ -bizL -- "$file") - local ext=${file##*.} local cmd='' - case "$file_type" in - *application/x-tar*|*application/zip*|*application/x-zip*|*application/x-cpio*) - cmd="bsdtar" ;; - *application/x-gzip*|*application/gzip*) - case "$ext" in - gz|z|Z) cmd="gzip" ;; - *) return;; - esac ;; - *application/x-bzip*) - case "$ext" in - bz2|bz) cmd="bzip2" ;; - *) return;; - esac ;; - *application/x-xz*) - case "$ext" in - xz) cmd="xz" ;; - *) return;; - esac ;; - *) - # See if bsdtar can recognize the file - if bsdtar -tf "$file" -q '*' &>/dev/null; then - cmd="bsdtar" - else - return 0 - fi ;; - esac + if bsdtar -tf "$file" -q '*'; then + cmd='bsdtar' + elif gzip -t "$file"; then + cmd='gzip' + elif bzip2 -t "$file"; then + cmd='bzip2' + elif xz -t "$file"; then + cmd='xz' + else + return 0 + fi &>/dev/null
I had to look at this three times before I realized you were redirecting both stdout and stderr of the entire if/elif block. Because these commands check to see if the file is "uncorrupted", not to see if they are "formatted in this compression format", I imagine this would falsely fail files if they have some class of recoverable error, but I'm not sure if we care. What we do care about, I think, is that you're silencing stderr due to the fact that it will noisily print diagnostic messages when the file is not compressed or is in a different compression format, and bsdtar will print listed filenames to stdout. So now, if these commands fail for *any* reason, we treat this as a successful analysis that the file is in a different format. One reason for it failing might be that xz is not installed... this is hardly a successful analysis, and printing an "error: command not found" message during the decompression, then dying with "Failed to extract %s" is a desirable feature.
local ret=0 msg2 "$(gettext "Extracting %s with %s")" "$file" "$cmd"
-- Eli Schwartz Bug Wrangler and Trusted User
On 27/11/19 7:29 am, Ethan Sommer wrote:
Previously, to determine which command we should use to extract an archive, we would run file and match the output against our list of possible extraction commands
Instead, run the archive through each extraction command's -t (--test) flag, if this succeeds then we know that the command is able to extract the file and is the one to use
Apart from the issues Eli brought up, there is a large performance regression on large source files. $ time file -bizL libreoffice-6.3.4.1.tar.xz application/x-tar; charset=binary compressed-encoding=application/x-xz; charset=binary real 0m0.034s user 0m0.031s sys 0m0.004s $ time xz -t libreoffice-6.3.4.1.tar.xz real 0m11.970s user 0m11.898s sys 0m0.057s Allan
participants (3)
-
Allan McRae
-
Eli Schwartz
-
Ethan Sommer