[pacman-dev] [PATCH 0/2] Correct package size calculation in makepkg
From: Ronan Pigott <rpigott@berkeley.edu> This patchset is a fix for FS#64252 and is mostly the result of some discussion there. The goal is to fix the double counting of hard linked files and also to correct some errors counting the size of certain file names that coincide with valid cat flag options. To make supporting whitespace in filenames easier, I changed INODECMD to output only the inode and instead rely on find's -print0 to reliably parse complete filenames. This fixes some errors for files that begin or end in whitespace. It was noted in discussion that -print0 is not POSIX, but it seems portable regardless. Note it is possible to get a null-delimited list of file names in pure bash using a combination of bash's printf and globbing, if it is necessary. Ronan Pigott (2): makepkg: use null-delimited file lists when tracking inodes makepkg: do not count hard linked file sizes multiple times configure.ac | 6 +++--- meson.build | 6 +++--- scripts/libmakepkg/tidy/zipman.sh.in | 9 +++++---- scripts/makepkg.sh.in | 10 +++++++++- 4 files changed, 20 insertions(+), 11 deletions(-) -- 2.23.0
From: Ronan Pigott <rpigott@berkeley.edu> --- configure.ac | 6 +++--- meson.build | 6 +++--- scripts/libmakepkg/tidy/zipman.sh.in | 9 +++++---- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/configure.ac b/configure.ac index d68b3591..fbb3b83b 100644 --- a/configure.ac +++ b/configure.ac @@ -370,18 +370,18 @@ GCC_VISIBILITY_CC # Host-dependant definitions DEFAULT_SEDINPLACEFLAGS=" --follow-symlinks -i" -INODECMD="stat -c '%i %n'" +INODECMD="stat -c '%i'" STRIP_BINARIES="--strip-all" STRIP_SHARED="--strip-unneeded" STRIP_STATIC="--strip-debug" case "${host_os}" in *bsd*) - INODECMD="stat -f '%i %N'" + INODECMD="stat -f '%i'" DEFAULT_SEDINPLACEFLAGS=" -i \"\"" ;; darwin*) host_os_darwin=yes - INODECMD="/usr/bin/stat -f '%i %N'" + INODECMD="/usr/bin/stat -f '%i'" DEFAULT_SEDINPLACEFLAGS=" -i ''" STRIP_BINARIES="" STRIP_SHARED="-S" diff --git a/meson.build b/meson.build index c5704efe..648dc80e 100644 --- a/meson.build +++ b/meson.build @@ -222,7 +222,7 @@ add_project_arguments('-include', 'config.h', language : 'c') filecmd = 'file' default_sedinplaceflags = ' --follow-symlinks -i' -inodecmd = 'stat -c \'%i %n\'' +inodecmd = 'stat -c \'%i\'' strip_binaries = '--strip-all' strip_shared = '--strip-unneeded' strip_static = '--strip-debug' @@ -236,13 +236,13 @@ endif os = host_machine.system() if os.startswith('darwin') - inodecmd = '/usr/bin/stat -f \'%i %n\'' + inodecmd = '/usr/bin/stat -f \'%i\'' default_sedinplaceflags = ' -i \'\'' strip_binaries = '' strip_shared = '-s' strip_static = '-s' elif os.contains('bsd') or os == 'dragonfly' - inodecmd = 'stat -f \'%i %n\'' + inodecmd = 'stat -f \'%i\'' default_sedinplaceflags = ' -i \'\'' endif diff --git a/scripts/libmakepkg/tidy/zipman.sh.in b/scripts/libmakepkg/tidy/zipman.sh.in index 3c2e261e..dd36ebef 100644 --- a/scripts/libmakepkg/tidy/zipman.sh.in +++ b/scripts/libmakepkg/tidy/zipman.sh.in @@ -33,9 +33,10 @@ tidy_modify+=('tidy_zipman') tidy_zipman() { if check_option "zipman" "y" && [[ -n ${MAN_DIRS[*]} ]]; then msg2 "$(gettext "Compressing man and info pages...")" - local file files inode link - while read -rd ' ' inode; do - read file + local file inode link + declare -A files + while read -rd $'\0' file; do + inode=$( @INODECMD@ "$file" ) find ${MAN_DIRS[@]} -type l 2>/dev/null | while read -r link ; do if [[ "${file}" -ef "${link}" ]] ; then @@ -56,6 +57,6 @@ tidy_zipman() { chmod 644 "${file}.gz" fi done < <(find ${MAN_DIRS[@]} -type f \! -name "*.gz" \! -name "*.bz2" \ - -exec @INODECMD@ '{}' + 2>/dev/null) + -print0) fi } -- 2.23.0
On 10/26/19 11:11 PM, Ronan Pigott wrote:
From: Ronan Pigott <rpigott@berkeley.edu>
--- configure.ac | 6 +++--- meson.build | 6 +++--- scripts/libmakepkg/tidy/zipman.sh.in | 9 +++++---- 3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/configure.ac b/configure.ac index d68b3591..fbb3b83b 100644 --- a/configure.ac +++ b/configure.ac @@ -370,18 +370,18 @@ GCC_VISIBILITY_CC
# Host-dependant definitions DEFAULT_SEDINPLACEFLAGS=" --follow-symlinks -i" -INODECMD="stat -c '%i %n'" +INODECMD="stat -c '%i'" STRIP_BINARIES="--strip-all" STRIP_SHARED="--strip-unneeded" STRIP_STATIC="--strip-debug" case "${host_os}" in *bsd*) - INODECMD="stat -f '%i %N'" + INODECMD="stat -f '%i'" DEFAULT_SEDINPLACEFLAGS=" -i \"\"" ;; darwin*) host_os_darwin=yes - INODECMD="/usr/bin/stat -f '%i %N'" + INODECMD="/usr/bin/stat -f '%i'" DEFAULT_SEDINPLACEFLAGS=" -i ''" STRIP_BINARIES="" STRIP_SHARED="-S" diff --git a/meson.build b/meson.build index c5704efe..648dc80e 100644 --- a/meson.build +++ b/meson.build @@ -222,7 +222,7 @@ add_project_arguments('-include', 'config.h', language : 'c')
filecmd = 'file' default_sedinplaceflags = ' --follow-symlinks -i' -inodecmd = 'stat -c \'%i %n\'' +inodecmd = 'stat -c \'%i\'' strip_binaries = '--strip-all' strip_shared = '--strip-unneeded' strip_static = '--strip-debug' @@ -236,13 +236,13 @@ endif
os = host_machine.system() if os.startswith('darwin') - inodecmd = '/usr/bin/stat -f \'%i %n\'' + inodecmd = '/usr/bin/stat -f \'%i\'' default_sedinplaceflags = ' -i \'\'' strip_binaries = '' strip_shared = '-s' strip_static = '-s' elif os.contains('bsd') or os == 'dragonfly' - inodecmd = 'stat -f \'%i %n\'' + inodecmd = 'stat -f \'%i\'' default_sedinplaceflags = ' -i \'\'' endif
diff --git a/scripts/libmakepkg/tidy/zipman.sh.in b/scripts/libmakepkg/tidy/zipman.sh.in index 3c2e261e..dd36ebef 100644 --- a/scripts/libmakepkg/tidy/zipman.sh.in +++ b/scripts/libmakepkg/tidy/zipman.sh.in @@ -33,9 +33,10 @@ tidy_modify+=('tidy_zipman') tidy_zipman() { if check_option "zipman" "y" && [[ -n ${MAN_DIRS[*]} ]]; then msg2 "$(gettext "Compressing man and info pages...")" - local file files inode link - while read -rd ' ' inode; do - read file + local file inode link + declare -A files + while read -rd $'\0' file; do + inode=$( @INODECMD@ "$file" )
There's really no need for this change, the stat command works fine as is. You mention:
To make supporting whitespace in filenames easier, I changed INODECMD to output only the inode and instead rely on find's -print0 to reliably parse complete filenames. This fixes some errors for files that begin or end in whitespace. It was noted in discussion that -print0 is not
But your replacement has the same issue. Whitespace is trimmed not by the -d ' ' in use (which reads in the first line of input until it hits the first space character, deletes the space character, and then declares the line to be over and assigns the line to $inode), but by the IFS in the second read (since IFS is used to split the line, even if there is only one variable to assign it will still strip leading and trailing $IFS). -- Eli Schwartz Bug Wrangler and Trusted User
On Sun Oct 27, 2019 at 2:42 AM Eli Schwartz wrote:
You mention:
To make supporting whitespace in filenames easier, I changed INODECMD to output only the inode and instead rely on find's -print0 to reliably parse complete filenames. This fixes some errors for files that begin or end in whitespace. It was noted in discussion that -print0 is not
But your replacement has the same issue. Whitespace is trimmed not by the -d ' ' in use (which reads in the first line of input until it hits the first space character, deletes the space character, and then declares the line to be over and assigns the line to $inode), but by the IFS in the second read (since IFS is used to split the line, even if there is only one variable to assign it will still strip leading and trailing $IFS).
Ah that does seem to be the case. Good catch. I'll remove this change and use the existing INODECMD format.
From: Ronan Pigott <rpigott@berkeley.edu> --- scripts/makepkg.sh.in | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 997c8668..0725f582 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -584,7 +584,15 @@ write_kv_pair() { } write_pkginfo() { - local size="$(find . -type f -exec cat {} + 2>/dev/null | wc -c)" + local inode size=0 + declare -A files + while read -rd $'\0' file; do + inode=$( @INODECMD@ "$file" ) + if [[ -z "${files[$inode]}" ]]; then + files[$inode]=$(wc -c < "$file") + size=$((size + ${files[$inode]})) + fi + done < <(find . -type f -print0) merge_arch_attrs -- 2.23.0
On 27/10/19 1:11 pm, Ronan Pigott wrote:
From: Ronan Pigott <rpigott@berkeley.edu>
--- scripts/makepkg.sh.in | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 997c8668..0725f582 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -584,7 +584,15 @@ write_kv_pair() { }
write_pkginfo() { - local size="$(find . -type f -exec cat {} + 2>/dev/null | wc -c)" + local inode size=0 + declare -A files + while read -rd $'\0' file; do + inode=$( @INODECMD@ "$file" ) + if [[ -z "${files[$inode]}" ]]; then + files[$inode]=$(wc -c < "$file") + size=$((size + ${files[$inode]})) + fi + done < <(find . -type f -print0)
I'm going to request a couple of changes... 1) can you put this function in a separate file like in the patch I submitted (just use that patch and adjust the function). Not that I expect this to be reused, but it will be a bit long to sit in write_pkginfo after... 2) we have some packages approaching 100,000 files! 67220 texlive-fontsextra-2019.50876-1/files 76595 papirus-icon-theme-20191009-1/files 80166 rocksndiamonds-data-4.1.3.0-1/files 82228 ceph-mgr-14.2.1-2/files 97821 nodejs-material-design-icons-3.0.1-2/files Most of those have no hard links, so a two pass approach has been discussed: find . -type f -links 1 ... with no requesting and storing inodes and then find . -type f -links +1 ... Having just checked out a couple of very large packages, this appears to be worth the effort. Thanks, Allan
participants (3)
-
Allan McRae
-
Eli Schwartz
-
Ronan Pigott