[pacman-dev] [PATCH 2/2] makepkg: do not count hard linked file sizes multiple times

Allan McRae allan at archlinux.org
Sun Oct 27 04:32:10 UTC 2019


On 27/10/19 1:11 pm, Ronan Pigott wrote:
> From: Ronan Pigott <rpigott at berkeley.edu>
> 
> ---
>  scripts/makepkg.sh.in | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in
> index 997c8668..0725f582 100644
> --- a/scripts/makepkg.sh.in
> +++ b/scripts/makepkg.sh.in
> @@ -584,7 +584,15 @@ write_kv_pair() {
>  }
>  
>  write_pkginfo() {
> -	local size="$(find . -type f -exec cat {} + 2>/dev/null | wc -c)"
> +	local inode size=0
> +	declare -A files
> +	while read -rd $'\0' file; do
> +		inode=$( @INODECMD@ "$file" )
> +		if [[ -z "${files[$inode]}" ]]; then
> +			files[$inode]=$(wc -c < "$file")
> +			size=$((size + ${files[$inode]}))
> +		fi
> +	done < <(find . -type f -print0)
>  

I'm going to request a couple of changes...

1) can you put this function in a separate file like in the patch I
submitted (just use that patch and adjust the function).  Not that I
expect this to be reused, but it will be a bit long to sit in
write_pkginfo after...

2) we have some packages approaching 100,000 files!

    67220 texlive-fontsextra-2019.50876-1/files
    76595 papirus-icon-theme-20191009-1/files
    80166 rocksndiamonds-data-4.1.3.0-1/files
    82228 ceph-mgr-14.2.1-2/files
    97821 nodejs-material-design-icons-3.0.1-2/files

Most of those have no hard links, so a two pass approach has been discussed:

find . -type f -links 1 ...

with no requesting and storing inodes and then

find . -type f -links +1 ...

Having just checked out a couple of very large packages, this appears to
be worth the effort.

Thanks,
Allan


More information about the pacman-dev mailing list