[PATCH] paccache: Support cleaning many thousands of candidates

Eli Schwartz eschwartz at archlinux.org
Tue Jun 2 04:17:35 UTC 2020


On 6/2/20 12:05 AM, Daniel M. Capella wrote:
> On June 1, 2020 10:47:21 PM EDT, Leonid Bloch <lb.workbox at gmail.com> wrote:
>> In situations when there are many thousands of candidates for
>> deletion,
>> paccache returns an "Argument list too long" error, because the
>> expansion of the ${candidates[@]} array becomes too long for Bash.
>>
>> This commit fixes that problem, by calculating the freed size and
>> (re)moving the candidates one by one. The cost is much longer
>> operation,
>> but on a laptop with 38143 candidates (no cache cleaning since 2013)
>> it
>> still takes ~18 minutes, which seems to be better than throwing an
>> error.
>>
>> A more efficient way to achieve this would be to break the
>> ${candidates[@]}
>> array into several sub-arrays and operating on them, but it will
>> introduce more complexity and thus might be more error-prone.
>>
>> Signed-off-by: Leonid Bloch <lb.workbox at gmail.com>
>> ---
>>  src/paccache.sh.in | 24 ++++++++++++++++--------
>>  1 file changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/paccache.sh.in b/src/paccache.sh.in
>> index af901f4..212e20a 100644
>> --- a/src/paccache.sh.in
>> +++ b/src/paccache.sh.in
>> @@ -29,7 +29,7 @@ LIBRARY=${LIBRARY:-'@libmakepkgdir@'}
>>  
>> declare -a cachedirs=() candidates=() cmdopts=() whitelist=()
>> blacklist=()
>> declare -i delete=0 dryrun=0 filecount=0 move=0 needsroot=0
>> totalsaved=0 verbose=0
>> -declare -i min_atime=0 min_mtime=0
>> +declare -i min_atime=0 min_mtime=0 first_escalation=1
>>  declare    delim=$'\n' keep=3 movedir= scanarch=
>>  
>>  QUIET=0
>> @@ -138,7 +138,10 @@ m4_include(../lib/size_to_human.sh)
>>  
>>  runcmd() {
>>  	if (( needsroot && EUID != 0 )); then
>> -		msg "Privilege escalation required"
>> +		(( first_escalation == 0 )) || {
>> +			msg "Privilege escalation required"
>> +			first_escalation=0
>> +		}
>>  		if sudo -v &>/dev/null && sudo -l &>/dev/null; then
>>  			sudo "$@"
>>  		else
>> @@ -385,23 +388,28 @@ pkgcount=${#candidates[*]}
>>  # copy the list, merging in any found sigs
>>  for cand in "${candidates[@]}"; do
>>  	candtemp+=("$cand")
>> -	[[ -f $cand.sig ]] && candtemp+=("$cand.sig")
>> +	totalsaved="$(( totalsaved + "$(stat -c %s "${cand}")" ))"
>> +	if [[ -f "${cand}.sig" ]]; then
>> +		candtemp+=("${cand}.sig")
>> +		totalsaved="$(( totalsaved + "$(stat -c %s "${cand}.sig")" ))"
>> +	fi
>>  done
>>  candidates=("${candtemp[@]}")
>>  unset candtemp
>>  
>> -# do this before we destroy anything
>> -totalsaved=$(@SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END {
>> print sum }')
>> -
>>  # Exit immediately if a pipeline returns non-zero.
>>  set -o errexit
>>  
>>  # crush. kill. destroy.
>>  (( verbose )) && cmdopts+=(-v)
>>  if (( delete )); then
>> -	printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm
>> "${cmdopts[@]}"
>> +	for cand in "${candidates[@]}"; do
>> +		runcmd rm "${cmdopts[@]}" "${cand}"
>> +	done
>>  elif (( move )); then
>> -	printf '%s\0' "${candidates[@]}" | runcmd xargs -0 mv
>> "${cmdopts[@]}" -t "$movedir"
>> +	for cand in "${candidates[@]}"; do
>> +		runcmd mv "${cmdopts[@]}" -t "${movedir}" "${cand}"
>> +	done
>>  fi
>>  
>>  summarize "$pkgcount" "${candidates[@]}"
> 
> Thanks for the patch. Cleaning that many packages seems fairly niche. I'd prefer a more efficient solution than to slow this down for everybody.

This patch doesn't seem to be on the mailing list, but here's my $0.02

This introduces a number of unrelated changes in addition to
circumventing an argument list too long issue, something which is
*already* circumvented through the use of xargs.

However it is circumvented inconsistently. For the final action paccache
uses:

printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm "${cmdopts[@]}"

And for summarizing the space saved, paccache does NOT use xargs,
instead it directly uses:

totalsaved=$(stat -c %s "${candidates[@]}" | awk '{ sum += $1 } END {
print sum }')

The solution is simple: make stat (or rather @SIZECMD@) consume
arguments via printf | xargs -0

Leonid, why do you think the xargs usage needs to be removed?

-- 
Eli Schwartz
Bug Wrangler and Trusted User

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1601 bytes
Desc: OpenPGP digital signature
URL: <https://lists.archlinux.org/pipermail/pacman-contrib/attachments/20200602/65b268fd/attachment.sig>


More information about the pacman-contrib mailing list