[PATCH] paccache: Support cleaning many thousands of candidates
Eli Schwartz
eschwartz at archlinux.org
Tue Jun 2 04:17:35 UTC 2020
On 6/2/20 12:05 AM, Daniel M. Capella wrote:
> On June 1, 2020 10:47:21 PM EDT, Leonid Bloch <lb.workbox at gmail.com> wrote:
>> In situations when there are many thousands of candidates for
>> deletion,
>> paccache returns an "Argument list too long" error, because the
>> expansion of the ${candidates[@]} array becomes too long for Bash.
>>
>> This commit fixes that problem, by calculating the freed size and
>> (re)moving the candidates one by one. The cost is much longer
>> operation,
>> but on a laptop with 38143 candidates (no cache cleaning since 2013)
>> it
>> still takes ~18 minutes, which seems to be better than throwing an
>> error.
>>
>> A more efficient way to achieve this would be to break the
>> ${candidates[@]}
>> array into several sub-arrays and operating on them, but it will
>> introduce more complexity and thus might be more error-prone.
>>
>> Signed-off-by: Leonid Bloch <lb.workbox at gmail.com>
>> ---
>> src/paccache.sh.in | 24 ++++++++++++++++--------
>> 1 file changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/paccache.sh.in b/src/paccache.sh.in
>> index af901f4..212e20a 100644
>> --- a/src/paccache.sh.in
>> +++ b/src/paccache.sh.in
>> @@ -29,7 +29,7 @@ LIBRARY=${LIBRARY:-'@libmakepkgdir@'}
>>
>> declare -a cachedirs=() candidates=() cmdopts=() whitelist=()
>> blacklist=()
>> declare -i delete=0 dryrun=0 filecount=0 move=0 needsroot=0
>> totalsaved=0 verbose=0
>> -declare -i min_atime=0 min_mtime=0
>> +declare -i min_atime=0 min_mtime=0 first_escalation=1
>> declare delim=$'\n' keep=3 movedir= scanarch=
>>
>> QUIET=0
>> @@ -138,7 +138,10 @@ m4_include(../lib/size_to_human.sh)
>>
>> runcmd() {
>> if (( needsroot && EUID != 0 )); then
>> - msg "Privilege escalation required"
>> + (( first_escalation == 0 )) || {
>> + msg "Privilege escalation required"
>> + first_escalation=0
>> + }
>> if sudo -v &>/dev/null && sudo -l &>/dev/null; then
>> sudo "$@"
>> else
>> @@ -385,23 +388,28 @@ pkgcount=${#candidates[*]}
>> # copy the list, merging in any found sigs
>> for cand in "${candidates[@]}"; do
>> candtemp+=("$cand")
>> - [[ -f $cand.sig ]] && candtemp+=("$cand.sig")
>> + totalsaved="$(( totalsaved + "$(stat -c %s "${cand}")" ))"
>> + if [[ -f "${cand}.sig" ]]; then
>> + candtemp+=("${cand}.sig")
>> + totalsaved="$(( totalsaved + "$(stat -c %s "${cand}.sig")" ))"
>> + fi
>> done
>> candidates=("${candtemp[@]}")
>> unset candtemp
>>
>> -# do this before we destroy anything
>> -totalsaved=$(@SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END {
>> print sum }')
>> -
>> # Exit immediately if a pipeline returns non-zero.
>> set -o errexit
>>
>> # crush. kill. destroy.
>> (( verbose )) && cmdopts+=(-v)
>> if (( delete )); then
>> - printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm
>> "${cmdopts[@]}"
>> + for cand in "${candidates[@]}"; do
>> + runcmd rm "${cmdopts[@]}" "${cand}"
>> + done
>> elif (( move )); then
>> - printf '%s\0' "${candidates[@]}" | runcmd xargs -0 mv
>> "${cmdopts[@]}" -t "$movedir"
>> + for cand in "${candidates[@]}"; do
>> + runcmd mv "${cmdopts[@]}" -t "${movedir}" "${cand}"
>> + done
>> fi
>>
>> summarize "$pkgcount" "${candidates[@]}"
>
> Thanks for the patch. Cleaning that many packages seems fairly niche. I'd prefer a more efficient solution than to slow this down for everybody.
This patch doesn't seem to be on the mailing list, but here's my $0.02
This introduces a number of unrelated changes in addition to
circumventing an argument list too long issue, something which is
*already* circumvented through the use of xargs.
However it is circumvented inconsistently. For the final action paccache
uses:
printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm "${cmdopts[@]}"
And for summarizing the space saved, paccache does NOT use xargs,
instead it directly uses:
totalsaved=$(stat -c %s "${candidates[@]}" | awk '{ sum += $1 } END {
print sum }')
The solution is simple: make stat (or rather @SIZECMD@) consume
arguments via printf | xargs -0
Leonid, why do you think the xargs usage needs to be removed?
--
Eli Schwartz
Bug Wrangler and Trusted User
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1601 bytes
Desc: OpenPGP digital signature
URL: <https://lists.archlinux.org/pipermail/pacman-contrib/attachments/20200602/65b268fd/attachment.sig>
More information about the pacman-contrib
mailing list