[PATCH] paccache: Support cleaning many thousands of candidates

Leonid Bloch lb.workbox at gmail.com
Tue Jun 2 04:34:35 UTC 2020


On Tue, Jun 2, 2020 at 7:17 AM Eli Schwartz <eschwartz at archlinux.org> wrote:
>
> On 6/2/20 12:05 AM, Daniel M. Capella wrote:
> > On June 1, 2020 10:47:21 PM EDT, Leonid Bloch <lb.workbox at gmail.com> wrote:
> >> In situations when there are many thousands of candidates for
> >> deletion,
> >> paccache returns an "Argument list too long" error, because the
> >> expansion of the ${candidates[@]} array becomes too long for Bash.
> >>
> >> This commit fixes that problem, by calculating the freed size and
> >> (re)moving the candidates one by one. The cost is much longer
> >> operation,
> >> but on a laptop with 38143 candidates (no cache cleaning since 2013)
> >> it
> >> still takes ~18 minutes, which seems to be better than throwing an
> >> error.
> >>
> >> A more efficient way to achieve this would be to break the
> >> ${candidates[@]}
> >> array into several sub-arrays and operating on them, but it will
> >> introduce more complexity and thus might be more error-prone.
> >>
> >> Signed-off-by: Leonid Bloch <lb.workbox at gmail.com>
> >> ---
> >>  src/paccache.sh.in | 24 ++++++++++++++++--------
> >>  1 file changed, 16 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/src/paccache.sh.in b/src/paccache.sh.in
> >> index af901f4..212e20a 100644
> >> --- a/src/paccache.sh.in
> >> +++ b/src/paccache.sh.in
> >> @@ -29,7 +29,7 @@ LIBRARY=${LIBRARY:-'@libmakepkgdir@'}
> >>
> >> declare -a cachedirs=() candidates=() cmdopts=() whitelist=()
> >> blacklist=()
> >> declare -i delete=0 dryrun=0 filecount=0 move=0 needsroot=0
> >> totalsaved=0 verbose=0
> >> -declare -i min_atime=0 min_mtime=0
> >> +declare -i min_atime=0 min_mtime=0 first_escalation=1
> >>  declare    delim=$'\n' keep=3 movedir= scanarch=
> >>
> >>  QUIET=0
> >> @@ -138,7 +138,10 @@ m4_include(../lib/size_to_human.sh)
> >>
> >>  runcmd() {
> >>      if (( needsroot && EUID != 0 )); then
> >> -            msg "Privilege escalation required"
> >> +            (( first_escalation == 0 )) || {
> >> +                    msg "Privilege escalation required"
> >> +                    first_escalation=0
> >> +            }
> >>              if sudo -v &>/dev/null && sudo -l &>/dev/null; then
> >>                      sudo "$@"
> >>              else
> >> @@ -385,23 +388,28 @@ pkgcount=${#candidates[*]}
> >>  # copy the list, merging in any found sigs
> >>  for cand in "${candidates[@]}"; do
> >>      candtemp+=("$cand")
> >> -    [[ -f $cand.sig ]] && candtemp+=("$cand.sig")
> >> +    totalsaved="$(( totalsaved + "$(stat -c %s "${cand}")" ))"
> >> +    if [[ -f "${cand}.sig" ]]; then
> >> +            candtemp+=("${cand}.sig")
> >> +            totalsaved="$(( totalsaved + "$(stat -c %s "${cand}.sig")" ))"
> >> +    fi
> >>  done
> >>  candidates=("${candtemp[@]}")
> >>  unset candtemp
> >>
> >> -# do this before we destroy anything
> >> -totalsaved=$(@SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END {
> >> print sum }')
> >> -
> >>  # Exit immediately if a pipeline returns non-zero.
> >>  set -o errexit
> >>
> >>  # crush. kill. destroy.
> >>  (( verbose )) && cmdopts+=(-v)
> >>  if (( delete )); then
> >> -    printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm
> >> "${cmdopts[@]}"
> >> +    for cand in "${candidates[@]}"; do
> >> +            runcmd rm "${cmdopts[@]}" "${cand}"
> >> +    done
> >>  elif (( move )); then
> >> -    printf '%s\0' "${candidates[@]}" | runcmd xargs -0 mv
> >> "${cmdopts[@]}" -t "$movedir"
> >> +    for cand in "${candidates[@]}"; do
> >> +            runcmd mv "${cmdopts[@]}" -t "${movedir}" "${cand}"
> >> +    done
> >>  fi
> >>
> >>  summarize "$pkgcount" "${candidates[@]}"
> >
> > Thanks for the patch. Cleaning that many packages seems fairly niche. I'd prefer a more efficient solution than to slow this down for everybody.
>
> This patch doesn't seem to be on the mailing list, but here's my $0.02

You're right - it's "awaiting moderation," as I was not subscribed to
the list. Subscribed now. :)

>
> This introduces a number of unrelated changes in addition to
> circumventing an argument list too long issue, something which is
> *already* circumvented through the use of xargs.
>
> However it is circumvented inconsistently. For the final action paccache
> uses:
>
> printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm "${cmdopts[@]}"
>
> And for summarizing the space saved, paccache does NOT use xargs,
> instead it directly uses:
>
> totalsaved=$(stat -c %s "${candidates[@]}" | awk '{ sum += $1 } END {
> print sum }')
>
> The solution is simple: make stat (or rather @SIZECMD@) consume
> arguments via printf | xargs -0
>
> Leonid, why do you think the xargs usage needs to be removed?

Eli, you are absolutely right. I experienced the error only at the
summarizing part, and extrapolated to the (re)move part erroneously.
Thanks for your review! Will fix.

Leonid.
___

>
> --
> Eli Schwartz
> Bug Wrangler and Trusted User
>


More information about the pacman-contrib mailing list