[PATCH] paccache: Support cleaning many thousands of candidates
Leonid Bloch
lb.workbox at gmail.com
Tue Jun 2 04:34:35 UTC 2020
On Tue, Jun 2, 2020 at 7:17 AM Eli Schwartz <eschwartz at archlinux.org> wrote:
>
> On 6/2/20 12:05 AM, Daniel M. Capella wrote:
> > On June 1, 2020 10:47:21 PM EDT, Leonid Bloch <lb.workbox at gmail.com> wrote:
> >> In situations when there are many thousands of candidates for
> >> deletion,
> >> paccache returns an "Argument list too long" error, because the
> >> expansion of the ${candidates[@]} array becomes too long for Bash.
> >>
> >> This commit fixes that problem, by calculating the freed size and
> >> (re)moving the candidates one by one. The cost is much longer
> >> operation,
> >> but on a laptop with 38143 candidates (no cache cleaning since 2013)
> >> it
> >> still takes ~18 minutes, which seems to be better than throwing an
> >> error.
> >>
> >> A more efficient way to achieve this would be to break the
> >> ${candidates[@]}
> >> array into several sub-arrays and operating on them, but it will
> >> introduce more complexity and thus might be more error-prone.
> >>
> >> Signed-off-by: Leonid Bloch <lb.workbox at gmail.com>
> >> ---
> >> src/paccache.sh.in | 24 ++++++++++++++++--------
> >> 1 file changed, 16 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/src/paccache.sh.in b/src/paccache.sh.in
> >> index af901f4..212e20a 100644
> >> --- a/src/paccache.sh.in
> >> +++ b/src/paccache.sh.in
> >> @@ -29,7 +29,7 @@ LIBRARY=${LIBRARY:-'@libmakepkgdir@'}
> >>
> >> declare -a cachedirs=() candidates=() cmdopts=() whitelist=()
> >> blacklist=()
> >> declare -i delete=0 dryrun=0 filecount=0 move=0 needsroot=0
> >> totalsaved=0 verbose=0
> >> -declare -i min_atime=0 min_mtime=0
> >> +declare -i min_atime=0 min_mtime=0 first_escalation=1
> >> declare delim=$'\n' keep=3 movedir= scanarch=
> >>
> >> QUIET=0
> >> @@ -138,7 +138,10 @@ m4_include(../lib/size_to_human.sh)
> >>
> >> runcmd() {
> >> if (( needsroot && EUID != 0 )); then
> >> - msg "Privilege escalation required"
> >> + (( first_escalation == 0 )) || {
> >> + msg "Privilege escalation required"
> >> + first_escalation=0
> >> + }
> >> if sudo -v &>/dev/null && sudo -l &>/dev/null; then
> >> sudo "$@"
> >> else
> >> @@ -385,23 +388,28 @@ pkgcount=${#candidates[*]}
> >> # copy the list, merging in any found sigs
> >> for cand in "${candidates[@]}"; do
> >> candtemp+=("$cand")
> >> - [[ -f $cand.sig ]] && candtemp+=("$cand.sig")
> >> + totalsaved="$(( totalsaved + "$(stat -c %s "${cand}")" ))"
> >> + if [[ -f "${cand}.sig" ]]; then
> >> + candtemp+=("${cand}.sig")
> >> + totalsaved="$(( totalsaved + "$(stat -c %s "${cand}.sig")" ))"
> >> + fi
> >> done
> >> candidates=("${candtemp[@]}")
> >> unset candtemp
> >>
> >> -# do this before we destroy anything
> >> -totalsaved=$(@SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END {
> >> print sum }')
> >> -
> >> # Exit immediately if a pipeline returns non-zero.
> >> set -o errexit
> >>
> >> # crush. kill. destroy.
> >> (( verbose )) && cmdopts+=(-v)
> >> if (( delete )); then
> >> - printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm
> >> "${cmdopts[@]}"
> >> + for cand in "${candidates[@]}"; do
> >> + runcmd rm "${cmdopts[@]}" "${cand}"
> >> + done
> >> elif (( move )); then
> >> - printf '%s\0' "${candidates[@]}" | runcmd xargs -0 mv
> >> "${cmdopts[@]}" -t "$movedir"
> >> + for cand in "${candidates[@]}"; do
> >> + runcmd mv "${cmdopts[@]}" -t "${movedir}" "${cand}"
> >> + done
> >> fi
> >>
> >> summarize "$pkgcount" "${candidates[@]}"
> >
> > Thanks for the patch. Cleaning that many packages seems fairly niche. I'd prefer a more efficient solution than to slow this down for everybody.
>
> This patch doesn't seem to be on the mailing list, but here's my $0.02
You're right - it's "awaiting moderation," as I was not subscribed to
the list. Subscribed now. :)
>
> This introduces a number of unrelated changes in addition to
> circumventing an argument list too long issue, something which is
> *already* circumvented through the use of xargs.
>
> However it is circumvented inconsistently. For the final action paccache
> uses:
>
> printf '%s\0' "${candidates[@]}" | runcmd xargs -0 rm "${cmdopts[@]}"
>
> And for summarizing the space saved, paccache does NOT use xargs,
> instead it directly uses:
>
> totalsaved=$(stat -c %s "${candidates[@]}" | awk '{ sum += $1 } END {
> print sum }')
>
> The solution is simple: make stat (or rather @SIZECMD@) consume
> arguments via printf | xargs -0
>
> Leonid, why do you think the xargs usage needs to be removed?
Eli, you are absolutely right. I experienced the error only at the
summarizing part, and extrapolated to the (re)move part erroneously.
Thanks for your review! Will fix.
Leonid.
___
>
> --
> Eli Schwartz
> Bug Wrangler and Trusted User
>
More information about the pacman-contrib
mailing list