[pacman-dev] [PATCHv2 1/2] contrib: adding pacsize
Printing package size is useful for maintenance. Indeed, the first entry on the wiki is focused on this topic: https://wiki.archlinux.org/index.php/Pacman_Tips#Maintenance None of the proposed solutions will allow you to: - select packages; - work on the output of other commands yielding a list of packages; - change the sorting; - be locale independent; - print a grand total; - be fast (most solution are wasting a lot of time -- only expac is faster); - not rely on third-party tools. Pacsize is a POSIX shell script that is generic enough to enclose all these features (and more). Adding a 'pacsize' script eliminates the unneeded abundance of workarounds for this simple matter. Signed-off-by: Pierre Neidhardt <ambrevar@gmail.com> --- contrib/.gitignore | 1 + contrib/Makefile.am | 3 + contrib/README | 4 ++ contrib/pacsize.sh.in | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 176 insertions(+) create mode 100644 contrib/pacsize.sh.in diff --git a/contrib/.gitignore b/contrib/.gitignore index a181813..9cecd5e 100644 --- a/contrib/.gitignore +++ b/contrib/.gitignore @@ -7,6 +7,7 @@ paclist paclog-pkglist pacscripts pacsearch +pacsize pacsysclean rankmirrors updpkgsums diff --git a/contrib/Makefile.am b/contrib/Makefile.am index f6ca3f1..8c5c6da 100644 --- a/contrib/Makefile.am +++ b/contrib/Makefile.am @@ -12,6 +12,7 @@ BASHSCRIPTS = \ paclist \ paclog-pkglist \ pacscripts \ + pacsize \ pacsysclean \ rankmirrors \ updpkgsums @@ -38,6 +39,7 @@ EXTRA_DIST = \ paclist.sh.in \ pacscripts.sh.in \ pacsearch.in \ + pacsize.sh.in \ pacsysclean.sh.in \ rankmirrors.sh.in \ updpkgsums.sh.in \ @@ -102,6 +104,7 @@ paclist: $(srcdir)/paclist.sh.in paclog-pkglist: $(srcdir)/paclog-pkglist.sh.in pacscripts: $(srcdir)/pacscripts.sh.in pacsearch: $(srcdir)/pacsearch.in +pacsize: $(srcdir)/pacsize.sh.in pacsysclean: $(srcdir)/pacsysclean.sh.in rankmirrors: $(srcdir)/rankmirrors.sh.in updpkgsums: $(srcdir)/updpkgsums.sh.in diff --git a/contrib/README b/contrib/README index ae33bb2..4f5c17f 100644 --- a/contrib/README +++ b/contrib/README @@ -31,6 +31,10 @@ pacsearch - a colorized search combining both -Ss and -Qs output. Installed packages are easily identified with a *** and local-only packages are also listed. +pacsize - display the size of packages. Duplicates are removed if any. The local +database is queried first; if the package is not found, the sync database is +then used for lookup. + pacsysclean - lists installed packages sorted by size. rankmirrors - ranks pacman mirrors by their connection and opening speed. diff --git a/contrib/pacsize.sh.in b/contrib/pacsize.sh.in new file mode 100644 index 0000000..9be800d --- /dev/null +++ b/contrib/pacsize.sh.in @@ -0,0 +1,168 @@ +#!/bin/sh +# pacsize -- display package sizes +# +# Copyright (C) 2014 Pierre Neidhardt <ambrevar@gmail.com> +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +readonly myname='pacsize' +readonly myver='@PACKAGE_VERSION@' + +calc_total () { + awk '{ + total += $1 + print +} +END { + printf ("%7s KIB TOTAL\n", total) +}' +} + +error () { + echo "$@" >&2 +} + +## Print size and name. We strip the arguably useless decimals. This makes +## output lighter. +filter () { + awk -F ": " \ + '$0 ~ "^Name" { + pkg = $2 +} +$0 ~ "^Installed Size" { + gsub (/[\.,][^ ]*/, "") + split($2, a, " ") + printf ("%4d%s %s\n", a[1], a[2], pkg) +}' +} + +remove_duplicates () { + awk '! table[$0]++' +} + +usage () { + cat <<EOF +Usage: ${1##*/} [OPTIONS] PACKAGES + ${1##*/} -a [OPTIONS] + +Display the size of PACKAGES. Duplicates are removed if any. The local database +is queried first; if the package is not found, the sync database is then used +for lookup. + +Options: + + -a: Process all installed packages. + -h: Show this help. + -n: Sort output by name. + -s: Sort output by size. + -t: Print total. + +Examples: + + $ ${1##*/} -ast + Convenient way to keep track of big packages. + + $ ${1##*/} \$(pactree -ld1 linux) + Print the size of linux and all its direct dependencies. + + $ ${1##*/} -st \$(pacman -Qdtq) + Print a grand total of orphan packages, and sort by size. +EOF +} + +version () { + echo "$myname $myver" + echo 'Copyright (C) 2014 Pierre Neidhardt <ambrevar@gmail.com>' +} + +opt_sort=false +opt_all=false +opt_total=false + +while getopts ":ahnstv" opt; do + case $opt in + a) + opt_all=true ;; + h) + usage "$0" + exit ;; + n) + opt_sort="sort -uk3" ;; + s) + opt_sort="sort -uh" ;; + t) + opt_total="calc_total" ;; + v) + version "$0" + exit ;; + ?) + usage "$0" + exit 1 ;; + esac +done + +shift $(($OPTIND - 1)) + +## All-packages mode. +## We use a dedicated algorithm which is much faster than per-package mode. +## Unfortunately there is no easy way to select packages with this method. +if $opt_all; then + DBPath="$(awk -F = '/^ *DBPath/{print $2}' @sysconfdir@/pacman.conf 2>/dev/null)" + [ ! -d "$DBPath" ] && DBPath="@localstatedir@/lib/pacman" + + if [ ! -d "$DBPath/local/" ]; then + error "Could not find local database in $DBPath/local/." + exit 1 + fi + + awk 'BEGIN { + split("B KiB MiB GiB TiB PiB EiB ZiB YiB", unit) +} +/^%NAME%/ { + getline + pkg=$0 +} +/^%SIZE%/ { + getline + size = $0 + i = 1 + while (size > 2048) { + size /= 1024 + i++ + } + printf ("%4d%s %s\n", size, unit[i], pkg) +}' "$DBPath"/local/*/desc | ($opt_sort || cat) | ($opt_total || cat) + exit +fi + +## Per-package mode. +if [ $# -eq 0 ]; then + error "Missing argument." + usage "$0" + exit 1 +fi + +if ! command -v pacman >/dev/null 2>&1; then + error "'pacman' not found." + exit 1 +fi + +{ + ## If package is not found locally (-Q), we use the sync database (-S). We + ## use LC_ALL=C to make sure pacman output is not localized. + buffer=$(LC_ALL=C pacman -Qi "$@" 2>&1 1>&3 3>&- | cut -f2 -d "'") + [ -n "$buffer" ] && LC_ALL=C pacman -Si $buffer +} 3>&1 | filter | ($opt_sort || remove_duplicates) | ($opt_total || cat) + +# vim: set noet: -- 1.9.0
On Wed, Mar 05, 2014 at 09:55:45PM +0100, Pierre Neidhardt wrote:
Printing package size is useful for maintenance. Indeed, the first entry on the wiki is focused on this topic:
https://wiki.archlinux.org/index.php/Pacman_Tips#Maintenance
None of the proposed solutions will allow you to: - select packages; - work on the output of other commands yielding a list of packages; - change the sorting; - be locale independent; - print a grand total; - be fast (most solution are wasting a lot of time -- only expac is faster); - not rely on third-party tools.
Pacsize is a POSIX shell script that is generic enough to enclose all these features (and more).
Adding a 'pacsize' script eliminates the unneeded abundance of workarounds for this simple matter.
Signed-off-by: Pierre Neidhardt <ambrevar@gmail.com> --- contrib/.gitignore | 1 + contrib/Makefile.am | 3 + contrib/README | 4 ++ contrib/pacsize.sh.in | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 176 insertions(+) create mode 100644 contrib/pacsize.sh.in
diff --git a/contrib/.gitignore b/contrib/.gitignore index a181813..9cecd5e 100644 --- a/contrib/.gitignore +++ b/contrib/.gitignore @@ -7,6 +7,7 @@ paclist paclog-pkglist pacscripts pacsearch +pacsize pacsysclean rankmirrors updpkgsums diff --git a/contrib/Makefile.am b/contrib/Makefile.am index f6ca3f1..8c5c6da 100644 --- a/contrib/Makefile.am +++ b/contrib/Makefile.am @@ -12,6 +12,7 @@ BASHSCRIPTS = \ paclist \ paclog-pkglist \ pacscripts \ + pacsize \ pacsysclean \ rankmirrors \ updpkgsums @@ -38,6 +39,7 @@ EXTRA_DIST = \ paclist.sh.in \ pacscripts.sh.in \ pacsearch.in \ + pacsize.sh.in \ pacsysclean.sh.in \ rankmirrors.sh.in \ updpkgsums.sh.in \ @@ -102,6 +104,7 @@ paclist: $(srcdir)/paclist.sh.in paclog-pkglist: $(srcdir)/paclog-pkglist.sh.in pacscripts: $(srcdir)/pacscripts.sh.in pacsearch: $(srcdir)/pacsearch.in +pacsize: $(srcdir)/pacsize.sh.in pacsysclean: $(srcdir)/pacsysclean.sh.in rankmirrors: $(srcdir)/rankmirrors.sh.in updpkgsums: $(srcdir)/updpkgsums.sh.in diff --git a/contrib/README b/contrib/README index ae33bb2..4f5c17f 100644 --- a/contrib/README +++ b/contrib/README @@ -31,6 +31,10 @@ pacsearch - a colorized search combining both -Ss and -Qs output. Installed packages are easily identified with a *** and local-only packages are also listed.
+pacsize - display the size of packages. Duplicates are removed if any. The local +database is queried first; if the package is not found, the sync database is +then used for lookup. + pacsysclean - lists installed packages sorted by size.
rankmirrors - ranks pacman mirrors by their connection and opening speed. diff --git a/contrib/pacsize.sh.in b/contrib/pacsize.sh.in new file mode 100644 index 0000000..9be800d --- /dev/null +++ b/contrib/pacsize.sh.in @@ -0,0 +1,168 @@ +#!/bin/sh +# pacsize -- display package sizes +# +# Copyright (C) 2014 Pierre Neidhardt <ambrevar@gmail.com> +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +readonly myname='pacsize' +readonly myver='@PACKAGE_VERSION@' + +calc_total () { + awk '{ + total += $1 + print +} +END { + printf ("%7s KIB TOTAL\n", total) +}' +} + +error () { + echo "$@" >&2 +} + +## Print size and name. We strip the arguably useless decimals. This makes +## output lighter. +filter () { + awk -F ": " \ + '$0 ~ "^Name" { + pkg = $2 +} +$0 ~ "^Installed Size" { + gsub (/[\.,][^ ]*/, "") + split($2, a, " ") + printf ("%4d%s %s\n", a[1], a[2], pkg) +}' +} + +remove_duplicates () { + awk '! table[$0]++' +} + +usage () { + cat <<EOF +Usage: ${1##*/} [OPTIONS] PACKAGES + ${1##*/} -a [OPTIONS] + +Display the size of PACKAGES. Duplicates are removed if any. The local database +is queried first; if the package is not found, the sync database is then used +for lookup.
Duplicates seem rather unexpected given the explanation that follows. You're querying either the local DB *or* the sync DB as a fallback. If there's duplicates, it's an implementation bug, no?
+ +Options: + + -a: Process all installed packages. + -h: Show this help. + -n: Sort output by name. + -s: Sort output by size. + -t: Print total. + +Examples: + + $ ${1##*/} -ast + Convenient way to keep track of big packages. + + $ ${1##*/} \$(pactree -ld1 linux) + Print the size of linux and all its direct dependencies. + + $ ${1##*/} -st \$(pacman -Qdtq) + Print a grand total of orphan packages, and sort by size. +EOF +} + +version () { + echo "$myname $myver" + echo 'Copyright (C) 2014 Pierre Neidhardt <ambrevar@gmail.com>' +} + +opt_sort=false +opt_all=false +opt_total=false + +while getopts ":ahnstv" opt; do + case $opt in + a) + opt_all=true ;; + h) + usage "$0" + exit ;; + n) + opt_sort="sort -uk3" ;; + s) + opt_sort="sort -uh" ;; + t) + opt_total="calc_total" ;; + v) + version "$0" + exit ;;
We seem to use -V more than -v to mean version.
+ ?) + usage "$0" + exit 1 ;; + esac +done + +shift $(($OPTIND - 1)) + +## All-packages mode. +## We use a dedicated algorithm which is much faster than per-package mode. +## Unfortunately there is no easy way to select packages with this method. +if $opt_all; then + DBPath="$(awk -F = '/^ *DBPath/{print $2}' @sysconfdir@/pacman.conf 2>/dev/null)"
What about leading tabs? What about trailing space and tabs? What about whitespace between the '=' and the actual value? I'm fairly sure that the -d test which follows this fails in pretty much all cases.
+ [ ! -d "$DBPath" ] && DBPath="@localstatedir@/lib/pacman" + + if [ ! -d "$DBPath/local/" ]; then + error "Could not find local database in $DBPath/local/."
If pacman.conf contains a DBPath which doesn't exist, the error message here will be rather odd, as it'll show the compile time default and not the path from pacman.conf.
+ exit 1 + fi + + awk 'BEGIN { + split("B KiB MiB GiB TiB PiB EiB ZiB YiB", unit) +} +/^%NAME%/ {
The whole field is %NAME%, there's no need to use a regex here.
+ getline + pkg=$0
getline pkg
+} +/^%SIZE%/ { + getline + size = $0
getline size
+ i = 1 + while (size > 2048) { + size /= 1024 + i++ + } + printf ("%4d%s %s\n", size, unit[i], pkg) +}' "$DBPath"/local/*/desc | ($opt_sort || cat) | ($opt_total || cat)
These subshells aren't wanted. You should be using command grouping instead.
+ exit +fi + +## Per-package mode. +if [ $# -eq 0 ]; then + error "Missing argument." + usage "$0" + exit 1 +fi + +if ! command -v pacman >/dev/null 2>&1; then
I find it very strange that you check for pacman -- the project that might distribute this script, but you never check for awk or sort.
+ error "'pacman' not found." + exit 1 +fi + +{ + ## If package is not found locally (-Q), we use the sync database (-S). We + ## use LC_ALL=C to make sure pacman output is not localized. + buffer=$(LC_ALL=C pacman -Qi "$@" 2>&1 1>&3 3>&- | cut -f2 -d "'") + [ -n "$buffer" ] && LC_ALL=C pacman -Si $buffer
Not only are you parsing the output of pacman and the internal format of the ALPM db, you're also parsing *error* output from pacman? So much groan...
+} 3>&1 | filter | ($opt_sort || remove_duplicates) | ($opt_total || cat)
More unnecessary subshells.
+ +# vim: set noet: -- 1.9.0
On 14-03-05 18:05:45, Dave Reisner wrote:
On Wed, Mar 05, 2014 at 09:55:45PM +0100, Pierre Neidhardt wrote:
+Display the size of PACKAGES. Duplicates are removed if any. The local database +is queried first; if the package is not found, the sync database is then used +for lookup.
Duplicates seem rather unexpected given the explanation that follows. You're querying either the local DB *or* the sync DB as a fallback. If there's duplicates, it's an implementation bug, no?
Left-over from an old version. There cannot be any duplicate indeed.
+## All-packages mode. +## We use a dedicated algorithm which is much faster than per-package mode. +## Unfortunately there is no easy way to select packages with this method. +if $opt_all; then + DBPath="$(awk -F = '/^ *DBPath/{print $2}' @sysconfdir@/pacman.conf 2>/dev/null)"
What about leading tabs? What about trailing space and tabs? What about whitespace between the '=' and the actual value? I'm fairly sure that the -d test which follows this fails in pretty much all cases.
Sorry, that was a little quick & dirty. I'll fix that.
+ [ ! -d "$DBPath" ] && DBPath="@localstatedir@/lib/pacman" + + if [ ! -d "$DBPath/local/" ]; then + error "Could not find local database in $DBPath/local/."
If pacman.conf contains a DBPath which doesn't exist, the error message here will be rather odd, as it'll show the compile time default and not the path from pacman.conf.
+ exit 1 + fi + + awk 'BEGIN { + split("B KiB MiB GiB TiB PiB EiB ZiB YiB", unit) +} +/^%NAME%/ {
The whole field is %NAME%, there's no need to use a regex here.
+ getline + pkg=$0
getline pkg
+} +/^%SIZE%/ { + getline + size = $0
getline size
+ i = 1 + while (size > 2048) { + size /= 1024 + i++ + } + printf ("%4d%s %s\n", size, unit[i], pkg) +}' "$DBPath"/local/*/desc | ($opt_sort || cat) | ($opt_total || cat)
These subshells aren't wanted. You should be using command grouping instead.
I thought command grouping did not work in that case (for a reason I couldn't grasp). Well, I was tricked by my shell. This works in zsh: $ ... | {$opt_sort || cat} while it doesn't in dash or bash. Two errors: - a space is needed on the inner side of the curly brackets; - since there is no line break, we need to terminate the statement with a semicolon. Damn zsh.
+ exit +fi + +## Per-package mode. +if [ $# -eq 0 ]; then + error "Missing argument." + usage "$0" + exit 1 +fi + +if ! command -v pacman >/dev/null 2>&1; then
I find it very strange that you check for pacman -- the project that might distribute this script, but you never check for awk or sort.
Again, a left-over since this was a personal script before.
+ error "'pacman' not found." + exit 1 +fi + +{ + ## If package is not found locally (-Q), we use the sync database (-S). We + ## use LC_ALL=C to make sure pacman output is not localized. + buffer=$(LC_ALL=C pacman -Qi "$@" 2>&1 1>&3 3>&- | cut -f2 -d "'") + [ -n "$buffer" ] && LC_ALL=C pacman -Si $buffer
Not only are you parsing the output of pacman and the internal format of the ALPM db, you're also parsing *error* output from pacman? So much groan...
This is the only way I've found to fallback to the sync db without running calling pacman more than necessary. This is way faster like this. Of course it's ugly, this is shell scripting. After all, this is the core of Unix scripting: you pipe and process the output. We definitely need expac.
+} 3>&1 | filter | ($opt_sort || remove_duplicates) | ($opt_total || cat)
More unnecessary subshells.
+ +# vim: set noet: -- 1.9.0
-- Pierre Neidhardt Where there are visible vapors, having their prevenance in ignited carbonaceous materials, there is conflagration.
On 14-03-05 18:05:45, Dave Reisner wrote:
+Display the size of PACKAGES. Duplicates are removed if any. The local database +is queried first; if the package is not found, the sync database is then used +for lookup.
Duplicates seem rather unexpected given the explanation that follows. You're querying either the local DB *or* the sync DB as a fallback. If there's duplicates, it's an implementation bug, no?
Forget my previous answer, I wasn't following... ;P Duplicates have nothing to do with the db, it's in case there are duplicate in ARGV. This can happen if you use the output of another command; for instance: $ pacsize $(pactree -l linux) Indeed, $ pactree -l linux | sort -u | wc -l 59 $ pactree -l linux | wc -l 131 -- Pierre Neidhardt Yesterday upon the stair I met a man who wasn't there. He wasn't there again today -- I think he's from the CIA.
participants (2)
-
Dave Reisner
-
Pierre Neidhardt