[pacman-dev] [PATCH] [RFC] makepkg: calculate exact total file size
The current calculation of the total file size for a package using "du" suffers from issues in portability and correctness. Especially on btrfs, this can result in clearly wrong package information such as: Download Size : 14684.29 KiB Installed Size : 7628.00 KiB Use a slower but more accurate method involving "cat" and "wc" to calculate total file size. Signed-off-by: Allan McRae <allan@archlinux.org> --- configure.ac | 1 - scripts/Makefile.am | 1 - scripts/makepkg.sh.in | 3 +-- 3 files changed, 1 insertions(+), 4 deletions(-) diff --git a/configure.ac b/configure.ac index fb1e0b3..6941054 100644 --- a/configure.ac +++ b/configure.ac @@ -245,7 +245,6 @@ esac AM_CONDITIONAL([CYGWIN], test "x$host_os_cygwin" = "xyes") AM_CONDITIONAL([DARWIN], test "x$host_os_darwin" = "xyes") -AC_PATH_PROGS([DUPATH], [du], [du], [/usr/bin$PATH_SEPARATOR/bin] ) AC_SUBST(SIZECMD) AC_SUBST(SEDINPLACE) AC_SUBST(STRIP_BINARIES) diff --git a/scripts/Makefile.am b/scripts/Makefile.am index 328fbff..727de25 100644 --- a/scripts/Makefile.am +++ b/scripts/Makefile.am @@ -53,7 +53,6 @@ edit = sed \ -e 's|@BUILDSCRIPT[@]|$(BUILDSCRIPT)|g' \ -e 's|@SIZECMD[@]|$(SIZECMD)|g' \ -e 's|@SEDINPLACE[@]|$(SEDINPLACE)|g' \ - -e 's|@DUPATH[@]|$(DUPATH)|g' \ -e 's|@SCRIPTNAME[@]|$@|g' \ -e 's|@configure_input[@]|Generated from $@.sh.in; do not edit by hand.|g' diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 8c6984d..c78db86 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -1132,8 +1132,7 @@ write_pkginfo() { else local packager="Unknown Packager" fi - local size="$(@DUPATH@ -sk)" - size="$(( ${size%%[^0-9]*} * 1024 ))" + local size="$(find . | xargs cat 2>/dev/null | wc -c)" msg2 "$(gettext "Generating %s file...")" ".PKGINFO" echo "# Generated by makepkg $myver" -- 1.7.8.1
On Sun, Dec 25, 2011 at 08:37:24PM +1000, Allan McRae wrote:
The current calculation of the total file size for a package using "du" suffers from issues in portability and correctness. Especially on btrfs, this can result in clearly wrong package information such as:
Download Size : 14684.29 KiB Installed Size : 7628.00 KiB
Use a slower but more accurate method involving "cat" and "wc" to calculate total file size.
Signed-off-by: Allan McRae <allan@archlinux.org> --- configure.ac | 1 - scripts/Makefile.am | 1 - scripts/makepkg.sh.in | 3 +-- 3 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/configure.ac b/configure.ac index fb1e0b3..6941054 100644 --- a/configure.ac +++ b/configure.ac @@ -245,7 +245,6 @@ esac
AM_CONDITIONAL([CYGWIN], test "x$host_os_cygwin" = "xyes") AM_CONDITIONAL([DARWIN], test "x$host_os_darwin" = "xyes") -AC_PATH_PROGS([DUPATH], [du], [du], [/usr/bin$PATH_SEPARATOR/bin] ) AC_SUBST(SIZECMD) AC_SUBST(SEDINPLACE) AC_SUBST(STRIP_BINARIES) diff --git a/scripts/Makefile.am b/scripts/Makefile.am index 328fbff..727de25 100644 --- a/scripts/Makefile.am +++ b/scripts/Makefile.am @@ -53,7 +53,6 @@ edit = sed \ -e 's|@BUILDSCRIPT[@]|$(BUILDSCRIPT)|g' \ -e 's|@SIZECMD[@]|$(SIZECMD)|g' \ -e 's|@SEDINPLACE[@]|$(SEDINPLACE)|g' \ - -e 's|@DUPATH[@]|$(DUPATH)|g' \ -e 's|@SCRIPTNAME[@]|$@|g' \ -e 's|@configure_input[@]|Generated from $@.sh.in; do not edit by hand.|g'
diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 8c6984d..c78db86 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -1132,8 +1132,7 @@ write_pkginfo() { else local packager="Unknown Packager" fi - local size="$(@DUPATH@ -sk)" - size="$(( ${size%%[^0-9]*} * 1024 ))" + local size="$(find . | xargs cat 2>/dev/null | wc -c)"
Unsafe xargs usage. find . -print0 | xargs -0 2>/dev/null | wc -c Why can't we use @SIZECMD@ here? Same issues as du?
msg2 "$(gettext "Generating %s file...")" ".PKGINFO" echo "# Generated by makepkg $myver" -- 1.7.8.1
On 25.12.2011 16:06, Dave Reisner wrote:
On Sun, Dec 25, 2011 at 08:37:24PM +1000, Allan McRae wrote:
The current calculation of the total file size for a package using "du" suffers from issues in portability and correctness. Especially on btrfs, this can result in clearly wrong package information such as:
Download Size : 14684.29 KiB Installed Size : 7628.00 KiB
Use a slower but more accurate method involving "cat" and "wc" to calculate total file size.
Signed-off-by: Allan McRae <allan@archlinux.org> --- diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 8c6984d..c78db86 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -1132,8 +1132,7 @@ write_pkginfo() { else local packager="Unknown Packager" fi - local size="$(@DUPATH@ -sk)" - size="$(( ${size%%[^0-9]*} * 1024 ))" + local size="$(find . | xargs cat 2>/dev/null | wc -c)"
Unsafe xargs usage.
find . -print0 | xargs -0 2>/dev/null | wc -c
You forgot the cat. find . -print0 | xargs -0 cat 2>/dev/null | wc -c
Why can't we use @SIZECMD@ here? Same issues as du?
SIZECMD returns one file size per line so we'd also have to add them up. Small test (1003 bytes PKGBUILD on btrfs with default mount options): SIZECMD (stat -L -c %s) 1003 du -skh 512 du -sb 1003 wc -c 1003 If du -sb is portable that might be the easiest way. -- Florian Pritz
On Sun, Dec 25, 2011 at 06:20:27PM +0100, Florian Pritz wrote:
On 25.12.2011 16:06, Dave Reisner wrote:
On Sun, Dec 25, 2011 at 08:37:24PM +1000, Allan McRae wrote:
The current calculation of the total file size for a package using "du" suffers from issues in portability and correctness. Especially on btrfs, this can result in clearly wrong package information such as:
Download Size : 14684.29 KiB Installed Size : 7628.00 KiB
Use a slower but more accurate method involving "cat" and "wc" to calculate total file size.
Signed-off-by: Allan McRae <allan@archlinux.org> --- diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 8c6984d..c78db86 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -1132,8 +1132,7 @@ write_pkginfo() { else local packager="Unknown Packager" fi - local size="$(@DUPATH@ -sk)" - size="$(( ${size%%[^0-9]*} * 1024 ))" + local size="$(find . | xargs cat 2>/dev/null | wc -c)"
Unsafe xargs usage.
find . -print0 | xargs -0 2>/dev/null | wc -c
You forgot the cat.
find . -print0 | xargs -0 cat 2>/dev/null | wc -c
meow. Sorry, mittens.
Why can't we use @SIZECMD@ here? Same issues as du?
SIZECMD returns one file size per line so we'd also have to add them up.
Yup. I do this in paccache: @SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END { print sum }'
Small test (1003 bytes PKGBUILD on btrfs with default mount options): SIZECMD (stat -L -c %s) 1003 du -skh 512 du -sb 1003 wc -c 1003
If du -sb is portable that might be the easiest way.
-- Florian Pritz
On 26/12/11 03:27, Dave Reisner wrote:
On Sun, Dec 25, 2011 at 06:20:27PM +0100, Florian Pritz wrote:
On 25.12.2011 16:06, Dave Reisner wrote:
On Sun, Dec 25, 2011 at 08:37:24PM +1000, Allan McRae wrote:
The current calculation of the total file size for a package using "du" suffers from issues in portability and correctness. Especially on btrfs, this can result in clearly wrong package information such as:
Download Size : 14684.29 KiB Installed Size : 7628.00 KiB
Use a slower but more accurate method involving "cat" and "wc" to calculate total file size.
Signed-off-by: Allan McRae <allan@archlinux.org> --- diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index 8c6984d..c78db86 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -1132,8 +1132,7 @@ write_pkginfo() { else local packager="Unknown Packager" fi - local size="$(@DUPATH@ -sk)" - size="$(( ${size%%[^0-9]*} * 1024 ))" + local size="$(find . | xargs cat 2>/dev/null | wc -c)"
Unsafe xargs usage.
find . -print0 | xargs -0 2>/dev/null | wc -c
You forgot the cat.
find . -print0 | xargs -0 cat 2>/dev/null | wc -c
meow. Sorry, mittens.
Why can't we use @SIZECMD@ here? Same issues as du?
SIZECMD returns one file size per line so we'd also have to add them up.
Yup. I do this in paccache:
@SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END { print sum }'
find . -print0 | xargs -0 stat -L -c %s | awk '{sum += $1 } END {
I'm happy using @SIZECMD@: allan@mugen ~/tmp/libreoffice print sum }' 196464312 allan@mugen ~/tmp/libreoffice
find . -print0 | xargs -0 cat 2>/dev/null | wc -c 195276472
allan@mugen ~/tmp/libreoffice
du -sb 196444140 .
Of course the numbers between the stat and wc approach are different because stat adds a "block size" amount for each directory of which there is 290 in the libreoffice package: (196464312 - 195276472) / 4096 = 290 So the SIZECMD approach is filesystem dependent, but in a way that is creates minimal difference, unlike the current approach which can wildly vary. It is also about the same speed as the current du based approach.
Small test (1003 bytes PKGBUILD on btrfs with default mount options): SIZECMD (stat -L -c %s) 1003 du -skh 512 du -sb 1003 wc -c 1003
If du -sb is portable that might be the easiest way.
It is not... which is why currently -k is used and then we multiply by 1024. There are many messages on pacman-dev about this back when that change was made to makepkg.
On 25.12.2011 23:09, Allan McRae wrote:
On 26/12/11 03:27, Dave Reisner wrote:
On Sun, Dec 25, 2011 at 06:20:27PM +0100, Florian Pritz wrote:
On 25.12.2011 16:06, Dave Reisner wrote:
Why can't we use @SIZECMD@ here? Same issues as du?
SIZECMD returns one file size per line so we'd also have to add them up.
Yup. I do this in paccache:
@SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END { print sum }'
I'm happy using @SIZECMD@:
find . -print0 | xargs -0 stat -L -c %s | awk '{sum += $1 } END {
allan@mugen ~/tmp/libreoffice print sum }' 196464312
allan@mugen ~/tmp/libreoffice
find . -print0 | xargs -0 cat 2>/dev/null | wc -c 195276472
allan@mugen ~/tmp/libreoffice
du -sb 196444140 .
Of course the numbers between the stat and wc approach are different because stat adds a "block size" amount for each directory of which there is 290 in the libreoffice package:
(196464312 - 195276472) / 4096 = 290
So the SIZECMD approach is filesystem dependent, but in a way that is creates minimal difference, unlike the current approach which can wildly vary. It is also about the same speed as the current du based approach.
find . \! -type -d -print0 -- Florian Pritz
On Sun, Dec 25, 2011 at 11:50:57PM +0100, Florian Pritz wrote:
On 25.12.2011 23:09, Allan McRae wrote:
On 26/12/11 03:27, Dave Reisner wrote:
On Sun, Dec 25, 2011 at 06:20:27PM +0100, Florian Pritz wrote:
On 25.12.2011 16:06, Dave Reisner wrote:
Why can't we use @SIZECMD@ here? Same issues as du?
SIZECMD returns one file size per line so we'd also have to add them up.
Yup. I do this in paccache:
@SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END { print sum }'
I'm happy using @SIZECMD@:
find . -print0 | xargs -0 stat -L -c %s | awk '{sum += $1 } END {
allan@mugen ~/tmp/libreoffice print sum }' 196464312
allan@mugen ~/tmp/libreoffice
find . -print0 | xargs -0 cat 2>/dev/null | wc -c 195276472
allan@mugen ~/tmp/libreoffice
du -sb 196444140 .
Of course the numbers between the stat and wc approach are different because stat adds a "block size" amount for each directory of which there is 290 in the libreoffice package:
(196464312 - 195276472) / 4096 = 290
So the SIZECMD approach is filesystem dependent, but in a way that is creates minimal difference, unlike the current approach which can wildly vary. It is also about the same speed as the current du based approach.
find . \! -type -d -print0
But directories add to the size of the package. IMO, it's correct to just sum up everything. d P.S. history expansion doesn't happen in a non-interactive shell. there's no need to escape the !.
participants (3)
-
Allan McRae
-
Dave Reisner
-
Florian Pritz