[pacman-dev] [PATCH] [RFC] makepkg: calculate exact total file size

Allan McRae allan at archlinux.org
Sun Dec 25 17:09:43 EST 2011


On 26/12/11 03:27, Dave Reisner wrote:
> On Sun, Dec 25, 2011 at 06:20:27PM +0100, Florian Pritz wrote:
>> On 25.12.2011 16:06, Dave Reisner wrote:
>>> On Sun, Dec 25, 2011 at 08:37:24PM +1000, Allan McRae wrote:
>>>> The current calculation of the total file size for a package using "du"
>>>> suffers from issues in portability and correctness.  Especially on btrfs,
>>>> this can result in clearly wrong package information such as:
>>>>
>>>> Download Size  : 14684.29 KiB
>>>> Installed Size : 7628.00 KiB
>>>>
>>>> Use a slower but more accurate method involving "cat" and "wc" to
>>>> calculate total file size.
>>>>
>>>> Signed-off-by: Allan McRae <allan at archlinux.org>
>>>> ---
>>>> diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in
>>>> index 8c6984d..c78db86 100644
>>>> --- a/scripts/makepkg.sh.in
>>>> +++ b/scripts/makepkg.sh.in
>>>> @@ -1132,8 +1132,7 @@ write_pkginfo() {
>>>>  	else
>>>>  		local packager="Unknown Packager"
>>>>  	fi
>>>> -	local size="$(@DUPATH@ -sk)"
>>>> -	size="$(( ${size%%[^0-9]*} * 1024 ))"
>>>> +	local size="$(find . | xargs cat 2>/dev/null | wc -c)"
>>>
>>> Unsafe xargs usage.
>>>
>>> find . -print0 | xargs -0 2>/dev/null | wc -c
>>
>> You forgot the cat.
>>
>> find . -print0 | xargs -0 cat 2>/dev/null | wc -c
>>
> 
> meow. Sorry, mittens.
> 
>>>
>>> Why can't we use @SIZECMD@ here? Same issues as du?
>>>
>>
>> SIZECMD returns one file size per line so we'd also have to add them up.
> 
> Yup. I do this in paccache:
> 
> @SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END { print sum }'

I'm happy using @SIZECMD@:

allan at mugen ~/tmp/libreoffice
> find . -print0 | xargs -0 stat -L -c %s | awk '{sum += $1 } END {
print sum }'
196464312

allan at mugen ~/tmp/libreoffice
> find . -print0 | xargs -0 cat 2>/dev/null | wc -c
195276472

allan at mugen ~/tmp/libreoffice
> du -sb
196444140	.

Of course the numbers between the stat and wc approach are different
because stat adds a "block size" amount for each directory of which
there is 290 in the libreoffice package:

(196464312 - 195276472) / 4096 = 290

So the SIZECMD approach is filesystem dependent, but in a way that is
creates minimal difference, unlike the current approach which can wildly
vary.  It is also about the same speed as the current du based approach.

>>
>> Small test (1003 bytes PKGBUILD on btrfs with default mount options):
>> SIZECMD (stat -L -c %s) 1003
>> du -skh                  512
>> du -sb                  1003
>> wc -c                   1003
>>
>> If du -sb is portable that might be the easiest way.
>>

It is not...  which is why currently -k is used and then we multiply by
1024.  There are many messages on pacman-dev about this back when that
change was made to makepkg.




More information about the pacman-dev mailing list