[pacman-dev] [PATCH] [RFC] makepkg: calculate exact total file size
Allan McRae
allan at archlinux.org
Sun Dec 25 17:09:43 EST 2011
On 26/12/11 03:27, Dave Reisner wrote:
> On Sun, Dec 25, 2011 at 06:20:27PM +0100, Florian Pritz wrote:
>> On 25.12.2011 16:06, Dave Reisner wrote:
>>> On Sun, Dec 25, 2011 at 08:37:24PM +1000, Allan McRae wrote:
>>>> The current calculation of the total file size for a package using "du"
>>>> suffers from issues in portability and correctness. Especially on btrfs,
>>>> this can result in clearly wrong package information such as:
>>>>
>>>> Download Size : 14684.29 KiB
>>>> Installed Size : 7628.00 KiB
>>>>
>>>> Use a slower but more accurate method involving "cat" and "wc" to
>>>> calculate total file size.
>>>>
>>>> Signed-off-by: Allan McRae <allan at archlinux.org>
>>>> ---
>>>> diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in
>>>> index 8c6984d..c78db86 100644
>>>> --- a/scripts/makepkg.sh.in
>>>> +++ b/scripts/makepkg.sh.in
>>>> @@ -1132,8 +1132,7 @@ write_pkginfo() {
>>>> else
>>>> local packager="Unknown Packager"
>>>> fi
>>>> - local size="$(@DUPATH@ -sk)"
>>>> - size="$(( ${size%%[^0-9]*} * 1024 ))"
>>>> + local size="$(find . | xargs cat 2>/dev/null | wc -c)"
>>>
>>> Unsafe xargs usage.
>>>
>>> find . -print0 | xargs -0 2>/dev/null | wc -c
>>
>> You forgot the cat.
>>
>> find . -print0 | xargs -0 cat 2>/dev/null | wc -c
>>
>
> meow. Sorry, mittens.
>
>>>
>>> Why can't we use @SIZECMD@ here? Same issues as du?
>>>
>>
>> SIZECMD returns one file size per line so we'd also have to add them up.
>
> Yup. I do this in paccache:
>
> @SIZECMD@ "${candidates[@]}" | awk '{ sum += $1 } END { print sum }'
I'm happy using @SIZECMD@:
allan at mugen ~/tmp/libreoffice
> find . -print0 | xargs -0 stat -L -c %s | awk '{sum += $1 } END {
print sum }'
196464312
allan at mugen ~/tmp/libreoffice
> find . -print0 | xargs -0 cat 2>/dev/null | wc -c
195276472
allan at mugen ~/tmp/libreoffice
> du -sb
196444140 .
Of course the numbers between the stat and wc approach are different
because stat adds a "block size" amount for each directory of which
there is 290 in the libreoffice package:
(196464312 - 195276472) / 4096 = 290
So the SIZECMD approach is filesystem dependent, but in a way that is
creates minimal difference, unlike the current approach which can wildly
vary. It is also about the same speed as the current du based approach.
>>
>> Small test (1003 bytes PKGBUILD on btrfs with default mount options):
>> SIZECMD (stat -L -c %s) 1003
>> du -skh 512
>> du -sb 1003
>> wc -c 1003
>>
>> If du -sb is portable that might be the easiest way.
>>
It is not... which is why currently -k is used and then we multiply by
1024. There are many messages on pacman-dev about this back when that
change was made to makepkg.
More information about the pacman-dev
mailing list