[arch-dev-public] doc size
Hello, I like this gnome feature that warns me about running out of space and proposes me to run a disk usage analyzer (baobab). Looking at the results, I found out that /usr/share/doc was taking a non negligible space : 140M It's actually the 3rd biggest directory in my filesystem after two games (openarena and flightgear). Well maybe 4th if I also count warsow in /opt :) It's especially gtkmm that made me go wtf. It takes 48MB. I checked the package size, 60MB. The next entries are far away, at 10MB, but still. Compared to the package size, they can still be pretty big (more than half). That made me want to check the proportion of docs in each package, and I quickly hacked a script together. I am pretty sure this subject has come here before, but I don't remember seeing any results, so I will post them here. That might help to establish some reasonable limits for when a package should be split. And with makepkg supporting that now, it's much better than before. -------------------------------------------------------- #!/bin/bash DOC_DIRS=(usr/{,local/}{,share/}{doc,gtk-doc} opt/*/{doc,gtk-doc}) filename=$1 pkgsize=$(bsdtar qxOf $1 .PKGINFO 2>/dev/null | grep size | awk '{ print $3 }') docsize=$(bsdtar tvf $filename ${DOC_DIRS[@]} 2>/dev/null | awk '{ SUM += $5 } END { print SUM }') [ -z "$docsize" ] && exit 0 docsizemb=$(( $docsize / 1024 / 1024 )) [ "$docsizemb" -eq 0 ] && exit 0 pkgsizemb=$(( $pkgsize / 1024 / 1024 )) echo "$(( 100 * $docsize / $pkgsize )) $docsizemb/$pkgsizemb $(basename $filename)" -------------------------------------------------------- $ (for i in /home/pkg/*; do ./doc-ratio $i; done) | sort -rn | column -t ratio docsize/pkgsize filename 75 8/11 libsigc++2.0-2.2.4.2-1-x86_64.pkg.tar.gz 75 45/60 gtkmm-2.18.2-1-x86_64.pkg.tar.gz 71 2/3 eggdbus-0.6-1-x86_64.pkg.tar.gz 68 10/15 glibmm-2.22.1-1-x86_64.pkg.tar.gz 67 1/2 libsoup-2.28.1-1-x86_64.pkg.tar.gz 66 2/3 pangomm-2.26.0-1-x86_64.pkg.tar.gz 63 2/3 libgdata-0.4.0-1-x86_64.pkg.tar.gz 62 7/12 telepathy-glib-0.9.2-1-x86_64.pkg.tar.gz 61 4/7 flac-1.2.1-2-x86_64.pkg.tar.gz 58 1/1 raptor-1.4.19-1-x86_64.pkg.tar.gz 58 10/17 pygtk-2.16.0-2-x86_64.pkg.tar.gz 56 1/2 cairo-1.8.8-1-x86_64.pkg.tar.gz 55 10/19 groff-1.20.1-3-x86_64.pkg.tar.gz 54 1/1 redland-1.0.9-4-x86_64.pkg.tar.gz 53 3/7 clutter-1.0.8-1-x86_64.pkg.tar.gz 52 1/1 policykit-0.9-9-x86_64.pkg.tar.gz 51 2/3 pango-1.26.2-1-x86_64.pkg.tar.gz 49 1/3 libxslt-1.1.26-1-x86_64.pkg.tar.gz 49 1/3 fontconfig-2.8.0-1-x86_64.pkg.tar.gz 47 5/11 libxml2-2.7.6-1-x86_64.pkg.tar.gz 47 1/2 pcre-8.00-1-x86_64.pkg.tar.gz 46 2/5 openexr-1.6.1-1-x86_64.pkg.tar.gz 44 1/2 polkit-0.95-1-x86_64.pkg.tar.gz 42 3/8 gstreamer0.10-base-0.10.25-1-x86_64.pkg.tar.gz 40 1/3 gmime-2.4.10-1-x86_64.pkg.tar.gz 38 4/12 gstreamer0.10-0.10.25-1-x86_64.pkg.tar.gz 36 1/3 pygobject-2.20.0-1-x86_64.pkg.tar.gz 35 2/5 at-spi-1.28.1-1-x86_64.pkg.tar.gz 34 2/7 libgnomeui-2.24.2-1-x86_64.pkg.tar.gz 34 1/3 libtiff-3.9.2-1-x86_64.pkg.tar.gz 29 6/22 evolution-data-server-2.28.2-1-x86_64.pkg.tar.gz 28 1/5 libbonobo-2.24.2-1-x86_64.pkg.tar.gz 26 1/6 gnome-vfs-2.24.2-2-x86_64.pkg.tar.gz 25 7/30 valgrind-3.5.0-3-x86_64.pkg.tar.gz 22 5/24 cmake-2.8.0-1-x86_64.pkg.tar.gz 20 3/15 gettext-0.17-3-x86_64.pkg.tar.gz 16 2/15 empathy-2.28.2-1-x86_64.pkg.tar.gz 15 1/7 evince-2.28.2-1-x86_64.pkg.tar.gz 14 1/7 gnome-keyring-2.28.2-1-x86_64.pkg.tar.gz 13 3/23 kdebase-runtime-4.3.4-1-x86_64.pkg.tar.gz 8 1/13 gok-2.28.1-1-x86_64.pkg.tar.gz
Xavier wrote:
Hello,
I like this gnome feature that warns me about running out of space and proposes me to run a disk usage analyzer (baobab). Looking at the results, I found out that /usr/share/doc was taking a non negligible space : 140M It's actually the 3rd biggest directory in my filesystem after two games (openarena and flightgear). Well maybe 4th if I also count warsow in /opt :)
It's especially gtkmm that made me go wtf. It takes 48MB. I checked the package size, 60MB. The next entries are far away, at 10MB, but still. Compared to the package size, they can still be pretty big (more than half). That made me want to check the proportion of docs in each package, and I quickly hacked a script together. I am pretty sure this subject has come here before, but I don't remember seeing any results, so I will post them here. That might help to establish some reasonable limits for when a package should be split. And with makepkg supporting that now, it's much better than before.
Didn't Dan post a patch for namcap to check the relative proportion of docs at some stage? Allan
On Wed, Dec 16, 2009 at 8:05 PM, Allan McRae <allan@archlinux.org> wrote:
Xavier wrote:
Hello,
I like this gnome feature that warns me about running out of space and proposes me to run a disk usage analyzer (baobab). Looking at the results, I found out that /usr/share/doc was taking a non negligible space : 140M It's actually the 3rd biggest directory in my filesystem after two games (openarena and flightgear). Well maybe 4th if I also count warsow in /opt :)
It's especially gtkmm that made me go wtf. It takes 48MB. I checked the package size, 60MB. The next entries are far away, at 10MB, but still. Compared to the package size, they can still be pretty big (more than half). That made me want to check the proportion of docs in each package, and I quickly hacked a script together. I am pretty sure this subject has come here before, but I don't remember seeing any results, so I will post them here. That might help to establish some reasonable limits for when a package should be split. And with makepkg supporting that now, it's much better than before.
Didn't Dan post a patch for namcap to check the relative proportion of docs at some stage?
Indeed, that's awesome. Seems I missed or forgot it. It's actually the only patch that came up after namcap 2.4 so there hasn't been a new release yet. http://projects.archlinux.org/namcap.git/ I will try it to compare with my results. I don't think that makes my results worthless though, I see both tools as complementary. Mine allows to quickly see what are the worst packages in your cache (either in ratio or in docsize), so that they can be treated in priority.
On Wed, Dec 16, 2009 at 1:18 PM, Xavier <shiningxc@gmail.com> wrote:
On Wed, Dec 16, 2009 at 8:05 PM, Allan McRae <allan@archlinux.org> wrote:
Xavier wrote:
Hello,
I like this gnome feature that warns me about running out of space and proposes me to run a disk usage analyzer (baobab). Looking at the results, I found out that /usr/share/doc was taking a non negligible space : 140M It's actually the 3rd biggest directory in my filesystem after two games (openarena and flightgear). Well maybe 4th if I also count warsow in /opt :)
It's especially gtkmm that made me go wtf. It takes 48MB. I checked the package size, 60MB. The next entries are far away, at 10MB, but still. Compared to the package size, they can still be pretty big (more than half). That made me want to check the proportion of docs in each package, and I quickly hacked a script together. I am pretty sure this subject has come here before, but I don't remember seeing any results, so I will post them here. That might help to establish some reasonable limits for when a package should be split. And with makepkg supporting that now, it's much better than before.
Didn't Dan post a patch for namcap to check the relative proportion of docs at some stage?
Indeed, that's awesome. Seems I missed or forgot it. It's actually the only patch that came up after namcap 2.4 so there hasn't been a new release yet. http://projects.archlinux.org/namcap.git/
I will try it to compare with my results. I don't think that makes my results worthless though, I see both tools as complementary. Mine allows to quickly see what are the worst packages in your cache (either in ratio or in docsize), so that they can be treated in priority.
Yeah, I can't remember which package it was that I noticed this on, but it might have been gtkmm and it made me go "WTF" as well, thus the reason for the patch. -Dan
On Wed, Dec 16, 2009 at 10:46 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Wed, Dec 16, 2009 at 1:18 PM, Xavier <shiningxc@gmail.com> wrote:
On Wed, Dec 16, 2009 at 8:05 PM, Allan McRae <allan@archlinux.org> wrote:
Xavier wrote:
Hello,
I like this gnome feature that warns me about running out of space and proposes me to run a disk usage analyzer (baobab). Looking at the results, I found out that /usr/share/doc was taking a non negligible space : 140M It's actually the 3rd biggest directory in my filesystem after two games (openarena and flightgear). Well maybe 4th if I also count warsow in /opt :)
It's especially gtkmm that made me go wtf. It takes 48MB. I checked the package size, 60MB. The next entries are far away, at 10MB, but still. Compared to the package size, they can still be pretty big (more than half). That made me want to check the proportion of docs in each package, and I quickly hacked a script together. I am pretty sure this subject has come here before, but I don't remember seeing any results, so I will post them here. That might help to establish some reasonable limits for when a package should be split. And with makepkg supporting that now, it's much better than before.
Didn't Dan post a patch for namcap to check the relative proportion of docs at some stage?
Indeed, that's awesome. Seems I missed or forgot it. It's actually the only patch that came up after namcap 2.4 so there hasn't been a new release yet. http://projects.archlinux.org/namcap.git/
I will try it to compare with my results. I don't think that makes my results worthless though, I see both tools as complementary. Mine allows to quickly see what are the worst packages in your cache (either in ratio or in docsize), so that they can be treated in priority.
Yeah, I can't remember which package it was that I noticed this on, but it might have been gtkmm and it made me go "WTF" as well, thus the reason for the patch.
It was indeed gtkmm, you posted it on the list too :) So I played a bit with namcap, I tweaked it a bit to include more information, to reconstruct a list similar to my first one. See attached patch. The results are slightly different, mostly because makepkg overestimates uncompressed size by not using -a with du, while namcap lotsofdocs computes the real size. Now that I think about it, maybe the trick/hack I used in my first script would actually be a portable way to get the real uncompressed size : bsdtar tvf foo.pkg.tar.gz 2>/dev/null | awk '{ SUM += $5 } END { print SUM }' But this is offtopic, I should find again the makepkg bug report we had about this that we probably rejected :P $ ./namcap.py -r lotsofdocs /home/pkg/* > result $ sort -k 3 -r result | cut -d';' -f1 | column -t -s'W' libsigc++2.0 : Package was 86% docs by size (9146/10574K) mcpp : Package was 83% docs by size (761/915K) pangomm : Package was 81% docs by size (2689/3295K) gtkmm : Package was 80% docs by size (45/56M) eggdbus : Package was 78% docs by size (2634/3366K) glibmm : Package was 78% docs by size (10/13M) libogg : Package was 77% docs by size (232/299K) libsoup : Package was 75% docs by size (1467/1948K) randrproto : Package was 74% docs by size (77/103K) libsoup : Package was 74% docs by size (1448/1939K) libgdata : Package was 73% docs by size (2126/2898K) flac : Package was 69% docs by size (4534/6485K) fontconfig : Package was 69% docs by size (1917/2754K) clutter-gtk : Package was 69% docs by size (164/237K) policykit : Package was 66% docs by size (1048/1583K) telepathy-glib : Package was 64% docs by size (7/11M) raptor : Package was 64% docs by size (1144/1771K) pygtk : Package was 64% docs by size (10/15M) libdatrie : Package was 63% docs by size (93/147K) redland : Package was 63% docs by size (1040/1639K) libepc : Package was 62% docs by size (494/786K) renderproto : Package was 62% docs by size (36/58K) libunique : Package was 62% docs by size (139/221K) cairo : Package was 62% docs by size (1177/1889K) groff : Package was 60% docs by size (10/17M) rasqal : Package was 59% docs by size (567/948K) libnotify : Package was 57% docs by size (87/153K) libgtop : Package was 57% docs by size (607/1051K) clutter : Package was 57% docs by size (3/6M) libbeagle : Package was 56% docs by size (430/760K) poppler-glib : Package was 56% docs by size (394/702K) pango : Package was 56% docs by size (2092/3679K) libxslt : Package was 56% docs by size (1660/2955K) libtheora : Package was 56% docs by size (1020/1819K) compositeproto : Package was 55% docs by size (13/23K) pcre : Package was 55% docs by size (1060/1903K) polkit : Package was 54% docs by size (1093/2010K) libxml2 : Package was 51% docs by size (5/10M) libxklavier : Package was 51% docs by size (185/359K) damageproto : Package was 50% docs by size (7/14K) libgsf : Package was 50% docs by size (658/1291K) at-spi : Package was 50% docs by size (2/4M)
participants (3)
-
Allan McRae
-
Dan McGee
-
Xavier