Re: [arch-general] [arch-dev-public] pkgstats: first results
Pierre Schmitz wrote:
Hi all,
I think you all are interested in the result of Allan's crazy idea to get some stats about package usage. I spent some/alot time this weekend to present you some stats.
At first: I played a bit with gettext and some usefull pages are available in German and English (depends on your browser's config):
* http://www.archlinux.de/?page=ArchitectureDifferences * http://www.archlinux.de/?page=MirrorStatus * http://www.archlinux.de/?page=PackageStatistics (That's the one; be warnded: atm it loads > 2MB of pure HTML!)
For those who want to play with some sql queries, I have uploaded a (reduced) db-snapshot: http://users.archlinux.de/~pierre/tmp/pkgdb-stripped.sql.gz
Before announcing this we should discuss the results and talk about what we learn about them.
I'll make a start: (topdown)
* extra and community have similar size * more than 1200 submissions since friday. Thanks! :-) * installation size varies from 126 to amazing 2800 * 1/4 use x86_64 * Nearly 70% of packages are from extra. Nice. * Only 7% are installed from community and a similar amount is in no official repo (Might be a sign that there is something wrong with priorities in [community]) * About 2% from extra and 3% from community aren't used by anybody! The unused kde-l10n pacakges are no problem; I create them automatically * Nearly 20% of all users (that includes 3/4 i686) use lib32 packages. * There are lots of rarly used packages in all repos * kdemod-kdelibs is installed by 14,26 % while kdelibs fomr [extra] is installed by 34,05 %. Maybe splitting support in makepkg and devtools should get a higher priority
...that should do it for a start.
Just a warning, generalizing based on these numbers (and on any numbers in fact) is very dangerous. Glenn
RedShift wrote:
Pierre Schmitz wrote:
Hi all,
I think you all are interested in the result of Allan's crazy idea to get some stats about package usage. I spent some/alot time this weekend to present you some stats.
At first: I played a bit with gettext and some usefull pages are available in German and English (depends on your browser's config):
* http://www.archlinux.de/?page=ArchitectureDifferences * http://www.archlinux.de/?page=MirrorStatus * http://www.archlinux.de/?page=PackageStatistics (That's the one; be warnded: atm it loads > 2MB of pure HTML!)
For those who want to play with some sql queries, I have uploaded a (reduced) db-snapshot: http://users.archlinux.de/~pierre/tmp/pkgdb-stripped.sql.gz
Before announcing this we should discuss the results and talk about what we learn about them.
I'll make a start: (topdown)
* extra and community have similar size * more than 1200 submissions since friday. Thanks! :-) * installation size varies from 126 to amazing 2800 * 1/4 use x86_64 * Nearly 70% of packages are from extra. Nice. * Only 7% are installed from community and a similar amount is in no official repo (Might be a sign that there is something wrong with priorities in [community]) * About 2% from extra and 3% from community aren't used by anybody! The unused kde-l10n pacakges are no problem; I create them automatically * Nearly 20% of all users (that includes 3/4 i686) use lib32 packages. * There are lots of rarly used packages in all repos * kdemod-kdelibs is installed by 14,26 % while kdelibs fomr [extra] is installed by 34,05 %. Maybe splitting support in makepkg and devtools should get a higher priority ...that should do it for a start.
Just a warning, generalizing based on these numbers (and on any numbers in fact) is very dangerous.
Glenn
Yes, but it better than making conclusions based on no numbers. Allan
On Montag, 10. November 2008 08:28 Allan McRae wrote:
Yes, but it better than making conclusions based on no numbers.
That is right.-) What i'm wondering a bit is that there is no release number in "pkgstats -s". I don't want to make it more difficult as necessary and i don't want to critize something but from my view this is the only chance to recognize if people stay with an older version or with an own version of a package. If this is not relevant for the stats or something for pkgstatsver=1011.0 ignore my lines.-) Finally: Nice idea to have this web pages and thanks everyboy for his work. See you, Attila
On Monday 10 November 2008 08:21:38 RedShift wrote:
Just a warning, generalizing based on these numbers (and on any numbers in fact) is very dangerous.
I agree. The fact that data comes from users who volontary installed the pkgstat package make the results biased. The best 'statistical' way should be including this in pacman, whithout asking user for permission to submit. The best etical way shoud be the same, but asking user for permission before submitting anything. Anyway, those results are really interesting ! Charly
Charly Ghislain wrote:
On Monday 10 November 2008 08:21:38 RedShift wrote:
Just a warning, generalizing based on these numbers (and on any numbers in fact) is very dangerous.
I agree. The fact that data comes from users who volontary installed the pkgstat package make the results biased. The best 'statistical' way should be including this in pacman, whithout asking user for permission to submit. The best etical way shoud be the same, but asking user for permission before submitting anything.
Anyway, those results are really interesting !
Charly
a pacman which sends informations home - unasked?! are you serious? that would be a data privacy horror! i don't want to have to observe pacman's traffic the whole time fearing leaks. a feature/bug like that would rather raise a huge scandal than appreciation. i have no doubt on this. an effort on trying to collect helpful data from users (or at least those who want to do so) was pkgstat, which is currently heavy discussed, as you surely have noticed. regards Hubert Grzeskowiak
Hubert Grzeskowiak wrote:
Charly Ghislain wrote:
On Monday 10 November 2008 08:21:38 RedShift wrote:
Just a warning, generalizing based on these numbers (and on any numbers in fact) is very dangerous.
I agree. The fact that data comes from users who volontary installed the pkgstat package make the results biased. The best 'statistical' way should be including this in pacman, whithout asking user for permission to submit. The best etical way shoud be the same, but asking user for permission before submitting anything.
Anyway, those results are really interesting !
Charly
a pacman which sends informations home - unasked?! are you serious? that would be a data privacy horror! i don't want to have to observe pacman's traffic the whole time fearing leaks. a feature/bug like that would rather raise a huge scandal than appreciation. i have no doubt on this.
an effort on trying to collect helpful data from users (or at least those who want to do so) was pkgstat, which is currently heavy discussed, as you surely have noticed.
regards Hubert Grzeskowiak
Uhm... he said not asking for permission would be _purely statisticly_ most valuable but he explicitly mentioned it would be more ethically correct to ask for permission. Dieter
Dieter Plaetinck schrieb:
Hubert Grzeskowiak wrote:
Charly Ghislain wrote:
On Monday 10 November 2008 08:21:38 RedShift wrote:
Just a warning, generalizing based on these numbers (and on any numbers in fact) is very dangerous.
I agree. The fact that data comes from users who volontary installed the pkgstat package make the results biased. The best 'statistical' way should be including this in pacman, whithout asking user for permission to submit. The best etical way shoud be the same, but asking user for permission before submitting anything.
Anyway, those results are really interesting !
Charly
a pacman which sends informations home - unasked?! are you serious? that would be a data privacy horror! i don't want to have to observe pacman's traffic the whole time fearing leaks. a feature/bug like that would rather raise a huge scandal than appreciation. i have no doubt on this.
an effort on trying to collect helpful data from users (or at least those who want to do so) was pkgstat, which is currently heavy discussed, as you surely have noticed.
regards Hubert Grzeskowiak
Uhm... he said not asking for permission would be _purely statisticly_ most valuable but he explicitly mentioned it would be more ethically correct to ask for permission. Dieter
already the idea of statistics made on top of users' unawareness is unreasonable. and i think the best way is to not ask people, but offer it to interested users, so that they can contribute if they feel the wish, not if they don't care but feel forced. critique or/and ideas are always welcome. regards Hubert Grzeskowiak
On Monday 10 November 2008 23:12:49 Hubert Grzeskowiak wrote:
a pacman which sends informations home - unasked?! are you serious? that would be a data privacy horror! i don't want to have to observe pacman's traffic the whole time fearing leaks. a feature/bug like that would rather raise a huge scandal than appreciation. i have no doubt on this.
Of course i dont want this to happen neither. Im just saying it would be the best population of result, 'statistically speaking'. As of pkgstat, yes it is heavily discussed, but i never installed it before i started reading this thread. Regards, Charly
Charly Ghislain wrote:
On Monday 10 November 2008 23:12:49 Hubert Grzeskowiak wrote:
a pacman which sends informations home - unasked?! are you serious? that would be a data privacy horror! i don't want to have to observe pacman's traffic the whole time fearing leaks. a feature/bug like that would rather raise a huge scandal than appreciation. i have no doubt on this.
Of course i dont want this to happen neither. Im just saying it would be the best population of result, 'statistically speaking'.
As of pkgstat, yes it is heavily discussed, but i never installed it before i started reading this thread.
Regards,
Charly
I agree with the statistical issue. Statistics is a funny thing. Maybe about 1500 user isn't a very significant part of the community. And as pointed by Charly, it could be a biased sample. Not to mention the problems with using IP to determine the uniqueness of the submissions (many machines under same IP, dinamic IPs). I'm not saying pkgstats is invalid. Only that the statistical results from it must be taken with a grain of salt. Armando
Cheers folks I suggest that on the install of pacman, install scripts should be suggested the install of pkgstats to help more people using it. 1500 submits :) how much archers are there? (I had been thinking of this some time now lol) Anyway, I think a better way would be the mirrors making the statics of downloaded pkgs so it would be less intrusive and less risky. But that would require a lot work on mirrors and statics would be bias because it wouldn't know of pkg that are installed and uninstalled right away. pkgstats looks nice solution for the current state of the community (devs + AUR TUs + just users) I'm happy that someone made it come to work :) Another thing that would be nice to now about is if the packages are really used. I, like (I think) most of the users just install all packages in core and then what it misses even if doesn't use it no more in the future. In my case I try to keep few packages that I don't use in the arch partition because I don't have much space to waste but I don't think I wouldn't care much if I had a bigger partition for arch. Still I have some packages that I even forgot I have installed, for example, the ones that don't have a .desktop file. (The list of packages is too big to look at it) This is another problem but I just wanted to warn about a little bias that might exist in the numbers :) Greetings, raca Ter, 2008-11-11 às 00:44 -0200, Armando M. Baratti escreveu:
Charly Ghislain wrote:
On Monday 10 November 2008 23:12:49 Hubert Grzeskowiak wrote:
a pacman which sends informations home - unasked?! are you serious? that would be a data privacy horror! i don't want to have to observe pacman's traffic the whole time fearing leaks. a feature/bug like that would rather raise a huge scandal than appreciation. i have no doubt on this.
Of course i dont want this to happen neither. Im just saying it would be the best population of result, 'statistically speaking'.
As of pkgstat, yes it is heavily discussed, but i never installed it before i started reading this thread.
Regards,
Charly
I agree with the statistical issue.
Statistics is a funny thing. Maybe about 1500 user isn't a very significant part of the community.
And as pointed by Charly, it could be a biased sample. Not to mention the problems with using IP to determine the uniqueness of the submissions (many machines under same IP, dinamic IPs).
I'm not saying pkgstats is invalid. Only that the statistical results from it must be taken with a grain of salt.
Armando
hi raca,
I suggest that on the install of pacman, install scripts should be suggested the install of pkgstats to help more people using it.
i'm against. it is already easy enaugh to install & use it. and btw it needs time to setup one's arch installation - statistics of newly installed systems wouldn't be that meaningful, don't you think?
Anyway, I think a better way would be the mirrors making the statics of downloaded pkgs so it would be less intrusive and less risky. But that would require a lot work on mirrors and statics would be bias because it wouldn't know of pkg that are installed and uninstalled right away.
to the contras ou already mentioned, there should be added: -many people have their own mirrors -some people are reinstalling packages from the mirrors (those not using the cached pkgs)
Another thing that would be nice to now about is if the packages are really used.
call me paranoid, but i don't want others to know what i'm exactly doing with my computer! regards Hubert Grzeskowiak
Ter, 2008-11-11 às 14:07 +0100, Hubert Grzeskowiak escreveu:
I suggest that on the install of pacman, install scripts should be suggested the install of pkgstats to help more people using it.
i'm against. it is already easy enaugh to install & use it. and btw it needs time to setup one's arch installation - statistics of newly installed systems wouldn't be that meaningful, don't you think?
Yeah, I agree.
Another thing that would be nice to now about is if the packages are really used.
call me paranoid, but i don't want others to know what i'm exactly doing with my computer!
Me neither most of the times :) Greetings, raca
On Montag, 10. November 2008 08:21 RedShift wrote:
Just a warning, generalizing based on these numbers (and on any numbers in fact) is very dangerous.
I agree and i think that the stats about the repositories from http://www.archlinux.de/?page=PackageStatistics "forget" the opportunity that a package could have the same name as the official from core or extra but is from a local or another repository. I suggest that pkgstats includes the name of the repository of the package as at example the output of "pacman -Qs" without the description lines. Sorry, but i think only the name of a package is too less for decide something. But still again, it is very nice to have this stats. See you, Attila
participants (8)
-
Allan McRae
-
Armando M. Baratti
-
Attila
-
Charly Ghislain
-
Dieter Plaetinck
-
Hubert Grzeskowiak
-
raca
-
RedShift