On Mon, Jan 26, 2009 at 4:17 PM, Grigorios Bouzakis <grbzks@gmail.com> wrote:
On Mon, Jan 26, 2009 at 02:39:38PM -0600, kludge wrote:
if there is no voting on packages in [community], then what mechanism exists for users to suggest/cheer on a package's promotion to [extra]?
-kludge
One users answer:
Maybe the same mechanism that would move packages out of extra to unsupported or community? Its called pkgstats. pacman -S pkgstats and then exec pkgstats as root IIRC.
Greg
Holy smokes, the current direction things are going in is scaring me. I don't understand why so much reliance has been put into pkgstats when there are a lot of fundamental problems with it in its current form. These are the EXACT same concerns that Bob Finch and I brought up with the recent vote on community package guidelines, and it seems everyone has been ignoring those issues and is ready to put all sorts of trust into pkgstats without examining its pitfalls. For starters, pkgstats has been around only since the beginning of November of last year (a little less than three months now). Although its implementation is more simple by design, we did have ArchStats before that, and people had NOWHERE near as much faith in that tool's statistics as they apparently do in pkgstats. Yes ArchStats was more complicated (and unmaintained), but it also had other features and protections that pkgstats does not. I agree that simplicity is nice from a development standpoint, but I'd argue that it also has the potential to lower the tool's accuracy. As far as implementation issues, right now, pkgstats does not accommodate for multiple machines with the same IP address (extremely common in home environments running NAT). It also cannot track changes over time or expire old data, which I'd think would be an absolute necessity if you want want to be able to rely on it for getting an accurate snapshot of current installation numbers or see to trends (which seems to be what people want the tool to be used for). Which brings me to the next set of problems. How do you know the data is an accurate reflection of what the community wants or uses? Right now, people are using the terms "popularity", "usage", and "installed" as the same thing and assuming that is what pkgstats shows them. The script only reflects what is installed, not what people are using, not what people are interested in...just installed! What about people who download things to try them out, but do not remove them? What about packages that don't have old dependencies removed when they're updated? There are a bunch of scenario's where long-running systems might have unnecessary or stale packages installed that the user is unaware of (see liblbxutil at 67.74%, csup at 42.53%, and xorg-xsm at 34.74% as reported by pkgstats). Also, pkgstats may be the best thing we currently have, but how do we know the stats it generates are an accurate representation of Arch users? Do we know how many Arch users there are in total? We have just over 20,000 registered forum members (the US forum only), just under 13,000 registered AUR users, and just about 3000 unique IPs that have contributed to pkgstats. The thing is, who knows how many Arch users are out there that haven't registered for any of our sites? Or what about users who have registered for our sites but no longer use Arch? Or users with multiple machines and the issue of whether or not those machines should count differently for pkgstats? Who knows how many of those unique IP addresses are unique machines and not just updates from the same machine where an ISP has handed out a different IP address? If there are users out there that are unaware of pkgstats, or aren't that involved in the community (but still use Arch), then are the current reports skewed one way or another? Meaning, are the packages that developers and people active in the community reflect the packages used by everyone? Also, when pkgstats was introduced, it was said that "In an ideal world one would run the script only once per installation or if really lot of things have changed (not the version of packages).", but if you want these numbers to be up-to-date, then it SHOULD be run on a regular basis and the old data should be discarded or weighted differently. Right now, there have been 3,757 pkgstat submissions, meaning about 700 people have run it multiple times. If we want people to keep up that habit and maintain current info, then perhaps it should come with a script that could be enabled to run it as a cron job...even if it does not get enable by default. Not only that, but pkgstats reports that no package has a 100% installation percentage...even pkgstats itself only has 98.88%, which means people have stuff installed on their systems that have not been reported. In this particular case, it's probably people that have downloaded the script directly rather than installed it via pacman, but it's still an issue to consider with other packages as well. Because pkgstats is not an "official" tool that is distributed with the core and turned on by default (which I don't think it should be), that alone means it has some amount of bias built into it. Not knowing the answers to a lot of these questions means that it's incredibly hard to tell statistically if the numbers reported by pkgstats are accurate or should be relied upon, and yet that's what people are doing. My whole point in bringing all this up is that it seems like people are treating pkgstats as a panacea, when I don't think it should be the sole basis of all of these recent decisions. I absolutely support the idea of pkgstats, but think that its still in its infancy and should not be so blindly trusted. People have been suggesting arbitrary limits and restrictions based on its numbers, but the numbers themselves could be pretty far off base...and I think that's the bigger problem. -- Aaron "ElasticDog" Schaefer