[aur-general] Proposed rules for packages entering [community]

Ondřej Kučera ondrej.kucera at centrum.cz
Thu Dec 4 19:37:17 EST 2008


> We have mirrors. Almost 100 of them. Feel free to contact them all,
> have them write code to count downloads which then sends the stats to
> us, and then we can implement this.
> What you suggest is absolutely not feasible at all.

That's too bad, I wanted to suggest counting of downloads too (because I 
believe that the number downloads of particular version of a package 
would after a while correlate quite well with the number of users that 
actually use, i. e. upgrade this package - it should more or less solve 
the problem of people trying the package and removing it quickly after 
that that was mentioned).

Anyway I've been meaning to contribute with some ideas for the topic for 
at least four days (since I read the first IRC log on Sunday), 
unfortunately my job hasn't allowed it this week. I just wanted to do 
some thinking out loud about both methods (voting/pkgstats) for both 
packages already in community and those that might get there in the 
future from a regular user's point of view (also with regards to 
privacy/paranoia matters).

(1) pkgstats
The obvious problem with accuracy is that not everybody will use it (or 
use it even from time to time to update their "contribution" to the 
statistics). Some people don't know about it, some people won't be 
bothered, some might be concerned about privacy. Even though IP address 
is not necessarily an identifier of a person, it still a "good enough 
information". I actually more or less trust Arch devs that really only a 
hash of the IP is stored together with the package list but I hardly can 
be sure and there are much more paranoid users out there than myself. 
(Their problem doesn't have to be only with privacy itself - when 
someone knows the packages you use and even the exact versions, it makes 
it so much easier to target some kind of attack on the system.)

On the other hand it can be nicely used to promote a package that is in 
unsupported. "Do you use this package? Do you want to see it in 
community? Have you run pkgstats on you system then?" It would be nice 
to see the statistics in AUR frontend, one could see how far the package 
is from the magic number that makes the package a good candidate for 
community (whatever the number will be).

As for pruning of community as it is now (if it still is an issue, I'm 
not quite sure anymore). How about this. Pick a reasonable percentage 
(it doesn't have to be the same number as the one for new packages 
entering community, it can be lower) by whatever criteria (number of 
packages to prune, number of MB to save, ...), create a list of all the 
packages with usage below this number and create lists of these packages 
grouped by their maintainers. Then send the individual maintainer-lists 
to the maintainers with a note that they should consider whether or not 
these particular packages are really a good material for community. At 
the same time put the list of all those packages on the web, announce 
its existence in the latest news and tell people that if they see a 
package/packages they use and haven't yet run pkgstats, they should 
probably do it now, otherwise the package might be removed from 
community. Then wait for some time and look at the change in statistics 
(maybe there will be some, maybe there won't).

(2) votes
Again, not everybody uses it. Especially since voting means that you 
have to have an AUR account. Today everybody has tons of accounts at 
different internet services, ideally one should have as many passwords 
as possible, and people don't like to create yet another account (I know 
I don't). Frankly, if I hadn't needed those about 15 packages I now 
maintain in unsupported (because I hadn't found them there), I wouldn't 
have created an AUR account either.

There's another problem with accuracy. Even users who have an account 
and vote don't vote for every single package they use. Especially many 
people (myself included) probably never voted for packages already in 
community. This makes the system usable for dealing with the transition 
unsupported -> community but not for the other way round. That, too, 
could be helped by similar approach as above - count packages with the 
least votes, create their list (lists) and urge people to vote for 
packages on this list if they use them a want to see them still in 
community in the future.

The problem is that this way the privacy concerns will be even bigger. 
Right now if someone looked up which packages I voted for, it wouldn't 
give them much of an idea which packages I actually use (because I only 
voted for packages in unsupported and only for those that I had a reason 
to believe that my vote might help push them to community). After 
applying the above suggestion, anyone who gained access to AUR data 
knows more or less about all community packages that a certain nickname 
uses (which is much worse that knowing that this list of packages is 
used by someone with this hash of IP address - which is the information 
pkgstats provides). Moreover, each nickname is associated with an e-mail 
which is then more or less associated with a particular person. Of 
course, the e-mail can be fake (or completely or almost unused), on the 
other hand if you also want to maintain some packages in unsupported, 
you want to have a valid e-mail, so, if you're paranoid, you'd probably 
have to have two AUR accounts - one connected to you for maintaining 
packages and the other one as "anonymous" as possible just for voting.

Unfortunately, I don't have a solution. Both systems can be made more 
accurate (and useful for pointing fingers at packages that really aren't 
all that much used) but at the price of some amount of privacy or even 
security. I still think that the best solution would be counting 
downloads, because it would be quite accurate and also quite anonymous 
(definitely more than pkgstats or voting) but sadly it's not an option.

I hope I haven't wasted too much time of those who have read it all. If 
so, then I apologize :-), but I felt that when I spent some the time 
thinking about these matters on my way to work and back this week, I 
should share the thoughts.


Ondřej Kučera

More information about the aur-general mailing list