[aur-general] Fighting spam on the AUR

Wed Mar 13 20:44:26 EDT 2013

Lukas Fleischer wrote:

>Status quo:
>
>    06:54 < gtmanfred> ok, it really is time for something else
>    06:54 < gtmanfred> the spammer is now creating a new account for
>    every comment and flag out of date
>
>The account suspension feature does not help here.
>
>Options:
>
>* Allow package maintainers to block the "Flag package out-of-date"
>  feature for a certain amount of time. Note that this might eventually
>  cripple the "out-of-date" function. Also, this does not work for
>  comments.
>
>* Use CAPTCHAs during account registration. We could either use MAPTCHAs
>  ("What is 1 + 1?") or something like reCAPTCHA [1].
>
>* Moderate new accounts. Might be a lot of work. We need some TUs that
>  review and unlock accounts. Also, it might be hard to distinguish a
>  spam bot from a regular user. If we require a short application text,
>  this might result in less users joining the AUR.
>
>* Block IP addresses. Bye-bye, Tor users!
>
>Comments and suggestions welcome! We need to find a proper solution as
>soon as possible!
>
>[1] http://www.google.com/recaptcha

How hard would it be to create an action queue for comments and flagging?

The idea would be to add a new field to the user accounts table (e.g. a boolean
named "supervise"). The default value would be true for new accounts. The value
could be changed by TUs and/or automatically changed after a fixed interval
and/or certain actions (depending on how far you want to go with the logic).

When flagged, comments and actions would be submitted to a queue that would be
accessible to TUs via a webpage with accept/reject buttons for each action.

This avoids the annoyance and data collection of captchas and it also avoids
the risk of blacklisting legitimate users who share IP ranges (or some proxy)
with trolls. Bonus: AUR automation tools will not be broken.

It will introduce a variable delay before actions are executed but in most
cases this will probably not be more than a couple of hours given the current
number of TUs.

Rejections should require a reason and the actions should be logged for a few
days just to make sure no one abuses the reject button. The reason should also
be sent back to the user in case of rejection so that it can be brought up
here. (If that's done then logging might be unnecessary).

The accept/reject page would need the following per action:
* package ID -> page link
* action
* content of comment if applicable
* user (+email? +IP?)

Of course I have no idea of how difficult this would be technically.