Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

5 Mar 2017

      On Sun, 05 Mar 2017 14:35:05 +0100
Lukas Fleischer <lfleischer@archlinux.org> wrote:
...
Hi,
I was recently contacted by a Polish researcher asking for a list of
AUR account names. I did not expect this to be controversial but a
couple of Trusted Users raised concerns on IRC, so I decided to move
this to the public mailing list and discuss the whole topic in
generality. I would like to head more opinions but please read the
whole email and give it a second thought before simply bringing up
the usual privacy arguments mentioned below.
My original questions was: Are we fine with sharing the list of AUR
accounts names (only user names, no real names or email addresses)
with a researcher that seems trustworthy and agrees to not share the
data in any form other than the resulting anonymized statistics?
In this particular case, we are talking about Dorota Celinska [1] from
the University of Warsaw, Faculty of Economic Sciences [2], see [3]
for a list of her publications and [4] for a summary of her research
project funded recently by the Polish National Science Centre. She
needs the list of user names to perform a segmentation analysis,
including users which were active on the older AUR releases both do
not show any activity on AUR 4. She would also like to use the user
names as identifiers to establish connections with other platforms,
such as GitHub.
The next question is: Would it make sense to even make this data
publicly available? Would it make sense to extend our RPC interface
such that one can search for users names? GitHub, for example, already
provides such an interface [5]. Let me quickly summarize some
arguments for this idea which came up on IRC:
* User names are mostly identifiers. It is questionable whether they
  can/should be considered personal/private information. Maybe this
can only be answered by a lawyer, though.
* The user names of all accounts with any kind of public activity,
like uploading a package, filing a request, writing a comment, are
public already.
* After logging into the aurweb interface, you can already check
whether an account with a given user name exists because the account
details page URIs have the form
https://aur.archlinux.org/account/$username. This means that for any
platform providing a list of user names (such as GitHub), you can
"establish connections" with the AUR already.
Now the arguments against:
* Principle of data economy: We should not share any kind of
information we do not need to share.
* Sharing user names lowers the threshold for sharing other
information which is considered more confidential.
* Users can (and should) already use crawlers to fetch the user names.
  For example, the user names of all package maintainers and comment
  authors appear on the package details pages. The names of all users
  filing package requests appear in the mailing list archives etc.
* We do not have ToS so we better not share anything.
I, personally, find the second last argument a very weak one. Telling
users to build crawlers scraping an brute-forcing our HTML pages makes
life difficult for both them and us. What do you think?
On the other side of the coin, the last argument is a very good one
and it brings me to my last point. Independently of the outcome of
this discussion, I think we should add some ToS that users need to
agree upon when registering. It should contain information on
liability and on privacy. Is anybody willing to write a draft? Do we
need the support of a lawyer here?
Thank you for your time and have a nice Sunday!
Regards,
Lukas
Hello,

As stated in IRC I'm against handing out user data (including nick
names) to a 3rd party. Personally due to mentioned privacy stuff, but
also the legal problems we may run into as we don't have a ToS. So
under these circumstances I have a bad feeling being making these
information available to someone else even if the person leaves a proper
impression.

Regarding the crawler I put in as a work around for the researcher
party to collect the already available public names I don't understand
why you extend this to brute forcing the account pages or going
through archives of the mailing list. The suggestion I made was that
it's simple to collect a list of all packages stored in AUR and then
get the common fields of original submitter, maintainers and people who
made comments for each package. Either by using a plain GET to request
the HTML page for the package or using the interfaces available (I'm not
familiar with those and what they provide). This does not involve any
brute force attacks as the package names are available. Also for the
scripts doing this no login necessary.

The names gathered this way are already public and can be found with
every large search engine. Sure this will create some load, but I
assume any reasonable person would put a short sleep in between the
requests.

I agree that we should get a ToS for the AUR.

Best Regards,
Thorsten

Re: [arch-dev-public] AUR ToS (aka making AUR user names public)

Thorsten Töpper