On Sun, 05 Mar 2017 at 18:40:36, Thorsten Töpper wrote:
As stated in IRC I'm against handing out user data (including nick names) to a 3rd party. Personally due to mentioned privacy stuff, but also the legal problems we may run into as we don't have a ToS. So under these circumstances I have a bad feeling being making these information available to someone else even if the person leaves a proper impression.
While I agree that not having ToS might be a problem, I don't see why publicly advertising the user name list is an issue, compared to the situation we are currently in. We already make the user name public in so many contexts (pretty much every action that requires an account). Given that, it should be pretty clear that you implicitly agree to share your user name by registering (IANAL, so this might be wrong from a legal standpoint, though).
Regarding the crawler I put in as a work around for the researcher party to collect the already available public names I don't understand why you extend this to brute forcing the account pages or going through archives of the mailing list. The suggestion I made was that it's simple to collect a list of all packages stored in AUR and then get the common fields of original submitter, maintainers and people who made comments for each package. Either by using a plain GET to request the HTML page for the package or using the interfaces available (I'm not familiar with those and what they provide). This does not involve any brute force attacks as the package names are available. Also for the scripts doing this no login necessary.
I only mentioned various possibilities to already obtain a list of user names. Theoretically, the complete list is already available online but brute-forcing all account details sites is not feasible in practice. Parsing the package details pages is a first naive approach that works in practice. Parsing even more sites is the next logical step. Scanning account details pages for a list of known user names gives you even more information and is still practically feasible.
The names gathered this way are already public and can be found with every large search engine. Sure this will create some load, but I assume any reasonable person would put a short sleep in between the requests.
All true, but if we are fine with sharing this information I still do not see why we should not provide a sane interface.
I agree that we should get a ToS for the AUR.
Volunteers? :) Regards, Lukas