On 06-03-2017 12:45, Henrik Danielsson via arch-general wrote:
2017-03-06 12:53 GMT+01:00 Mauro Santos via arch-general < arch-general@archlinux.org>:
On 06-03-2017 11:20, Henrik Danielsson via arch-general wrote:
2017-03-06 11:18 GMT+01:00 Ralf Mardorf <silver.bullet@zoho.com>:
Privacy is a principle. You seem not to understand the difference between giving somebody data with the formal permission to use this data and data that simply is available for everybody, but not explicitly handed over to somebody. Paranoia isn't involved in my concern.
My standpoint is that privacy does not apply to this kind of public information, simply because it's not private and by no means sensitive (people freely chose the username and other visible info they posted, no?). Thus, no, I see no difference and really no point in even considering trying to keep such information private.
What anyone does with the freely available information posted in the AUR is up to them ("mining" it or handing it over to someone else included), we could not do anything about it anyway, nor would I even care if I was in that list or not, since there seems to be no ToS between the one submitting that information and the one publishing it. Since it was freely submitted without any terms, I can simply not find any restrictions on its usage.
Yes, we should have a ToS to at least keep the principle of privacy alive. But let's face it, real privacy online has been dead for long, if it ever existed.
If there was a ToS, the situation would perhaps have been different, at least legally. I'm no legal expert of course, but to me it makes perfect sense that if you posted something on the internet, in a very public space, you can have no expectations of keeping any of that information private in any way, nor any information easily associated with. No, I don't see that as a problem, at least not if you never explicitly agreed that information would not be shared. What I really want to keep private I don't post anywhere.
I think the point here is not so much privacy, as I believe everyone recognizes that the information that was asked for (the full list of usernames) is public and can be scraped.
The point here is handing over the full list of usernames on request. Do note that in their research proposal[1] they specifically mention scraping information from github. That information is public, github does have an API to query that information, but they still have to scrape it, I suppose that implies github does not hand it over wholesale on request, why should we? This might be due to their ToS or they know something we don't.
It would be rather interesting to see what they could come up with from that correlation.
Probably nothing meaningful. As I've said before you have no way of knowing if user foo on github is the same as user foo on the AUR.
I think, perhaps a bit cynically, the reason github may not hand over that data directly is likely that they don't want to do some of the work of the researchers for them. As you said, the data is there, the format matters less if they're going to massage it into something else later anyway, so why bother with the effort of compiling it on their [github] own time?
We could simply deny the AUR username request it for the same reason, or no reason at all. Since some people seem uncomfortable about what could be derived from a potential correlation of publicly available data, that's most likely the safest way to go.
-- Mauro Santos