[aur-dev] fwd: AUR PKGBUILD files and web searches
I'm forwarding this to aur-dev, as it's come up before and I want to gather current opinions on it. A couple thoughts that come to my mind are that spam harvesters won't honor robots.txt anyway, and in the past people sort of agreed that by uploading a pkgbuild, you probably should have noticed the fact that it's google searchable before you did it, and taken countermeasures yourself. Opinions? -S ----- Forwarded message from Tom Wizetek <mailadmin@wizetek.com> ----- To: simo@archlinux.org Subject: AUR PKGBUILD files and web searches Date: Sat, 4 Oct 2008 02:56:53 -0400 From: Tom Wizetek <mailadmin@wizetek.com> Hello. I must admit that I'm a bit concerned about how certain information contained in PKGBUILD files hosted on archlinux.org is returned by web searches. Specifically, I mean email addresses. For example, a simple search for my name: http://www.google.ca/search?q=tom+wizetek produces: Contributor: Tom Wizetek <tom@wizetek.com> pkgname=tleds linking to http://aur.archlinux.org/packages/tleds/tleds/PKGBUILD This could be easily used for email address harvesting. Any chance you can add to the site's robots.txt (seems like the file exists but is empty) the appropriate entries to prevent web bots from listing the contents of PKGBUILDs? Thanks for hearing me out. -- Tom Wizetek (MajorTom) ----- End forwarded message -----
On Mon, Oct 6, 2008 at 8:27 AM, Simo Leone <simo@archlinux.org> wrote:
I'm forwarding this to aur-dev, as it's come up before and I want to gather current opinions on it. A couple thoughts that come to my mind are that spam harvesters won't honor robots.txt anyway, and in the past people sort of agreed that by uploading a pkgbuild, you probably should have noticed the fact that it's google searchable before you did it, and taken countermeasures yourself.
Opinions? -S
There's simply no getting around it. Plain text files on a public server *will* get indexed by someone, somewhere. I also think it's important that the PKGBUILD contain correct and easily identifiable contact information. I suppose we could look at doing a mask of some sort, but I am highly sceptical of their effectiveness too. I think the best option is to simply use spam filters or a free email account which provide decent spam filters, such as Gmail. This is just my two cents of course, but there you have it.
I'm still of the opinion that if you don't want spammers getting your email address you shouldn't have put it on the AUR in the first place, there's not a lot we can actually do and contributor/maintainer tags are pretty lax in this respect. We're not telling people they need to be perfectly formatted. Just like you said I seriously doubt spammers honour robots.txt and the only thing we'll be losing is the ability to search PKGBUILDs ourselves with something like google. -- Callan Barrett
On Tue, Oct 07, 2008 at 12:07:39AM +0800, Callan Barrett wrote:
I'm still of the opinion that if you don't want spammers getting your email address you shouldn't have put it on the AUR in the first place, there's not a lot we can actually do and contributor/maintainer tags are pretty lax in this respect. We're not telling people they need to be perfectly formatted.
Just like you said I seriously doubt spammers honour robots.txt and the only thing we'll be losing is the ability to search PKGBUILDs ourselves with something like google.
Aye. If one doesn't want spam they should not publish their email or at least not in an easily readable form. This is something that everyone needs to learn eventually. I wouldn't spend time worrying about those who don't take certain measures. I do sympathise with victims of spam though, because I also have been a victim.
On Mon, Oct 6, 2008 at 11:07 AM, Callan Barrett <wizzomafizzo@gmail.com> wrote:
I'm still of the opinion that if you don't want spammers getting your email address you shouldn't have put it on the AUR in the first place, there's not a lot we can actually do and contributor/maintainer tags are pretty lax in this respect. We're not telling people they need to be perfectly formatted.
Agreed. If you have a decent spam filter, then all should be fine (let's face it, who DOESN'T get 3-4 spam mails slipping through the filters every so often). If not, I don't see an issue with aaronmgriffin+spam@gmail.com in the tag, or something of the sort.
participants (5)
-
Aaron Griffin
-
Callan Barrett
-
Loui
-
Simo Leone
-
Thayer Williams