[arch-mirrors] adding a robots.txt

16 Jul 2018

      Hey!

Looking at logs from my mirror (https://arch.jensgutermuth.de/) reveals
at least Google and AHrefs are crawling my mirror, which is obviously a
waste of resources for both sides. I'm thinking about blocking them (and
all other crawlers) using a robots.txt file like so (nginx config snippet):

location= /robots.txt {
return 200 "User-agent: *\nDisallow: /\n";
allow all;
access_log off;
}

Doing it this way prevents robots.txt from showing up in directory
listings and circumvents all issues with the sync script.

I know modifying mirror contents is a very touchy subject and rightfully
so. I therefore wanted to ask if there is some kind of policy and if
there is, if this would be allowed or a possible exception.

Best regards
Jens Gutermuth

Jens Gutermuth

Florian Pritz

tags

participants (2)