[arch-mirrors] adding a robots.txt

Jens Gutermuth arch at jensgutermuth.de
Mon Jul 16 23:38:06 UTC 2018


Hey!

Looking at logs from my mirror (https://arch.jensgutermuth.de/) reveals
at least Google and AHrefs are crawling my mirror, which is obviously a
waste of resources for both sides. I'm thinking about blocking them (and
all other crawlers) using a robots.txt file like so (nginx config snippet):

location= /robots.txt {
return 200 "User-agent: *\nDisallow: /\n";
allow all;
access_log off;
}

Doing it this way prevents robots.txt from showing up in directory
listings and circumvents all issues with the sync script.

I know modifying mirror contents is a very touchy subject and rightfully
so. I therefore wanted to ask if there is some kind of policy and if
there is, if this would be allowed or a possible exception.

Best regards
Jens Gutermuth

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.archlinux.org/pipermail/arch-mirrors/attachments/20180717/43f339fb/attachment.html>


More information about the arch-mirrors mailing list