[arch-mirrors] adding a robots.txt
Hey! Looking at logs from my mirror (https://arch.jensgutermuth.de/) reveals at least Google and AHrefs are crawling my mirror, which is obviously a waste of resources for both sides. I'm thinking about blocking them (and all other crawlers) using a robots.txt file like so (nginx config snippet): location= /robots.txt { return 200 "User-agent: *\nDisallow: /\n"; allow all; access_log off; } Doing it this way prevents robots.txt from showing up in directory listings and circumvents all issues with the sync script. I know modifying mirror contents is a very touchy subject and rightfully so. I therefore wanted to ask if there is some kind of policy and if there is, if this would be allowed or a possible exception. Best regards Jens Gutermuth
On Tue 17.07.18 - 01:38, Jens Gutermuth via arch-mirrors wrote:
I know modifying mirror contents is a very touchy subject and rightfully so. I therefore wanted to ask if there is some kind of policy and if there is, if this would be allowed or a possible exception.
Just returning a robots.txt from nginx sounds fine. I don't see a reason why we should ever have that file in our repo and I don't expect any problems if you add it. Thanks for asking though! Florian
participants (2)
-
Florian Pritz
-
Jens Gutermuth