Hey!

Looking at logs from my mirror (https://arch.jensgutermuth.de/) reveals at least Google and AHrefs are crawling my mirror, which is obviously a waste of resources for both sides. I'm thinking about blocking them (and all other crawlers) using a robots.txt file like so (nginx config snippet):

    location = /robots.txt {
        return 200 "User-agent: *\nDisallow: /\n";
        allow all;
        access_log off;
    }

Doing it this way prevents robots.txt from showing up in directory listings and circumvents all issues with the sync script.

I know modifying mirror contents is a very touchy subject and rightfully so. I therefore wanted to ask if there is some kind of policy and if there is, if this would be allowed or a possible exception.

Best regards
Jens Gutermuth