<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hey!<br>

      <br>

      Looking at logs from my mirror (<a class="moz-txt-link-freetext" href="https://arch.jensgutermuth.de/">https://arch.jensgutermuth.de/</a>)

      reveals at least Google and AHrefs are crawling my mirror, which

      is obviously a waste of resources for both sides. I'm thinking

      about blocking them (and all other crawlers) using a robots.txt

      file like so (nginx config snippet):</p>

    <div style="color: #000000;background-color: #ffffff;font-family: DejaVu Sans Mono;font-weight: normal;font-size: 14px;line-height: 19px;white-space: pre;"><div><span style="color: #000000;">    </span><span style="color: #0000ff;">location</span><span style="color: #000000;"> = /robots.txt {</span></div><div><span style="color: #000000;">        return 200 </span><span style="color: #a31515;">"User-agent: *\nDisallow: /\n"</span><span style="color: #008000;">;</span></div><div><span style="color: #000000;">        allow all</span><span style="color: #008000;">;</span></div><div><span style="color: #000000;">        access_log off</span><span style="color: #008000;">;</span></div><div><span style="color: #000000;">    }</span>

</div></div>

    <p>Doing it this way prevents robots.txt from showing up in

      directory listings and circumvents all issues with the sync

      script.<br>

      <br>

      I know modifying mirror contents is a very touchy subject and

      rightfully so. I therefore wanted to ask if there is some kind of

      policy and if there is, if this would be allowed or a possible

      exception.</p>

    <p>Best regards<br>

      Jens Gutermuth<br>

    </p>

  </body>

</html>