<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hey!<br>
<br>
Looking at logs from my mirror (<a class="moz-txt-link-freetext" href="https://arch.jensgutermuth.de/">https://arch.jensgutermuth.de/</a>)
reveals at least Google and AHrefs are crawling my mirror, which
is obviously a waste of resources for both sides. I'm thinking
about blocking them (and all other crawlers) using a robots.txt
file like so (nginx config snippet):</p>
<div style="color: #000000;background-color: #ffffff;font-family: DejaVu Sans Mono;font-weight: normal;font-size: 14px;line-height: 19px;white-space: pre;"><div><span style="color: #000000;"> </span><span style="color: #0000ff;">location</span><span style="color: #000000;"> = /robots.txt {</span></div><div><span style="color: #000000;"> return 200 </span><span style="color: #a31515;">"User-agent: *\nDisallow: /\n"</span><span style="color: #008000;">;</span></div><div><span style="color: #000000;"> allow all</span><span style="color: #008000;">;</span></div><div><span style="color: #000000;"> access_log off</span><span style="color: #008000;">;</span></div><div><span style="color: #000000;"> }</span>
</div></div>
<p>Doing it this way prevents robots.txt from showing up in
directory listings and circumvents all issues with the sync
script.<br>
<br>
I know modifying mirror contents is a very touchy subject and
rightfully so. I therefore wanted to ask if there is some kind of
policy and if there is, if this would be allowed or a possible
exception.</p>
<p>Best regards<br>
Jens Gutermuth<br>
</p>
</body>
</html>