[arch-mirrors] CDN based/caching mirror?

Kristian Klausen kristian at klausen.dk
Mon Jan 27 08:50:29 UTC 2020


On 26.01.2020 21.52, David Precious wrote:
> On Sun, 26 Jan 2020 17:19:10 +0100
> Kristian Klausen via arch-mirrors <arch-mirrors at archlinux.org> wrote:
>> So instead of mirroring the whole thing, the idea is to mirror only
>> the database files (core.db etc) and download the packages on demand
>> from a Tier 1 mirror (and let nginx cache them). By doing it that
>> way, I only download requested packages from the Tier 1 mirrors,
>> instead of downloading the whole thing (saving Tier 1 bandwidth).
> I'm not quite sure what problem you're trying to solve - tier 1 servers
> have plenty of bandwidth, otherwise they shouldn't be running such a
> mirror, and I'd wager that downstream mirrors syncing occasionally
> pales in comparison to end user traffic, so I don't think you need to
> really worry about the upstream.
>
> If your concern is *your* bandwidth or disk space, then you probably
> shouldn't be setting up a public mirror at all - assuming, of course,
> that it is a public mirror you're talking about here, and not just a
> an internal network cache to point your boxes at so that you only
> download each package once, not once for every machine.
>
>> To provide even better performance a CDN (ex: Cloudflare) could be
>> used to provide more caching.
> Others have already addressed that this may break Cloudflare's terms,
> as they're designed to optimise websites by hosting HTML/JS.

Valid point.

>> Do I miss something? Is this a bad idea?
> Immediate thought is that the first request for each package could seem
> unacceptably slow, as your mirror would have to fetch it first before
> it could serve it to the client, and for larger packages, that could
> begin to make it feel slow (especially if also doing that for ISOs,
> etc).

Valid point, that could in theory be fixed by downloading from multiple 
servers in parallel. It would require a more complex setup, but in 
theory it could be done.

>    It also means that if your upstream is temporarily down, you
> have an incomplete mirror which appears reachable but fails to serve
> some files, which is probably not ideal.

The idea was to fallback to another mirror on errors/404.

> To me, it feels rather like you're trying to solve a problem which
> doesn't really exist.

Roger that, it was just a "crazy" idea to run a mirror without mirroring 
everything (requiring less storage) and a CDN like (deb.debian.org), but 
as the Arch project seems to have more than enough mirrors, the idea 
doesn't make sense.
Thanks for your time everyone!

>
> Cheers
>
> Dave P


More information about the arch-mirrors mailing list