[arch-mirrors] [sysadmin] CDN based/caching mirror?

Kristian Klausen kristian at klausen.dk
Mon Feb 3 02:34:48 UTC 2020


On 30.01.2020 17.04, Konstantin Ryabitsev wrote:
> On Sun, 26 Jan 2020 at 11:19, Kristian Klausen via arch-mirrors
> <arch-mirrors at archlinux.org> wrote:
>> I'm considering setting up a Arch Linux mirror and I'm considering a
>> different design.
>>
>> So instead of mirroring the whole thing, the idea is to mirror only the
>> database files (core.db etc) and download the packages on demand from a
>> Tier 1 mirror (and let nginx cache them). By doing it that way, I only
>> download requested packages from the Tier 1 mirrors, instead of
>> downloading the whole thing (saving Tier 1 bandwidth).
>>
>> To provide even better performance a CDN (ex: Cloudflare) could be used
>> to provide more caching. So we end up with a setup like this:
>> Cloudflare -> Nginx cache -> Tier1 mirrors (nginx with multiple upstream)
>>
>> Do I miss something? Is this a bad idea?
> If you are trying to save Tier1 some bandwidth, you'll probably
> actually end up causing them more problems due to increased random
> seek waits. Tier1 mirrors may not necessarily have fast storage -- for
> example, all kernel.org mirror nodes have terabytes of spinning rust
> and about half-a-TB of ssd used via lvm-cache. It works great for
> Tier1 setups because most Tier2 mirrors want the same set of recent
> updates that are served out of ssd cache. If a new mirror comes along
> and wants to slurp and entire distro, that is fine too, because even
> if there's higher iowait latency, the Tier2 mirror isn't working
> against any HTTP timeouts or impatient clients and doesn't care if the
> data arrives at a slower rate due to higher iowait. Tier1 can also
> tell Tier2 mirror "I'm overloaded right now, please try again later"
> and it'll be fine as most Tier2 mirrors can wait an hour or two before
> receiving updates.
>
> Making Tier1 mirrors a "cold cache" for your setup will likely cause
> more disk thrash for them, but will also result in poorer service for
> people using your mirror due to the reasons I listed above.

Tier 1 mirrors is also used directly by end-users (correct me if I'm wrong)
So worst-case (cache miss) my SSD-backed shared cache won't be 
noticeable slower than pulling directly from the Tier 1 mirror. 
Best-case (cache hit) I'm saving the Tier 1 mirror some bandwidth and 
disk usage.
My idea is basically tiered caching (CDN -> Nginx SSD-backed shared 
cache -> Tier 1 mirror(s)), is that worse than status quo? :)

> If someone
> tries to install a package and watches their download bar sit at 0 for
> half a minute due to backend proxies fetching data from Tier1 origin,
> that's going to result in frustrated people.

Nginx streams the data as it is received from the upstream server, so 
worst-case (cache miss) the data can be delivered as fast as received 
from the upstream server.

> TL;DR: If you can afford CDN-fronting your mirror, that should be
> mostly fine, but I would recommend against using Tier1 as your
> cache-miss backend. Storage is cheap and most Tier1 mirrors have
> unlimited bandwidth, so just run a Tier2 mirror (with slow/fast
> storage caching) and keep local copies of everything.
>
> -K
> (mirrors.kernel.org administrator)


More information about the arch-mirrors mailing list