[arch-dev-public] [RFC] Mirror load distribution
Hi, I've just received a report from a mirror admin about some very heavy traffic. After some investigation it appears that the traffic towards his mirror started to rise around the beginning of the new year when we disabled the mirror checker on gerolde. Since we now only have a mirror checker running in Germany and his server is actually in the same data centre as ours, the mirror checks completed very quickly. Archweb uses this data to calculate a "mirror score" which can be seen here[1]. This score can also be used to sort the mirror list that can be generate by archweb list this[2]. [1] https://www.archlinux.org/mirrors/status/ [2] https://www.archlinux.org/mirrorlist/?use_mirror_status=on&protocol=https Apparently there is a script in AUR[3] which uses [2] to fetch a mirrorlist. That script runs once a day. [3] https://aur.archlinux.org/packages/update-pacman-mirrorlist/ I'm thinking about removing the mirror score from archweb's output and more importantly, not sorting mirrors based on this score but rather randomizing the list returned in [2]. It could still take the score into account by limiting the returned set to mirror that are not totally out of date, but I'd remove the sorting. The score doesn't really have a lot a meaning anyways since it's just from our point of view. Does anyone have hard feeling about this? If not I'll prepare a patch in the next few days. Florian
On 01/30/17 at 08:39pm, Florian Pritz via arch-dev-public wrote:
Hi,
I've just received a report from a mirror admin about some very heavy traffic. After some investigation it appears that the traffic towards his mirror started to rise around the beginning of the new year when we disabled the mirror checker on gerolde. Since we now only have a mirror checker running in Germany and his server is actually in the same data centre as ours, the mirror checks completed very quickly.
I can't think of an elegant solution for this issue.
I'm thinking about removing the mirror score from archweb's output and more importantly, not sorting mirrors based on this score but rather randomizing the list returned in [2]. It could still take the score into account by limiting the returned set to mirror that are not totally out of date, but I'd remove the sorting. The score doesn't really have a lot a meaning anyways since it's just from our point of view.
Does anyone have hard feeling about this? If not I'll prepare a patch in the next few days.
Idea sounds good to me, don't forget that you can also 'generate a mirrorlist' here, so you might want to remove the 'use mirror status' option there too. (or was that not part of the plan?) [1] [1] https://www.archlinux.org/mirrorlist/ -- Jelle van der Waa
On 30.01.2017 21:57, Jelle van der Waa wrote:
Idea sounds good to me, don't forget that you can also 'generate a mirrorlist' here, so you might want to remove the 'use mirror status' option there too. (or was that not part of the plan?) [1]
My plan is to keep this because I just remove the sorting by score. The score will still be used to filter mirrors which have a very high one (currently >100). Such mirrors are either very, very slow or out of date. Using the score (mirror status) just to filter the list should return a list of useable mirrors. Without this option, archweb would return a list that contains all mirrors, even if they are out of date. Florian
Em janeiro 30, 2017 19:13 Florian Pritz via arch-dev-public escreveu:
My plan is to keep this because I just remove the sorting by score. The score will still be used to filter mirrors which have a very high one (currently >100). Such mirrors are either very, very slow or out of date. Using the score (mirror status) just to filter the list should return a list of useable mirrors.
Without this option, archweb would return a list that contains all mirrors, even if they are out of date.
Florian, Django has support for GeoIP. We could use this information and return different mirror lists based on the ip of the client accessing the mirror list page. Most likely (but not always) the servers on the person country are faster than the others. I was thinking the same regarding the score, keep it for our own metrics, but not use it for sorting. If we can leverage GeoIP with it, it would be great. Otherwise, a random list is ok by me, even though it might not be great for the users. Cheers, Giancarlo Razzolini
No hard feelings from this side of the room. Have we ever considered using software like mirrorbrain so users could just hit ${country_code}.mirrors.archlinux.org to get the packages? Bartłomiej
On 01.02.2017 20:08, Bartłomiej Piotrowski wrote:
No hard feelings from this side of the room. Have we ever considered using software like mirrorbrain so users could just hit ${country_code}.mirrors.archlinux.org to get the packages?
I don't feel like adding a single point of failure is a good idea. Also we have quite some packages so there would be a lot of requests/s. Florian
participants (4)
-
Bartłomiej Piotrowski
-
Florian Pritz
-
Giancarlo Razzolini
-
Jelle van der Waa