[arch-general] Syncing the mirrors
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We have a bit update today, and we see: The syncing process is not really good. So I suggest to change the procedure mirrorsyncs are done: We should have primary and secondary mirrors. When al.org is updated, the sync process of the primary mirrors should be started via ssh(or something similar). Then the primary mirrors start the sync process on the secondary mirrors via ssh. So the al.org server isn't overloaded and the mirrors are more up to date because of the push way. - -- Gruß, Benedikt -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAktlkM4ACgkQEbgIp8utRvBwAwCgiv9qBi/hEP5Xr447gHrauExn f5kAnAs7uzHXPIhz+Q+JnQAkwfJms2/T =6iqt -----END PGP SIGNATURE-----
We have a bit update today, and we see: The syncing process is not really good. So I suggest to change the procedure mirrorsyncs are done: We should have primary and secondary mirrors. When al.org is updated, the sync process of the primary mirrors should be started via ssh(or something similar). Then the primary mirrors start the sync process on the secondary mirrors via ssh. So the al.org server isn't overloaded and the mirrors are more up to date because of the push way.
Thanks for signing that message, I wasn't sure it was from you. The problem here is we haven't had anyone step up and *finish* a two tier mirror system. The situation has improved a bit, but without a developer actively working on it we aren't going to have it fully implemented. As far as pushing goes, that is a bad idea for a number of reasons, the primary being one compromised root server gains you ssh access to X more servers. -Dan
2010/1/31 Dan McGee <dpmcgee@gmail.com>:
As far as pushing goes, that is a bad idea for a number of reasons, the primary being one compromised root server gains you ssh access to X more servers.
-Dan
I didn't say that it must be root. One user with the only permission to use rsync would be the right for this task. -- Gruß, Benedikt
On 01/31/2010 04:30 PM, Benedikt Müller wrote:
2010/1/31 Dan McGee<dpmcgee@gmail.com>:
As far as pushing goes, that is a bad idea for a number of reasons, the primary being one compromised root server gains you ssh access to X more servers.
-Dan
I didn't say that it must be root. One user with the only permission to use rsync would be the right for this task.
heh. compromised might have a lot more sense. the user can still modify packages and db without any restrictions. -- Ionut
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/31/2010 03:27 PM, Dan McGee wrote:
As far as pushing goes, that is a bad idea for a number of reasons, the primary being one compromised root server gains you ssh access to X more servers. Can be solved easily by using forced commands: http://oreilly.com/catalog/sshtdg/chapter/ch08.html#22858
- -- Florian Pritz -- {flo,bluewind}@server-speed.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJLZZSjAAoJEG0WVcFM4cE+5cAP/iKrzKsc30sqGOJDqZdJdrCY xXH0FWP4BqakEdROYiQ9nlAynhQwdovVqtQ70dKnMdV3bxamgWugAraQuFY3bwGU xPDNdGoRxviOL7csSUkTA1oEzDdgj0fxCOxLbIAdRMDXSwEzc+LM2n1/uYiC9tWx KPUfAn5xqNL9egzsl8FDIFxswA3JhZe8i1qVL8pbZphgoV0bhzjVLuB24AGm/10n tBTI0QBoIFFjwfPF8o4X57sK34qVm2laAXmrkSksDF6D7u5XJxD6JJsZsotJBQ/d RehniBEC0v3QrRgQnkrVwKv4aK/34rbqnJ/eBJW6/uPc9aaL+6zYb38LDVm3Dj8U Gfowh/sUoIyua/hkbE8d0Hc2R9ArY0RYXKxQSOYUu9qPLiC4h1hTuHlNaXXakCmb v8jZumno3I/mSm2br4PBNQGWd42DK/B+HcduycvIImSeY1EVigbskNrTfGlmaSJt qapDzFEOTxHhbchJv58szAook1mWxZe+zimmh7Kpmp0qw5ebg6dS6T2nzJ+kPRo7 OyoH7lK2xZnc7dfLVHjLFSffhcl1wxjHjdhkz4Lz8ZrbC0rP7vb6a95M1XPsYsrq dG/fJ0CTtFHlgbGCoP6xziMarVmWdt0HiPFoPfgtol5ZaN6AnqsEjaJhi32OoAG2 Sh62vOa0R5Th+i/Cb4VQ =SP5k -----END PGP SIGNATURE-----
Am Sonntag, 31. Januar 2010 15:27:03 schrieb Dan McGee:
Thanks for signing that message, I wasn't sure it was from you.
OT: Can't we strip gpg-signatures from the mailinglist? It's of no use. Use s/mime instead ;-)
The problem here is we haven't had anyone step up and finish a two tier mirror system. The situation has improved a bit, but without a developer actively working on it we aren't going to have it fully implemented.
There are several methods to improve the situation: * multi tier mirroring. Roman started to work on this but might need some help here. It's mostly an organizing task * Add support for using both gz and xz compressed packages to db-scripts. This way we could migrate to the way better xz compression and reduce traffic: http://bugs.archlinux.org/task/17280 * Implement a common package pool and link to those packages from every repo. This would have reduce the amount of transfered data from several GB to a few KB in the current case. (also an dbscritps issue) dbscripts can be found at http://projects.archlinux.org/dbscripts.git/ Everybody could help implementing this and submit patches for us to review. Pierre -- Pierre Schmitz, https://users.archlinux.de/~pierre
Hi,
There are several methods to improve the situation: * multi tier mirroring. Roman started to work on this but might need some help here. It's mostly an organizing task
I strongly second that. Having a geographically organized hierarchy would be nice, so that there are tier-1 mirrors in every country (of course not if there are for example only two mirrors/country those could sync from the geographically closest tier-1 mirror), those being machines with a somewhat reasonable connection, and then have the mirrors sync from those machines and only the tier-1 (mirror.us,mirror.eu,mirror.de and the like) sync from rsync.archlinux.org and all others sync from those. As Pierre pointed out this is mostly an organizational problem as in selecting the tier-1 mirrors and asking the owners if they're going to support this plan. Eg for the tier-1 mirrors we could ask the guys from kernel.org for the main us and eu mirrors, and for eg germany ask the owner of ftp-stud.hs-esslingen.de or maybe Hosteurope. I don't know the situation in countries != de, but IIRC we have a lot of mirrors here.
* Add support for using both gz and xz compressed packages to db-scripts. This way we could migrate to the way better xz compression and reduce traffic: http://bugs.archlinux.org/task/17280
I don't know much about the package-system but if theres a 25% decrease in package size this sounds like one would like to have this anway. Just my 2 cents on this topic. regards, Hannes 'hrist' Rist -- Hannes Rist +----------------------------------------------------+ | Crew Selfnet e.V. NOC: admin@selfnet.de | | Allmandring 8A http://www.selfnet.de | | 70569 Stuttgart Fax: +49 711 620 4796 | +----------------------------------------------------+
On 31 January 2010 17:05, Hannes Rist <hrist@selfnet.de> wrote:
Hi,
There are several methods to improve the situation: * multi tier mirroring. Roman started to work on this but might need some help here. It's mostly an organizing task
I strongly second that. Having a geographically organized hierarchy would be nice, so that there are tier-1 mirrors in every country (of course not if there are for example only two mirrors/country those could sync from the geographically closest tier-1 mirror), those being machines with a somewhat reasonable connection, and then have the mirrors sync from those machines and only the tier-1 (mirror.us,mirror.eu,mirror.de and the like) sync from rsync.archlinux.org and all others sync from those. As Pierre pointed out this is mostly an organizational problem as in selecting the tier-1 mirrors and asking the owners if they're going to support this plan. Eg for the tier-1 mirrors we could ask the guys from kernel.org for the main us and eu mirrors, and for eg germany ask the owner of ftp-stud.hs-esslingen.de or maybe Hosteurope. I don't know the situation in countries != de, but IIRC we have a lot of mirrors here.
* Add support for using both gz and xz compressed packages to db-scripts. This way we could migrate to the way better xz compression and reduce traffic: http://bugs.archlinux.org/task/17280
I don't know much about the package-system but if theres a 25% decrease in package size this sounds like one would like to have this anway.
Just my 2 cents on this topic. regards, Hannes 'hrist' Rist
-- Hannes Rist +----------------------------------------------------+ | Crew Selfnet e.V. NOC: admin@selfnet.de | | Allmandring 8A http://www.selfnet.de | | 70569 Stuttgart Fax: +49 711 620 4796 | +----------------------------------------------------+
Hi all, I think that the syncing would be much less painful if there was some possibility to tell mirrors that package foo has been moved from [testing] to [extra]. Then these rebuilds would be only a matter of distributing information which packages should be moved from [testing] (that could be done by one text file). Lukas
Am Sonntag, 31. Januar 2010 17:14:05 schrieb Lukáš Jirkovský:
I think that the syncing would be much less painful if there was some possibility to tell mirrors that package foo has been moved from [testing] to [extra]. Then these rebuilds would be only a matter of distributing information which packages should be moved from [testing] (that could be done by one text file).
See my third suggestion. -- Pierre Schmitz, https://users.archlinux.de/~pierre
On 31 January 2010 17:15, Pierre Schmitz <pierre@archlinux.de> wrote:
Am Sonntag, 31. Januar 2010 17:14:05 schrieb Lukáš Jirkovský:
I think that the syncing would be much less painful if there was some possibility to tell mirrors that package foo has been moved from [testing] to [extra]. Then these rebuilds would be only a matter of distributing information which packages should be moved from [testing] (that could be done by one text file).
See my third suggestion.
--
Pierre Schmitz, https://users.archlinux.de/~pierre
I didn't understand what you meant first time. I think I got it now. If I understand it well you mean having all packages in one directory on server and the repos would be differentiated by some text files or symlinks. The difference is really small (have all packages in one place and link them vs. have current repository layout and move files between directories on server).
On Sun, 31 Jan 2010 17:24:22 +0100 Lukáš Jirkovský <l.jirkovsky@gmail.com> wrote:
I didn't understand what you meant first time. I think I got it now. If I understand it well you mean having all packages in one directory on server and the repos would be differentiated by some text files or symlinks. The difference is really small (have all packages in one place and link them vs. have current repository layout and move files between directories on server).
the difference is big, because rsync (used by mirrors to sync with us) doesn't/cannot know a file has moved. it deletes the old file and downloads it again under the new name/path Dieter
We have a bit update today, and we see: The syncing process is not really good.
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages. So "pacman -Syu" errors-out because it can't download some packages. So I either switch to the throttled ftp.archlinux.org or have to wait several days for the mirror to finally sync-up. -- damjan
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 I'm also having major problems upgrading my system. Pacman errors out and it tells me I have 77 packages to update and I was current two days ago. I believe it is 77 and counting. Last night it was 66. I'm using the kernel.org site for my packages now. I used to use easynews but it got out of date for a while so switched. kernel.org isn't going bad, is it? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEAREDAAYFAktnh2wACgkQWSjv55S0LfGQmwCg8kRLZm3i2I1nb9FY76xZyjrZ 78sAoJBG6JcdBUs0mvw7LiLlaoB5LlzE =slwK -----END PGP SIGNATURE-----
On Mon, Feb 1, 2010 at 9:01 PM, Steve Holmes <steve.holmes88@gmail.com> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160
I'm also having major problems upgrading my system. Pacman errors out and it tells me I have 77 packages to update and I was current two days ago. I believe it is 77 and counting. Last night it was 66.
I'm using the kernel.org site for my packages now. I used to use easynews but it got out of date for a while so switched. kernel.org isn't going bad, is it?
kernel.org has done me well during this update. There's more than one server answering to that name, and not all of them are equally up-to-date. Keep at it and you'll get one of the leading ones.
On 02/02/2010 04:29 AM, Ray Kohler wrote:
On Mon, Feb 1, 2010 at 9:01 PM, Steve Holmes<steve.holmes88@gmail.com> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160
I'm also having major problems upgrading my system. Pacman errors out and it tells me I have 77 packages to update and I was current two days ago. I believe it is 77 and counting. Last night it was 66.
I'm using the kernel.org site for my packages now. I used to use easynews but it got out of date for a while so switched. kernel.org isn't going bad, is it?
kernel.org has done me well during this update. There's more than one server answering to that name, and not all of them are equally up-to-date. Keep at it and you'll get one of the leading ones.
mirrors.kernel.org in fact is not a single mirror. is an alias to a geolocation subdomain and from there is serving from closer geographically position(in theory). for you maybe you hit in an up to date server. -- Ionut
On Tue, Feb 02, 2010 at 04:42:32AM +0200, Ionut Biru wrote:
mirrors.kernel.org in fact is not a single mirror. is an alias to a geolocation subdomain and from there is serving from closer geographically position(in theory). for you maybe you hit in an up to date server.
I don't know but I've been doing multiple 'pacman -Syuw' and each time I seem to be able to download one package and then all the rest fail and the response says the file is supposedly not found. I'll tell you, with 83 files to get and only getting one each time, this is the worst update experience I've ever had. It usually goes so smoothly but this time around is terrible!
On 02/02/10 13:01, Steve Holmes wrote:
On Tue, Feb 02, 2010 at 04:42:32AM +0200, Ionut Biru wrote:
mirrors.kernel.org in fact is not a single mirror. is an alias to a geolocation subdomain and from there is serving from closer geographically position(in theory). for you maybe you hit in an up to date server.
I don't know but I've been doing multiple 'pacman -Syuw' and each time I seem to be able to download one package and then all the rest fail and the response says the file is supposedly not found. I'll tell you, with 83 files to get and only getting one each time, this is the worst update experience I've ever had. It usually goes so smoothly but this time around is terrible!
Well, there was a news item saying it would be best to wait a couple of days to do an update... but no-one ever listens to us. Allan
Allan McRae wrote:
Well, there was a news item saying it would be best to wait a couple of days to do an update... but no-one ever listens to us.
And yet they're still using Archlinux. True love! regards, Hannes -- Hannes Rist +----------------------------------------------------+ | Crew Selfnet e.V. NOC: admin@selfnet.de | | Allmandring 8A http://www.selfnet.de | | 70569 Stuttgart Fax: +49 711 620 4796 | +----------------------------------------------------+
Hi Allan, I am holding the upgrade till now on your advice. Total Download Size: 679.66 MB Total Installed Size: 1811.77 MB Is it safe to Upgrade now? Regards, Gaurish Sharma www.gaurishsharma.com On Tue, Feb 2, 2010 at 8:34 AM, Allan McRae <allan@archlinux.org> wrote:
On 02/02/10 13:01, Steve Holmes wrote:
On Tue, Feb 02, 2010 at 04:42:32AM +0200, Ionut Biru wrote:
mirrors.kernel.org in fact is not a single mirror. is an alias to a geolocation subdomain and from there is serving from closer geographically position(in theory). for you maybe you hit in an up to date server.
I don't know but I've been doing multiple 'pacman -Syuw' and each time I seem to be able to download one package and then all the rest fail and the response says the file is supposedly not found. I'll tell you, with 83 files to get and only getting one each time, this is the worst update experience I've ever had. It usually goes so smoothly but this time around is terrible!
Well, there was a news item saying it would be best to wait a couple of days to do an update... but no-one ever listens to us.
Allan
On Tue, Feb 2, 2010 at 12:02 AM, Damjan Georgievski <gdamjan@gmail.com> wrote:
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db. The only way to have always running mirrors is to either make the operation atomic by for instance switching to a directory with the new files (but what if the switch occurs while you are downloading files according to the old db), or to have an incremental layout where nothing gets deleted. I don't see a way of keeping mirrors consistent while keeping the whole procedure simple. In any case, there should be more communication towards users about what's really going on. Benoit.
On 02/02/10 18:09, Benoit Favre wrote:
In any case, there should be more communication towards users about what's really going on.
Like posting a message saying not to update on the front page? That would have been nice... Allan
Excerpts from Allan McRae's message of 2010-02-02 09:24:47 +0100:
On 02/02/10 18:09, Benoit Favre wrote:
In any case, there should be more communication towards users about what's really going on.
Like posting a message saying not to update on the front page? That would have been nice...
Allan
With all due respect Allan, but it doesn't say that, it says: It is advisable to check the state of your mirror before updating (https://www.archlinux.de/?page=MirrorStatus). But the page doesn't tell us a lot. It doesn't tell us which mirrors are in sync and which ones are syncing. That would be the needed information. Last night I tried one where the sync date was less than two hours in the past and yet I got lots of corrupted files. The only useful thing the page can tell you is which mirrors probably are pre push from testing and which ones are probably after. The only way to actually find a synced one is through trial and error or by someone telling you a good one. Regards, Philipp
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db.
- download new packages - update db - delete old packages -- damjan
On 02/02/2010 07:53 PM, Damjan Georgievski wrote:
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db.
- download new packages - update db - delete old packages
now tell us how do this order with rsync. -- Ionut
Ionut Biru wrote:
On 02/02/2010 07:53 PM, Damjan Georgievski wrote:
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db.
- download new packages - update db - delete old packages
now tell us how do this order with rsync.
the debian mirror scripts have such a staged setup with 2 rsync runs, might wanna have a look at them mirror.debian.org somewhere here. http://www.debian.org/mirror/ftpmirror 'how to mirror' it's explained there. regards, Hannes -- Hannes Rist +----------------------------------------------------+ | Crew Selfnet e.V. NOC: admin@selfnet.de | | Allmandring 8A http://www.selfnet.de | | 70569 Stuttgart Fax: +49 711 620 4796 | +----------------------------------------------------+
On Tue, Feb 2, 2010 at 1:53 PM, Hannes Rist <hrist@selfnet.de> wrote:
Ionut Biru wrote:
On 02/02/2010 07:53 PM, Damjan Georgievski wrote:
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db.
- download new packages - update db - delete old packages
now tell us how do this order with rsync.
the debian mirror scripts have such a staged setup with 2 rsync runs, might wanna have a look at them mirror.debian.org somewhere here. http://www.debian.org/mirror/ftpmirror 'how to mirror' it's explained there.
from http://www.debian.org/mirror/ftpmirror#how ... * MUST perform a 2-stage sync ... Rationale: if archive mirroring is done in a single stage, there will be periods of time during which the index files will reference files not yet mirrored. ... Sounds pretty good, Hannes. -- Andrew Antle <andrew dot antle at gmail dot com>
On Tue, 2 Feb 2010 14:06:35 -0500 Andrew Antle <andrew.antle@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:53 PM, Hannes Rist <hrist@selfnet.de> wrote:
Ionut Biru wrote:
On 02/02/2010 07:53 PM, Damjan Georgievski wrote:
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db.
- download new packages - update db - delete old packages
now tell us how do this order with rsync.
the debian mirror scripts have such a staged setup with 2 rsync runs, might wanna have a look at them mirror.debian.org somewhere here. http://www.debian.org/mirror/ftpmirror 'how to mirror' it's explained there.
from http://www.debian.org/mirror/ftpmirror#how ... * MUST perform a 2-stage sync ... Rationale: if archive mirroring is done in a single stage, there will be periods of time during which the index files will reference files not yet mirrored. ... Sounds pretty good, Hannes.
I must be missing something.. isn't --delete-after good enough? Dieter
On Tue, 2 Feb 2010 20:32:20 +0100 Dieter Plaetinck <dieter@plaetinck.be> wrote:
On Tue, 2 Feb 2010 14:06:35 -0500 Andrew Antle <andrew.antle@gmail.com> wrote:
On Tue, Feb 2, 2010 at 1:53 PM, Hannes Rist <hrist@selfnet.de> wrote:
Ionut Biru wrote:
On 02/02/2010 07:53 PM, Damjan Georgievski wrote:
> > There's also the problem that some mirrors (most of the ones > I've tried) sync the package database before syncing all the > packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db.
- download new packages - update db - delete old packages
now tell us how do this order with rsync.
the debian mirror scripts have such a staged setup with 2 rsync runs, might wanna have a look at them mirror.debian.org somewhere here. http://www.debian.org/mirror/ftpmirror 'how to mirror' it's explained there.
from http://www.debian.org/mirror/ftpmirror#how ... * MUST perform a 2-stage sync ... Rationale: if archive mirroring is done in a single stage, there will be periods of time during which the index files will reference files not yet mirrored. ... Sounds pretty good, Hannes.
I must be missing something.. isn't --delete-after good enough?
Dieter
On mir.archlinux.fr, we use --delay-updates, it uses more disk spaces and if it fails, should restart from 0 but db is normally coherent with packages.
- download new packages - update db - delete old packages
from http://www.debian.org/mirror/ftpmirror#how ... * MUST perform a 2-stage sync ... Rationale: if archive mirroring is done in a single stage, there will be periods of time during which the index files will reference files not yet mirrored.
I must be missing something.. isn't --delete-after good enough?
you are missing the fact that it will download the database file before it downloads the package files. So if I "-Syu" at that time pacman will want to upgrade packages that are not on the mirror. Which is why mirrors must sync the database AFTER syncing the packages. -- damjan
On Tue, Feb 2, 2010 at 7:41 PM, Damjan Georgievski <gdamjan@gmail.com> wrote:
- download new packages - update db - delete old packages ... I must be missing something.. isn't --delete-after good enough?
you are missing the fact that it will download the database file before it downloads the package files. So if I "-Syu" at that time pacman will want to upgrade packages that are not on the mirror.
Which is why mirrors must sync the database AFTER syncing the packages.
I thought that was exactly what Damjan said: - download all new packages - sync database (or, update db, as he said) - delete old packages I don't know how the mirros sync, but if it is with rsync, I guess it would be a matter of calling it without the delete parameter once, update db, and then calling rsync with the delete parameter set. Would this work? -- Guilherme M. Nogueira "Any sufficiently advanced technology is indistinguishable from magic." - Arthur C. Clarke
Damjan Georgievski <gdamjan@gmail.com> writes:
- download new packages - update db - delete old packages
from http://www.debian.org/mirror/ftpmirror#how ... * MUST perform a 2-stage sync ... Rationale: if archive mirroring is done in a single stage, there will be periods of time during which the index files will reference files not yet mirrored.
I must be missing something.. isn't --delete-after good enough?
you are missing the fact that it will download the database file before it downloads the package files. So if I "-Syu" at that time pacman will want to upgrade packages that are not on the mirror.
Which is why mirrors must sync the database AFTER syncing the packages.
I don't get it. Why is it such a big problem? If pacman tries to download a package that isn't on the mirror, quit it and try it later...
This order can be accomplished by first running rsync without the delete flag. Then rsync over the DB. Then re-run the original rsync with --delete or --delete-after. You could also google for 'atomic rsync' First hit is http://www.opensource.apple.com/source/rsync/rsync-35.2/rsync/support/atomic... As for push mirroring, http://www.debian.org/mirror/push_server is a decent example An identity file with no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="/path/to/mirror/script",from="IPADDRESS" &" Is fairly decent.. --- Lee Burton lburton@mrow.org 301 910 0246 On Tue, Feb 2, 2010 at 12:53, Damjan Georgievski <gdamjan@gmail.com> wrote:
There's also the problem that some mirrors (most of the ones I've tried) sync the package database before syncing all the packages.
Actually, syncing the db last is not going to improve things: if some packages get deleted, they won't be found when updating against the old db.
- download new packages - update db - delete old packages
-- damjan
On 02/03/2010 03:12 PM, Lee Burton wrote:
As for push mirroring, http://www.debian.org/mirror/push_server is a decent example An identity file with no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="/path/to/mirror/script",from="IPADDRESS" &" Is fairly decent.. I've talked with someone working on the data distribution system (big webcluster) for some big company and he said they haven't had good experience with pushing. Polling (often) has yet been the best solution. Actually the patch I posted here is quite similar to their system.
PS: Please don't toppost. -- Florian Pritz -- {flo,bluewind}@server-speed.net
On Wed, Feb 3, 2010 at 10:53, Florian Pritz <bluewind@server-speed.net> wrote:
On 02/03/2010 03:12 PM, Lee Burton wrote:
As for push mirroring, http://www.debian.org/mirror/push_server is a decent example An identity file with no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="/path/to/mirror/script",from="IPADDRESS" &" Is fairly decent.. I've talked with someone working on the data distribution system (big webcluster) for some big company and he said they haven't had good experience with pushing. Polling (often) has yet been the best solution. Actually the patch I posted here is quite similar to their system.
PS: Please don't toppost. -- Florian Pritz -- {flo,bluewind}@server-speed.net
In fact Debian does force commands.. just as you suggested earlier. I agree that polling is probably a better solution. To make it "multi-tiered" and to reduce load on the primary mirror could have slightly more intelligent polling than just checking one upstream machine. In this example Let: Primary = Arch Primary Mirror/Mirrors (updated directly by the dbscripts). Tier-1 = Large High-Bandwidth/Traffic mirrors that other mirrors mirror off of Tier-2 = Smaller mirrors It would then go something like: A tier-1 mirror would check against the Primaries once a minute (for the md5sum). A tier-2 mirror would check against two tier-1 mirrors and see if they agree, if they don't it would ask a primary for a tie-break. It would then could notify (via an automated email?, perhaps one in a 24-hour period? if it's been out of date for XX hours) the mirror owner of the out of date mirror? Also I forget, does archlinux/pacman do any sort of GPG checks/signing with packages? Apologies on topposting, I hadn't responded to very much list traffic with this mail client, and have now changed the client's behavior. -- Lee Burton lburton@mrow.org 301 910 0246
On Thu, 4 Feb 2010 14:27:14 -0500 Lee Burton <lburton@mrow.org> wrote:
To make it "multi-tiered" and to reduce load on the primary mirror could have slightly more intelligent polling than just checking one upstream machine. In this example Let: Primary = Arch Primary Mirror/Mirrors (updated directly by the dbscripts). Tier-1 = Large High-Bandwidth/Traffic mirrors that other mirrors mirror off of Tier-2 = Smaller mirrors It would then go something like: A tier-1 mirror would check against the Primaries once a minute (for the md5sum). A tier-2 mirror would check against two tier-1 mirrors and see if they agree, if they don't it would ask a primary for a tie-break. It would then could notify (via an automated email?, perhaps one in a 24-hour period? if it's been out of date for XX hours) the mirror owner of the out of date mirror?
seems needlessly complex to me. Dieter
2010/2/4 Dieter Plaetinck <dieter@plaetinck.be>:
seems needlessly complex to me. Dieter But the current way is not the best doable. We need another(a better) solution!
-- Gruß, Benedikt
On Thu, Feb 4, 2010 at 14:40, Dieter Plaetinck <dieter@plaetinck.be> wrote:
On Thu, 4 Feb 2010 14:27:14 -0500 Lee Burton <lburton@mrow.org> wrote:
To make it "multi-tiered" and to reduce load on the primary mirror could have slightly more intelligent polling than just checking one upstream machine. In this example Let: Primary = Arch Primary Mirror/Mirrors (updated directly by the dbscripts). Tier-1 = Large High-Bandwidth/Traffic mirrors that other mirrors mirror off of Tier-2 = Smaller mirrors It would then go something like: A tier-1 mirror would check against the Primaries once a minute (for the md5sum). A tier-2 mirror would check against two tier-1 mirrors and see if they agree, if they don't it would ask a primary for a tie-break. It would then could notify (via an automated email?, perhaps one in a 24-hour period? if it's been out of date for XX hours) the mirror owner of the out of date mirror?
seems needlessly complex to me. Dieter
It probably is. Perhaps a push-primary solution (much simpler..) combined with a default twice a day sync (just to make sure?) for tier-1 mirrors might work.. the I believe point here is to get ideas out there. -- Lee Burton lburton@mrow.org 301 910 0246
It probably is. Perhaps a push-primary solution (much simpler..) combined with a default twice a day sync (just to make sure?) for tier-1 mirrors might work.. the I believe point here is to get ideas out there. I'd go for arch master -> mirror with more bandwidth -> rest and all run the script I already posted (no pushing) like every 1-5 minutes. Load should be near 0 and we could have for example
On 02/04/2010 09:42 PM, Lee Burton wrote: main.mirrors.archlinux.org point to the fast mirror. If I'm not mistaken the script should always try to resync when something doesn't seem to be ok (rsync should exit > 0 in that case), but I might have forgotten some edge cases. PS: A mailinglist for mirror stuff (like this discussion) with all mirror admins would also be quite nice. -- Florian Pritz -- {flo,bluewind}@server-speed.net
On Thu, Feb 4, 2010 at 15:57, Florian Pritz <bluewind@server-speed.net> wrote:
It probably is. Perhaps a push-primary solution (much simpler..) combined with a default twice a day sync (just to make sure?) for tier-1 mirrors might work.. the I believe point here is to get ideas out there. I'd go for arch master -> mirror with more bandwidth -> rest and all run the script I already posted (no pushing) like every 1-5 minutes. Load should be near 0 and we could have for example
On 02/04/2010 09:42 PM, Lee Burton wrote: main.mirrors.archlinux.org point to the fast mirror. If I'm not mistaken the script should always try to resync when something doesn't seem to be ok (rsync should exit > 0 in that case), but I might have forgotten some edge cases.
PS: A mailinglist for mirror stuff (like this discussion) with all mirror admins would also be quite nice.
-- Florian Pritz -- {flo,bluewind}@server-speed.net
As a mirror admin (mirrors.rit.edu) , I second that request, although perhaps one low traffic list (mandatory script updates, bulletins, etc) and one for discussions? Or perhaps some other scheme entirely. -- Lee Burton lburton@mrow.org 301 910 0246
Hi,
PS: A mailinglist for mirror stuff (like this discussion) with all mirror admins would also be quite nice.
As a mirror admin (mirrors.rit.edu) , I second that request, although perhaps one low traffic list (mandatory script updates, bulletins, etc) and one for discussions? Or perhaps some other scheme entirely.
+1 from me. mirror-announce and mirror-discussion? Also a mailing list would be a lot faster than the bug-tracker to for example let archlinux.org know if you plan to change hostname/IP/directory-structure etc as it looks like the mailing-list gets read more frequently than the bug-tracker. regards, Hannes (mirror.selfnet.de/ftp.wh-stuttgart.net) -- Hannes Rist +----------------------------------------------------+ | Crew Selfnet e.V. NOC: admin@selfnet.de | | Allmandring 8A http://www.selfnet.de | | 70569 Stuttgart Fax: +49 711 620 4796 | +----------------------------------------------------+
Hi, I suggest using A Download Redirector and Metalink Generator like mirror brain(http://mirrorbrain.org/) to reduce work load on main ArchLinux Server. Also, we should have few other official mirrors apart from al.org from which mirrors of rest of the world would sync. essentially spreading out the work over couple of servers hence increasing the overall updating speed. We have to keep in mind that arch is a rolling release distro so mirror should update more frequently(like once in a day). Regards, Gaurish Sharma
On Sat, Feb 6, 2010 at 21:20, Gaurish Sharma <contact@gaurishsharma.com> wrote:
Hi, I suggest using A Download Redirector and Metalink Generator like mirror brain(http://mirrorbrain.org/) to reduce work load on main ArchLinux Server.
I wonder if something like http://www.coralcdn.org/ could be used for a package repository. -- damjan
On Sun, Feb 7, 2010 at 3:28 AM, Damjan Georgievski <gdamjan@gmail.com> wrote:
I wonder if something like http://www.coralcdn.org/ could be used for a package repository.
Nope. The CoralCDN is intended to be used as a distributed web cache. It doesn't even serve large files, it redirects you to the original source: % wget http://ftp.archlinux.org.nyud.net/iso/latest/archlinux-2009.08-netinstall-x8... --2010-02-09 19:39:51-- http://ftp.archlinux.org.nyud.net/iso/latest/archlinux-2009.08-netinstall-x8... Resolving ftp.archlinux.org.nyud.net... 150.189.2.102, 141.161.20.32, 141.213.4.201 Connecting to ftp.archlinux.org.nyud.net|150.189.2.102|:80... connected. HTTP request sent, awaiting response... 302 Location: http://ftp.archlinux.org/iso/latest/archlinux-2009.08-netinstall-x86_64.iso?... [following] --2010-02-09 19:39:58-- http://ftp.archlinux.org/iso/latest/archlinux-2009.08-netinstall-x86_64.iso?... Resolving ftp.archlinux.org... 209.85.41.143, 209.85.41.144 [...] Regards, Marti
I wonder if something like http://www.coralcdn.org/ could be used for a package repository.
Nope. The CoralCDN is intended to be used as a distributed web cache. It doesn't even serve large files, it redirects you to the original source:
I was thinking about using the technology not the service (which is probably limited) -- damjan
I'm working on a mirrorscript that can be run as often as you want to. Even every minute. In short: The script fetches a md5sum of the databases and if one database has changed it will start rsync to resync that particular repo. The md5 it fetches is small, static and will cause nearly no load, but mirrors will be way more up to date than they are now. If you run it every minute which should not cause any problems the delay would be nearly unnoticeable. I'm still waiting for Roman to discuss the mirror setup itself (multi-tier mirroring?) and maybe I'll also patch dbscripts to generate the md5sums I need. For those who are interested I've attached the current sync script. It might change in future though. -- Florian Pritz -- {flo,bluewind}@server-speed.net
On 02/03/2010 01:16 AM, Florian Pritz wrote:
For those who are interested I've attached the current sync script. It might change in future though. Seems it got lost :( http://karif.server-speed.net/~flo/tmp/mirrorsync.sh.txt
-- Florian Pritz -- {flo,bluewind}@server-speed.net
Looks really nice, Florian ;] I wouldn't dare to analyse the syntax, as I only know the basics, but the idea is pretty neat. -- Guilherme M. Nogueira "Any sufficiently advanced technology is indistinguishable from magic." - Arthur C. Clarke
participants (21)
-
Allan McRae
-
Andre Ramaciotti
-
Andrew Antle
-
Benedikt Müller
-
Benoit Favre
-
Damjan Georgievski
-
Dan McGee
-
Dieter Plaetinck
-
Florian Pritz
-
Gaurish Sharma
-
Guilherme M. Nogueira
-
Hannes Rist
-
hollunder
-
Ionut Biru
-
Lee Burton
-
Lukáš Jirkovský
-
Marti Raudsepp
-
Pierre Schmitz
-
Ray Kohler
-
Steve Holmes
-
tuxce