[arch-general] Building local repo - eliminating dups - why some new x86_64?
Listmates, I'm building a local repo for boxes to update via the lan instead of redownloading. I have my repo on my local server as: arch/ x86/ x86_64/ I have moved all files for my two x86_64 boxes to the arch/x86_64 dir and I am filtering with a script to eliminate dups by moving the lesser numbered packages to arch/x86_64/oldpkgs. Testing the script before the actual move of packages, I ran across this anomaly in some filenames: (script output) [1] <snip> ttf-isabella-1.003-3.pkg.tar.gz -> oldpkgs ttf-isabella-1.003-4-x86_64.pkg.tar.gz tunepimp-0.5.3-5.pkg.tar.gz -> oldpkgs tunepimp-0.5.3-6-x86_64.pkg.tar.gz tzdata-2009f-1-x86_64.pkg.tar.gz -> oldpkgs tzdata-2009g-1-x86_64.pkg.tar.gz unrar-3.8.5-2-x86_64.pkg.tar.gz -> oldpkgs unrar-3.9.1-1-x86_64.pkg.tar.gz <snip> As shown above, there are multiple packages where the earlier version did *not* contain the x86_64 designation where the current package now does. Are these the same packages? If so why did earlier packages not have the x86_64 and when was the architecture added? Are all packages now going to have the architecture specified? Footnotes: [1] Filtering done by a simple look ahead and the output created as follows: PKGFILES=( $(ls -1 /home/backup/archlinux/x86_64) ) for ((i=0;i<${#PKGFILES[@]}-1;i++)); do if [[ ${PKGFILES[$i]%%-[[:digit:]]*} == ${PKGFILES[$i+1]%%-[[:digit:]]*} ]]; then echo -e "${PKGFILES[$i]} -> oldpkgs\n${PKGFILES[$i+1]}\n" fi -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
David C. Rankin, J.D.,P.E. wrote:
Listmates,
I'm building a local repo for boxes to update via the lan instead of redownloading. I have my repo on my local server as:
You might want to look into this: http://xyne.archlinux.ca/info/pkgd
arch/ x86/ x86_64/
I have moved all files for my two x86_64 boxes to the arch/x86_64 dir and I am filtering with a script to eliminate dups by moving the lesser numbered packages to arch/x86_64/oldpkgs. Testing the script before the actual move of packages, I ran across this anomaly in some filenames: (script output) [1]
<snip>
ttf-isabella-1.003-3.pkg.tar.gz -> oldpkgs ttf-isabella-1.003-4-x86_64.pkg.tar.gz
tunepimp-0.5.3-5.pkg.tar.gz -> oldpkgs tunepimp-0.5.3-6-x86_64.pkg.tar.gz
tzdata-2009f-1-x86_64.pkg.tar.gz -> oldpkgs tzdata-2009g-1-x86_64.pkg.tar.gz
unrar-3.8.5-2-x86_64.pkg.tar.gz -> oldpkgs unrar-3.9.1-1-x86_64.pkg.tar.gz
<snip>
As shown above, there are multiple packages where the earlier version did *not* contain the x86_64 designation where the current package now does. Are these the same packages? If so why did earlier packages not have the x86_64 and when was the architecture added? Are all packages now going to have the architecture specified?
Pre pacman-3.0 (I think), the architecture was not included in the file name. Anything in the [community] repo still will not have the architecture name as the [community] repo scripts do not handle it yet. Allan
On or about Tuesday 19 May 2009 at approximately 01:31:46 Allan McRae composed: snip
You might want to look into this: http://xyne.archlinux.ca/info/pkgd
<snip>
Pre pacman-3.0 (I think), the architecture was not included in the file name. Anything in the [community] repo still will not have the architecture name as the [community] repo scripts do not handle it yet.
Allan
Allan, Thanks, at least I know I didn't mix apples and oranges. Thanks for the link as well. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
Hey, We see people trying to make an Arch repo mirror to save themselves bandwidth. I think that doesn't really make too much sense. Instead, it seems much better to implement a download proxy. The way this works is that all traffic is routed through a computer which backs up stuff that passes through it. When a computer on the network asks for a file that's been downloaded previously, there is no need to go into the Internet. That seems like a great thing to use for Arch packages, as well as a lot of stuff really. Think about how much faster some websites and stuff can load if you already have all the common images downloaded to your LAN. Here's a link. http://www.squid-cache.org/ -AT (Man.. I really want to set up a Linux router box...)
Hello, On Tuesday 19 May 2009 18:44:13 Andrei Thorp wrote:
We see people trying to make an Arch repo mirror to save themselves bandwidth. I think that doesn't really make too much sense. Instead, it seems much better to implement a download proxy. The way this works is that all traffic is routed through a computer which backs up stuff that passes through it. When a computer on the network asks for a file that's been downloaded previously, there is no need to go into the Internet.
Yes and no. arch packages are not exactly small. I run a squid cache and a cache object size of 128KB serves me pretty well. To accomodate all arch packages, this setting has to go up to may be 150MB(for openoffice). If the cache start caching every object of size upto 150MB, it won't be as effective or will baloon dramatically. Not to mention the memory requirement that will go up too. But no doubt http access will be dramatically fast :) Not to mention, squid is only http caching proxy, not ftp.
That seems like a great thing to use for Arch packages, as well as a lot of stuff really. Think about how much faster some websites and stuff can load if you already have all the common images downloaded to your LAN.
squid is great but I doubt it can help with multiple computers with arch. It can handle only download caching but thats not enough. e.g. I have two arch boxes. The installations are not exactly identical. I am using shared network cache on nfs as detailed on arch wiki. How and when should I run "pacman -Sc" to clean the repo? There should be one way of supporting multiple computers that handles all the package scenario, including download, upgrade, deinstallation, clean up and rotation. +1 if it could handle identical and /or shared installations. (Yes I read about pkgdd but I need something thats in core/extra as it gives extra confidence in terms of testing.) Arch is growing and it needs more meat :) -- Shridhar
On Tue, May 19, 2009 at 11:11, Shridhar Daithankar <ghodechhap@ghodechhap.net> wrote:
(Yes I read about pkgdd but I need something thats in core/extra as it gives extra confidence in terms of testing.)
You can definitely trust Xyne, he's a great author. You really don't need to be so paranoid. :P Plus you can just look at the code to see if it's trustworthy or not...
On Tuesday 19 May 2009 22:29:05 Daenyth Blank wrote:
On Tue, May 19, 2009 at 11:11, Shridhar Daithankar
<ghodechhap@ghodechhap.net> wrote:
(Yes I read about pkgdd but I need something thats in core/extra as it gives extra confidence in terms of testing.)
You can definitely trust Xyne, he's a great author. You really don't need to be so paranoid. :P Plus you can just look at the code to see if it's trustworthy or not...
:) As I said earlier, this is not about author or the package. Its just that such a solution needs more attention and an effort to make it mainstream. I am going to try it out and upvote it for inclusion in core. -- Shridhar
(...) When a computer on the network asks for a file that's been downloaded previously, there is no need to go into the Internet.
Yes and no.
arch packages are not exactly small. I run a squid cache and a cache object size of 128KB serves me pretty well. To accomodate all arch packages, this setting has to go up to may be 150MB(for openoffice). If the cache start caching every object of size upto 150MB, it won't be as effective or will baloon dramatically. Not to mention the memory requirement that will go up too.
I'm under the impression that you can configure it in other ways and not just space, therefore letting it work for Arch packages (say, from your favourite mirrors) and not from everywhere. Yeah, it does increase the requirements, but I'm sure it's handleable.
But no doubt http access will be dramatically fast :)
Not to mention, squid is only http caching proxy, not ftp.
"Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more." -- their website.
squid is great but I doubt it can help with multiple computers with arch. It can handle only download caching but thats not enough.
(snip)
Yeah, some decent ideas there. -AT
On Tue, May 19, 2009 at 2:03 PM, Andrei Thorp <garoth@gmail.com> wrote:
(...) When a computer on the network asks for a file that's been downloaded previously, there is no need to go into the Internet.
Yes and no.
arch packages are not exactly small. I run a squid cache and a cache object size of 128KB serves me pretty well. To accomodate all arch packages, this setting has to go up to may be 150MB(for openoffice). If the cache start caching every object of size upto 150MB, it won't be as effective or will baloon dramatically. Not to mention the memory requirement that will go up too.
I'm under the impression that you can configure it in other ways and not just space, therefore letting it work for Arch packages (say, from your favourite mirrors) and not from everywhere. Yeah, it does increase the requirements, but I'm sure it's handleable.
But no doubt http access will be dramatically fast :)
Not to mention, squid is only http caching proxy, not ftp.
"Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more." -- their website.
squid is great but I doubt it can help with multiple computers with arch. It can handle only download caching but thats not enough.
(snip)
Yeah, some decent ideas there.
-AT
Another solution is to have all computers using only one pacman cache located on a single computer via nfs. So once a computer has downloaded a packages, all the other ones can grab it directly from the local network. If you have i686 and x86_64 computers, pacman can't differentiate between the two arches if the packages name doen't contain the arch (old pkg and community pkg). It reports a md5sum mismatch. You just need to say 'yes' to redownload the package when that happen. If you want to get rid of that problem, setup two caches: one for each arch.
On Tue, May 19, 2009 at 3:46 PM, Eric Bélanger <snowmaniscool@gmail.com> wrote:
Another solution is to have all computers using only one pacman cache located on a single computer via nfs. So once a computer has downloaded a packages, all the other ones can grab it directly from the local network.
I might be wrong but I think pacman is downloading the packages directly to the cache directory. This might cause problem if the download starts on one host and is started on another host before the download is done. pacman will then report the package as corrupted but doesn't try to redownload it automatically. It won't do any harm but it can get anoying if you have a large number of systems. Sébastien
2009/5/19 Sébastien Duquette <ekse.0x@gmail.com>:
On Tue, May 19, 2009 at 3:46 PM, Eric Bélanger <snowmaniscool@gmail.com> wrote:
Another solution is to have all computers using only one pacman cache located on a single computer via nfs. So once a computer has downloaded a packages, all the other ones can grab it directly from the local network.
I might be wrong but I think pacman is downloading the packages directly to the cache directory. This might cause problem if the download starts on one host and is started on another host before the download is done. pacman will then report the package as corrupted but doesn't try to redownload it automatically. It won't do any harm but it can get anoying if you have a large number of systems.
Sébastien
I believe you are correct. I only used that setup to share source cache when building packages. However, I was the only user and was building on one machine at a time (I only have two). So I didn't encounter this multiple download problem. Perhaps the method I proposed is more suitable for cases when there only one or very few users. Eric
Eric Bélanger wrote:
2009/5/19 Sébastien Duquette <ekse.0x@gmail.com>:
On Tue, May 19, 2009 at 3:46 PM, Eric Bélanger <snowmaniscool@gmail.com> wrote:
Another solution is to have all computers using only one pacman cache located on a single computer via nfs. So once a computer has downloaded a packages, all the other ones can grab it directly from the local network.
I might be wrong but I think pacman is downloading the packages directly to the cache directory. This might cause problem if the download starts on one host and is started on another host before the download is done. pacman will then report the package as corrupted but doesn't try to redownload it automatically. It won't do any harm but it can get anoying if you have a large number of systems.
Sébastien
I believe you are correct. I only used that setup to share source cache when building packages. However, I was the only user and was building on one machine at a time (I only have two). So I didn't encounter this multiple download problem. Perhaps the method I proposed is more suitable for cases when there only one or very few users.
That is defintely correct - I have had issues in the past where I was using the same cache for my chroot and actual system and tried updating both at once. Adding a lock file to the cache is on my TODO list, although it is a simple patch if someone wants to beat me to it... Allan
On or about Tuesday 19 May 2009 at approximately 08:14:13 Andrei Thorp composed:
Hey,
We see people trying to make an Arch repo mirror to save themselves bandwidth. I think that doesn't really make too much sense. Instead, it seems much better to implement a download proxy. The way this works is that all traffic is routed through a computer which backs up stuff that passes through it. When a computer on the network asks for a file that's been downloaded previously, there is no need to go into the Internet.
That seems like a great thing to use for Arch packages, as well as a lot of stuff really. Think about how much faster some websites and stuff can load if you already have all the common images downloaded to your LAN.
Here's a link.
-AT
(Man.. I really want to set up a Linux router box...)
Now that looks like a clever solution. Which is what I am essentially trying to do, but without the "proxy mechanism" serving the files. I have several servers, one is my 1st Arch box x86_64 and my laptop which is x86_64. The present main server is an older openSuSE box. What I was wanting to do was to update the Arch server, then move the cache to /home/backup/archlinux, share that dir either by http, ftp, or both. Then it seemed like a simple thing just to point my laptop to the /home/backup/archlinux dir on the server, edit pacman.conf and put the update the server line at the top and then pull updates to my laptop from the other archbox. The only issue I encountered in the thought process was handling multiple versions on the server which was solved with a bit of bash scripting. It squid was involved, then wouldn't the "Which duplicate problem" still apply from the proxy?? -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
participants (7)
-
Allan McRae
-
Andrei Thorp
-
Daenyth Blank
-
David C. Rankin, J.D.,P.E.
-
Eric Bélanger
-
Shridhar Daithankar
-
Sébastien Duquette