On Tue, Nov 3, 2009 at 7:23 AM, Thomas Bächler <thomas@archlinux.org> wrote:
Dan McGee schrieb:
Realize that this has drawbacks; someone that is fetching (not cloning) over HTTP will have to redownload the whole pack again and not just the incremental changeset. You may want something more like the included script as it gives you the benefits of compressing objects but not creating one huge pack.
-Dan
$ cat bin/prunerepos #!/bin/sh
cwd=$(pwd)
for dir in $(ls | grep -F '.git'); do cd $cwd/$dir echo "pruning and packing $cwd/$dir..." git prune git repack -d done
I realize that, is it something we should be really concerned about? With our small repositories, the overhead of downloading a bunch of small files might even outweigh the size of a big pack.
That is the whole point, repack doesn't create small files, it bundles them up for you. Downloading 3 packs is still quicker than downloading 1 big one if we do it once a week. The AUR pack is quite huge and that is under active development, so I would feel bad gc-ing that when a simple repack (I just did one) will do creating only a 230K pack: $ ll objects/pack/ total 8.7M -r--r--r-- 1 simo aur-git 22K 2009-11-03 08:28 pack-2def16dc5d8361b8a7c11e60e10c503ba9874fdb.idx -r--r--r-- 1 simo aur-git 230K 2009-11-03 08:28 pack-2def16dc5d8361b8a7c11e60e10c503ba9874fdb.pack -r--r--r-- 1 simo aur-git 139K 2009-01-22 21:38 pack-c7bd96b6fc392799991ad88824f935c09d470efa.idx -r--r--r-- 1 simo aur-git 8.3M 2009-01-22 21:38 pack-c7bd96b6fc392799991ad88824f935c09d470efa.pack And if it is still a problem we can always just switch to git-gc later- we don't need to skip this intermediate step.
pacman.git is our biggest and currently has a 5.4MB pack when you gc it.
Note that this is an incredibly compacted initial pack- the repository will weigh in around 9 MB if you packed it locally; I had to pull some tricks to get it that small.
Or maybe we should prune && repack them weekly, but gc them monthly or every 2 months?
Last week, we had http access to http://projects.archlinux.org/git/ (not counting 403s and 404s) from 12 different IPs, 66 the week before that, then 63 and 84. I hope most people use git://.
I also hope most people use git; but I don't want to leave those in the dust that can't. They are also likely the ones with the worst internet connections so watching out for them might be the nice thing to do.