[arch-dev-public] Cronjob for regular git garbage collection

Dan McGee dpmcgee at gmail.com
Tue Nov 3 08:30:56 EST 2009


On Tue, Nov 3, 2009 at 7:23 AM, Thomas Bächler <thomas at archlinux.org> wrote:
> Dan McGee schrieb:
>>
>> Realize that this has drawbacks; someone that is fetching (not
>> cloning) over HTTP will have to redownload the whole pack again and
>> not just the incremental changeset. You may want something more like
>> the included script as it gives you the benefits of compressing
>> objects but not creating one huge pack.
>>
>> -Dan
>>
>> $ cat bin/prunerepos
>> #!/bin/sh
>>
>> cwd=$(pwd)
>>
>> for dir in $(ls | grep -F '.git'); do
>>        cd $cwd/$dir
>>        echo "pruning and packing $cwd/$dir..."
>>        git prune
>>        git repack -d
>> done
>
> I realize that, is it something we should be really concerned about? With
> our small repositories, the overhead of downloading a bunch of small files
> might even outweigh the size of a big pack.

That is the whole point, repack doesn't create small files, it bundles
them up for you. Downloading 3 packs is still quicker than downloading
1 big one if we do it once a week. The AUR pack is quite huge and that
is under active development, so I would feel bad gc-ing that when a
simple repack (I just did one) will do creating only a 230K pack:
$ ll objects/pack/
total 8.7M
-r--r--r-- 1 simo aur-git  22K 2009-11-03 08:28
pack-2def16dc5d8361b8a7c11e60e10c503ba9874fdb.idx
-r--r--r-- 1 simo aur-git 230K 2009-11-03 08:28
pack-2def16dc5d8361b8a7c11e60e10c503ba9874fdb.pack
-r--r--r-- 1 simo aur-git 139K 2009-01-22 21:38
pack-c7bd96b6fc392799991ad88824f935c09d470efa.idx
-r--r--r-- 1 simo aur-git 8.3M 2009-01-22 21:38
pack-c7bd96b6fc392799991ad88824f935c09d470efa.pack

And if it is still a problem we can always just switch to git-gc
later- we don't need to skip this intermediate step.

> pacman.git is our biggest and currently has a 5.4MB pack when you gc it.

Note that this is an incredibly compacted initial pack- the repository
will weigh in around 9 MB if you packed it locally; I had to pull some
tricks to get it that small.

> Or maybe we should prune && repack them weekly, but gc them monthly or every
> 2 months?
>
> Last week, we had http access to http://projects.archlinux.org/git/ (not
> counting 403s and 404s) from 12 different IPs, 66 the week before that, then
> 63 and 84. I hope most people use git://.

I also hope most people use git; but I don't want to leave those in
the dust that can't. They are also likely the ones with the worst
internet connections so watching out for them might be the nice thing
to do.


More information about the arch-dev-public mailing list