[pacman-dev] [PATCH] makepkg: use "shared" git clones when checking out sources

Eli Schwartz eschwartz at archlinux.org
Fri Mar 8 04:18:23 UTC 2019


In order to cache sources offline, makpekg creates *two* copies of every
git repo. This is a useful tradeoff for network time, but comes at the
cost of increased disk space.

Normally, git can smooth this over automagically. Whenever possible, git
objects are hardlinked to save space, but this does not work when
SRCDEST and BUILDDIR are on separate filesystems.

When the repo in question is both very large (linux.git for example is
2.2 GB) and crosses filesystem boundaries, this results in a lot of
extra disk space being used; the most likely scenario is where BUILDDIR
is a tmpfs for bonus ouch.

git(1) has a builtin feature which serves this case handily: the
--shared flag will create the info/alternates file instructing git to
not copy or hardlink or create objects/packs at all, but merely look for
them in an external location (that being the source of the clone).

The downside of using shared clones, is that if you modify and drop
commits from the original repo, or simply delete the whole repo
altogether, you break the copy. But we don't care about that here,
because

1) the BUILDDIR copy is meant to be a temporary copy strictly derived
   via PKGBUILD syntax from the SRCDEST, and must be able to be
   recreated at any time,
2) if the SRCDEST disappears, makepkg will redownload it, thus restoring
   the objects needed by the BUILDDIR clone,
3) if the user does non-default things like hacking on the BUILDDIR copy
   then deleting and re-cloning the SRCDEST may result in momentary
   breakage, but ultimately should be fine -- the unique objects they
   created will be stored in the BUILDDIR copy.

While it's theoretically possible that upstream will force-push to
overwrite the base tree from which makepkg is building (which they
should not do), *and* the user deleted their SRCDEST which they should
not do, *and* they saved work in makepkg's working directory which they
should not do either...
... this is an unlikely chain of events for which we should not care.

Using --shared is therefore helpful in immediately useful ways and IMHO
has no actual downsides; we should use it.

An alternative implementation would be to use worktrees. I've rejected
this since it is essentially the same as shared clones, except adding
additional restrictions on the branch namespace, and could potentially
break existing use cases such as manually handling the SRCDEST in order
to share repositories with normal working copies.

Signed-off-by: Eli Schwartz <eschwartz at archlinux.org>
---
 scripts/libmakepkg/source/git.sh.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/libmakepkg/source/git.sh.in b/scripts/libmakepkg/source/git.sh.in
index 497a668c..96d79623 100644
--- a/scripts/libmakepkg/source/git.sh.in
+++ b/scripts/libmakepkg/source/git.sh.in
@@ -91,7 +91,7 @@ extract_git() {
 			exit 1
 		fi
 		cd_safe "$srcdir"
-	elif ! git clone "$dir" "${dir##*/}"; then
+	elif ! git clone -s "$dir" "${dir##*/}"; then
 		error "$(gettext "Failure while creating working copy of %s %s repo")" "${repo}" "git"
 		plain "$(gettext "Aborting...")"
 		exit 1
-- 
2.21.0


More information about the pacman-dev mailing list