[pacman-dev] [PATCH] makepkg: use "shared" git clones when checking out sources
In order to cache sources offline, makpekg creates *two* copies of every git repo. This is a useful tradeoff for network time, but comes at the cost of increased disk space. Normally, git can smooth this over automagically. Whenever possible, git objects are hardlinked to save space, but this does not work when SRCDEST and BUILDDIR are on separate filesystems. When the repo in question is both very large (linux.git for example is 2.2 GB) and crosses filesystem boundaries, this results in a lot of extra disk space being used; the most likely scenario is where BUILDDIR is a tmpfs for bonus ouch. git(1) has a builtin feature which serves this case handily: the --shared flag will create the info/alternates file instructing git to not copy or hardlink or create objects/packs at all, but merely look for them in an external location (that being the source of the clone). The downside of using shared clones, is that if you modify and drop commits from the original repo, or simply delete the whole repo altogether, you break the copy. But we don't care about that here, because 1) the BUILDDIR copy is meant to be a temporary copy strictly derived via PKGBUILD syntax from the SRCDEST, and must be able to be recreated at any time, 2) if the SRCDEST disappears, makepkg will redownload it, thus restoring the objects needed by the BUILDDIR clone, 3) if the user does non-default things like hacking on the BUILDDIR copy then deleting and re-cloning the SRCDEST may result in momentary breakage, but ultimately should be fine -- the unique objects they created will be stored in the BUILDDIR copy. While it's theoretically possible that upstream will force-push to overwrite the base tree from which makepkg is building (which they should not do), *and* the user deleted their SRCDEST which they should not do, *and* they saved work in makepkg's working directory which they should not do either... ... this is an unlikely chain of events for which we should not care. Using --shared is therefore helpful in immediately useful ways and IMHO has no actual downsides; we should use it. An alternative implementation would be to use worktrees. I've rejected this since it is essentially the same as shared clones, except adding additional restrictions on the branch namespace, and could potentially break existing use cases such as manually handling the SRCDEST in order to share repositories with normal working copies. Signed-off-by: Eli Schwartz <eschwartz@archlinux.org> --- scripts/libmakepkg/source/git.sh.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/libmakepkg/source/git.sh.in b/scripts/libmakepkg/source/git.sh.in index 497a668c..96d79623 100644 --- a/scripts/libmakepkg/source/git.sh.in +++ b/scripts/libmakepkg/source/git.sh.in @@ -91,7 +91,7 @@ extract_git() { exit 1 fi cd_safe "$srcdir" - elif ! git clone "$dir" "${dir##*/}"; then + elif ! git clone -s "$dir" "${dir##*/}"; then error "$(gettext "Failure while creating working copy of %s %s repo")" "${repo}" "git" plain "$(gettext "Aborting...")" exit 1 -- 2.21.0
On 8/3/19 2:18 pm, Eli Schwartz wrote:
In order to cache sources offline, makpekg creates *two* copies of every
Fixed this typo.
git repo. This is a useful tradeoff for network time, but comes at the cost of increased disk space.
Normally, git can smooth this over automagically. Whenever possible, git objects are hardlinked to save space, but this does not work when SRCDEST and BUILDDIR are on separate filesystems.
When the repo in question is both very large (linux.git for example is 2.2 GB) and crosses filesystem boundaries, this results in a lot of extra disk space being used; the most likely scenario is where BUILDDIR is a tmpfs for bonus ouch.
git(1) has a builtin feature which serves this case handily: the --shared flag will create the info/alternates file instructing git to not copy or hardlink or create objects/packs at all, but merely look for them in an external location (that being the source of the clone).
The downside of using shared clones, is that if you modify and drop commits from the original repo, or simply delete the whole repo altogether, you break the copy. But we don't care about that here, because
1) the BUILDDIR copy is meant to be a temporary copy strictly derived via PKGBUILD syntax from the SRCDEST, and must be able to be recreated at any time, 2) if the SRCDEST disappears, makepkg will redownload it, thus restoring the objects needed by the BUILDDIR clone, 3) if the user does non-default things like hacking on the BUILDDIR copy then deleting and re-cloning the SRCDEST may result in momentary breakage, but ultimately should be fine -- the unique objects they created will be stored in the BUILDDIR copy.
While it's theoretically possible that upstream will force-push to overwrite the base tree from which makepkg is building (which they should not do), *and* the user deleted their SRCDEST which they should not do, *and* they saved work in makepkg's working directory which they should not do either... ... this is an unlikely chain of events for which we should not care.
Using --shared is therefore helpful in immediately useful ways and IMHO has no actual downsides; we should use it.
An alternative implementation would be to use worktrees. I've rejected this since it is essentially the same as shared clones, except adding additional restrictions on the branch namespace, and could potentially break existing use cases such as manually handling the SRCDEST in order to share repositories with normal working copies.
Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
I don't think the commit message is long enough relative to the size of the change. But I will accept anyway. A
--- scripts/libmakepkg/source/git.sh.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/libmakepkg/source/git.sh.in b/scripts/libmakepkg/source/git.sh.in index 497a668c..96d79623 100644 --- a/scripts/libmakepkg/source/git.sh.in +++ b/scripts/libmakepkg/source/git.sh.in @@ -91,7 +91,7 @@ extract_git() { exit 1 fi cd_safe "$srcdir" - elif ! git clone "$dir" "${dir##*/}"; then + elif ! git clone -s "$dir" "${dir##*/}"; then error "$(gettext "Failure while creating working copy of %s %s repo")" "${repo}" "git" plain "$(gettext "Aborting...")" exit 1
On 3/19/19 12:09 AM, Allan McRae wrote:
On 8/3/19 2:18 pm, Eli Schwartz wrote:
In order to cache sources offline, makpekg creates *two* copies of every
Fixed this typo.
git repo. This is a useful tradeoff for network time, but comes at the cost of increased disk space.
Normally, git can smooth this over automagically. Whenever possible, git objects are hardlinked to save space, but this does not work when SRCDEST and BUILDDIR are on separate filesystems.
When the repo in question is both very large (linux.git for example is 2.2 GB) and crosses filesystem boundaries, this results in a lot of extra disk space being used; the most likely scenario is where BUILDDIR is a tmpfs for bonus ouch.
git(1) has a builtin feature which serves this case handily: the --shared flag will create the info/alternates file instructing git to not copy or hardlink or create objects/packs at all, but merely look for them in an external location (that being the source of the clone).
The downside of using shared clones, is that if you modify and drop commits from the original repo, or simply delete the whole repo altogether, you break the copy. But we don't care about that here, because
1) the BUILDDIR copy is meant to be a temporary copy strictly derived via PKGBUILD syntax from the SRCDEST, and must be able to be recreated at any time, 2) if the SRCDEST disappears, makepkg will redownload it, thus restoring the objects needed by the BUILDDIR clone, 3) if the user does non-default things like hacking on the BUILDDIR copy then deleting and re-cloning the SRCDEST may result in momentary breakage, but ultimately should be fine -- the unique objects they created will be stored in the BUILDDIR copy.
While it's theoretically possible that upstream will force-push to overwrite the base tree from which makepkg is building (which they should not do), *and* the user deleted their SRCDEST which they should not do, *and* they saved work in makepkg's working directory which they should not do either... ... this is an unlikely chain of events for which we should not care.
Using --shared is therefore helpful in immediately useful ways and IMHO has no actual downsides; we should use it.
An alternative implementation would be to use worktrees. I've rejected this since it is essentially the same as shared clones, except adding additional restrictions on the branch namespace, and could potentially break existing use cases such as manually handling the SRCDEST in order to share repositories with normal working copies.
Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
I don't think the commit message is long enough relative to the size of the change. But I will accept anyway.
Sorry! I will try harder next time. -- Eli Schwartz Bug Wrangler and Trusted User
participants (2)
-
Allan McRae
-
Eli Schwartz