[pacman-dev] [PATCH] libmakepkg/source/git: Use --bare with a refspec instead of --mirror
This pulls in all of the branches in the same way as --mirror, but won't also pull in all of the non-branch references. For example the refs/pull/*/{head,merge} references that GitHub creates for every PR that has ever been opened against the repo can pull in a very large amount of objects that aren't useful, and which can massively inflate a repository. Signed-off-by: Johannes Löthberg <johannes@kyriasis.com> --- scripts/libmakepkg/source/git.sh.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/libmakepkg/source/git.sh.in b/scripts/libmakepkg/source/git.sh.in index 130c11e1..6a75c1c2 100644 --- a/scripts/libmakepkg/source/git.sh.in +++ b/scripts/libmakepkg/source/git.sh.in @@ -43,7 +43,7 @@ download_git() { if [[ ! -d "$dir" ]] || dir_is_empty "$dir" ; then msg2 "$(gettext "Cloning %s %s repo...")" "${repo}" "git" - if ! git clone --mirror "$url" "$dir"; then + if ! git clone --bare --config=remote.origin.fetch=+refs/heads/*:refs/heads/* "$url" "$dir"; then error "$(gettext "Failure while downloading %s %s repo")" "${repo}" "git" plain "$(gettext "Aborting...")" exit 1 -- 2.20.1
This pulls in all of the branches in the same way as --mirror, but won't also pull in all of the non-branch references.
For example the refs/pull/*/{head,merge} references that GitHub creates for every PR that has ever been opened against the repo can pull in a very large amount of objects that aren't useful, and which can massively inflate a repository. It's entirely possible that people use this to cherry-pick a patch from a PR branch. That being said, I do consider it reasonable to not fetch
On 1/20/19 9:13 AM, Johannes Löthberg wrote: this by default and pull in the patchfile via source=() if you do need it... but I wonder how often people might be relying on this behavior. -- Eli Schwartz Bug Wrangler and Trusted User
On 01/20/2019 07:30:21 PM, Eli Schwartz wrote:
It's entirely possible that people use this to cherry-pick a patch from a PR branch. That being said, I do consider it reasonable to not fetch this by default and pull in the patchfile via source=() if you do need it... but I wonder how often people might be relying on this behavior.
Is it possible to add something like GIT_OPTIONS or VCS_OPTIONS to pass any custom options for clonning? For end user who wish to install e.g. qgis from git but have no plans to hack it, clone with '--depth 1' option would be _much_ faster.
On Sun, Jan 20, 2019 at 08:40:40PM +0300, Versus via pacman-dev wrote:
e.g. qgis from git but have no plans to hack it, clone with '--depth 1'
How would the pkgver() work?
On 1/20/19 12:40 PM, Versus via pacman-dev wrote:
On 01/20/2019 07:30:21 PM, Eli Schwartz wrote:
It's entirely possible that people use this to cherry-pick a patch from a PR branch. That being said, I do consider it reasonable to not fetch this by default and pull in the patchfile via source=() if you do need it... but I wonder how often people might be relying on this behavior.
Is it possible to add something like GIT_OPTIONS or VCS_OPTIONS to pass any custom options for clonning? For end user who wish to install e.g. qgis from git but have no plans to hack it, clone with '--depth 1' option would be _much_ faster.
Rejected on numerous occasions, and it's an unrelated topic. The first major issue you will have is that the pkgver() function for qgis-git uses: printf "%s.r%s" "$_pkgver" "$(git rev-list --count HEAD).$(git rev-parse --short HEAD)" And this will break if you delete all the rev-list history. -- Eli Schwartz Bug Wrangler and Trusted User
On 21/1/19 2:30 am, Eli Schwartz wrote:
This pulls in all of the branches in the same way as --mirror, but won't also pull in all of the non-branch references.
For example the refs/pull/*/{head,merge} references that GitHub creates for every PR that has ever been opened against the repo can pull in a very large amount of objects that aren't useful, and which can massively inflate a repository. It's entirely possible that people use this to cherry-pick a patch from a PR branch. That being said, I do consider it reasonable to not fetch
On 1/20/19 9:13 AM, Johannes Löthberg wrote: this by default and pull in the patchfile via source=() if you do need it... but I wonder how often people might be relying on this behavior.
I have done that in the past - I found it to be good documentation of where the patch came from rather than using a local copy of the patch. I'd like to see an example of what is meant by "massively inflate"? What percentage are we talking? A
Excerpts from Allan McRae's message of January 20, 2019 22:30:
On 21/1/19 2:30 am, Eli Schwartz wrote:
This pulls in all of the branches in the same way as --mirror, but won't also pull in all of the non-branch references.
For example the refs/pull/*/{head,merge} references that GitHub creates for every PR that has ever been opened against the repo can pull in a very large amount of objects that aren't useful, and which can massively inflate a repository. It's entirely possible that people use this to cherry-pick a patch from a PR branch. That being said, I do consider it reasonable to not fetch
On 1/20/19 9:13 AM, Johannes Löthberg wrote: this by default and pull in the patchfile via source=() if you do need it... but I wonder how often people might be relying on this behavior.
I have done that in the past - I found it to be good documentation of where the patch came from rather than using a local copy of the patch.
For that I much prefer just using GitHub .patch URLs in the sources array, which will get you a file that you can just throw at git-am, which not only tells you where it's from, you can just copy it and remove the .patch at the end to see the code review of it. Depending on non-branch or tag refs that by default aren't cloned feel rather icky to me overall, but maybe that's just me.
I'd like to see an example of what is meant by "massively inflate"? What percentage are we talking?
I distinctly remember having cases where there were old PRs from before a rebase that ended up pulling in multiple gigabytes of data, but I cannot seem to figure out which repos they were anymore. Of the larger repos I've looked at quickly it seems the difference for most of them are roughly 100-500MB. For Linus' linux repo the difference is 446MB, 113MB for rust. Since I can't really find the really nasty cases anymore, I guess you can feel free to reject it, though I still think that the change is more morally correct, and even a 500MB difference can affect some users. -- Sincerely, Johannes Löthberg :: SA0DEM
On 23/1/19 3:39 am, Johannes Löthberg wrote:
I distinctly remember having cases where there were old PRs from before a rebase that ended up pulling in multiple gigabytes of data, but I cannot seem to figure out which repos they were anymore.
Of the larger repos I've looked at quickly it seems the difference for most of them are roughly 100-500MB. For Linus' linux repo the difference is 446MB, 113MB for rust.
Since I can't really find the really nasty cases anymore, I guess you can feel free to reject it, though I still think that the change is more morally correct, and even a 500MB difference can affect some user Just because I wanted to look at the numbers (percentage increase is more interesting that absolute value), here is what I get for rust:
git clone --mirror: Download: 506.98 MiB git clone --bare: Download: 406.00 MiB So that is a decent overhead.
participants (5)
-
Allan McRae
-
Earnestly
-
Eli Schwartz
-
Johannes Löthberg
-
Versus