[arch-general] Why no git --depth=1 option for makepkg?
Hi I recently came across this closed feature request for a way to perform a shallow clone with makepkg. https://bugs.archlinux.org/task/52957 The closing comment was: Closed by Andrew Gregory (andrewgregory) Monday, 13 February 2017, 17:29 GMT-9 Reason for closing: Won't implement Additional comments about closing: This has been rejected numerous times: https://wiki.archlinux.org/index.php/Use r:Apg#makepkg:_shallow_git_clones Which provides a now dead link. Would anyone in the know be willing to explain why this feature is considered outside the scope of makepkg? Clearly if the developers aren't interested in the feature, well then they just won't want to support it. But if there is some rationale behind this I am curious about it. On one hand this is pure curiosity, but on the other hand, I use makepkg and the Arch packaging system multiple times per week so I like to understand the design and the intent. Thank you! Adam Levy (alaskanarcher)
It would be extremely nice to have shallow clone support for some packages. The Unreal git repo requires pulling down 20 gigabytes for a build, taking maybe a half hour each time. On Mar 3, 2018 1:53 AM, "Adam Levy via arch-general" < arch-general@archlinux.org> wrote:
Hi
I recently came across this closed feature request for a way to perform a shallow clone with makepkg.
https://bugs.archlinux.org/task/52957
The closing comment was: Closed by Andrew Gregory (andrewgregory) Monday, 13 February 2017, 17:29 GMT-9 Reason for closing: Won't implement Additional comments about closing: This has been rejected numerous times: https://wiki.archlinux.org/index.php/Use r:Apg#makepkg:_shallow_git_clones
Which provides a now dead link.
Would anyone in the know be willing to explain why this feature is considered outside the scope of makepkg?
Clearly if the developers aren't interested in the feature, well then they just won't want to support it. But if there is some rationale behind this I am curious about it. On one hand this is pure curiosity, but on the other hand, I use makepkg and the Arch packaging system multiple times per week so I like to understand the design and the intent.
Thank you! Adam Levy (alaskanarcher)
On 03/03/18 08:48, mike lojkovic via arch-general wrote:
It would be extremely nice to have shallow clone support for some packages. The Unreal git repo requires pulling down 20 gigabytes for a build, taking maybe a half hour each time.
An effective workaround is to create a shallow clone prior to running makepkg, $ cd $SRCDEST $ git clone --bare --depth=1 https://github.com/cisco/ChezScheme.git ChezScheme $ cd ChezScheme $ git config remote.origin.fetch "+refs/*:refs/*" and away you go. However. You can't just use --depth=1 on everything without running into "weird" problems. For example, any VCS package that relies on tags for its pkgver will fail to find the last tagged commit, and so the fetch depth must be increased to extend to the tagged commit.
On 03/03/2018 12:50 PM, Jonathon Fernyhough wrote:
On 03/03/18 08:48, mike lojkovic via arch-general wrote:
It would be extremely nice to have shallow clone support for some packages. The Unreal git repo requires pulling down 20 gigabytes for a build, taking maybe a half hour each time.
An effective workaround is to create a shallow clone prior to running makepkg,
$ cd $SRCDEST $ git clone --bare --depth=1 https://github.com/cisco/ChezScheme.git ChezScheme $ cd ChezScheme $ git config remote.origin.fetch "+refs/*:refs/*"
and away you go.
However.
You can't just use --depth=1 on everything without running into "weird" problems. For example, any VCS package that relies on tags for its pkgver will fail to find the last tagged commit, and so the fetch depth must be increased to extend to the tagged commit.
Yep -- more or less this. There is no way for git to fetch "all commits since a given tag", and obviously `git describe` which is used in the standard pkgver() function cannot describe the remote repository... not to mention what happens when the repository has *no* tags, and git rev-list --count HEAD depends on all commits since the repository was initialized. Then there is the fact that --depth, or even --single-branch (not that this usually saves much space or time), will break on PKGBUILDs that use `git cherry-pick` to backport fixes (more commonly seen in non-VCS packages obviously). All in all, there is simply no way to generically support shallow clones in a generic way. The best you can do is take a given PKGBUILD, predict what it needs, and perform the clone manually according to handpicked criteria as makepkg will detect that clone and then simply fetch new changes which respects a previous shallow clone designation. -- Eli Schwartz Bug Wrangler and Trusted User
It provides a now dead link because there is a rogue space character ("Use r"). The following link works:
Thank you Tinu Weber. Silly oversight on my part. After reading the discussions in previous feature requests the answer is pretty clear. It could break some packages if used incorrectly and the same functionality can be achieved by manually cloning the repo. Thanks all for the responses. On Sat, Mar 3, 2018, 3:07 PM Eli Schwartz via arch-general < arch-general@archlinux.org> wrote:
On 03/03/18 08:48, mike lojkovic via arch-general wrote:
It would be extremely nice to have shallow clone support for some
On 03/03/2018 12:50 PM, Jonathon Fernyhough wrote: packages.
The Unreal git repo requires pulling down 20 gigabytes for a build, taking maybe a half hour each time.
An effective workaround is to create a shallow clone prior to running makepkg,
$ cd $SRCDEST $ git clone --bare --depth=1 https://github.com/cisco/ChezScheme.git ChezScheme $ cd ChezScheme $ git config remote.origin.fetch "+refs/*:refs/*"
and away you go.
However.
You can't just use --depth=1 on everything without running into "weird" problems. For example, any VCS package that relies on tags for its pkgver will fail to find the last tagged commit, and so the fetch depth must be increased to extend to the tagged commit.
Yep -- more or less this. There is no way for git to fetch "all commits since a given tag", and obviously `git describe` which is used in the standard pkgver() function cannot describe the remote repository... not to mention what happens when the repository has *no* tags, and git rev-list --count HEAD depends on all commits since the repository was initialized.
Then there is the fact that --depth, or even --single-branch (not that this usually saves much space or time), will break on PKGBUILDs that use `git cherry-pick` to backport fixes (more commonly seen in non-VCS packages obviously).
All in all, there is simply no way to generically support shallow clones in a generic way. The best you can do is take a given PKGBUILD, predict what it needs, and perform the clone manually according to handpicked criteria as makepkg will detect that clone and then simply fetch new changes which respects a previous shallow clone designation.
-- Eli Schwartz Bug Wrangler and Trusted User
Am 04.03.2018 um 01:08 schrieb Eli Schwartz via arch-general:
Yep -- more or less this. There is no way for git to fetch "all commits since a given tag", and obviously `git describe` which is used in the standard pkgver() function cannot describe the remote repository... not to mention what happens when the repository has *no* tags, and git rev-list --count HEAD depends on all commits since the repository was initialized.
Then there is the fact that --depth, or even --single-branch (not that this usually saves much space or time), will break on PKGBUILDs that use `git cherry-pick` to backport fixes (more commonly seen in non-VCS packages obviously).
All in all, there is simply no way to generically support shallow clones in a generic way. The best you can do is take a given PKGBUILD, predict what it needs, and perform the clone manually according to handpicked criteria as makepkg will detect that clone and then simply fetch new changes which respects a previous shallow clone designation.
Maybe a working option would be to implement fragmant variables for some git options like depth, shallow-exclude and shallow-since, but that is likely not trivial. source=("one::git+https://repo.git#branch=master:shallow-exclude=v4.14" "two::git+https://repo.git#branch=master:shallow-since=2017-12-30") -- Andy
On 03/04/2018 10:37 AM, ProgAndy wrote:
Maybe a working option would be to implement fragmant variables for some git options like depth, shallow-exclude and shallow-since, but that is likely not trivial.
source=("one::git+https://repo.git#branch=master:shallow-exclude=v4.14" "two::git+https://repo.git#branch=master:shallow-since=2017-12-30")
That would require opt-in support for every package to describe which commits it needs, something which the vast majority of maintainers are uninterested in and requires successively more query strings for each branch you want to cherry-pick from. Also shallow-exclude would exclude the tag itself, you cannot specify "v${pkgver}~1" to shallow-exclude. As you say, not trivial. ;) I've thought about it too... -- Eli Schwartz Bug Wrangler and Trusted User
At least for GitHub remotes, don't they still support checking out with SVN? If they do, this would be faster and use less space, too, when we just need a certain revision and no history at all. Other than that, I'm "pretty sure" that a git depth of 10 commits will work for most repositories when you clone normally, not shallow. Should also work for tags. However, it's true that git's limited depth clone isn't implemented fully. There are many unhandled cases and surprises. All that being said, I can report that in CI of personal and company projects, I haven't yet run into problems with depth=5. It speeds up checking out the tree, even when it's a fast local network remote. On 3/4/18, Eli Schwartz via arch-general <arch-general@archlinux.org> wrote:
On 03/04/2018 10:37 AM, ProgAndy wrote:
Maybe a working option would be to implement fragmant variables for some git options like depth, shallow-exclude and shallow-since, but that is likely not trivial.
source=("one::git+https://repo.git#branch=master:shallow-exclude=v4.14" "two::git+https://repo.git#branch=master:shallow-since=2017-12-30")
That would require opt-in support for every package to describe which commits it needs, something which the vast majority of maintainers are uninterested in and requires successively more query strings for each branch you want to cherry-pick from.
Also shallow-exclude would exclude the tag itself, you cannot specify "v${pkgver}~1" to shallow-exclude.
As you say, not trivial. ;) I've thought about it too...
-- Eli Schwartz Bug Wrangler and Trusted User
On 03/04/2018 10:58 AM, Carsten Mattner wrote:
At least for GitHub remotes, don't they still support checking out with SVN? If they do, this would be faster and use less space, too, when we just need a certain revision and no history at all.
Other than that, I'm "pretty sure" that a git depth of 10 commits will work for most repositories when you clone normally, not shallow. Should also work for tags. However, it's true that git's limited depth clone isn't implemented fully. There are many unhandled cases and surprises.
All that being said, I can report that in CI of personal and company projects, I haven't yet run into problems with depth=5. It speeds up checking out the tree, even when it's a fast local network remote.
depth=1 is perfectly okay for most travis cases, as you don't need any history at all unless your build system looks for it... this is a bizarre comparison. The point, is that PKGBUILDs do look for history, and make use of it -- figuring out clever ways to avoid pulling history is completely missing the point that we, well, want history. depth=10 will only work for tags that are present in the last ten commits, which unsurprisingly is exactly the opposite of most projects (which don't have tags at all and therefore require all history without exception in order to implement the pkgver() function) or even most projects with tags (which don't release stable releases on basically every other commit). -- Eli Schwartz Bug Wrangler and Trusted User
Am 04.03.2018 um 17:05 schrieb Eli Schwartz via arch-general:
The point, is that PKGBUILDs do look for history, and make use of it -- figuring out clever ways to avoid pulling history is completely missing the point that we, well, want history.
But the history is only needed for the default functions, isn't it? And shallow clones are only needed for special repositories where a full clone is not feasible. So in this case it's a far better approach to provide your own functions that don't need the whole git history, cause this has all needed changes for this special repository inside the recipe and is not something a user has to do. On the other hand, it certainly doesn't make sense to use shallow copies in general, because they raise the discussed problems for functions that should be generally usable. Uwe
On 3/4/18, Eli Schwartz <eschwartz@archlinux.org> wrote:
On 03/04/2018 10:58 AM, Carsten Mattner wrote:
At least for GitHub remotes, don't they still support checking out with SVN? If they do, this would be faster and use less space, too, when we just need a certain revision and no history at all.
Other than that, I'm "pretty sure" that a git depth of 10 commits will work for most repositories when you clone normally, not shallow. Should also work for tags. However, it's true that git's limited depth clone isn't implemented fully. There are many unhandled cases and surprises.
All that being said, I can report that in CI of personal and company projects, I haven't yet run into problems with depth=5. It speeds up checking out the tree, even when it's a fast local network remote.
depth=1 is perfectly okay for most travis cases, as you don't need any history at all unless your build system looks for it... this is a bizarre comparison.
The point, is that PKGBUILDs do look for history, and make use of it -- figuring out clever ways to avoid pulling history is completely missing the point that we, well, want history.
Interesting. What does PKGBUILD do with history of more than 10 revisions? If we checkout a tag or specific commit (e.g. xf86-video-intel), what does PKGBUILD need prior revisions for? I'm sure you're correct, I'd like to know what it is, if you don't mind explaining.
depth=10 will only work for tags that are present in the last ten commits, which unsurprisingly is exactly the opposite of most projects (which don't have tags at all and therefore require all history without exception in order to implement the pkgver() function) or even most projects with tags (which don't release stable releases on basically every other commit).
Eli, you certainly have more experience, so I'm trusting your word here. However, I don't understand how depth=10 can fail when trying to checkout a specific git tag. Wouldn't the tag be the HEAD in that case? Checking out with SVN is a speedup trick, and I still think it can make sense if depth limiting git clone is not possible. svn checkout is basically just copying the tree of that revision (or branch/tag path) specified.
On 03/04/2018 03:27 PM, Carsten Mattner wrote:
Interesting. What does PKGBUILD do with history of more than 10 revisions? If we checkout a tag or specific commit (e.g. xf86-video-intel), what does PKGBUILD need prior revisions for? I'm sure you're correct, I'd like to know what it is, if you don't mind explaining.
You cannot clone a tag or commit, you can only clone a branch and check out the tag or commit. So you need enough revisions on that branch to reach said tag... and you cannot use shallow-exclude as I mentioned in a previous email. This means that PKGBUILDs which checkout a specific revision are actually worse than the rest, as you cannot even get the source without knowing how many commits you need (rather than failing afterwards in pkgver() or something).
depth=10 will only work for tags that are present in the last ten commits, which unsurprisingly is exactly the opposite of most projects (which don't have tags at all and therefore require all history without exception in order to implement the pkgver() function) or even most projects with tags (which don't release stable releases on basically every other commit).
Eli, you certainly have more experience, so I'm trusting your word here. However, I don't understand how depth=10 can fail when trying to checkout a specific git tag. Wouldn't the tag be the HEAD in that case?
If that were true, then depth=1 would work. But tags are usually not the upstream HEAD commit, because development continues afterwards... So first you clone a branch, and then you try to checkout a tag (and fail, if you used depth=10 and the tag is not attached to one of those ten commits).
Checking out with SVN is a speedup trick, and I still think it can make sense if depth limiting git clone is not possible. svn checkout is basically just copying the tree of that revision (or branch/tag path) specified.
I know how SVN works. :p I also know how svn doesn't work -- you cannot get tag information, for example, and svn revision numbers do not necessarily cleanly translate to git revisions numbers let alone commit hashes. Giving users a mysterious svn revision number they don't know how to trace, is confusing UI. So I wouldn't recommend this even for projects without tags at all. -- Eli Schwartz Bug Wrangler and Trusted User
On 3/4/18, Eli Schwartz <eschwartz@archlinux.org> wrote:
On 03/04/2018 03:27 PM, Carsten Mattner wrote:
Interesting. What does PKGBUILD do with history of more than 10 revisions? If we checkout a tag or specific commit (e.g. xf86-video-intel), what does PKGBUILD need prior revisions for? I'm sure you're correct, I'd like to know what it is, if you don't mind explaining.
You cannot clone a tag or commit, you can only clone a branch and check out the tag or commit. So you need enough revisions on that branch to reach said tag... and you cannot use shallow-exclude as I mentioned in a previous email.
This means that PKGBUILDs which checkout a specific revision are actually worse than the rest, as you cannot even get the source without knowing how many commits you need (rather than failing afterwards in pkgver() or something).
Right. I had assumed that git clone -b/--branch did also exist for tags. Git is like Linux and very evolutionary, with many warts, only some parts designed before implementation. This means some features are only implemented partially. I like and use git, but sometimes it feels like it's a car where there are five doors, but you're only supposed to use 2.5 of them.
depth=10 will only work for tags that are present in the last ten commits, which unsurprisingly is exactly the opposite of most projects (which don't have tags at all and therefore require all history without exception in order to implement the pkgver() function) or even most projects with tags (which don't release stable releases on basically every other commit).
Eli, you certainly have more experience, so I'm trusting your word here. However, I don't understand how depth=10 can fail when trying to checkout a specific git tag. Wouldn't the tag be the HEAD in that case?
If that were true, then depth=1 would work. But tags are usually not the upstream HEAD commit, because development continues afterwards...
So first you clone a branch, and then you try to checkout a tag (and fail, if you used depth=10 and the tag is not attached to one of those ten commits).
See above.
Checking out with SVN is a speedup trick, and I still think it can make sense if depth limiting git clone is not possible. svn checkout is basically just copying the tree of that revision (or branch/tag path) specified.
I know how SVN works. :p
I also know how svn doesn't work -- you cannot get tag information, for example, and svn revision numbers do not necessarily cleanly translate to git revisions numbers let alone commit hashes.
svn works differently, whereas git is all about the DAG. But let's not discuss svn's design. The idea was that when you the ability to svn checkout a github project or maybe Apache svn repository, and those have proper tags and branches, then this will be very quick in comparison. But as you say, this is bound to be problematic for other reasons. I believe git devs are working on checking out tags with shallow depth, not sure how many years it will take.
Giving users a mysterious svn revision number they don't know how to trace, is confusing UI. So I wouldn't recommend this even for projects without tags at all.
Let's ignore the possibility of svn, but tracking a revision number is the same for those projects without tags as it is for git. As in the xf86-video-intel project.
This means that PKGBUILDs which checkout a specific revision are actually worse than the rest, as you cannot even get the source without knowing how many commits you need (rather than failing afterwards in pkgver() or something).
Right. I had assumed that git clone -b/--branch did also exist for tags.
https://www.kernel.org/pub/software/scm/git/docs/git-clone.html --branch can also take tags and detaches the HEAD at that commit in the resulting repository.
On 03/04/2018 07:13 PM, Damjan Georgievski via arch-general wrote:
This means that PKGBUILDs which checkout a specific revision are actually worse than the rest, as you cannot even get the source without knowing how many commits you need (rather than failing afterwards in pkgver() or something).
Right. I had assumed that git clone -b/--branch did also exist for tags.
https://www.kernel.org/pub/software/scm/git/docs/git-clone.html
--branch can also take tags and detaches the HEAD at that commit in the resulting repository.
... huh, I stand corrected. :D I did not realize this was possible -- I've looked at clone depth fairly often but never noticed this... well, you live and learn! This actually makes it pretty easy to clone what you need in a stable PKGBUILD that checks out a tag (but not one that checks out a commit). Although it makes it no easier to also grab commits that are cherry-picked in prepare() or get the output of `git describe` for an unpredictable number of commits since and including a tag, which are also significant blockers. And these cannot be syntactically parsed from the source=() which means they would require PKGBUILD metadata to either indicate if it is safe to shallow clone or (manually specify) e.g. a date or tag-1 to fetch commits since. Probably still too much effort to implement... This would in theory be totally feasible if makepkg had a builtin feature to apply patches (which I think would be considered a "this is doing too much" feature) in addition to some way to reverse the pkgver() function to acquire the tag used in pkgver= and then specify git clone --shallow-since=${tag}~1 but at this point it becomes understandable why no one has any interest in implementing it. :) -- Eli Schwartz Bug Wrangler and Trusted User
On Fri, Mar 02, 2018 at 22:52:47 -0900, Adam Levy via arch-general wrote:
Additional comments about closing: This has been rejected numerous times: https://wiki.archlinux.org/index.php/Use r:Apg#makepkg:_shallow_git_clones
Which provides a now dead link.
It provides a now dead link because there is a rogue space character ("Use r"). The following link works: https://wiki.archlinux.org/index.php/User:Apg#makepkg:_shallow_git_clones
participants (9)
-
Adam Levy
-
Carsten Mattner
-
Damjan Georgievski
-
Eli Schwartz
-
Jonathon Fernyhough
-
mike lojkovic
-
ProgAndy
-
Tinu Weber
-
Uwe Koloska