[arch-projects] [ABS] [PATCH v3 0/7] vcs prototype cleanups and some git-specific changes
Hello all, So this is the third iteration of my initial patch series. Sorry about the weird, non-uniform mailings before (I learned about git send-email today). The biggest change from the earlier series is that for the git prototype, I actually threw out the temporary build directory altogether. Jesse Young's suggestion of just using the SCM builtins to get a pristine working directory started making more sense to me as I thought about the problem. My initial resistance to the idea doesn't make sense --- sorry Jesse! Unfortunately, I am really only familiar with git, so I couldn't make a uniform change across all VCS prototypes. I also used rsync to efficiently copy the temporary directory for cvs/svn prototypes (instead of doing a "cp -r" followed with a find command for recursive deletion of unwanted directories). The other notable change is that I dropped the idea of deleting the temporary build directories inside package() altogether (PATCH v2 1/7), as Lukas Fleischer suggested. Linus Arver (7): vcs prototypes: consistent $PWD after checkout vcs prototypes: typo/stylistic fixes git prototype: on initial clones, perform a shallow clone git prototype: remove temp build directory vcs prototypes: simplify code vcs prototypes: consistent coding style vcs prototypes: more efficient temp build directories prototypes/PKGBUILD-bzr.proto | 14 ++++++++------ prototypes/PKGBUILD-cvs.proto | 16 ++++++++-------- prototypes/PKGBUILD-darcs.proto | 20 ++++++++++---------- prototypes/PKGBUILD-git.proto | 22 +++++++++++----------- prototypes/PKGBUILD-hg.proto | 17 +++++++++-------- prototypes/PKGBUILD-svn.proto | 14 ++++++++------ 6 files changed, 54 insertions(+), 49 deletions(-) -- 1.7.7.2
Before, some vcs prototypes (bzr, git, hg, svn) cd'ed into the repo
directory after a checkout, but not on an initial clone. The cvs/darcs
prototypes made it uniform by always going into the repo after the
checkout, whether on an initial clone or an update.
With this change, we make all vcs prototypes simply not remain in the
repo directory after a checkout (whether initial clone or update).
Signed-off-by: Linus Arver
Make messages uniform.
Also, stop calling repositories "server", and display the repository for
more transparency.
Signed-off-by: Linus Arver
Shallow git clones are just like regular clones, but do not contain any
of the past commit history. It is virtually the same thing as doing a
regular clone, then doing a rebase to squash all commits into a single
commit. Many people who do not understand git dismiss shallow clones
because they wrongly believe that shallow clones are incapable of
pulling in changes going forward from the remote. This is not the case!
You can still do pulls from the master remote repo in the future to
update the shallow clone, just like a regular clone!
On an inital clone, we should *always* encourage PKGBUILD authors to do
a shallow clone. This will save time (less downloading) and disk space
(e.g., yaourt users). The savings can be hundreds of MiB for large git
repos. This will also help AUR packagers out there who do not understand
git at all to make this change themselves.
Git PKGBUILD authors who need to pull in an older version of the remote
repo (an extremely rare case) will already have the knowledge to remove
the "--depth 1" to suit their needs.
Signed-off-by: Linus Arver
On Tue, Nov 8, 2011 at 11:56 PM, Linus Arver
Shallow git clones are just like regular clones, but do not contain any of the past commit history. It is virtually the same thing as doing a regular clone, then doing a rebase to squash all commits into a single commit. Many people who do not understand git dismiss shallow clones because they wrongly believe that shallow clones are incapable of pulling in changes going forward from the remote. This is not the case! You can still do pulls from the master remote repo in the future to update the shallow clone, just like a regular clone!
i had a few other improvements that may be of interest, outlined here: http://mailman.archlinux.org/pipermail/arch-general/2011-July/021078.html ... [sort of] condensed: ) `_gitname` is not used consistently ... though i now forget the various uses ive seen :-( will have to follow up on that ) allow _git* variables to be set by the environment ) introduce `_gitspec` variable which supersedes `_gitname` at *checkout* stage (fallback to `_gitname`) ) use a targeted fetch command instead of a clone -- this can achieve even greater savings than shallow clone, even though the fetch is "deep". the idea is to pull only $_gitname and _nothing_ else. this can however be combined with shallow if done correctly for even greater savings (this method results in 50%+ reduction to kernel pull [dont know shallow variant offhand], and i've seen savings as high as 90%+) ) store repositories in a known list of locations ... people WILL blow the repo away if it's in the build dir. my PKGBUILDs searches these (in order of precedence): /var/abs/PKGBUILD.devel/${pkgname}.git /var/abs/local/PKGBUILD.devel/${pkgname}.git ${SRCDEST}/PKGBUILD.devel/${pkgname}.git ${startdir}/PKGBUILD.devel/${pkgname}.git ~/PKGBUILD.devel/${pkgname}.git ) use a proxy mechanism in the event a repository is found but it is read-only. this lets you read-only bind mount a repo (think mkchrootpkg), and it will simply create a new repository, copy the refs, and setup the object directory as an alternative for the proxy repo. thus the proxy has the ability to download new objects as needed, but starts from the same spot as the bind-mounted repo. ... these techniques are all in use, and primarily derived from experiences developing, this PKGBUILD: http://aur.archlinux.org/packages/pyjamas-engine-pythonwebkit/PKGBUILD ... which is a massive 1GB+ download and lengthy compile. these modifications also make it very simple to rapidly build git packages within a chroot (one of the primary motivations) *without* any copying/etc. probably a little out of scope from what you've done here, and possibly in need of further discussion, but you're message sparked memory and i still believe they are all good changes -- it saved me oodles of time and prevents constant removal of humongous repos (esp. when in chroot). -- C Anthony
On Wed, Nov 09, 2011 at 12:28:23AM -0600, C Anthony Risinger wrote:
On Tue, Nov 8, 2011 at 11:56 PM, Linus Arver
wrote: Shallow git clones are just like regular clones, but do not contain any of the past commit history. It is virtually the same thing as doing a regular clone, then doing a rebase to squash all commits into a single commit. Many people who do not understand git dismiss shallow clones because they wrongly believe that shallow clones are incapable of pulling in changes going forward from the remote. This is not the case! You can still do pulls from the master remote repo in the future to update the shallow clone, just like a regular clone!
i had a few other improvements that may be of interest, outlined here:
http://mailman.archlinux.org/pipermail/arch-general/2011-July/021078.html
... [sort of] condensed:
) `_gitname` is not used consistently ... though i now forget the various uses ive seen :-( will have to follow up on that
) allow _git* variables to be set by the environment
) introduce `_gitspec` variable which supersedes `_gitname` at *checkout* stage (fallback to `_gitname`)
) use a targeted fetch command instead of a clone -- this can achieve even greater savings than shallow clone, even though the fetch is "deep". the idea is to pull only $_gitname and _nothing_ else. this can however be combined with shallow if done correctly for even greater savings (this method results in 50%+ reduction to kernel pull [dont know shallow variant offhand], and i've seen savings as high as 90%+)
) store repositories in a known list of locations ... people WILL blow the repo away if it's in the build dir. my PKGBUILDs searches these (in order of precedence):
/var/abs/PKGBUILD.devel/${pkgname}.git /var/abs/local/PKGBUILD.devel/${pkgname}.git ${SRCDEST}/PKGBUILD.devel/${pkgname}.git ${startdir}/PKGBUILD.devel/${pkgname}.git ~/PKGBUILD.devel/${pkgname}.git
) use a proxy mechanism in the event a repository is found but it is read-only. this lets you read-only bind mount a repo (think mkchrootpkg), and it will simply create a new repository, copy the refs, and setup the object directory as an alternative for the proxy repo. thus the proxy has the ability to download new objects as needed, but starts from the same spot as the bind-mounted repo.
... these techniques are all in use, and primarily derived from experiences developing, this PKGBUILD:
http://aur.archlinux.org/packages/pyjamas-engine-pythonwebkit/PKGBUILD
... which is a massive 1GB+ download and lengthy compile. these modifications also make it very simple to rapidly build git packages within a chroot (one of the primary motivations) *without* any copying/etc.
probably a little out of scope from what you've done here, and possibly in need of further discussion, but you're message sparked memory and i still believe they are all good changes -- it saved me oodles of time and prevents constant removal of humongous repos (esp. when in chroot).
--
C Anthony
The problem I have with your sample PKGBUILD is that it is extremely complicated. Anything extremely complicated goes entirely against the KISS philosophy that we Arch devs/contributors cherish. See https://wiki.archlinux.org/index.php/The_Arch_Way But of course, you are free to write up a separate patch series for git. At this time, however, I am unwilling to delay this patch series to incorporate such extensive changes. -Linus
On Wed, Nov 9, 2011 at 9:46 PM, Linus Arver
The problem I have with your sample PKGBUILD is that it is extremely complicated. Anything extremely complicated goes entirely against the KISS philosophy that we Arch devs/contributors cherish. See https://wiki.archlinux.org/index.php/The_Arch_Way
well, the message i linked had the bits i was referring to factored out, and they amount to about 15-20 lines -- the PKGBUILD i linked *is* complex, but not complicated ... there is alot going on, and it's a less than trivial build. the bits relating to git are pretty clear, imo at least. "the arch way" is a great guideline -- i believe i've made the process as simple as it *can* be made ;-)
But of course, you are free to write up a separate patch series for git. At this time, however, I am unwilling to delay this patch series to incorporate such extensive changes.
that is fine, i was not suggesting any alteration or amendments to your series. it simply reminded me of a my prior work; work i believe is still more than valid. perhaps i should have created a new thread from it, but i'm still not 100% the expectations of this list. no worries, i may be able to spin some patches, but i would recommend maybe reading my original linked proposal, as i think it labels the goals pretty well. -- C Anthony
On Thu, Nov 10, 2011 at 12:56:35PM -0600, C Anthony Risinger wrote:
On Wed, Nov 9, 2011 at 9:46 PM, Linus Arver
wrote: The problem I have with your sample PKGBUILD is that it is extremely complicated. Anything extremely complicated goes entirely against the KISS philosophy that we Arch devs/contributors cherish. See https://wiki.archlinux.org/index.php/The_Arch_Way
well, the message i linked had the bits i was referring to factored out, and they amount to about 15-20 lines -- the PKGBUILD i linked *is* complex, but not complicated ... there is alot going on, and it's a less than trivial build.
the bits relating to git are pretty clear, imo at least. "the arch way" is a great guideline -- i believe i've made the process as simple as it *can* be made ;-)
I think your 10-20-line excerpts outlined in http://mailman.archlinux.org/pipermail/arch-general/2011-July/021078.html are still extremely complicated. But maybe I'm the only one who thinks this way (your technical competence with git is certainly above my own). But I think that the PKGBUILD prototypes are meant to be a very simple, sane starting point for devs/contributors to create their own. If we end up introducing too many concepts into these prototypes, maybe they may not be helpful in the end. Perhaps there should be two sets of prototypes --- an "beginner" and "advanced" version. Or maybe the ideas you introduced in your email belong to the Arch Wiki, and not the prototype (under a heading like "Advanced Git PKGBUILD Techniques).
But of course, you are free to write up a separate patch series for git. At this time, however, I am unwilling to delay this patch series to incorporate such extensive changes.
that is fine, i was not suggesting any alteration or amendments to your series. it simply reminded me of a my prior work; work i believe is still more than valid. perhaps i should have created a new thread from it, but i'm still not 100% the expectations of this list.
no worries, i may be able to spin some patches, but i would recommend maybe reading my original linked proposal, as i think it labels the goals pretty well.
Yes, please do create a new thread/patch series. Hopefully my series will be merged soonish. I think you should make very small, incremental changes at a time. That way, you won't scare off all the people on the list who are not as technically competent in git as yourself. ;) -Linus P.S. When writing a list of paragraphs, use '*' or '-', not ')'.
On Thu, Nov 10, 2011 at 11:08 PM, Linus Arver
But I think that the PKGBUILD prototypes are meant to be a very simple, sane starting point for devs/contributors to create their own. If we end up introducing too many concepts into these prototypes, maybe they may not be helpful in the end. Perhaps there should be two sets of prototypes --- an "beginner" and "advanced" version. Or maybe the ideas you introduced in your email belong to the Arch Wiki, and not the prototype (under a heading like "Advanced Git PKGBUILD Techniques).
i think the best route is some kind of lower level integration, at makepkg (or ?) level ... i know someone brought up a bug report regarding just that. i'll mull that a bit and probably make noise there instead :-)
I think you should make very small, incremental changes at a time. That way, you won't scare off all the people on the list who are not as technically competent in git as yourself. ;)
indeed :-) i'll seek the appropriate channel for these changes.
P.S. When writing a list of paragraphs, use '*' or '-', not ')'.
bah! i've done that for quite some time :-) i use `)` specifically because: 1) a) ... looks nice and clear to me, but i tend to skip backfilling the letter/numbers. maybe `*)` or `x)` is better? i'll consider `*`, but `-` is ... nay :-) aaaanyways, i'll look into lower level integration so users can simply set a variable or something, and not even call git directly. -- C Anthony
Git comes with commands that are specifically designed to return the
working directory to a "pristine" state. We make use of these commands
to avoid the expensive operation of creating a temporary build
directory.
Having a single directory to pull in upstream code and build from
greatly simplifies the code, and as a bonus, saves disk space/time!
Signed-off-by: Linus Arver
Since the $PWD after the update/initial clone is always "$srcdir", we
don't need to reference it later on.
Signed-off-by: Linus Arver <linusarver at gmail.com>
Signed-off-by: Linus Arver
Some vcs prototypes do
cd repo && update
while others do it like
cd repo
update
to update an existing repo. It makes sense to have them all do it the
first way (there's nothing wrong with it, and it has better form).
We also check for the (hidden) version control directory in the
if-statement for consisteny.
Signed-off-by: Linus Arver
On 2011-11-08 at 21:56 -0800, Linus Arver wrote:
Some vcs prototypes do
cd repo && update
while others do it like
cd repo update
to update an existing repo. It makes sense to have them all do it the first way (there's nothing wrong with it, and it has better form).
I agree with the consistency issue, but not with using the first way. makepkg is executed with `/bin/bash -e`[1]. Whenever a command returns a non zero exit status makepkg exits immediately. See bash(1) or set(1p). Thus conditional checks on single commands with && and || are seldomly required[2]. Even in the second case the "update" step is never reached. I personally find the second form easier to read, as one does not have to think about why the && was mixed in, and if this makes sense. [1]: http://projects.archlinux.org/pacman.git/commit/?id=b69edc1c3532816576198995... [2]: http://projects.archlinux.org/pacman.git/commit/?id=2710b256cc260db6a0805c83...
On Wed, Nov 09, 2011 at 03:09:09PM +0100, Sebastian Schwarz wrote:
On 2011-11-08 at 21:56 -0800, Linus Arver wrote:
Some vcs prototypes do
cd repo && update
while others do it like
cd repo update
to update an existing repo. It makes sense to have them all do it the first way (there's nothing wrong with it, and it has better form).
I agree with the consistency issue, but not with using the first way. makepkg is executed with `/bin/bash -e`[1]. Whenever a command returns a non zero exit status makepkg exits immediately. See bash(1) or set(1p). Thus conditional checks on single commands with && and || are seldomly required[2]. Even in the second case the "update" step is never reached.
I personally find the second form easier to read, as one does not have to think about why the && was mixed in, and if this makes sense.
[1]: http://projects.archlinux.org/pacman.git/commit/?id=b69edc1c3532816576198995... [2]: http://projects.archlinux.org/pacman.git/commit/?id=2710b256cc260db6a0805c83...
Hmm, I did not realize that makepkg behaved in that manner. This changes the whole underlying assumption about what is "good form". I've verified the behavior with a quick test in a sample PKGBUILD of mine; indeed, if "cd repo" itself fails, makepkg will abort the build() function. I will revert back to the second form, as in the first version of the patch series. -Linus
On Wed, Nov 9, 2011 at 10:09 PM, Sebastian Schwarz
On 2011-11-08 at 21:56 -0800, Linus Arver wrote:
Some vcs prototypes do
cd repo && update
while others do it like
cd repo update
to update an existing repo. It makes sense to have them all do it the first way (there's nothing wrong with it, and it has better form).
I agree with the consistency issue, but not with using the first way. makepkg is executed with `/bin/bash -e`[1]. Whenever a command returns a non zero exit status makepkg exits immediately. See bash(1) or set(1p). Thus conditional checks on single commands with && and || are seldomly required[2].
Actually, with `set -e', the second form is safer. In the first form, if `cd repo' fails, bash will NOT abort. This is the relevant part in bash(1) on `set -e': The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. So yes, the `&&' form is totally busted.
Even in the second case the "update" step is never reached.
I personally find the second form easier to read, as one does not have to think about why the && was mixed in, and if this makes sense.
[1]: http://projects.archlinux.org/pacman.git/commit/?id=b69edc1c3532816576198995... [2]: http://projects.archlinux.org/pacman.git/commit/?id=2710b256cc260db6a0805c83...
The bzr, darcs, git, and hg version control systems use a single
internal folder at the root to store all VCS-related data (commits,
history, etc.). This folder can get quite large (hundreds of MiB) for
big projects and continues to grow as the project lives on.
We exclude this folder when creating a temporary build directory to save
time and space. Since commit 0e79802c0ac8453376d8c0f99629f5a3b499f571 in
pacman includes "shopt -s extglob" in pacman/scripts/makepkg.sh.in, we
can use the simple "!(foo)" syntax.
CVS and SVN pollute the source repo with "CVS" and ".svn" directories
recursively for every single directory, so there is no simple one-liner
solution to exclude the VCS data for these systems that I am aware of.
Signed-off-by: Linus Arver
Hello all, This is the fourth (and final?) iteration of my patch series. Changes since v3: * The "vcs prototypes: consistent coding style" patch has been changed. See http://mailman.archlinux.org/pipermail/arch-projects/2011-November/002098.ht.... * The "vcs prototypes: more efficient temp build directories" patch had an incorrect commit message re: cvs/svn prototypes. This has been fixed. Linus Arver (7): vcs prototypes: consistent $PWD after checkout vcs prototypes: typo/stylistic fixes git prototype: on initial clones, perform a shallow clone git prototype: remove temp build directory vcs prototypes: simplify code vcs prototypes: consistent coding style vcs prototypes: more efficient temp build directories prototypes/PKGBUILD-bzr.proto | 17 ++++++++++------- prototypes/PKGBUILD-cvs.proto | 13 +++++++------ prototypes/PKGBUILD-darcs.proto | 17 +++++++++-------- prototypes/PKGBUILD-git.proto | 22 +++++++++++----------- prototypes/PKGBUILD-hg.proto | 14 ++++++++------ prototypes/PKGBUILD-svn.proto | 15 +++++++++------ 6 files changed, 54 insertions(+), 44 deletions(-) -- 1.7.7.3
Before, some vcs prototypes (bzr, git, hg, svn) cd'ed into the repo
directory after a checkout, but not on an initial clone. The cvs/darcs
prototypes made it uniform by always going into the repo after the
checkout, whether on an initial clone or an update.
With this change, we make all vcs prototypes simply not remain in the
repo directory after a checkout (whether initial clone or update).
Signed-off-by: Linus Arver
Make messages uniform.
Also, stop calling repositories "server", and display the repository URL
for more transparency.
Signed-off-by: Linus Arver
Shallow git clones are just like regular clones, but do not contain any
of the past commit history. It is virtually the same thing as doing a
regular clone, then doing a rebase to squash all commits into a single
commit. Many people who do not understand git dismiss shallow clones
because they wrongly believe that shallow clones are incapable of
pulling in changes going forward from the remote. This is not the case!
You can still do pulls from the master remote repo in the future to
update the shallow clone, just like a regular clone!
On an inital clone, we should *always* encourage PKGBUILD authors to do
a shallow clone. This will save time (less downloading) and disk space
(e.g., yaourt users). The savings can be hundreds of MiB for large git
repos. This will also help AUR packagers out there who do not understand
git at all to make this change themselves.
Git PKGBUILD authors who need to pull in an older version of the remote
repo (an extremely rare case) will already have the knowledge to remove
the "--depth 1" to suit their needs.
Signed-off-by: Linus Arver
Git comes with commands that are specifically designed to return the
working directory to a "pristine" state. We make use of these commands
to avoid the expensive operation of creating a temporary build
directory.
Having a single directory to pull in upstream code and build from
greatly simplifies the code, and as a bonus, saves disk space/time!
Thanks to Jesse Young for suggesting this idea. [1]
[1]: http://mailman.archlinux.org/pipermail/arch-projects/2011-November/002052.ht...
Signed-off-by: Linus Arver
Since the $PWD after the update/initial clone is always "$srcdir", we
don't need to reference it later on.
Signed-off-by: Linus Arver <linusarver at gmail.com>
Signed-off-by: Linus Arver
On Fri, Nov 11, 2011 at 05:50:21PM -0800, Linus Arver wrote:
Since the $PWD after the update/initial clone is always "$srcdir", we don't need to reference it later on.
Signed-off-by: Linus Arver <linusarver at gmail.com> Signed-off-by: Linus Arver
--- prototypes/PKGBUILD-bzr.proto | 6 +++--- prototypes/PKGBUILD-cvs.proto | 6 +++--- prototypes/PKGBUILD-darcs.proto | 6 +++---
Embarrasing --- please fix this duplicate signed-off-by line in the commit message after merging with a rebase. I don't think posting an entirely new patch series just for this typo is a sane thing to do... -Linus
Some vcs prototypes do
cd repo && update
while others do it like
cd repo
update
to update an existing repo. It makes sense to have them all do it the
second way, because makepkg runs with `/bin/bash -e` [1]. The manpage
for bash(1) states regarding the "-e" option:
"The shell does not exit if the command that fails is ... part of any
command executed in a && or || list except the command following the
final && or || ..."
I.e., if "cd repo" fails in "cd repo && update", the shell will not
exit! The second form avoids this pitfall and is slightly easier to
read, especially for the longer "update" commands.
Thanks to Sebastian Schwarz [2] and lolilolicon [3] for the pointers.
Lastly, we also check for the (hidden) version control directory in the
if-statement for consisteny.
[1]: http://projects.archlinux.org/pacman.git/commit/?id=b69edc1c3532816576198995...
[2]: http://mailman.archlinux.org/pipermail/arch-projects/2011-November/002096.ht...
[3]: http://mailman.archlinux.org/pipermail/arch-projects/2011-November/002099.ht...
Signed-off-by: Linus Arver
The bzr, darcs, git, and hg version control systems use a single
internal folder at the root to store all VCS-related data (commits,
history, etc.). This folder can get quite large (hundreds of MiB) for
big projects and continues to grow as the project lives on.
We exclude this folder when creating a temporary build directory to save
time and space. Since commit 0e79802c0ac8453376d8c0f99629f5a3b499f571 in
pacman includes "shopt -s extglob" in pacman/scripts/makepkg.sh.in, we
can use the simple "!(foo)" syntax. Thanks to Dave Reisner for pointing
this out. [1]
CVS and SVN pollute the source repo with "CVS" and ".svn" directories
recursively for every single directory, so we use rsync. Most, if not
all, Arch systems should have rsync installed, so this should be safe.
[1]: http://mailman.archlinux.org/pipermail/arch-projects/2011-November/002033.ht...
Signed-off-by: Linus Arver
participants (4)
-
C Anthony Risinger
-
Linus Arver
-
lolilolicon
-
Sebastian Schwarz