[arch-general] PKGBUILD.proto improvements
i wanted to share a couple of loose thoughts on the standard practices for PKGBUILDs ... specifically development ones ... with an immediate focus on git ones (for this instance, but applies to any) awhile back i tried to make a super cheap way of doing git checkouts for builds: https://bbs.archlinux.org/viewtopic.php?id=86366 (and this would have worked if i would have thought to make git use itself as an alternative object pool ...) ... the motivation of this message is to encourage adding some routines to the makepkg library, similar to msg/msg2/warning/etc, to handle development checkouts. basically i see people using _gitroot and _gitname for different stuff ... namely _gitname. i'd like to see a routine that pulls the repo to a known location so it doenst get blown away every update, and so the user doesnt have to manually manage this. everyone does it differently, puts the repository in all sorts of different locations/etc, and it just isnt very pretty IMO. ... so, i suggest a permanent setup based off my out setup and experiences, in reference of this PKGBUILD in particular: http://aur.archlinux.org/packages/pyjamas-engine-pythonwebkit/PKGBUILD ) allow _git* variables to be altered by the environment ) add _gitspec variable which supercedes _gitname for checkout stage (so you can do relative checkouts or SHA1 based checkouts) ... in the routine ... ) use a targeted fetch command instead of a blind clone. not only is this more flexible, but it can significantly reduce download size (for kernel its a ~50% reduction IIRC). the methoud from my builds are very simple, greatly resembling --mirror mode in git, but ONLY for the exact branch you are building ) store repositories in a known list of locations. my PKGBUILD searches these: /var/abs/PKGBUILD.devel/${pkgname}.git /var/abs/local/PKGBUILD.devel/${pkgname}.git ${SRCDEST}/PKGBUILD.devel/${pkgname}.git ${startdir}/PKGBUILD.devel/${pkgname}.git ~/PKGBUILD.devel/${pkgname}.git ... and chooses the first one that already exists (PKGBUILD.devel location only BEFORE appending repo name), OR the first one with write access to parent dir (it will create PKGBUILD.devel for you if it can). i personally used the first one (/var/abs/PKGBUILD.devel), with `chmod 1777` to sticky bit it like /tmp. this not only allows for reuse, but also allows for safely+easily+consistently bind-mounting into a chroot for use with mkchrootpkg ... ) use a proxy mechanism in the event a repository is found but it is read-only. this lets you read-only bind mount a repo (think mkchrootpkg), and it will simply create a new repository, copy the refs, and setup the object directory as an alternative for the proxy repo. thus the proxy has the ability to download new objects as needed, but starts from the same spot as the bind-mounted repo. these techniques save me a great deal of time and pain. that package in particular is a custom webkit build, requiring over 1GB to clone ... mkchrootpkg tried to blow it away once, bit it was read-only :-) ... i intend to possibly improve on this process by reducing a targeted fetch to a *shallow* targeted fetch , ie. the minimum amount of objects required to build. i have seen "clones" go down to 50%, sometimes even 10% or less, by using this over a naive `git clone $xyz`. i'd like to see *something* adopted so we can end the madness :-) ... below are a couple key excerpts from the noted PKGBUILD, for your reference. C Anthony ------------------------------------------------- [locate repo] ------------------------------------------------- # Devel directory fragment : ${_dir_devel:=PKGBUILD.devel} # Local/custom repo? if [[ -z ${_gitrepo} ]]; then search=(/var/abs{,/local}/"${_dir_devel}" "${SRCDEST%%/}/${_dir_devel}" "${startdir%%/}/${_dir_devel}" ~/"${_dir_devel}") for d in "${search[@]%%/}"; do if [[ -e ${d} ]] || [[ ! -e ${d} && -w ${d%/*} ]]; then mkdir -p "${d}" 2>&- && _gitrepo_proxy="${srcdir}/${pkgname}.git" && _gitrepo="${d}/${pkgname}.git" && break fi done fi ------------------------------------------------- [create new repo, or new proxy, if needed] ------------------------------------------------- mkdir -p "${w}" if [[ ! -e ${g}/objects ]]; then msg "[git] Creating NEW repository ... " git --git-dir="${g}" --work-tree="${w}" init elif [[ ! -w ${g}/objects ]]; then warning "[git] Repository read-only, setting up proxy ... " git --git-dir="${_gitrepo_proxy}" --work-tree="${w}" init echo "${g}/objects" > "${_gitrepo_proxy}/objects/info/alternates" cp -r "${g}/refs" "${_gitrepo_proxy}" g="${_gitrepo_proxy}" fi ------------------------------------------------- [perform targeted fetch] ------------------------------------------------- msg "[git] Syncing ... ${_gitroot} -> ${g}" git --git-dir="${g}" --work-tree="${w}" fetch -fu "${_gitroot}" "+${_gitname}:${_gitname}" msg "[git] Reading ... ${_gitspec:-${_gitname}} -> ${w}" git --git-dir="${g}" --work-tree="${w}" read-tree --reset -u "${_gitspec:-${_gitname}}" -------------------------------------------------
C Anthony Risinger (2011-07-21 16:42):
... the motivation of this message is to encourage adding some routines to the makepkg library, similar to msg/msg2/warning/etc, to handle development checkouts. basically i see people using _gitroot and _gitname for different stuff ... namely _gitname. i'd like to see a routine that pulls the repo to a known location so it doenst get blown away every update, and so the user doesnt have to manually manage this. everyone does it differently, puts the repository in all sorts of different locations/etc, and it just isnt very pretty IMO. <snip>
I am sure you didn't ask, but I use $SRCDEST/scm for this, do not define sources=() and run this in build(): cp -r "$SRCDEST/scm/$pkgname" "$srcdir/" -- -- Rogutės Sparnuotos
On Thu, Jul 21, 2011 at 5:08 PM, Rogutės Sparnuotos <rogutes@googlemail.com> wrote:
C Anthony Risinger (2011-07-21 16:42):
... the motivation of this message is to encourage adding some routines to the makepkg library, similar to msg/msg2/warning/etc, to handle development checkouts. basically i see people using _gitroot and _gitname for different stuff ... namely _gitname. i'd like to see a routine that pulls the repo to a known location so it doenst get blown away every update, and so the user doesnt have to manually manage this. everyone does it differently, puts the repository in all sorts of different locations/etc, and it just isnt very pretty IMO. <snip>
I am sure you didn't ask, but I use $SRCDEST/scm for this, do not define sources=() and run this in build(): cp -r "$SRCDEST/scm/$pkgname" "$srcdir/"
nah i didnt ask, but hey no one needs my approval anyway :-) i started off doing it just like that too ... but there are some problems: ) mkchrootpkg sets that variable itself, ie. it's not stable ) ... *anyone* can set it, ie. it's very not stable ) more of a runtime/per-invocation/cache setting vs. a stable/known location ) not overridable from env in mkchrootpkg without patch (use of sudo kills env) also, $SRCDEST is still a "cache" location, like the pacman cache. it's safe to delete. in fact that's what `--cleancache` in makepkg does ... i want to establish a known STABLE location because it's NOT safe to delete repos unless you really really REALLY mean to. additionally, your routine, while it would work for most simple/small packages, would cause a 1GB+ copy and then a massive checkout for my reference PKGBUILD *everytime* i ran makepkg. considering this package take several hours to build on a fairly powerful machine, up to 8+ on a normal machine, i'd prefer if it didn't wipe it out when something failed :-) my routines are specifically designed for maximum reuse and minimum downtime. if everyone used the same routines, we could have *automagic* sharing between competing PKGBUILDs, AND i could reuse those repositories for my own development tinkerings. ... imo development packages should be able to mirror the options of regular packages as close as possible, eg.`--noextract` should build without a sync/checkout, `--nobuild` should sync/checkout without a build, maybe even `--allsource` does a checkout + bundling. C Anthony
C Anthony Risinger (2011-07-22 11:25):
On Thu, Jul 21, 2011 at 5:08 PM, Rogutės Sparnuotos <rogutes@googlemail.com> wrote:
C Anthony Risinger (2011-07-21 16:42):
... the motivation of this message is to encourage adding some routines to the makepkg library, similar to msg/msg2/warning/etc, to handle development checkouts. basically i see people using _gitroot and _gitname for different stuff ... namely _gitname. i'd like to see a routine that pulls the repo to a known location so it doenst get blown away every update, and so the user doesnt have to manually manage this. everyone does it differently, puts the repository in all sorts of different locations/etc, and it just isnt very pretty IMO. <snip>
I am sure you didn't ask, but I use $SRCDEST/scm for this, do not define sources=() and run this in build(): cp -r "$SRCDEST/scm/$pkgname" "$srcdir/"
nah i didnt ask, but hey no one needs my approval anyway :-)
i started off doing it just like that too ... but there are some problems:
) mkchrootpkg sets that variable itself, ie. it's not stable ) ... *anyone* can set it, ie. it's very not stable ) more of a runtime/per-invocation/cache setting vs. a stable/known location ) not overridable from env in mkchrootpkg without patch (use of sudo kills env)
mkarchroot/makechrootpkg should learn to bind mount $SRCDEST to /srcdest. And $SRCDEST/scm could be chmod 0775 $SRCDEST/scm && chgrp builder $SRCDEST/scm And the 'builder' group id could be synced between chroot and the real system.
also, $SRCDEST is still a "cache" location, like the pacman cache. it's safe to delete. in fact that's what `--cleancache` in makepkg does ... i want to establish a known STABLE location because it's NOT safe to delete repos unless you really really REALLY mean to.
--cleancache doesn't delete directories, so $SRCDEST/scm stays in place :) And you should know what you are doing when saying --cleancache.
additionally, your routine, while it would work for most simple/small packages, would cause a 1GB+ copy and then a massive checkout for my reference PKGBUILD *everytime* i ran makepkg. considering this package take several hours to build on a fairly powerful machine, up to 8+ on a normal machine, i'd prefer if it didn't wipe it out when something failed :-) my routines are specifically designed for maximum reuse and minimum downtime. if everyone used the same routines, we could have *automagic* sharing between competing PKGBUILDs, AND i could reuse those repositories for my own development tinkerings.
Your reference PKGBUILD is a very bad example. What happened to the good old simple PKGBUILDs?
... imo development packages should be able to mirror the options of regular packages as close as possible, eg.`--noextract` should build without a sync/checkout, `--nobuild` should sync/checkout without a build, maybe even `--allsource` does a checkout + bundling.
All this needs support in makepkg. Your mail subject says "PKGBUILD.proto improvements" and your text talks about some custom, agreed upon routine for SCM handling, but what you really want is SCM support in makepkg. Well, you are not alone: https://bugs.archlinux.org/task/16384#comment50310 https://bugs.archlinux.org/task/16872 Anyway, not sure why I am arguing, because I am not intending to write the code and haven't done enough homework :) Btw., makepkg currently runs this on files in the source=() array: ln -s "$SRCDEST/$file" "$srcdir/" It could learn to parse smth. like this (but I am sure there exists a case where one needs to pull more than 1 repo): source=(xxx.patch [git]="git://git.sv.gnu.org/pythonwebkit.git") When encountering a [git] subscript, makepkg could call some internal function and download to $SRCDEST/scm/, or call fetch_git() if it is defined in the PKGBUILD (cd'ing to $SRCDEST/scm before call, executing `ln -s "$SRCDEST/scm/$_gitname" "$srcdir/"` afterwards)... -- -- Rogutės Sparnuotos
Excerpts from C Anthony Risinger's message of 2011-07-21 23:42:21 +0200:
i wanted to share a couple of loose thoughts on the standard practices for PKGBUILDs ... specifically development ones ... with an immediate focus on git ones (for this instance, but applies to any)
awhile back i tried to make a super cheap way of doing git checkouts for builds:
https://bbs.archlinux.org/viewtopic.php?id=86366 (and this would have worked if i would have thought to make git use itself as an alternative object pool ...)
... the motivation of this message is to encourage adding some routines to the makepkg library, similar to msg/msg2/warning/etc, to handle development checkouts. basically i see people using _gitroot and _gitname for different stuff ... namely _gitname. i'd like to see a routine that pulls the repo to a known location so it doenst get blown away every update, and so the user doesnt have to manually manage this. everyone does it differently, puts the repository in all sorts of different locations/etc, and it just isnt very pretty IMO.
... so, i suggest a permanent setup based off my out setup and experiences, in reference of this PKGBUILD in particular:
http://aur.archlinux.org/packages/pyjamas-engine-pythonwebkit/PKGBUILD
) allow _git* variables to be altered by the environment ) add _gitspec variable which supercedes _gitname for checkout stage (so you can do relative checkouts or SHA1 based checkouts)
... in the routine ...
) use a targeted fetch command instead of a blind clone. not only is this more flexible, but it can significantly reduce download size (for kernel its a ~50% reduction IIRC). the methoud from my builds are very simple, greatly resembling --mirror mode in git, but ONLY for the exact branch you are building ) store repositories in a known list of locations. my PKGBUILD searches these:
/var/abs/PKGBUILD.devel/${pkgname}.git /var/abs/local/PKGBUILD.devel/${pkgname}.git ${SRCDEST}/PKGBUILD.devel/${pkgname}.git ${startdir}/PKGBUILD.devel/${pkgname}.git ~/PKGBUILD.devel/${pkgname}.git
... and chooses the first one that already exists (PKGBUILD.devel location only BEFORE appending repo name), OR the first one with write access to parent dir (it will create PKGBUILD.devel for you if it can). i personally used the first one (/var/abs/PKGBUILD.devel), with `chmod 1777` to sticky bit it like /tmp. this not only allows for reuse, but also allows for safely+easily+consistently bind-mounting into a chroot for use with mkchrootpkg ...
) use a proxy mechanism in the event a repository is found but it is read-only. this lets you read-only bind mount a repo (think mkchrootpkg), and it will simply create a new repository, copy the refs, and setup the object directory as an alternative for the proxy repo. thus the proxy has the ability to download new objects as needed, but starts from the same spot as the bind-mounted repo.
these techniques save me a great deal of time and pain. that package in particular is a custom webkit build, requiring over 1GB to clone ... mkchrootpkg tried to blow it away once, bit it was read-only :-) ... i intend to possibly improve on this process by reducing a targeted fetch to a *shallow* targeted fetch , ie. the minimum amount of objects required to build. i have seen "clones" go down to 50%, sometimes even 10% or less, by using this over a naive `git clone $xyz`.
i'd like to see *something* adopted so we can end the madness :-) ... below are a couple key excerpts from the noted PKGBUILD, for your reference.
C Anthony
------------------------------------------------- [locate repo] -------------------------------------------------
# Devel directory fragment : ${_dir_devel:=PKGBUILD.devel}
# Local/custom repo? if [[ -z ${_gitrepo} ]]; then search=(/var/abs{,/local}/"${_dir_devel}" "${SRCDEST%%/}/${_dir_devel}" "${startdir%%/}/${_dir_devel}" ~/"${_dir_devel}") for d in "${search[@]%%/}"; do if [[ -e ${d} ]] || [[ ! -e ${d} && -w ${d%/*} ]]; then mkdir -p "${d}" 2>&- && _gitrepo_proxy="${srcdir}/${pkgname}.git" && _gitrepo="${d}/${pkgname}.git" && break fi done fi
------------------------------------------------- [create new repo, or new proxy, if needed] -------------------------------------------------
mkdir -p "${w}" if [[ ! -e ${g}/objects ]]; then msg "[git] Creating NEW repository ... " git --git-dir="${g}" --work-tree="${w}" init elif [[ ! -w ${g}/objects ]]; then warning "[git] Repository read-only, setting up proxy ... " git --git-dir="${_gitrepo_proxy}" --work-tree="${w}" init echo "${g}/objects" > "${_gitrepo_proxy}/objects/info/alternates" cp -r "${g}/refs" "${_gitrepo_proxy}" g="${_gitrepo_proxy}" fi
------------------------------------------------- [perform targeted fetch] -------------------------------------------------
msg "[git] Syncing ... ${_gitroot} -> ${g}" git --git-dir="${g}" --work-tree="${w}" fetch -fu "${_gitroot}" "+${_gitname}:${_gitname}" msg "[git] Reading ... ${_gitspec:-${_gitname}} -> ${w}" git --git-dir="${g}" --work-tree="${w}" read-tree --reset -u "${_gitspec:-${_gitname}}"
I'm not quite sure what problem you are trying to solve but what I see is that this is an order of magnitude more complex than the current PKGBUILD-git.proto. As it stands here I'd have no idea how to use it, what it does or how it works. I wouldn't want to use it. Just 2c from someone who is neither experienced git user nor fond of bash.
participants (3)
-
C Anthony Risinger
-
Philipp Überbacher
-
Rogutės Sparnuotos