[pacman-dev] [PATCH 0/3] makepkg: Alternate implementation of VCS URLs in sources array.
A while ago I started working on a derivative of makepkg to support having 'git://...' type urls in the sources=() array. When preparing to file this patch, I did a `git rebase`, and noticed that Allan McRae began working on a similar feature. Our implementations are in many ways similar. Hopefully mine will be useful. My implementation makes minimal changes to makepkg itself (only adding blob expansion to DLAGENTS, allowing for things like "git+*::""). Instead I added a `vcsget` tool which generates a tarball from the VCS repo, in a very similar manner to the way Allan's implementation does so within makepkg. It looks as if Allan's download_*() functions are more verbose than mine about what failed when there is an error. His svn and hg handlers are likely more robust--though my git is pretty solid. I also have a half-written handler for for bzr. An advantage of my design is that it does allow for integrity checks of VCS packages, rather than inserting 'SKIP' into the md5sums array. This is very important to the derivative distribution Parabola. (However, the 'SKIP' option is still valuable for URLs that track a branch) Happy hacking, ~ Luke Shumaker Luke Shumaker (3): Add a `vcsget` tool to download source from VCS repositories. makepkg: do glob expansion in DLAGENTS maps makepkg.conf: add vcsget DLAGENTS etc/makepkg.conf.in | 8 +- scripts/.gitignore | 1 + scripts/Makefile.am | 4 +- scripts/makepkg.sh.in | 13 ++- scripts/vcsget.sh.in | 294 ++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 316 insertions(+), 4 deletions(-) create mode 100644 scripts/vcsget.sh.in -- 1.7.12
It would support * cvs * git * svn * bzr * hg But cvs support is yet to be written, and bzr support is half-way written. Git is the only mode that I'm thoroughly confident in, the others need testing. Signed-off-by: Luke Shumaker <LukeShu@sbcglobal.net> --- scripts/.gitignore | 1 + scripts/Makefile.am | 4 +- scripts/vcsget.sh.in | 294 +++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 298 insertions(+), 1 deletion(-) create mode 100644 scripts/vcsget.sh.in diff --git a/scripts/.gitignore b/scripts/.gitignore index 9e403bf..e59551e 100644 --- a/scripts/.gitignore +++ b/scripts/.gitignore @@ -6,3 +6,4 @@ pkgdelta repo-add repo-elephant repo-remove +vcsget diff --git a/scripts/Makefile.am b/scripts/Makefile.am index 29c81aa..8e6d037 100644 --- a/scripts/Makefile.am +++ b/scripts/Makefile.am @@ -14,7 +14,8 @@ OURSCRIPTS = \ pacman-key \ pacman-optimize \ pkgdelta \ - repo-add + repo-add \ + vcsget EXTRA_DIST = \ makepkg.sh.in \ @@ -23,6 +24,7 @@ EXTRA_DIST = \ pacman-optimize.sh.in \ pkgdelta.sh.in \ repo-add.sh.in \ + vcsget.sh.in \ $(LIBRARY) LIBRARY = \ diff --git a/scripts/vcsget.sh.in b/scripts/vcsget.sh.in new file mode 100644 index 0000000..f8f6e4b --- /dev/null +++ b/scripts/vcsget.sh.in @@ -0,0 +1,294 @@ +#!/bin/bash +# +# vcsget - downloader agent that generates tarballs from VCS repos +# @configure_input@ +# +# Copyright (c) 2012 Luke Shumaker <lukeshu@sbcglobal.net> +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +# gettext initialization +export TEXTDOMAIN='pacman-scripts' +export TEXTDOMAINDIR='@localedir@' + +unset CDPATH + +m4_include(library/output_format.sh) + +## +# usage : in_array( $needle, $haystack ) +# return : 0 - found +# 1 - not found +## +in_array() { + local needle=$1; shift + local item + for item in "$@"; do + [[ $item = "$needle" ]] && return 0 # Found + done + return 1 # Not Found +} + +usage() { + echo "Usage: $0 <URL> <OUTPUT_TARBALL>" + echo "" + echo "Talks with multiple version control systems (VCSs) to create a" + echo "tarball of a specific commit." + echo "" + echo "<OUTPUT_TARBALL> is a plain, uncompressed tarball. Given the same" + echo "commit, the tarball will always have the same checksum." + echo "" + echo "<URL> is the repository, and possibly commit to download and tar." + echo "The following schemes are recognized:" + echo " * cvs://" + echo " * git://" + echo " * git+PROTO://" + echo " * svn+PROTO://" + echo " * bzr://" + echo " * hg+PROTO://" + echo "" + echo "URLs to be consumed by $0 are not always in the format of the" + echo "relevent VCS program, but normalized to proper URLs. Simply, this" + echo "means:" + echo "" + echo " scheme://[authinfo@]host[:port]/path[#fragment]" + echo "" + echo "Where <authinfo> is in the format \"username[:password]\" and" + echo "<fragment> is the commit ID, branch, or tag to be returned." +} + +## +# Get the column "$2" from the space-delimited string "$1". +# This has the advantage over `set -- $1` and `arr=($1)` that it separates on +# a single space, instead of '\s+', which allows it to have empty columns. +## +get_field() { + local str=$1 + local i=$2 + echo "$str"|cut -d' ' -f $i +} + +## +# Parse a URL into parts, according to RFC 3986, "URI Generic Syntax". +# Sets the variables `scheme`, `authority`, `path`, `query` and `fragment`. +## +parse_url() { + local url=$1 + + # This regex is copied from RFC 3986 + local regex='^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?' + local parts=$(echo "$url"|sed -r "s@$regex@\2 \4 \5 \7 \9@") + + scheme=$( get_field "$parts" 1) + authority=$(get_field "$parts" 2) + path=$( get_field "$parts" 3) + query=$( get_field "$parts" 4) + fragment=$( get_field "$parts" 5) +} + +## +# Parse an authority from a URL. +# Sets the variables `user`, `pass`, `sock`, `host` and `port`. +# `sock` is socket, which is defined as `hostport` in the RFC. +## +parse_authority() { + local authority=$1 + + local regex='^(([^:]*)(:([^@]*))?@)?(([^:]+)(:([0-9]*))?)' + # 12 3 4 56 7 8 + # 1 = userinfo@ + # 2 = user + # 3 = :pass + # 4 = pass + # 5 = hostport + # 6 = host + # 7 = :port + # 8 = port + local parts=$(echo "$authority"|sed -r "s%$regex%\2 \4 \5 \6 \8%") + user=$(get_field "$parts" 1) + pass=$(get_field "$parts" 2) + sock=$(get_field "$parts" 3) + host=$(get_field "$parts" 4) + port=$(get_field "$parts" 5) +} + +download_cvs() { + # TODO + error "$(gettext 'CVS not implemented')" + exit 1 +} + +download_git() { + msg "$(gettext 'Getting source from git')" + : ${fragment:=master} # default to 'master' if not set + send_url="${scheme#git+}://${authority}${path}" + + # Where to checkout the code to + dir="$basedir/git/$send_url" + if [[ "${dir##*.}" != git ]]; then + dir="${dir}/.git" + fi + + # Download + if [ ! -d "$dir" ]; then + msg2 "$(gettext 'doing initial clone')" + mkdir -p "$dir" + git clone --bare "$send_url" "$dir" + fi + + cd "$dir" + refs=($(git show-ref|sed 's@.*/@@'|sort -u)) + if in_array "$fragment" "${refs[@]}"; then + # The ref is symbolic, so it may have changed upstream + msg2 "$(gettext 'fetching updates')" + git fetch --all + fi + + # Generate tarball + archive=(git archive --format=tar --prefix="$base/" + "$fragment" -o "$tarball") + msg2 "$(gettext 'trying to generate tarball')" + if "${archive[@]}" 2>/dev/null; then + # success + msg2 "$(gettext 'success')" + exit 0 + else + # the relevent commit is probably newer than we have + msg2 "$(gettext 'failure, forcing an update')" + git fetch --all + msg2 "$(gettext 'generating tarball')" + "${archive[@]}" + local s=$? + exit $? + fi +} + +download_svn() { + msg "$(gettext 'Getting source from Subversion')" + parse_authority "$authority" + send_url="${scheme#svn+}://${sock}${path}" + args=('--config-dir' "$basedir/svn/.config") + if [ -n "$user" ]; then + args+=('--username' "$user") + fi + if [ -n "$pass" ]; then + args+=('--password' "$pass") + fi + if [ -n "$fragment" ]; then + args+=('-r' "$fragment") + fi + + # Where to checkout the code to + dir="$basedir/svn/$send_url#$fragment" + + # Download + if [ ! -d "$dir" ]; then + mkdir -p "$dir" + svn co "${args[@]}" "$send_url" "$dir" + else + cd "$dir" + svn up "${args[@]}" + fi + + # Generate tarball + mkdir "$dir.tmp$$" + cp -r "$dir" "$dir.tmp$$/$base" + find "$dir.tmp$$/$base" -name .svn -exec rm -rf '{}' + + cd "$dir.tmp$$" + bsdtar cf "$tarball" "$base" + rm -rf "$dir.tmp$$" +} + +download_bzr() { + # TODO finish bzr support + error "$(gettext 'Bazaar not implemented')" + exit 1 + + send_url="${url%%#*}" + if [ "$scheme" != 'bzr+ssh' ]; then + send_url="${send_url#bzr+}" + fi + + dir="$basedir/bzr/$send_url" + + # Download + if [ ! -d "$dir" ]; then + msg2 "$(gettext 'doing initial checkout')" + mkdir -p "$dir" + bzr checkout "$send_url" "$dir" + fi + + cd "$dir" + bzr update + + # Generate tarball +} + +download_hg() { + msg "$(gettext 'Getting source from Mercurial')" + send_url="${url#hg+}" + hg clone -U "$send_url" + dir="$basedir/hg/${send_url%%#*}" + + # Download + if [ ! -d "$dir" ]; then + mkdir -p "$dir" + hg clone "$send_url" "$dir" + else + cd "$dir" + hg update -r "$fragment" + fi + + # Generate tarball + mkdir "$dir.tmp$$" + cp -r "$dir" "$dir.tmp$$/$base" + rm -rf "$dir.tmp$$/$base/.hg" + cd "$dir.tmp$$" + bsdtar cf "$tarball" "$base" + rm -rf "$dir.tmp$$" +} + +main() { + url=$1 + tarball=$(readlink -m "$2") + + base="$(echo "$tarball"|sed -r 's@.*/(.*)\.tar(\..*)?$@\1@')" + basedir="${tarball%/*}/vcs-cache" + + if ( in_array '-h' "$@" || in_array '--help' "$@" ); then + usage + exit 0 + fi + if [ "$#" -ne 2 ]; then + usage + exit 1 + fi + + msg "$(printf "$(gettext "Saving '%s' to '%s'")" "$url" "$tarball")" + + parse_url "$url" + case "$scheme" in + cvs) download_cvs;; + git+*|git) download_git;; + svn+*) download_svn;; + bzr) download_bzr;; + hg+*) download_hg;; + *) + error "$(gettext 'Unable to match a VCS with scheme'): $scheme" + exit 1;; + esac +} + +main "$@" >&2 -- 1.7.12
On Sat, Aug 25, 2012 at 01:36:42PM -0400, Luke Shumaker wrote:
It would support * cvs * git * svn * bzr * hg
But cvs support is yet to be written, and bzr support is half-way written.
Git is the only mode that I'm thoroughly confident in, the others need testing.
Signed-off-by: Luke Shumaker <LukeShu@sbcglobal.net> ---
Code review, as promised. There's a general lack of error checking on important commands. Things that mutate the filesystem or change directory should _always_ be checked for error. I haven't pointed it all out, but you'll get the idea. I still don't like the idea of creating a tarball out of everything, just to unpack it after its completed. From a distribution packager perspective, building from version control shouldn't be a regular thing. Yes, I do understand your decision based on Parabola, but I don't have to agree with it =P.
scripts/.gitignore | 1 + scripts/Makefile.am | 4 +- scripts/vcsget.sh.in | 294 +++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 298 insertions(+), 1 deletion(-) create mode 100644 scripts/vcsget.sh.in
diff --git a/scripts/.gitignore b/scripts/.gitignore index 9e403bf..e59551e 100644 --- a/scripts/.gitignore +++ b/scripts/.gitignore @@ -6,3 +6,4 @@ pkgdelta repo-add repo-elephant
__ '. \ '- \ / /_ .---. / | \\,.\/--.// ) | \// )/ / \ ' ^ ^ / )____.----.. 6 '.____. .___/ \._) .\/. ) '\ / _/ \/ ). ) ( /# .! | /\ / \ C// # /'-----''/ # / . 'C/ | | | | |mrf , \), .. .'OOO-'. ..'OOO'OOO-'. ..\(,
repo-remove +vcsget diff --git a/scripts/Makefile.am b/scripts/Makefile.am index 29c81aa..8e6d037 100644 --- a/scripts/Makefile.am +++ b/scripts/Makefile.am @@ -14,7 +14,8 @@ OURSCRIPTS = \ pacman-key \ pacman-optimize \ pkgdelta \ - repo-add + repo-add \ + vcsget
EXTRA_DIST = \ makepkg.sh.in \ @@ -23,6 +24,7 @@ EXTRA_DIST = \ pacman-optimize.sh.in \ pkgdelta.sh.in \ repo-add.sh.in \ + vcsget.sh.in \ $(LIBRARY)
LIBRARY = \ diff --git a/scripts/vcsget.sh.in b/scripts/vcsget.sh.in new file mode 100644 index 0000000..f8f6e4b --- /dev/null +++ b/scripts/vcsget.sh.in @@ -0,0 +1,294 @@ +#!/bin/bash +# +# vcsget - downloader agent that generates tarballs from VCS repos +# @configure_input@ +# +# Copyright (c) 2012 Luke Shumaker <lukeshu@sbcglobal.net> +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +# gettext initialization +export TEXTDOMAIN='pacman-scripts' +export TEXTDOMAINDIR='@localedir@' + +unset CDPATH + +m4_include(library/output_format.sh) + +## +# usage : in_array( $needle, $haystack ) +# return : 0 - found +# 1 - not found +## +in_array() { + local needle=$1; shift + local item + for item in "$@"; do + [[ $item = "$needle" ]] && return 0 # Found + done + return 1 # Not Found +} + +usage() { + echo "Usage: $0 <URL> <OUTPUT_TARBALL>"
You might consider stuffing $0 into a var sooner as "${0##*/}", similar to program_invocation_short_name in C. Invoking your script as ../some/other/dir/vcsget won't horribly throw off your line wrapping and it looks prettier in the trimmed version, damnit.
+ echo "" + echo "Talks with multiple version control systems (VCSs) to create a" + echo "tarball of a specific commit." + echo "" + echo "<OUTPUT_TARBALL> is a plain, uncompressed tarball. Given the same" + echo "commit, the tarball will always have the same checksum." + echo "" + echo "<URL> is the repository, and possibly commit to download and tar." + echo "The following schemes are recognized:" + echo " * cvs://" + echo " * git://" + echo " * git+PROTO://" + echo " * svn+PROTO://" + echo " * bzr://" + echo " * hg+PROTO://" + echo "" + echo "URLs to be consumed by $0 are not always in the format of the" + echo "relevent VCS program, but normalized to proper URLs. Simply, this" + echo "means:" + echo "" + echo " scheme://[authinfo@]host[:port]/path[#fragment]" + echo "" + echo "Where <authinfo> is in the format \"username[:password]\" and" + echo "<fragment> is the commit ID, branch, or tag to be returned." +} + +## +# Get the column "$2" from the space-delimited string "$1". +# This has the advantage over `set -- $1` and `arr=($1)` that it separates on +# a single space, instead of '\s+', which allows it to have empty columns. +## +get_field() { + local str=$1 + local i=$2 + echo "$str"|cut -d' ' -f $i +} + +## +# Parse a URL into parts, according to RFC 3986, "URI Generic Syntax". +# Sets the variables `scheme`, `authority`, `path`, `query` and `fragment`. +## +parse_url() { + local url=$1 + + # This regex is copied from RFC 3986 + local regex='^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?' + local parts=$(echo "$url"|sed -r "s@$regex@\2 \4 \5 \7 \9@") + + scheme=$( get_field "$parts" 1) + authority=$(get_field "$parts" 2) + path=$( get_field "$parts" 3) + query=$( get_field "$parts" 4) + fragment=$( get_field "$parts" 5) +}
As mentioned, this should all be replaced with bash regex to avoid the constant forking to sed.
+ +## +# Parse an authority from a URL. +# Sets the variables `user`, `pass`, `sock`, `host` and `port`. +# `sock` is socket, which is defined as `hostport` in the RFC. +## +parse_authority() { + local authority=$1 + + local regex='^(([^:]*)(:([^@]*))?@)?(([^:]+)(:([0-9]*))?)' + # 12 3 4 56 7 8 + # 1 = userinfo@ + # 2 = user + # 3 = :pass + # 4 = pass + # 5 = hostport + # 6 = host + # 7 = :port + # 8 = port + local parts=$(echo "$authority"|sed -r "s%$regex%\2 \4 \5 \6 \8%") + user=$(get_field "$parts" 1) + pass=$(get_field "$parts" 2) + sock=$(get_field "$parts" 3) + host=$(get_field "$parts" 4) + port=$(get_field "$parts" 5) +}
More regex. In general, stuffing code into a sed expression is frowned upon (for fear of separator clashes). In this case, you know the value of $regex ahead of time, so it's okay.
+ +download_cvs() { + # TODO + error "$(gettext 'CVS not implemented')" + exit 1 +} + +download_git() { + msg "$(gettext 'Getting source from git')" + : ${fragment:=master} # default to 'master' if not set + send_url="${scheme#git+}://${authority}${path}" + + # Where to checkout the code to + dir="$basedir/git/$send_url"
always localize your variables when possible.
+ if [[ "${dir##*.}" != git ]]; then
Random proper bash test with quoting on the wrong side ;)
+ dir="${dir}/.git" + fi + + # Download + if [ ! -d "$dir" ]; then + msg2 "$(gettext 'doing initial clone')" + mkdir -p "$dir" + git clone --bare "$send_url" "$dir" + fi + + cd "$dir" + refs=($(git show-ref|sed 's@.*/@@'|sort -u))
This is the wrong way to read a command into an array. It'll break on legit whitespace in a ref, and its subject to glob expansion. Addtionally, it breaks refs with slashes in them. Right now, using this with Allan's working-split/binary-path branch will break. I'd use awk and read for this, if you'd like to keep this avenue... IFS=$'\n' read -rd '' -a refs < \ <(git show-ref | awk '{ sub(/refs\/[^/]+\/?/, "", $2); print $2 }') I'll point out that Allan's implementation has the advantage here of forcing that the user provide the _type_ of ref as the URI fragment. In this case, we know what kind of ref that translates to and git-show-ref can do the verification for us via the --verify -q flags.
+ if in_array "$fragment" "${refs[@]}"; then + # The ref is symbolic, so it may have changed upstream + msg2 "$(gettext 'fetching updates')" + git fetch --all
that --all flag seems like you're doing more work than you need to. A repo like linux.git is an extreme case, but could add a fair bit of weight to the call. No error checking?
+ fi + + # Generate tarball + archive=(git archive --format=tar --prefix="$base/" + "$fragment" -o "$tarball")
localize this.
+ msg2 "$(gettext 'trying to generate tarball')" + if "${archive[@]}" 2>/dev/null; then + # success + msg2 "$(gettext 'success')" + exit 0
exit 0 here is redundant, unless you expect that msg2 will fail.
+ else + # the relevent commit is probably newer than we have + msg2 "$(gettext 'failure, forcing an update')" + git fetch --all + msg2 "$(gettext 'generating tarball')" + "${archive[@]}"
Don't hide the command being executed. You've picked an appropriately descriptive name, but I think it'd better to file only the args into an array. It's easier to tell what's being executed at a glance, and it looks more "natural" in terms of command execution, e.g. bsdtar "${archive_args[@]}" In addition, this is the most important part of the function, but you're not doing any error checking.
+ local s=$? + exit $?
I can only assume there's a typo somewhere in here. You're throwing away your return value from the archive command and exiting (invariably) 0.
+ fi +} + +download_svn() { + msg "$(gettext 'Getting source from Subversion')" + parse_authority "$authority" + send_url="${scheme#svn+}://${sock}${path}" + args=('--config-dir' "$basedir/svn/.config")
localize this.
+ if [ -n "$user" ]; then + args+=('--username' "$user")
I just think it's weird seeing a combination of POSIX (single brace) and non-POSIX (arrays) syntax right next to each other =P.
+ fi + if [ -n "$pass" ]; then + args+=('--password' "$pass") + fi + if [ -n "$fragment" ]; then + args+=('-r' "$fragment") + fi + + # Where to checkout the code to + dir="$basedir/svn/$send_url#$fragment"
And localize this... There's more below, I won't point them out.
+ + # Download + if [ ! -d "$dir" ]; then + mkdir -p "$dir" + svn co "${args[@]}" "$send_url" "$dir" + else + cd "$dir" + svn up "${args[@]}" + fi + + # Generate tarball + mkdir "$dir.tmp$$"
If you really need to make a temporary directory, I suggest using mktemp, which can be made to honor the user's TMPDIR var. They'll thank you. but... see below.
+ cp -r "$dir" "$dir.tmp$$/$base" + find "$dir.tmp$$/$base" -name .svn -exec rm -rf '{}' + + cd "$dir.tmp$$" + bsdtar cf "$tarball" "$base"
bsdtar's --exclude flag makes this all unnecessary.
+ rm -rf "$dir.tmp$$" +} + +download_bzr() { + # TODO finish bzr support + error "$(gettext 'Bazaar not implemented')" + exit 1 + + send_url="${url%%#*}" + if [ "$scheme" != 'bzr+ssh' ]; then + send_url="${send_url#bzr+}" + fi + + dir="$basedir/bzr/$send_url" + + # Download + if [ ! -d "$dir" ]; then + msg2 "$(gettext 'doing initial checkout')" + mkdir -p "$dir" + bzr checkout "$send_url" "$dir" + fi + + cd "$dir" + bzr update + + # Generate tarball +} + +download_hg() { + msg "$(gettext 'Getting source from Mercurial')" + send_url="${url#hg+}" + hg clone -U "$send_url" + dir="$basedir/hg/${send_url%%#*}" + + # Download + if [ ! -d "$dir" ]; then + mkdir -p "$dir" + hg clone "$send_url" "$dir" + else + cd "$dir" + hg update -r "$fragment" + fi + + # Generate tarball + mkdir "$dir.tmp$$" + cp -r "$dir" "$dir.tmp$$/$base" + rm -rf "$dir.tmp$$/$base/.hg" + cd "$dir.tmp$$" + bsdtar cf "$tarball" "$base" + rm -rf "$dir.tmp$$"
Needs more error checking... and again with the --exclude.
+} + +main() { + url=$1 + tarball=$(readlink -m "$2")
As mentioned, readlink with flags is bad.
+ + base="$(echo "$tarball"|sed -r 's@.*/(.*)\.tar(\..*)?$@\1@')"
Augh... Who can read that crap? The PE version is so much nicer.
+ basedir="${tarball%/*}/vcs-cache" + + if ( in_array '-h' "$@" || in_array '--help' "$@" ); then
The subshell isn't at all useful here except for increasing your PID count. Just get rid of it.
+ usage + exit 0 + fi + if [ "$#" -ne 2 ]; then + usage + exit 1 + fi + + msg "$(printf "$(gettext "Saving '%s' to '%s'")" "$url" "$tarball")"
I don't think you need that printf there since msg() expects a format string.
+ + parse_url "$url" + case "$scheme" in + cvs) download_cvs;; + git+*|git) download_git;; + svn+*) download_svn;; + bzr) download_bzr;; + hg+*) download_hg;; + *) + error "$(gettext 'Unable to match a VCS with scheme'): $scheme" + exit 1;; + esac +} + +main "$@" >&2
This dumps _everything_ in the script to stderr, putting you on par with gnupg. Don't be like that.
-- 1.7.12
This allows vcs-* schemes to be used for vcsget. Signed-off-by: Luke Shumaker <LukeShu@sbcglobal.net> --- scripts/makepkg.sh.in | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in index b30e9d0..f2a5008 100644 --- a/scripts/makepkg.sh.in +++ b/scripts/makepkg.sh.in @@ -394,16 +394,25 @@ source_has_signatures() { return 1 } +string_matches_glob() { + local string=$1 + local glob=$2 + case "$string" in + $glob) return 0;; + esac + return 1 +} + get_downloadclient() { # $1 = URL with valid protocol prefix local url=$1 local proto="${url%%://*}" - # loop through DOWNLOAD_AGENTS variable looking for protocol + # loop through DLAGENTS (download agents) variable looking for protocol local i for i in "${DLAGENTS[@]}"; do local handler="${i%%::*}" - if [[ $proto = "$handler" ]]; then + if string_matches_glob "$proto" "$handler"; then local agent="${i##*::}" break fi -- 1.7.12
On Sat, Aug 25, 2012 at 01:36:41PM -0400, Luke Shumaker wrote:
A while ago I started working on a derivative of makepkg to support having 'git://...' type urls in the sources=() array. When preparing to file this patch, I did a `git rebase`, and noticed that Allan McRae began working on a similar feature. Our implementations are in many ways similar. Hopefully mine will be useful.
An interesting approach. As you've noticed, I think we're fairly committed to the implementation that Allan has provided. If you have specific concerns, maybe we can work to fix those.
My implementation makes minimal changes to makepkg itself (only adding blob expansion to DLAGENTS, allowing for things like "git+*::""). Instead I added a `vcsget` tool which generates a tarball from the VCS repo, in a very similar manner to the way Allan's implementation does so within makepkg.
I'm not thrilled with the shell I saw in that patch -- there's extensive use of external commands when simple bash features would have sufficed.
It looks as if Allan's download_*() functions are more verbose than mine about what failed when there is an error. His svn and hg handlers are likely more robust--though my git is pretty solid. I also have a half-written handler for for bzr.
An advantage of my design is that it does allow for integrity checks of VCS packages, rather than inserting 'SKIP' into the md5sums array. This is very important to the derivative distribution Parabola. (However, the 'SKIP' option is still valuable for URLs that track a branch)
I don't see this as an advantage so much as a duplication. The backing VCS takes care of integrity checks. They're only necessary with tarballs because there is no "builtin" checksum to reference.
Happy hacking, ~ Luke Shumaker
Luke Shumaker (3): Add a `vcsget` tool to download source from VCS repositories. makepkg: do glob expansion in DLAGENTS maps makepkg.conf: add vcsget DLAGENTS
etc/makepkg.conf.in | 8 +- scripts/.gitignore | 1 + scripts/Makefile.am | 4 +- scripts/makepkg.sh.in | 13 ++- scripts/vcsget.sh.in | 294 ++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 316 insertions(+), 4 deletions(-) create mode 100644 scripts/vcsget.sh.in
-- 1.7.12
At Sat, 25 Aug 2012 14:43:32 -0400, Dave Reisner wrote:
On Sat, Aug 25, 2012 at 01:36:41PM -0400, Luke Shumaker wrote:
A while ago I started working on a derivative of makepkg to support having 'git://...' type urls in the sources=() array. When preparing to file this patch, I did a `git rebase`, and noticed that Allan McRae began working on a similar feature. Our implementations are in many ways similar. Hopefully mine will be useful.
An interesting approach. As you've noticed, I think we're fairly committed to the implementation that Allan has provided. If you have specific concerns, maybe we can work to fix those.
* Allan's URL parsing has some issues, see point 2 below. This is an easy fix. * Composability. It's not really a "problem", but it seems to me that It's better to create a new general-purpose tool, instead of shoving a bunch of "special case" behavior into makepkg. Allan's implementation could even fairly easily be pulled out of makepkg. And really, does it make sense to have these URL schemes hardcoded into makepkg? Why have DLAGENTS at all, if we're going to hard-code schemes into makepkg? Again, I'm not against Allan's imlementation, but I am for moving it out of makepkg.
My implementation makes minimal changes to makepkg itself (only adding blob expansion to DLAGENTS, allowing for things like "git+*::""). Instead I added a `vcsget` tool which generates a tarball from the VCS repo, in a very similar manner to the way Allan's implementation does so within makepkg.
I'm not thrilled with the shell I saw in that patch -- there's extensive use of external commands when simple bash features would have sufficed.
I assume you're speaking of my 1. use of cut in get_field 2. use of sed to parse URLs 3. use of sed to parse URL authorities 4. use of readlink to establish the tarball name 5. use of sed to create a basedir name from the tarball name 6. use of '[' with if statements My defense of each: 1. `cut` in get_field --------------------- This was simply the simplest solution that was obviously correct. Another solution would have been to manipulate IFS, which has all kinds of funny side effects and fringe behaviors. That was a weak argument, and as a coder, I would be thrilled to find out if there is a better solution. 2. `sed` to parse URLs ---------------------- I wanted full URL parsing, for a few cases when the translation is less than than straight-forward. For example, including authentication info in svn URLs. Further, I wanted to make sure that *any* URL parsing done would be robust. Given that the regex used was taken straight from the relevent RFC, I knew I could count on it to work, even in fringe cases. For example a "fragment" *should* be able to contain a '#', but Allan's implementation discards anything before the last '#' in the fragment. This is a problem for (at least) git, as tags and branches may contain the symbol. 3. `sed` to parse URL authorities --------------------------------- I believe that a robust (not having the same problems as fragments for URLs) implementation in bash would be non-trivial. 4. `readlink` to establish the tarball name ------------------------------------------- I used `readlink -m` to turn the tarball name into an absolute path. This was not an absolute must, but it allowed me to avoid worrying about keeping track of the current directory. It would be fairly easy to remove this by useing pushd/popd instead of cd. If you would like this changed, I can do that. 5. `sed` to create a basedir name from the tarball name ------------------------------------------------------- I'll admit, this was me getting a little lazy. A pure bash implementation is: base=${tarball##*/} base=${base%%.tar.*} base=${base%.tar} 6. `[` with if statements ------------------------- Can I call stylistic choice on this one? It is trivial to replace this with bash's '[['.
It looks as if Allan's download_*() functions are more verbose than mine about what failed when there is an error. His svn and hg handlers are likely more robust--though my git is pretty solid. I also have a half-written handler for for bzr.
An advantage of my design is that it does allow for integrity checks of VCS packages, rather than inserting 'SKIP' into the md5sums array. This is very important to the derivative distribution Parabola. (However, the 'SKIP' option is still valuable for URLs that track a branch)
I don't see this as an advantage so much as a duplication. The backing VCS takes care of integrity checks. They're only necessary with tarballs because there is no "builtin" checksum to reference.
This is only partially true of the non-distributed VCSs. There was concern from other Parabola developers about being able to use the checksum to unambiguously be able to confirm that a source tree is the correct version, but Allan's implementation seems acceptable on that front. Happy hacking, ~ Luke Shumaker
On Mon, Aug 27, 2012 at 09:55:57AM -0400, Luke T.Shumaker wrote:
At Sat, 25 Aug 2012 14:43:32 -0400, Dave Reisner wrote:
On Sat, Aug 25, 2012 at 01:36:41PM -0400, Luke Shumaker wrote:
A while ago I started working on a derivative of makepkg to support having 'git://...' type urls in the sources=() array. When preparing to file this patch, I did a `git rebase`, and noticed that Allan McRae began working on a similar feature. Our implementations are in many ways similar. Hopefully mine will be useful.
An interesting approach. As you've noticed, I think we're fairly committed to the implementation that Allan has provided. If you have specific concerns, maybe we can work to fix those.
* Allan's URL parsing has some issues, see point 2 below. This is an easy fix.
Great, let's fix those.
* Composability. It's not really a "problem", but it seems to me that It's better to create a new general-purpose tool, instead of shoving a bunch of "special case" behavior into makepkg. Allan's implementation could even fairly easily be pulled out of makepkg.
-1 for moving it out of makepkg. Decoupling it means we would need to do a lot of sharing so that the tool is sure to understand makepkg's absurdly complicated environment. Not extensible, and there's a lot of code that would be "duplicated".
And really, does it make sense to have these URL schemes hardcoded into makepkg? Why have DLAGENTS at all, if we're going to hard-code schemes into makepkg? Again, I'm not against Allan's imlementation, but I am for moving it out of makepkg.
Let's see what's already established and available for http downloading: - curl - wget - aria2 - dog - fetch - bash tcp sockets (im sure there's more) Show me a popular reimplementation of the reference git/svn/bzr/cvs/darcs/hg client, and then we'll talk about a way to make it possible for the user to configure that in place of our "hardcoded default". The nerd in me wants to point out that there is no backing RFC for any of the VCS protocols so it doesn't belong in DLAGENTS, but we've violated that puritanism already by adding a DLAGENT for scp.
My implementation makes minimal changes to makepkg itself (only adding blob expansion to DLAGENTS, allowing for things like "git+*::""). Instead I added a `vcsget` tool which generates a tarball from the VCS repo, in a very similar manner to the way Allan's implementation does so within makepkg.
I'm not thrilled with the shell I saw in that patch -- there's extensive use of external commands when simple bash features would have sufficed.
I assume you're speaking of my 1. use of cut in get_field 2. use of sed to parse URLs 3. use of sed to parse URL authorities 4. use of readlink to establish the tarball name 5. use of sed to create a basedir name from the tarball name 6. use of '[' with if statements
Probably a good laundry list to start with. I'm happy to give a proper review of vcsget if you'd like, but I'll respond here in general terms.
My defense of each:
1. `cut` in get_field ---------------------
This was simply the simplest solution that was obviously correct. Another solution would have been to manipulate IFS, which has all kinds of funny side effects and fringe behaviors.
That was a weak argument, and as a coder, I would be thrilled to find out if there is a better solution.
Cut is useless. IFS manipulation with read is the preferred solution here. I'm curious what you think are the "fringe behaviors" in altering IFS (I suspect you're merely misusing it -- it's a common point of confusion). I've yet to come across any that make me want to use cut. I'd sooner offload a larger portion of work to awk then use cut for several reasons, anyways.
2. `sed` to parse URLs ----------------------
I wanted full URL parsing, for a few cases when the translation is less than than straight-forward. For example, including authentication info in svn URLs.
Further, I wanted to make sure that *any* URL parsing done would be robust. Given that the regex used was taken straight from the relevent RFC, I knew I could count on it to work, even in fringe cases. For example a "fragment" *should* be able to contain a '#', but Allan's implementation discards anything before the last '#' in the fragment. This is a problem for (at least) git, as tags and branches may contain the symbol.
bash has regex matching which supports ERE with backreferences -- this could easily be adopted to split the URL all at once, simply saving the relevant pieces of BASH_REMATCH after the fact. See the =~ operator in bash(1).
3. `sed` to parse URL authorities ---------------------------------
I believe that a robust (not having the same problems as fragments for URLs) implementation in bash would be non-trivial.
See above.
4. `readlink` to establish the tarball name -------------------------------------------
I used `readlink -m` to turn the tarball name into an absolute path. This was not an absolute must, but it allowed me to avoid worrying about keeping track of the current directory. It would be fairly easy to remove this by useing pushd/popd instead of cd. If you would like this changed, I can do that.
This is _not_ portable. We've refrained from using readlink at all in our codebase except in the untainted form. The only thing you can portably rely on readlink to do is to wrap the readlink syscall. Better yet, figure out why a potential symlink isn't good enough and fix that. You shouldn't need to know the actual target that a symlink points to, since 90% of a tool will dereference it for you.
5. `sed` to create a basedir name from the tarball name -------------------------------------------------------
I'll admit, this was me getting a little lazy. A pure bash implementation is:
base=${tarball##*/} base=${base%%.tar.*} base=${base%.tar}
And with an extglob, you can more precisely cut that down to: base=${tarball##*/} base=${base%.tar?(.*)} The unfortunately named foo.tar.bar.tar.gz wouldn't be "broken" in this way.
6. `[` with if statements -------------------------
Can I call stylistic choice on this one? It is trivial to replace this with bash's '[['.
Apologies for the incoming rant. No, this isn't merely an issue of "style". [[ is far superior to [ in several ways. It's a bash keyword, making it far faster than the [ builtin, it supports pattern matching with globs and regex, and it has defined behavior when you pass more than 3 arguments. Quoting semantics are far simpler (and safer), too. Consider a contrived example which, actually, you see far too often: [ -n $foo ] This is always true, regardless of whether or not 'foo' is defined. since '[' is a builtin, which means it behaves no different than another other standard command. In reality, it's the same as: [ -n ] This is _true_, because you've really only passed a single argument to [. In the single argument case, the existance test is performed. Since '-n' always has a length, it's true. By comparison, the preferred bash construction: [[ -n $foo ]] This always does the right thing because the lexer passes over it and sees the variable before its expanded. It understands that even though foo might not be defined, you've passed an argument and exits with an error when the argument is the empty string. The only time you need to quote arguments to tests is when dealing with equations: [[ $foo = $bar ]] is not the same as [[ $foo = "$bar" ]] Generally, you want the latter since the RHS (right hand side) is subject to glob expansion. Quoting it ensures that any glob characters are not expanded. The LHS is _never_ subject to glob expansion, but will require quoting if it contains whitespace.
It looks as if Allan's download_*() functions are more verbose than mine about what failed when there is an error. His svn and hg handlers are likely more robust--though my git is pretty solid. I also have a half-written handler for for bzr.
An advantage of my design is that it does allow for integrity checks of VCS packages, rather than inserting 'SKIP' into the md5sums array. This is very important to the derivative distribution Parabola. (However, the 'SKIP' option is still valuable for URLs that track a branch)
I don't see this as an advantage so much as a duplication. The backing VCS takes care of integrity checks. They're only necessary with tarballs because there is no "builtin" checksum to reference.
This is only partially true of the non-distributed VCSs.
There was concern from other Parabola developers about being able to use the checksum to unambiguously be able to confirm that a source tree is the correct version, but Allan's implementation seems acceptable on that front.
Happy hacking, ~ Luke Shumaker
At Mon, 27 Aug 2012 10:53:32 -0400, Dave Reisner wrote:
On Mon, Aug 27, 2012 at 09:55:57AM -0400, Luke T.Shumaker wrote:
At Sat, 25 Aug 2012 14:43:32 -0400, Dave Reisner wrote:
On Sat, Aug 25, 2012 at 01:36:41PM -0400, Luke Shumaker wrote:
A while ago I started working on a derivative of makepkg to support having 'git://...' type urls in the sources=() array. When preparing to file this patch, I did a `git rebase`, and noticed that Allan McRae began working on a similar feature. Our implementations are in many ways similar. Hopefully mine will be useful.
An interesting approach. As you've noticed, I think we're fairly committed to the implementation that Allan has provided. If you have specific concerns, maybe we can work to fix those.
* Allan's URL parsing has some issues, see point 2 below. This is an easy fix.
Great, let's fix those.
* Composability. It's not really a "problem", but it seems to me that It's better to create a new general-purpose tool, instead of shoving a bunch of "special case" behavior into makepkg. Allan's implementation could even fairly easily be pulled out of makepkg.
-1 for moving it out of makepkg. Decoupling it means we would need to do a lot of sharing so that the tool is sure to understand makepkg's absurdly complicated environment. Not extensible, and there's a lot of code that would be "duplicated".
And really, does it make sense to have these URL schemes hardcoded into makepkg? Why have DLAGENTS at all, if we're going to hard-code schemes into makepkg? Again, I'm not against Allan's imlementation, but I am for moving it out of makepkg.
Let's see what's already established and available for http downloading:
- curl - wget - aria2 - dog - fetch - bash tcp sockets (im sure there's more)
Show me a popular reimplementation of the reference git/svn/bzr/cvs/darcs/hg client, and then we'll talk about a way to make it possible for the user to configure that in place of our "hardcoded default".
I can't speak for the others, but there are multiple implementations of git (though I am only personally familiar with the original).
The nerd in me wants to point out that there is no backing RFC for any of the VCS protocols so it doesn't belong in DLAGENTS, but we've violated that puritanism already by adding a DLAGENT for scp.
Regardless, we *are* mapping the VCS tools to URLs. There are plenty of URL schemes in common use that aren't registered with IANA (per RFC4395/BCP35). Another effect of the Allan's current implementation is that adding another scheme requires modification to makepkg, event if it is just to pass it to DLAGENTS. I'd argue that even if we have the special cases, the correct behavior would be to have download_file (DLAGENTS) as a fallback, instead of only tripping on 'ftp|http|https|rsync|scp'. Further, a use case I see is a user putting some funky stuff in DLAGENTS to either use an offline cache, or do some network stuff to deal with a contrived network setup. Also, the nerd in me wants to point out that the makepkg source always refers to it as a 'protocol' (or 'proto') when RFC3986 says that it should be 'scheme'.
My implementation makes minimal changes to makepkg itself (only adding blob expansion to DLAGENTS, allowing for things like "git+*::""). Instead I added a `vcsget` tool which generates a tarball from the VCS repo, in a very similar manner to the way Allan's implementation does so within makepkg.
I'm not thrilled with the shell I saw in that patch -- there's extensive use of external commands when simple bash features would have sufficed.
I assume you're speaking of my 1. use of cut in get_field 2. use of sed to parse URLs 3. use of sed to parse URL authorities 4. use of readlink to establish the tarball name 5. use of sed to create a basedir name from the tarball name 6. use of '[' with if statements
Probably a good laundry list to start with. I'm happy to give a proper review of vcsget if you'd like, but I'll respond here in general terms.
That would be great!
My defense of each:
1. `cut` in get_field ---------------------
This was simply the simplest solution that was obviously correct. Another solution would have been to manipulate IFS, which has all kinds of funny side effects and fringe behaviors.
That was a weak argument, and as a coder, I would be thrilled to find out if there is a better solution.
Cut is useless. IFS manipulation with read is the preferred solution here. I'm curious what you think are the "fringe behaviors" in altering IFS (I suspect you're merely misusing it -- it's a common point of confusion). I've yet to come across any that make me want to use cut. I'd sooner offload a larger portion of work to awk then use cut for several reasons, anyways.
I'll give you that awk might be better--but I was never comfortable with awk. The problem with IFS that stopped me from using it is that it always* will collapse multiple whitespace characters, so it is impossible to have an empty column in the middle if useing a ' ' delimiter. For some reason I was thinking that space was the only safe (printable ASCII) character to use--I now realize that both '<' and '>' would work fine. * perhaps there is a shopt option, but I am unaware of it
2. `sed` to parse URLs ----------------------
I wanted full URL parsing, for a few cases when the translation is less than than straight-forward. For example, including authentication info in svn URLs.
Further, I wanted to make sure that *any* URL parsing done would be robust. Given that the regex used was taken straight from the relevent RFC, I knew I could count on it to work, even in fringe cases. For example a "fragment" *should* be able to contain a '#', but Allan's implementation discards anything before the last '#' in the fragment. This is a problem for (at least) git, as tags and branches may contain the symbol.
bash has regex matching which supports ERE with backreferences -- this could easily be adopted to split the URL all at once, simply saving the relevant pieces of BASH_REMATCH after the fact. See the =~ operator in bash(1).
I was unaware of that feature. It would save me a lot of effort.
3. `sed` to parse URL authorities ---------------------------------
I believe that a robust (not having the same problems as fragments for URLs) implementation in bash would be non-trivial.
See above.
4. `readlink` to establish the tarball name -------------------------------------------
I used `readlink -m` to turn the tarball name into an absolute path. This was not an absolute must, but it allowed me to avoid worrying about keeping track of the current directory. It would be fairly easy to remove this by useing pushd/popd instead of cd. If you would like this changed, I can do that.
This is _not_ portable. We've refrained from using readlink at all in our codebase except in the untainted form. The only thing you can portably rely on readlink to do is to wrap the readlink syscall. Better yet, figure out why a potential symlink isn't good enough and fix that. You shouldn't need to know the actual target that a symlink points to, since 90% of a tool will dereference it for you.
My use of `readlink` had nothing to do with symlinks; it is simply to turn a relative path into an absolute path. An 'unintended' use of readlink, but the UNIX authors would say that so is `cat` to display a single file is too. I've seen this in several codebases, and is the most reliable method of doing this in shell, assuming GNU coreutils. As I said, I could have avoided this by being more careful with `cd` and doing a little more directory hopping.
5. `sed` to create a basedir name from the tarball name -------------------------------------------------------
I'll admit, this was me getting a little lazy. A pure bash implementation is:
base=${tarball##*/} base=${base%%.tar.*} base=${base%.tar}
And with an extglob, you can more precisely cut that down to:
base=${tarball##*/} base=${base%.tar?(.*)}
The unfortunately named foo.tar.bar.tar.gz wouldn't be "broken" in this way.
I was unaware of extglob offering that. Dang these shopt features I have to turn on! However, given that input, I would have thought that 'foo' is the correct output, so that's what I wrote.
6. `[` with if statements -------------------------
Can I call stylistic choice on this one? It is trivial to replace this with bash's '[['.
Apologies for the incoming rant.
No, this isn't merely an issue of "style". [[ is far superior to [ in several ways. It's a bash keyword, making it far faster than the [ builtin, it supports pattern matching with globs and regex, and it has defined behavior when you pass more than 3 arguments. Quoting semantics are far simpler (and safer), too. Consider a contrived example which, actually, you see far too often:
[ -n $foo ]
This is always true, regardless of whether or not 'foo' is defined. since '[' is a builtin, which means it behaves no different than another other standard command. In reality, it's the same as:
[ -n ]
This is _true_, because you've really only passed a single argument to [. In the single argument case, the existance test is performed. Since '-n' always has a length, it's true.
By comparison, the preferred bash construction:
[[ -n $foo ]]
This always does the right thing because the lexer passes over it and sees the variable before its expanded. It understands that even though foo might not be defined, you've passed an argument and exits with an error when the argument is the empty string. The only time you need to quote arguments to tests is when dealing with equations:
[[ $foo = $bar ]]
is not the same as
[[ $foo = "$bar" ]]
Generally, you want the latter since the RHS (right hand side) is subject to glob expansion. Quoting it ensures that any glob characters are not expanded. The LHS is _never_ subject to glob expansion, but will require quoting if it contains whitespace.
OK, 'style' was the wrong word. I am in the habbit of using '[' from having to code for Almquist-family shells, including dash, busybox, and BDS's sh. I realize the performance advantage of '[[' for Bash. For every way I used it (and I know this isn't always the case), the two are identical in result. As for quoting rules--I always disliked the 'simpler' rules for '[[', the seem like they were trying to compensate for bad programmers, (in the first example) I'd have to quote the variable if I were to do a similar task with any other command. ---- Well, I have been taught a few features of Bash, and been made to feel like I noob. Thank you, this is learning! Happy hacking, ~ Luke Shumaker
On Mon, Aug 27, 2012 at 08:17:22PM -0400, Luke T.Shumaker wrote: Ermagerd, snerping...
I'll give you that awk might be better--but I was never comfortable with awk. The problem with IFS that stopped me from using it is that it always* will collapse multiple whitespace characters, so it is impossible to have an empty column in the middle if useing a ' ' delimiter. For some reason I was thinking that space was the only safe (printable ASCII) character to use--I now realize that both '<' and '>' would work fine.
* perhaps there is a shopt option, but I am unaware of it
Yes, this is correct about collapsing empty columns. You can preserve those empty columns with some hackery, but you cannot split on them. And yes, this is one case where I would move to awk. It's a lovely little language that I find to be extremely flexible, and oh gosh readable -- especially when matched up against the likes of sed.
My use of `readlink` had nothing to do with symlinks; it is simply to turn a relative path into an absolute path. An 'unintended' use of readlink, but the UNIX authors would say that so is `cat` to display a single file is too.
And this is just a result of me not really reading vcsget too closely when I crafted this reply. If you need an absolute path, just look at the first character. If its a slash, you're done. If it's not, then prepend $PWD/.
I've seen this in several codebases, and is the most reliable method of doing this in shell, assuming GNU coreutils.
Yup, but we don't assume that.
As for quoting rules--I always disliked the 'simpler' rules for '[[', the seem like they were trying to compensate for bad programmers, (in the first example) I'd have to quote the variable if I were to do a similar task with any other command.
Sure... so many bugs in shell boil down to quoting problems, so I'm actually happy to have one respite here. I've written a _lot_ of shell over the past 2-3 years.
----
Well, I have been taught a few features of Bash, and been made to feel like I noob. Thank you, this is learning!
It's not clear that anyone ever _really_ knows shell. Over time, you just make fewer grevious errors.
On 28/08/12 10:17, Luke T.Shumaker wrote:
Another effect of the Allan's current implementation is that adding another scheme requires modification to makepkg, event if it is just to pass it to DLAGENTS. I'd argue that even if we have the special cases, the correct behavior would be to have download_file (DLAGENTS) as a fallback, instead of only tripping on 'ftp|http|https|rsync|scp'.
Fix on its way.
Further, a use case I see is a user putting some funky stuff in DLAGENTS to either use an offline cache, or do some network stuff to deal with a contrived network setup.
This is the only thing I think needs dealt with... maybe... Can you give a concrete example? Thanks, Allan
On 27/08/12 23:55, Luke T.Shumaker wrote:
For example a "fragment" *should* be able to contain a '#', but Allan's implementation discards anything before the last '#' in the fragment. This is a problem for (at least) git, as tags and branches may contain the symbol.
Fix on its way. Allan
On 26/08/12 03:36, Luke Shumaker wrote:
An advantage of my design is that it does allow for integrity checks of VCS packages, rather than inserting 'SKIP' into the md5sums array. This is very important to the derivative distribution Parabola. (However, the 'SKIP' option is still valuable for URLs that track a branch)
Can you explain why this is important? That would help me understand what you are trying to achieve that can not be done with the current system. The only reason I can see to create a tarball is to distribute the source on its own. Using "makepkg --allsource" creates a full source tarball including the VCS sources. If you are worried about integrity of those VCS sources in the source tarball, adding a checksum to the PKGBUILD does nothing as the PKGBUILD can be edited too. You are best to use "makepkg --allsource" and PGP sign the resulting tarball. But perhaps I entirely missed the issue... A comment that I need to make is about the need for a separate tool to download the vcs sources. We used to have a script called "versionpkg" that dealt with VCS packages. That got merged into makepkg and my recent work was to fully integrate VCS packaging into makepkg. So going using a separate script to deal with VCS sources is really a step or two backwards. Allan
At Mon, 27 Aug 2012 14:12:21 +1000, Allan McRae wrote:
On 26/08/12 03:36, Luke Shumaker wrote:
An advantage of my design is that it does allow for integrity checks of VCS packages, rather than inserting 'SKIP' into the md5sums array. This is very important to the derivative distribution Parabola. (However, the 'SKIP' option is still valuable for URLs that track a branch)
Can you explain why this is important? That would help me understand what you are trying to achieve that can not be done with the current system.
The only reason I can see to create a tarball is to distribute the source on its own. Using "makepkg --allsource" creates a full source tarball including the VCS sources. If you are worried about integrity of those VCS sources in the source tarball, adding a checksum to the PKGBUILD does nothing as the PKGBUILD can be edited too. You are best to use "makepkg --allsource" and PGP sign the resulting tarball.
Some of the Parabola devs have a weird fixation with checksums. However, after discussing it with them, your implementation addresses all of their major concerns. My only concern--and this is a minor one--is that if a user uses the fragment to check out a specific commit, with (for example) svn, if that commit changed upstream (which requires wizardry, but does happen), then you have no idea at download time.
But perhaps I entirely missed the issue...
A comment that I need to make is about the need for a separate tool to download the vcs sources. We used to have a script called "versionpkg" that dealt with VCS packages. That got merged into makepkg and my recent work was to fully integrate VCS packaging into makepkg. So going using a separate script to deal with VCS sources is really a step or two backwards.
The difference is that versionpkg was invoked directly by the user, distracting them from makepkg. My `vcsget` tool is simply a downloader-agent. The idea was to reinforce makepkg's paradigm: extract scheme from URL, look for entry in DLAGENTS, use that. The way you've done it, it checks if the scheme is in a list, then checks DLAGENTS if it is, or does a number of special things based on what scheme it is. Given it's simplicity, if we're shoving specialized logic for each scheme into makepkg, why have DLAGENTS at all? Why not just move that collection of one-liners into makepkg? If that's the direction you want to go, that's fine, but I'd wanted to stick to the current paradigm of using DLAGENTS, and having the URLs be basically opaque to makepkg. Thanks for your reply! Happy hacking, ~ Luke Shumaker
participants (4)
-
Allan McRae
-
Dave Reisner
-
Luke Shumaker
-
Luke T.Shumaker