[arch-projects] [dbscripts] [PATCH 1/8] Fix quoting around variables, especially arrays.

Luke Shumaker lukeshu at lukeshu.com
Wed Mar 14 04:53:21 UTC 2018


On Wed, 14 Mar 2018 00:11:05 -0400,
Eli Schwartz via arch-projects wrote:
> >  	while read line; do
> > -		pkginfo=(${line})
> > +		pkginfo=("${line}")
> 
> That's completely wrong, just look at the next five lines.
> 
> >  		pkgbase=${pkginfo[0]}
> >  		pkgver=${pkginfo[1]}
> >  		pkgarch=${pkginfo[2]}
> > -		pkglicense=(${pkginfo[@]:3})
> > +		pkglicense=("${pkginfo[@]:3}")

You're absolutely right.  I realized I screwed up right after I sent
it.

> I don't really see the need to quote every variable just because it is a
> variable, at least in cases where the variable is fairly statically
> defined... I'd like to see path components that could have spaces
> quoted, and arrays I guess because semantically that's how you iterate
> over an array. Also you introduced some bugs, in cases where we actually
> want whitespace splitting...

Part of it is to have a common style.  Trying to rectify two codebases
that diverged 7 years ago is rough.  When trying to come up with clean
diffs, having to guess "did the other one quote this variable?" makes
it harder.  If you can say "always quote (except for the LHS of [[
]])" or something, that makes it a bit easier.

> >  backup_package_variables() {
> > -	for var in ${splitpkg_overrides[@]}; do
> > +	for var in "${splitpkg_overrides[@]}"; do
> >  		indirect="${var}_backup"
> > -		eval "${indirect}=(\${$var[@]})"
> > +		eval "${indirect}=(\"\${$var[@]}\")"
> >  	done
> >  }
> >  
> >  restore_package_variables() {
> > -	for var in ${splitpkg_overrides[@]}; do
> > +	for var in "${splitpkg_overrides[@]}"; do
> >  		indirect="${var}_backup"
> >  		if [ -n "${!indirect}" ]; then
> > -			eval "${var}=(\${$indirect[@]})"
> > +			eval "${var}=(\"\${$indirect[@]}\")"
> >  		else
> > -			unset ${var}
> > +			unset "${var}"
> >  		fi
> >  	done
> 
> This is too much escaping and metaprogramming, there are better ways of
> backing up a variable to begin with. :/
> 
> We do it in makepkg, I will have us do it here as well. Advantage: using
> declare -p means the shell auto-escapes things where needed.

I haven't been keeping my thumb on makepkg git, but the eval lines as
I wrote them exactly match the eval lines in makepkg 5.0.2's version
of {backup,restore}_package_variables (makepkg's versions don't quote
the for loops, or the unset command).

> > -	if ! ${CLEANUP_DRYRUN}; then
> > +	if ! "${CLEANUP_DRYRUN}"; then
> 
> This is a variable being run as a command, so if there were to be spaces
> in it we'd end up trying to run a command with a space in it. Arguably
> we should not be running this as a command (even though they are set to
> true/false which is a shell builtin blah blah blah) but since we are it
> would be illogical to indicate that if there are spaces they should be
> interpreted as string literals in an executable filename.

For the true/false idiom, quoting it is just a style rule.  I figure
accepting the true/false idiom doesn't imply allowing the boolean
variable to have any value.  Having the quotes would help catch the
variable being erroneously set to a different value.

> > -		if ! ([[ -z ${ALLOWED_LICENSES[@]} ]] || chk_license ${pkglicense[@]} || grep -Fqx "${pkgbase}" "${dirname}/sourceballs.force"); then
> > +		if ! ([[ -z ${ALLOWED_LICENSES[*]} ]] || chk_license "${pkglicense[@]}" || grep -Fqx "${pkgbase}" "${dirname}/sourceballs.force"); then
> 
> What's the check here anyways? This considers ALLOWED_LICENSES=('') to
> be non-empty, so we might as well check the length of
> ${#ALLOWED_LICENSES[@]} which is a clearer read.

Agree.

> > -	if [[ ! -z "$(echo ${@%\.*} | sed "s/ /\n/g" | sort | uniq -D)" ]]; then
> > +	if [[ ! -z "$(printf '%s\n' "${@%\.*}" | sort | uniq -D)" ]]; then
> 
> Thanks for noticing this! printf is a lot nicer. I'm also wondering if
> it makes more sense to use a single awk 'a[$0]++{exit 1}' rather than
> chaining sort/uniq and reading its length in a bash test... the problem
> here was always that uniq -D returns successfully even if it prints
> nothing. With awk we can actually have an exit code stating whether
> duplicates were found.

Yeah.  Awk is probably the better solution here.  Even if it's not,
that !-z could be -n.  This commit was mostly a dumb grep for unquoted
variables (also, awk exiting early is safe, because Bash builtins are
silent when the recieve SIGPIPE).

> > -		arch_pkgs=($(getpkgfiles "${STAGING}/${repo}/"*-${pkgarch}${PKGEXTS} 2>/dev/null))
> > -		for pkg in ${arch_pkgs[@]} ${any_pkgs[@]}; do
> > +		arch_pkgs=($(getpkgfiles "${STAGING}/${repo}/"*-"${pkgarch}"${PKGEXTS} 2>/dev/null))
> > +		for pkg in "${arch_pkgs[@]}" "${any_pkgs[@]}"; do
> 
> Dropping in and out of quotes here a number of times sort of hammers
> home the point to me that maybe we really don't need to do this.
> Configurable directories need to be quoted, yes, but then dropping out
> to expand a wildcard, failing to quote the - for good measure, and
> quoting pkgarch only to resort to expanding PKGEXTS as a glob pattern...
> 
> It is only possible anyways for $STAGING to have spaces.

At some point, I'd like to have `make lint` run shellcheck over
dbscripts.  That's a long way off, both because of a whole bunch of
changes needed in dbscripts to make it come back clean, and a few
features needed in shellcheck to avoid having to drop entirely too
many shellcheck directives in to the dbscripts source.

Anyway, I know linters should be taken with a grain of salt, but when
there's something simple like this, that you know just about any
linter would complain about... why not?

-- 
Happy hacking,
~ Luke Shumaker


More information about the arch-projects mailing list