[arch-projects] [dbscripts] [PATCH 0/8] Backports from Parabola
From: Luke Shumaker
From: Luke Shumaker
On 03/13/2018 09:51 PM, Luke Shumaker wrote:
From: Luke Shumaker
Other than pure quoting, this involved: - swapping */@ for array access in a few places - fiddling with printf in a pipeline - replacing `$(echo ${array[@]})` with `${array[*]}` - replacing `echo $(...)` with `...`
When searching for these things, I used the command:
grep -Prn --exclude-dir=.git '(?
and ignored a bunch of false positives.
I don't really see the need to quote every variable just because it is a variable, at least in cases where the variable is fairly statically defined... I'd like to see path components that could have spaces quoted, and arrays I guess because semantically that's how you iterate over an array. Also you introduced some bugs, in cases where we actually want whitespace splitting...
backup_package_variables() { - for var in ${splitpkg_overrides[@]}; do + for var in "${splitpkg_overrides[@]}"; do indirect="${var}_backup" - eval "${indirect}=(\${$var[@]})" + eval "${indirect}=(\"\${$var[@]}\")" done }
restore_package_variables() { - for var in ${splitpkg_overrides[@]}; do + for var in "${splitpkg_overrides[@]}"; do indirect="${var}_backup" if [ -n "${!indirect}" ]; then - eval "${var}=(\${$indirect[@]})" + eval "${var}=(\"\${$indirect[@]}\")" else - unset ${var} + unset "${var}" fi done
This is too much escaping and metaprogramming, there are better ways of backing up a variable to begin with. :/ We do it in makepkg, I will have us do it here as well. Advantage: using declare -p means the shell auto-escapes things where needed.
- if ! ${CLEANUP_DRYRUN}; then + if ! "${CLEANUP_DRYRUN}"; then
This is a variable being run as a command, so if there were to be spaces in it we'd end up trying to run a command with a space in it. Arguably we should not be running this as a command (even though they are set to true/false which is a shell builtin blah blah blah) but since we are it would be illogical to indicate that if there are spaces they should be interpreted as string literals in an executable filename.
-${CLEANUP_DRYRUN} && warning 'dry run mode is active' +"${CLEANUP_DRYRUN}" && warning 'dry run mode is active'
Same.
- if ! ${CLEANUP_DRYRUN}; then + if ! "${CLEANUP_DRYRUN}"; then
Same.
# Create a readable file for each repo with the following format #
<pkgver>-<pkgrel> <arch> <license>[ <license>]
When we consume this file...
while read line; do - pkginfo=(${line}) + pkginfo=("${line}")
That's completely wrong, just look at the next five lines.
pkgbase=${pkginfo[0]} pkgver=${pkginfo[1]} pkgarch=${pkginfo[2]} - pkglicense=(${pkginfo[@]:3}) + pkglicense=("${pkginfo[@]:3}")
How will we extract elements of this array, if your quoting squashes everything down into one array element? ${pkginfo[0]} will have too much, and the other elements simply won't exist at all. We go from having: declare -a pkginfo=([0]="foo" [1]="1.0-1" [2]="any" [3]="GPL") to having: declare -a pkginfo=([0]="foo 1.0-1 any GPL")
- if ! ([[ -z ${ALLOWED_LICENSES[@]} ]] || chk_license ${pkglicense[@]} || grep -Fqx "${pkgbase}" "${dirname}/sourceballs.force"); then + if ! ([[ -z ${ALLOWED_LICENSES[*]} ]] || chk_license "${pkglicense[@]}" || grep -Fqx "${pkgbase}" "${dirname}/sourceballs.force"); then
What's the check here anyways? This considers ALLOWED_LICENSES=('') to be non-empty, so we might as well check the length of ${#ALLOWED_LICENSES[@]} which is a clearer read.
- ${SOURCE_CLEANUP_DRYRUN} && warning 'dry run mode is active' - for old_pkg in ${old_pkgs[@]}; do + "${SOURCE_CLEANUP_DRYRUN}" && warning 'dry run mode is active' + for old_pkg in "${old_pkgs[@]}"; do msg2 "${old_pkg}" - if ! ${SOURCE_CLEANUP_DRYRUN}; then + if ! "${SOURCE_CLEANUP_DRYRUN}"; then mv_acl "$FTP_BASE/${SRCPOOL}/${old_pkg}" "${SOURCE_CLEANUP_DESTDIR}/${old_pkg}" touch "${SOURCE_CLEANUP_DESTDIR}/${old_pkg}" fi @@ -147,9 +147,9 @@ done
if (( ${#old_pkgs[@]} >= 1 )); then msg "Removing old source packages from the cleanup directory..." - for old_pkg in ${old_pkgs[@]}; do + for old_pkg in "${old_pkgs[@]}"; do msg2 "${old_pkg}" - ${SOURCE_CLEANUP_DRYRUN} || rm -f "${SOURCE_CLEANUP_DESTDIR}/${old_pkg}" + "${SOURCE_CLEANUP_DRYRUN}" || rm -f "${SOURCE_CLEANUP_DESTDIR}/${old_pkg}" done
More commands-as-variables where whitespace if it existed would actually be significant and wanted.
- if [[ ! -z "$(echo ${@%\.*} | sed "s/ /\n/g" | sort | uniq -D)" ]]; then + if [[ ! -z "$(printf '%s\n' "${@%\.*}" | sort | uniq -D)" ]]; then
Thanks for noticing this! printf is a lot nicer. I'm also wondering if it makes more sense to use a single awk 'a[$0]++{exit 1}' rather than chaining sort/uniq and reading its length in a bash test... the problem here was always that uniq -D returns successfully even if it prints nothing. With awk we can actually have an exit code stating whether duplicates were found.
- local svnnames=($(. "${WORKDIR}/pkgbuilds/${repo}-${_pkgarch}/${_pkgbase}"; echo ${pkgname[@]})) - for svnname in ${svnnames[@]}; do - echo "${svnname}" >> "${repo}/${_pkgarch}/${_pkgbase}/svn" - done + local svnnames=($(. "${WORKDIR}/pkgbuilds/${repo}-${_pkgarch}/${_pkgbase}"; echo "${pkgname[@]}")) + printf '%s\n' "${svnnames[@]}" >> "${repo}/${_pkgarch}/${_pkgbase}/svn"
Again this does actually look a lot nicer, thanks for spotting this.
- arch_pkgs=($(getpkgfiles "${STAGING}/${repo}/"*-${pkgarch}${PKGEXTS} 2>/dev/null)) - for pkg in ${arch_pkgs[@]} ${any_pkgs[@]}; do + arch_pkgs=($(getpkgfiles "${STAGING}/${repo}/"*-"${pkgarch}"${PKGEXTS} 2>/dev/null)) + for pkg in "${arch_pkgs[@]}" "${any_pkgs[@]}"; do
Dropping in and out of quotes here a number of times sort of hammers home the point to me that maybe we really don't need to do this. Configurable directories need to be quoted, yes, but then dropping out to expand a wildcard, failing to quote the - for good measure, and quoting pkgarch only to resort to expanding PKGEXTS as a glob pattern... It is only possible anyways for $STAGING to have spaces.
- ${found_source} || die "%s not found in [%s]" "$pkgbase" "$TESTING_REPO" + "${found_source}" || die "%s not found in [%s]" "$pkgbase" "$TESTING_REPO"
command as variable
- ${found_target} || die "%s not found in any of these repos: %s" "$pkgbase" "${STABLE_REPOS[*]}" + "${found_target}" || die "%s not found in any of these repos: %s" "$pkgbase" "${STABLE_REPOS[*]}"
This too. -- Eli Schwartz Bug Wrangler and Trusted User
On Wed, 14 Mar 2018 00:11:05 -0400, Eli Schwartz via arch-projects wrote:
while read line; do - pkginfo=(${line}) + pkginfo=("${line}")
That's completely wrong, just look at the next five lines.
pkgbase=${pkginfo[0]} pkgver=${pkginfo[1]} pkgarch=${pkginfo[2]} - pkglicense=(${pkginfo[@]:3}) + pkglicense=("${pkginfo[@]:3}")
You're absolutely right. I realized I screwed up right after I sent it.
I don't really see the need to quote every variable just because it is a variable, at least in cases where the variable is fairly statically defined... I'd like to see path components that could have spaces quoted, and arrays I guess because semantically that's how you iterate over an array. Also you introduced some bugs, in cases where we actually want whitespace splitting...
Part of it is to have a common style. Trying to rectify two codebases that diverged 7 years ago is rough. When trying to come up with clean diffs, having to guess "did the other one quote this variable?" makes it harder. If you can say "always quote (except for the LHS of [[ ]])" or something, that makes it a bit easier.
backup_package_variables() { - for var in ${splitpkg_overrides[@]}; do + for var in "${splitpkg_overrides[@]}"; do indirect="${var}_backup" - eval "${indirect}=(\${$var[@]})" + eval "${indirect}=(\"\${$var[@]}\")" done }
restore_package_variables() { - for var in ${splitpkg_overrides[@]}; do + for var in "${splitpkg_overrides[@]}"; do indirect="${var}_backup" if [ -n "${!indirect}" ]; then - eval "${var}=(\${$indirect[@]})" + eval "${var}=(\"\${$indirect[@]}\")" else - unset ${var} + unset "${var}" fi done
This is too much escaping and metaprogramming, there are better ways of backing up a variable to begin with. :/
We do it in makepkg, I will have us do it here as well. Advantage: using declare -p means the shell auto-escapes things where needed.
I haven't been keeping my thumb on makepkg git, but the eval lines as I wrote them exactly match the eval lines in makepkg 5.0.2's version of {backup,restore}_package_variables (makepkg's versions don't quote the for loops, or the unset command).
- if ! ${CLEANUP_DRYRUN}; then + if ! "${CLEANUP_DRYRUN}"; then
This is a variable being run as a command, so if there were to be spaces in it we'd end up trying to run a command with a space in it. Arguably we should not be running this as a command (even though they are set to true/false which is a shell builtin blah blah blah) but since we are it would be illogical to indicate that if there are spaces they should be interpreted as string literals in an executable filename.
For the true/false idiom, quoting it is just a style rule. I figure accepting the true/false idiom doesn't imply allowing the boolean variable to have any value. Having the quotes would help catch the variable being erroneously set to a different value.
- if ! ([[ -z ${ALLOWED_LICENSES[@]} ]] || chk_license ${pkglicense[@]} || grep -Fqx "${pkgbase}" "${dirname}/sourceballs.force"); then + if ! ([[ -z ${ALLOWED_LICENSES[*]} ]] || chk_license "${pkglicense[@]}" || grep -Fqx "${pkgbase}" "${dirname}/sourceballs.force"); then
What's the check here anyways? This considers ALLOWED_LICENSES=('') to be non-empty, so we might as well check the length of ${#ALLOWED_LICENSES[@]} which is a clearer read.
Agree.
- if [[ ! -z "$(echo ${@%\.*} | sed "s/ /\n/g" | sort | uniq -D)" ]]; then + if [[ ! -z "$(printf '%s\n' "${@%\.*}" | sort | uniq -D)" ]]; then
Thanks for noticing this! printf is a lot nicer. I'm also wondering if it makes more sense to use a single awk 'a[$0]++{exit 1}' rather than chaining sort/uniq and reading its length in a bash test... the problem here was always that uniq -D returns successfully even if it prints nothing. With awk we can actually have an exit code stating whether duplicates were found.
Yeah. Awk is probably the better solution here. Even if it's not, that !-z could be -n. This commit was mostly a dumb grep for unquoted variables (also, awk exiting early is safe, because Bash builtins are silent when the recieve SIGPIPE).
- arch_pkgs=($(getpkgfiles "${STAGING}/${repo}/"*-${pkgarch}${PKGEXTS} 2>/dev/null)) - for pkg in ${arch_pkgs[@]} ${any_pkgs[@]}; do + arch_pkgs=($(getpkgfiles "${STAGING}/${repo}/"*-"${pkgarch}"${PKGEXTS} 2>/dev/null)) + for pkg in "${arch_pkgs[@]}" "${any_pkgs[@]}"; do
Dropping in and out of quotes here a number of times sort of hammers home the point to me that maybe we really don't need to do this. Configurable directories need to be quoted, yes, but then dropping out to expand a wildcard, failing to quote the - for good measure, and quoting pkgarch only to resort to expanding PKGEXTS as a glob pattern...
It is only possible anyways for $STAGING to have spaces.
At some point, I'd like to have `make lint` run shellcheck over dbscripts. That's a long way off, both because of a whole bunch of changes needed in dbscripts to make it come back clean, and a few features needed in shellcheck to avoid having to drop entirely too many shellcheck directives in to the dbscripts source. Anyway, I know linters should be taken with a grain of salt, but when there's something simple like this, that you know just about any linter would complain about... why not? -- Happy hacking, ~ Luke Shumaker
On 03/14/2018 12:53 AM, Luke Shumaker wrote:
Part of it is to have a common style. Trying to rectify two codebases that diverged 7 years ago is rough. When trying to come up with clean diffs, having to guess "did the other one quote this variable?" makes it harder. If you can say "always quote (except for the LHS of [[ ]])" or something, that makes it a bit easier.
I'm not sure that "specifically for the sole sake of diffs against our fork" is a valid justification on its own for modifying a coding style.
backup_package_variables() { - for var in ${splitpkg_overrides[@]}; do + for var in "${splitpkg_overrides[@]}"; do indirect="${var}_backup" - eval "${indirect}=(\${$var[@]})" + eval "${indirect}=(\"\${$var[@]}\")" done }
restore_package_variables() { - for var in ${splitpkg_overrides[@]}; do + for var in "${splitpkg_overrides[@]}"; do indirect="${var}_backup" if [ -n "${!indirect}" ]; then - eval "${var}=(\${$indirect[@]})" + eval "${var}=(\"\${$indirect[@]}\")" else - unset ${var} + unset "${var}" fi done
This is too much escaping and metaprogramming, there are better ways of backing up a variable to begin with. :/
We do it in makepkg, I will have us do it here as well. Advantage: using declare -p means the shell auto-escapes things where needed.
I haven't been keeping my thumb on makepkg git, but the eval lines as I wrote them exactly match the eval lines in makepkg 5.0.2's version of {backup,restore}_package_variables (makepkg's versions don't quote the for loops, or the unset command).
Hmm, I was thinking of: eval "$restoretrap" eval "$restoreset" eval "$restoreshopt" eval "$restore_envvars" and similar. Maybe I should fix the backups as well, but that is a slightly more complicated case there.
- if ! ${CLEANUP_DRYRUN}; then + if ! "${CLEANUP_DRYRUN}"; then
This is a variable being run as a command, so if there were to be spaces in it we'd end up trying to run a command with a space in it. Arguably we should not be running this as a command (even though they are set to true/false which is a shell builtin blah blah blah) but since we are it would be illogical to indicate that if there are spaces they should be interpreted as string literals in an executable filename.
For the true/false idiom, quoting it is just a style rule. I figure accepting the true/false idiom doesn't imply allowing the boolean variable to have any value. Having the quotes would help catch the variable being erroneously set to a different value.
So would doing a bash test.
At some point, I'd like to have `make lint` run shellcheck over dbscripts. That's a long way off, both because of a whole bunch of changes needed in dbscripts to make it come back clean, and a few features needed in shellcheck to avoid having to drop entirely too many shellcheck directives in to the dbscripts source.
Anyway, I know linters should be taken with a grain of salt, but when there's something simple like this, that you know just about any linter would complain about... why not?
That would imply one of my long-term goals is being able to run a linter. If I did, this rule would be the first thing I disabled -- it is far, far too prone to both false positives and false negatives. -- Eli Schwartz Bug Wrangler and Trusted User
From: Luke Shumaker
From: Luke Shumaker
On 03/13/2018 09:52 PM, Luke Shumaker wrote:
From: Luke Shumaker
diff --git a/test/cases/db-update.bats b/test/cases/db-update.bats index e7e4489..2e44b91 100644 --- a/test/cases/db-update.bats +++ b/test/cases/db-update.bats @@ -222,7 +222,7 @@ load ../lib/common
@test "package has to be aregular file" { local p - local target=$(mktemp -d) + local target=$(mktemp -dt) local arches=('i686' 'x86_64')
releasePackage extra 'pkg-simple-a' diff --git a/test/lib/common.bash b/test/lib/common.bash index 568a541..45e4800 100644 --- a/test/lib/common.bash +++ b/test/lib/common.bash @@ -83,7 +83,7 @@ setup() { local a PKGEXT=".pkg.tar.xz"
- TMP="$(mktemp -d)" + TMP="$(mktemp -dt)"
export DBSCRIPTS_CONFIG=${TMP}/config.local cat <<eot > "${DBSCRIPTS_CONFIG}"
These two have no TEMPLATE given anyways, so this change is extraneous. -- Eli Schwartz Bug Wrangler and Trusted User
From: Luke Shumaker
On 03/13/2018 09:52 PM, Luke Shumaker wrote:
From: Luke Shumaker
`grep -q` may exit as soon as it finds a match; this is a good optimization for when the input is a file. However, if the input is the output of another program, then that other program will receive SIGPIPE, and further writes will fail. When this happens, it might (bsdtar does) print a message about a "write error" to stderr. Which is going to confuse and alarm the user.
In one of the cases, this had already been mitigated by wrapping bsdtar in "echo "$(bsdtar ...)", as Bash builtin echo doesn't complain if it gets SIGPIPE. However, that means we're storing the entire output of bsdtar in memory, which is silly. --- db-functions | 2 +- test/lib/common.bash | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/db-functions b/db-functions index 58b753a..ee390ff 100644 --- a/db-functions +++ b/db-functions @@ -303,7 +303,7 @@ check_pkgfile() {
in_array "${pkgarch}" "${ARCHES[@]}" 'any' || return 1
- if echo "${pkgfile##*/}" | grep -q "^${pkgname}-${pkgver}-${pkgarch}"; then + if echo "${pkgfile##*/}" | grep "^${pkgname}-${pkgver}-${pkgarch}" &>/dev/null; then
But echo should be fine anyway? Regardless this could be so much more elegant. if [[ $pkgfile = $pkgname-$pkgver-$pkgrel-$arch* ]]; then
return 0 else return 1 diff --git a/test/lib/common.bash b/test/lib/common.bash index 45e4800..ab805dd 100644 --- a/test/lib/common.bash +++ b/test/lib/common.bash @@ -215,7 +215,7 @@ checkPackageDB() {
for db in ${DBEXT} ${FILESEXT}; do [ -r "${FTP_BASE}/${repo}/os/${repoarch}/${repo}${db%.tar.*}" ] - bsdtar -xf "${FTP_BASE}/${repo}/os/${repoarch}/${repo}${db%.tar.*}" -O | grep -q "${pkgfile%${PKGEXT}}" + bsdtar -xf "${FTP_BASE}/${repo}/os/${repoarch}/${repo}${db%.tar.*}" -O | grep "${pkgfile%${PKGEXT}}" &>/dev/null done done done @@ -269,7 +269,7 @@ checkRemovedPackageDB() { for tarch in ${tarches[@]}; do if [ -r "${FTP_BASE}/${repo}/os/${tarch}/${repo}${db%.tar.*}" ]; then for pkgname in ${pkgnames[@]}; do - echo "$(bsdtar -xf "${FTP_BASE}/${repo}/os/${tarch}/${repo}${db%.tar.*}" -O)" | grep -qv ${pkgname} + bsdtar -xf "${FTP_BASE}/${repo}/os/${tarch}/${repo}${db%.tar.*}" -O | grep -v ${pkgname} &>/dev/null done fi done
-- Eli Schwartz Bug Wrangler and Trusted User
On Wed, 14 Mar 2018 00:11:12 -0400, Eli Schwartz wrote:
On 03/13/2018 09:52 PM, Luke Shumaker wrote:
From: Luke Shumaker
`grep -q` may exit as soon as it finds a match; this is a good optimization for when the input is a file. However, if the input is the output of another program, then that other program will receive SIGPIPE, and further writes will fail. When this happens, it might (bsdtar does) print a message about a "write error" to stderr. Which is going to confuse and alarm the user.
In one of the cases, this had already been mitigated by wrapping bsdtar in "echo "$(bsdtar ...)", as Bash builtin echo doesn't complain if it gets SIGPIPE. However, that means we're storing the entire output of bsdtar in memory, which is silly. --- db-functions | 2 +- test/lib/common.bash | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/db-functions b/db-functions index 58b753a..ee390ff 100644 --- a/db-functions +++ b/db-functions @@ -303,7 +303,7 @@ check_pkgfile() {
in_array "${pkgarch}" "${ARCHES[@]}" 'any' || return 1
- if echo "${pkgfile##*/}" | grep -q "^${pkgname}-${pkgver}-${pkgarch}"; then + if echo "${pkgfile##*/}" | grep "^${pkgname}-${pkgver}-${pkgarch}" &>/dev/null; then
But echo should be fine anyway?
Yeah, in this case it's for consistency with the others. It's easier to remember "don't use `grep -q` on other commands' stdout" that it is to work out when it's ok and when it isn't.
Regardless this could be so much more elegant.
if [[ $pkgfile = $pkgname-$pkgver-$pkgrel-$arch* ]]; then
You're right, that is better. -- Happy hacking, ~ Luke Shumaker
On Tue, 13 Mar 2018 21:52:01 -0400, Luke Shumaker wrote:
From: Luke Shumaker
`grep -q` may exit as soon as it finds a match; this is a good optimization for when the input is a file. However, if the input is the output of another program, then that other program will receive SIGPIPE, and further writes will fail. When this happens, it might (bsdtar does) print a message about a "write error" to stderr. Which is going to confuse and alarm the user.
In one of the cases, this had already been mitigated by wrapping bsdtar in "echo "$(bsdtar ...)", as Bash builtin echo doesn't complain if it gets SIGPIPE. However, that means we're storing the entire output of bsdtar in memory, which is silly. ---
- echo "$(bsdtar -xf "${FTP_BASE}/${repo}/os/${tarch}/${repo}${db%.tar.*}" -O)" | grep -qv ${pkgname} + bsdtar -xf "${FTP_BASE}/${repo}/os/${tarch}/${repo}${db%.tar.*}" -O | grep -v ${pkgname} &>/dev/null
This is broken. As the commit message said, the subshell soaks up the full output to avoid SIGPIPE. But it also has a nother subtle purpose: to ensure that at least one "\n" is written to grep's stdin (as echo appends "\n"). Otherwise, if the db is now empty, grep will fail because it didn't recieve anything. The correct command is ! bsdtar -xf "${FTP_BASE}/${repo}/os/${tarch}/${repo}${db%.tar.*}" -O | grep ${pkgname} &>/dev/null -- Happy hacking, ~ Luke Shumaker
From: Luke Shumaker
From: Luke Shumaker
From: Luke Shumaker
From: Luke Shumaker
On 03/13/2018 09:52 PM, Luke Shumaker wrote:
From: Luke Shumaker
TBH we don't even send out integrity check email anymore, do you? -- Eli Schwartz Bug Wrangler and Trusted User
On Wed, 14 Mar 2018 00:11:40 -0400, Eli Schwartz wrote:
On 03/13/2018 09:52 PM, Luke Shumaker wrote:
From: Luke Shumaker
TBH we don't even send out integrity check email anymore, do you?
No, but we have a different cron-job that calls devlist-mailer -- Happy hacking, ~ Luke Shumaker
On 03/13/2018 09:51 PM, Luke Shumaker wrote:
BTW, now that dbscripts is on GitHub, is that the preferred way of submitting these? Or is this mailing list still best?
It's been on github for quite some time, but I am okay with looking at things in either location. -- Eli Schwartz Bug Wrangler and Trusted User
participants (2)
-
Eli Schwartz
-
Luke Shumaker