[arch-projects] [dbscripts] [PATCH 0/3] Fix ambiguous uses of
This was sort of cobbled together and not really tested, so I'm not 100% sure it will work, but it looks okay, so I am posting this to get more eyes on it. I think I've actually gotten this to work properly, which is yay, and support multiple extensions, which is meh but we may need this as Luke said, if we ever decide to switch over. Which means it is probably "more proper" to do so... As a bonus, we get to micro-optimize a few external calls away which saves us a handful of forked processes and should be just as fast not counting a fraction of a second gained for all those forks? Eli Schwartz (3): db-update: replace external find command with bash globbing ftpdir-cleanup,sourceballs: replace external find command with bash globbing Globally set $PKGEXT to a bash extended glob representing valid choices. config | 3 ++- cron-jobs/ftpdir-cleanup | 24 ++++++++++++++++++------ cron-jobs/sourceballs | 21 ++++++++++++++++----- db-functions | 13 +++++++++++-- db-update | 9 ++++++--- 5 files changed, 53 insertions(+), 17 deletions(-) -- 2.16.1
Don't bother emitting errors. bash doesn't show globbing errors if it
cannot read a directory to try globbing there. And the former code never
aborted on errors anyway, as without `set -o pipefail` the sort command
swallowed the return code.
Signed-off-by: Eli Schwartz
This fully removes the use of find from the codebase, leads to a
micro-optimization in a couple cases, and ensures that $PKGEXT is
consistently treated as a shell globbing character (which is important
because it is used as one).
Of the eight instances in these files:
- One was unnecessary as `cat` can natively consume all files passed to
it and no directory traversal was in use.
- Two were unnecessary as they were hardcoded to read a single file....
- Another four were only being used to strip leading directory paths,
and can be replaced by globstar and ${filepath##*/}
- The final two were checking the modification time of the files, and
can be replaced with touch(1) and [[ -nt ]]. Although this introduces
an additional temporary file, this is not such a big deal.
Signed-off-by: Eli Schwartz
This can be anything makepkg.conf accepts, therefore it needs to be able
to match all that. Document the fact that this has *always* been some
sort of glob, and update the two cases where this was (not!) being
evaluated by bash [[ ... ]], to use a proxy function is_globfile()
Signed-off-by: Eli Schwartz
On Thu, 15 Feb 2018 22:45:03 -0500, Eli Schwartz via arch-projects wrote:
diff --git a/cron-jobs/ftpdir-cleanup b/cron-jobs/ftpdir-cleanup index 2f3d5aa..2d33047 100755 --- a/cron-jobs/ftpdir-cleanup +++ b/cron-jobs/ftpdir-cleanup ... -if [ ${#old_pkgs[@]} -ge 1 ]; then +if (( ${#old_pkgs[@]} > 1 )); then
That should either be >= 1 or > 0.
diff --git a/cron-jobs/sourceballs b/cron-jobs/sourceballs index 9ab4e98..5844817 100755 --- a/cron-jobs/sourceballs +++ b/cron-jobs/sourceballs ... -if [ ${#old_pkgs[@]} -ge 1 ]; then +if (( ${#old_pkgs[@]} > 1 )); then
Likewise. -- Happy hacking, ~ Luke Shumaker
On 02/17/2018 02:29 PM, Luke Shumaker wrote:
On Thu, 15 Feb 2018 22:45:03 -0500, Eli Schwartz via arch-projects wrote:
diff --git a/cron-jobs/ftpdir-cleanup b/cron-jobs/ftpdir-cleanup index 2f3d5aa..2d33047 100755 --- a/cron-jobs/ftpdir-cleanup +++ b/cron-jobs/ftpdir-cleanup ... -if [ ${#old_pkgs[@]} -ge 1 ]; then +if (( ${#old_pkgs[@]} > 1 )); then
That should either be >= 1 or > 0.
diff --git a/cron-jobs/sourceballs b/cron-jobs/sourceballs index 9ab4e98..5844817 100755 --- a/cron-jobs/sourceballs +++ b/cron-jobs/sourceballs ... -if [ ${#old_pkgs[@]} -ge 1 ]; then +if (( ${#old_pkgs[@]} > 1 )); then
Likewise.
Thanks. I've also noticed this whole patchset terribly breaks the testsuite. Which is sort of expected. We are overloading PKGEXT to mean something dbscripts specific, but that then (I think?) gets imported into makepkg during the testsuite builds. I'm going to rename it to PKGEXTS as that serves a number of purposes: it avoids clashing with makepkg, it is more descriptive of its actual purpose, and it provides a free semantic warning to future readers of the code that this variable is meant to be more than one thing, and extra care *must* be taken when using it. But the real issue is that we then use this variable to complete ${pkgnames[@]/%/${PKGEXT}} which works, sort of, as it coincidentally globs okay with ? but is technically quite wrong for the above mentioned reasons. Really, once pacman 5.1 is released containing my fix that makes --packagelist finally useful for the first time ever, this will automatically be fixed, as the use of print_all_package_names will simply return full filename paths and there will be no need to glob something that matches both filenames from your patch "Update tests to check for glob regression". The testsuite keeps looking for files that match some random dbscripts glob which has nothing to do with the hardcoded .pkg.tar.xz in a stock makepkg.conf, and the testsuite seems to be subtly buggy. Looks like PKGEXT is also used for complicated things in checkPackageDB, with more unquoted [ ] paths as well as grep -q "${pkgfile%${PKGEXT}}" which actually *breaks* with extglob. Because extended globs don't fall back on being a string literal -- which is behavior I approve of. So this needs to use ${pkgname}-${pkgver}-${pkgarch}. We can either have $pkgfile not include $PKGEXT, use more is_globfile (which is not actually available in the testsuite as db-functions is not sourced and will break everything if you try since it runs mktemp with the bats TMPDIR or something), or rename PKGEXT, and have the testsuite use the PKGEXT from makepkg.conf since that is what it will use anyway when running makepkg... -- Eli Schwartz Bug Wrangler and Trusted User
Comes with fancy checkmarks from travis saying that the testsuite passed: https://github.com/archlinux/dbscripts/commits/pkgext-real-wildcards Eli Schwartz (5): Use even more bashisms. Fix overloading PKGEXT to mean two things. db-update: replace external find command with bash globbing ftpdir-cleanup,sourceballs: replace external find command with bash globbing Globally set $PKGEXT to a bash extended glob representing valid choices. config | 3 ++- cron-jobs/devlist-mailer | 6 +++--- cron-jobs/ftpdir-cleanup | 36 ++++++++++++++++++++++++------------ cron-jobs/integrity-check | 2 +- cron-jobs/sourceballs | 31 +++++++++++++++++++++---------- cron-jobs/update-web-db | 6 +++--- db-functions | 13 +++++++++++-- db-move | 4 ++-- db-update | 17 +++++++++++------ test/cases/db-repo-add.bats | 6 +++--- test/cases/db-update.bats | 3 +-- test/cases/ftpdir-cleanup.bats | 6 +++--- test/lib/common.bash | 6 ++++-- 13 files changed, 89 insertions(+), 50 deletions(-) -- 2.16.2
Catch some cases that were missed in the previous run.
Signed-off-by: Eli Schwartz
PKGEXT is a makepkg variable referring to a fixed filename suffix, but
we were also using it to mean a bash glob referring to candidate
filenames. This is wrong, so rename it to PKGEXTS which is more
descriptive of its purpose.
Exclude the testsuite from this change, as the testsuite actually uses
PKGEXT for its intended purpose. Fix the testsuite to consistently use
PKGEXT, as it hardcoded the file extension in several cases, and extract
its value from the makepkg.conf we ship.
Signed-off-by: Eli Schwartz
Don't bother emitting errors. bash doesn't show globbing errors if it
cannot read a directory to try globbing there. And the former code never
aborted on errors anyway, as without `set -o pipefail` the sort command
swallowed the return code.
Signed-off-by: Eli Schwartz
This fully removes the use of find from the codebase, leads to a
micro-optimization in a couple cases, and ensures that $PKGEXT is
consistently treated as a shell globbing character (which is important
because it is used as one).
Of the eight instances in these files:
- One was unnecessary as `cat` can natively consume all files passed to
it and no directory traversal was in use.
- Two were unnecessary as they were hardcoded to read a single file....
- Another four were only being used to strip leading directory paths,
and can be replaced by globstar and ${filepath##*/}
- The final two were checking the modification time of the files, and
can be replaced with touch(1) and [[ -nt ]]. Although this introduces
an additional temporary file, this is not such a big deal.
Signed-off-by: Eli Schwartz
The current glob `*.pkg.tar.?z` is both less restrictive and more
restrictive than makepkg, as it accepts any valid unicode character.
To be more exact, it's almost completely orthogonal to the one in makepkg.
makepkg only accepts .tar.gz, .tar.bz2, .tar.xz, .tar.lzo, .tar.lrz, and
.tar.Z and most of those fail to match against a two-char compression type.
dbscripts accepts .pkg.tar.💩z which incidentally is what I think of
cherry-picking xz and gz as supported methods.
Since this can be anything makepkg.conf accepts, it needs to be able to
match all that, unless we decide to perform additional restrictions in
which case we should still explicitly list each allowed extension. Using
bash extended globbing allows us to do this relatively painlessly.
Document the fact that this has *always* been some sort of glob, and
update the two cases where this was (not!) being evaluated by bash
[[ ... ]], to use a not-elegant-at-all proxy function is_globfile() to
evaluate globs *before* testing if they exist.
Signed-off-by: Eli Schwartz
On Mon, 19 Feb 2018 15:11:43 -0500, Eli Schwartz via arch-projects wrote:
--- a/db-update +++ b/db-update @@ -9,9 +9,14 @@ if (( $# >= 1 )); then fi
# Find repos with packages to release -if ! staging_repos=($(find "${STAGING}" -mindepth 1 -type f -name "*${PKGEXTS}" -printf '%h\n' | sort -u)); then - die "Could not read %s" "$STAGING" -fi +mapfile -t -d '' staging_repos < <( + for f in "${STAGING}"/**/*${PKGEXTS}; do + f="${f%/*}" + if [[ -d $f ]]; then + printf '%s\0' "$f" + fi + done | sort -uz +)
repos=() for staging_repo in ${staging_repos[@]##*/}; do
Isn't [[ -d ]] there redundant? If globbing gave us $dir/file, of course $dir is a directory! Meanwhile, this dropped the `-type f` check, though I'm not sure how important that was. Shouldn't this be written as: mapfile -t -d '' staging_repos < <( for f in "${STAGING}"/**/*${PKGEXTS}; do if [[ -f $f && ! -h $f ]]; then printf '%s\0' "${f/*}" fi done | sort -uz ) The original `find` command rejected symlinks; I don't know if that's an important property; but that's what the `&& ! -h $f` bit is for. -- Happy hacking, ~ Luke Shumaker
On Mon, 19 Feb 2018 15:11:45 -0500, Eli Schwartz via arch-projects wrote:
Document the fact that this has *always* been some sort of glob, and update the two cases where this was (not!) being evaluated by bash [[ ... ]], to use a not-elegant-at-all proxy function is_globfile() to evaluate globs *before* testing if they exist. ... @@ -378,8 +383,8 @@ check_pkgrepos() { local pkgver="$(getpkgver ${pkgfile})" || return 1 local pkgarch="$(getpkgarch ${pkgfile})" || return 1
- [[ -f ${FTP_BASE}/${PKGPOOL}/${pkgname}-${pkgver}-${pkgarch}${PKGEXTS} ]] && return 1 - [[ -f ${FTP_BASE}/${PKGPOOL}/${pkgname}-${pkgver}-${pkgarch}${PKGEXTS}.sig ]] && return 1 + is_globfile "${FTP_BASE}/${PKGPOOL}/${pkgname}-${pkgver}-${pkgarch}"${PKGEXTS} && return 1 + is_globfile "${FTP_BASE}/${PKGPOOL}/${pkgname}-${pkgver}-${pkgarch}"${PKGEXTS}.sig && return 1 [[ -f ${FTP_BASE}/${PKGPOOL}/${pkgfile##*/} ]] && return 1 [[ -f ${FTP_BASE}/${PKGPOOL}/${pkgfile##*/}.sig ]] && return 1
It's not a big deal, but I'd rather that be a separate commit, as it's fixing breakage that's unrelated to switching it from a plain glob to an extglob.
diff --git a/config b/config index 5bb3b16..0d33de0 100644 --- a/config +++ b/config @@ -25,7 +25,8 @@ TMPDIR="/var/tmp" ARCHES=(x86_64) DBEXT=".db.tar.gz" FILESEXT=".files.tar.gz" -PKGEXTS=".pkg.tar.?z" +# bash glob listing allowed extensions. Note that db-functions turns on extglob. +PKGEXTS=".pkg.tar.@(gz|bz2|xz|lzo|lrz|Z)" SRCEXT=".src.tar.gz"
Is there a reason you reject '.pkg.tar' (no compression, which makepkg accepts)? (I also found it curious that you swapped lzo and lrz from the order the extensions are in in the makepkg source.) (Also, I'd move it down a line, so that it's more obvious that the comment doesn't apply to SRCEXT.) -- Happy hacking, ~ Luke Shumaker
On 02/19/2018 04:53 PM, Luke Shumaker wrote:
Isn't [[ -d ]] there redundant? If globbing gave us $dir/file, of course $dir is a directory!
True. I think I still had that in from some point where I hadn't enabled nullglob yet.
Meanwhile, this dropped the `-type f` check, though I'm not sure how important that was.
Shouldn't this be written as:
mapfile -t -d '' staging_repos < <( for f in "${STAGING}"/**/*${PKGEXTS}; do if [[ -f $f && ! -h $f ]]; then printf '%s\0' "${f/*}" fi done | sort -uz )
The original `find` command rejected symlinks; I don't know if that's an important property; but that's what the `&& ! -h $f` bit is for.
It is not important, the find command only checked if the file itself was a symlink but if there is another package file in the same directory then we still add those staging repos. Meanwhile, we check later on for `die "Package %s is a symbolic link"`. So I guess technically it would make more sense to stage the package and then utilize the explicit error message rather than silently dropping the package altogether (but only sometimes) simply because we didn't think to use -xtype. At this stage in the game, we're just trying to assemble a list of the packages that the uploader is asserting they want to db-update. We perform all actual validation later on. -- Eli Schwartz Bug Wrangler and Trusted User
On 02/19/2018 04:59 PM, Luke Shumaker wrote:
Is there a reason you reject '.pkg.tar' (no compression, which makepkg accepts)?
I don't think there is any utility in supporting uncompressed packages in dbscripts. Anyone who wants to customize this in a non-Arch Linux deployment is free to do so... If someone wants to use some deviant compression type because they're positive it works better on those packaged files, I cannot think of a compelling reason to say "no you're wrong", which is why I listed everything else.
(I also found it curious that you swapped lzo and lrz from the order the extensions are in in the makepkg source.)
makepkg is inconsistent here, I pulled that from the makepkg.conf(5) source. :D -- Eli Schwartz Bug Wrangler and Trusted User
On Mon, 19 Feb 2018 15:11:45 -0500, Eli Schwartz via arch-projects wrote:
+# Check if a file exists, even if the file uses wildcards +is_globfile() { + [[ -f $1 ]] +} +
Dave's comment on my version of this patchset applies equally to this version:
Frankly, this function name and comment sucks, because it says nothing about quoting. As I read the comment, I'm lead to believe that given a file "foobar" existing, I can call: __isGlobfile "foo*", and this will succeed. To the naive reader, you might even believe this claim based on the unquotedness of $1 within the -f test.
(I had the same function, with the same comment, as is_globfile in db-functions and as __isGlobfile in common.bash) -- Happy hacking, ~ Luke Shumaker
Don't bother emitting errors. bash doesn't show globbing errors if it
cannot read a directory to try globbing there. And the former code never
aborted on errors anyway, as without `set -o pipefail` the sort command
swallowed the return code.
Signed-off-by: Eli Schwartz
The current glob `*.pkg.tar.?z` is both less restrictive and more
restrictive than makepkg, as it accepts any valid unicode character.
To be more exact, it's almost completely orthogonal to the one in makepkg.
makepkg only accepts .tar.gz, .tar.bz2, .tar.xz, .tar.lzo, .tar.lrz, and
.tar.Z and most of those fail to match against a two-char compression type.
dbscripts accepts .pkg.tar.💩z which incidentally is what I think of
cherry-picking xz and gz as supported methods.
Since this can be anything makepkg.conf accepts, it needs to be able to
match all that, unless we decide to perform additional restrictions in
which case we should still explicitly list each allowed extension. Using
bash extended globbing allows us to do this relatively painlessly.
Document the fact that this has *always* been some sort of glob, and
update the two cases where this was (not!) being evaluated by bash
[[ ... ]], to use a not-elegant-at-all proxy function is_globfile() to
evaluate globs *before* testing if they exist.
Signed-off-by: Eli Schwartz
Hi Eli,
Disclaimer: the following is a bit subtle topic, so I hope it doesn't
spur a lot of off-topic.
On 19 February 2018 at 20:11, Eli Schwartz via arch-projects
Catch some cases that were missed in the previous run.
Signed-off-by: Eli Schwartz
--- This patch is new + refactor some changes from: ftpdir-cleanup,sourceballs: replace external find command with bash globbing
cron-jobs/devlist-mailer | 6 +++--- cron-jobs/ftpdir-cleanup | 14 +++++++------- cron-jobs/integrity-check | 2 +- cron-jobs/sourceballs | 12 ++++++------ cron-jobs/update-web-db | 6 +++--- 5 files changed, 20 insertions(+), 20 deletions(-)
Is there any performance or other technical benefit to using more bashisms? Reason being, that I am slowly going through different parts of Arch making it zsh friendly. While keeping the code brief and legible, of course. Guessing that I've picked the wrong hobby? Thanks Emil
On Tue, Feb 20, 2018 at 11:59:49AM +0000, Emil Velikov via arch-projects wrote:
Hi Eli,
Disclaimer: the following is a bit subtle topic, so I hope it doesn't spur a lot of off-topic.
On 19 February 2018 at 20:11, Eli Schwartz via arch-projects
wrote: Catch some cases that were missed in the previous run.
Signed-off-by: Eli Schwartz
--- This patch is new + refactor some changes from: ftpdir-cleanup,sourceballs: replace external find command with bash globbing
cron-jobs/devlist-mailer | 6 +++--- cron-jobs/ftpdir-cleanup | 14 +++++++------- cron-jobs/integrity-check | 2 +- cron-jobs/sourceballs | 12 ++++++------ cron-jobs/update-web-db | 6 +++--- 5 files changed, 20 insertions(+), 20 deletions(-)
Is there any performance or other technical benefit to using more bashisms?
The scripts run under bash, so why not take advantage of bash features? For example, bash's [[ and (( are less error prone and more featureful than the POSIX [, and builtins like mapfile and read (POSIX read has an extremely limited featureset) make I/O far simpler tasks. There's plenty more to like... Please don't try to talk about performance and shell in the same sentence. These are not performance-sensitive scripts, and shell is not a language to use when performance (of almost any kind) is relevant.
Reason being, that I am slowly going through different parts of Arch making it zsh friendly. While keeping the code brief and legible, of course.
Feel free to exemplify how conversion from bash to zsh has aided your goals while retaining portability to a supermajority of Arch systems. $ pacman -Q zsh error: package 'zsh' was not found
Guessing that I've picked the wrong hobby?
Almost certainly.
Thanks Emil
On 02/20/2018 06:59 AM, Emil Velikov wrote:
Disclaimer: the following is a bit subtle topic, so I hope it doesn't spur a lot of off-topic.
Eh, I don't mind.
Is there any performance or other technical benefit to using more bashisms?
Reason being, that I am slowly going through different parts of Arch making it zsh friendly. While keeping the code brief and legible, of course. Guessing that I've picked the wrong hobby?
I think you'll probably find that few people write zsh scripts for non-interactive use. I'm not really sure what the point would be, considering it has a nonstandard syntax (bash is ubiquitous, zsh is not), and many people who would know bash would not know zsh (like me for example). AFAIK zsh should more or less run either bash or POSIX sh scripts just fine if you invoke it via a symlink named `sh` or `bash`, because zsh has a bash compatibility mode. I have no idea whether that bash compatibility mode fixes subtle things like the fact that zsh arrays are 1-indexed while bash arrays are 0-indexed, but if I had to guess, probably not. ... I can see some compelling reasons to write scripts targeting POSIX sh as a baseline, which is being *sh* friendly, not zsh friendly. But, for projects that make heavy use of bashisms anyways, I dislike using POSIX because it implies that sh will be supported in any way when it really won't be. Essentially, I prefer to go "all in". As for why you'd want them, bashisms generally look cleaner IMHO, and they add a great deal of power and flexibility to the shell. Things like [[ ... ]] are just a lot more sane in basically every way, shell arithmetic uses proper operators, etc. -- Eli Schwartz Bug Wrangler and Trusted User
On 20 February 2018 at 14:23, Eli Schwartz
On 02/20/2018 06:59 AM, Emil Velikov wrote:
Disclaimer: the following is a bit subtle topic, so I hope it doesn't spur a lot of off-topic.
Eh, I don't mind.
Is there any performance or other technical benefit to using more bashisms?
Reason being, that I am slowly going through different parts of Arch making it zsh friendly. While keeping the code brief and legible, of course. Guessing that I've picked the wrong hobby?
I think you'll probably find that few people write zsh scripts for non-interactive use. I'm not really sure what the point would be, considering it has a nonstandard syntax (bash is ubiquitous, zsh is not), and many people who would know bash would not know zsh (like me for example).
AFAIK zsh should more or less run either bash or POSIX sh scripts just fine if you invoke it via a symlink named `sh` or `bash`, because zsh has a bash compatibility mode. I have no idea whether that bash compatibility mode fixes subtle things like the fact that zsh arrays are 1-indexed while bash arrays are 0-indexed, but if I had to guess, probably not.
...
I can see some compelling reasons to write scripts targeting POSIX sh as a baseline, which is being *sh* friendly, not zsh friendly. But, for projects that make heavy use of bashisms anyways, I dislike using POSIX because it implies that sh will be supported in any way when it really won't be. Essentially, I prefer to go "all in".
As for why you'd want them, bashisms generally look cleaner IMHO, and they add a great deal of power and flexibility to the shell. Things like [[ ... ]] are just a lot more sane in basically every way, shell arithmetic uses proper operators, etc.
Seems like I wasn't clear enough: The goal is not to appease zsh - but a step closer to POSIX sh friendly. I've been staring and writing bash (closer to POSIX sh really) scripts for over a decade, haven't seen what makes X cleaner over Y. Yet that's subjective, unlike the original argument - consistency rules ;-) Thanks Emil
On 02/20/2018 12:24 PM, Emil Velikov wrote:
Seems like I wasn't clear enough: The goal is not to appease zsh - but a step closer to POSIX sh friendly.
I've been staring and writing bash (closer to POSIX sh really) scripts for over a decade, haven't seen what makes X cleaner over Y. Yet that's subjective, unlike the original argument - consistency rules ;-)
If you're working for "POSIX sh friendly", why are you mentioning zsh in the first place? As for targeting POSIX sh, if you can do that then sure. I have personal scripts written for sh when it makes sense (and my /bin/sh is symlinked to dash, so I actually use sh). But yeah, consistency rules. -- Eli Schwartz Bug Wrangler and Trusted User
participants (4)
-
Dave Reisner
-
Eli Schwartz
-
Emil Velikov
-
Luke Shumaker