[pacman-dev] [PATCH] repo-add: Remove lock on SIG{TERM, HUP, QUIT, INT, ERR}
There's no need to keep the lock file if we interrupt repo-{add,remove} using ^C (same applies to other signals). In contrast to a clean exit, we do not `rm -rf` the temporary directory here, so that it's still possible to analyze what went wrong. Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> --- Hrm. Second time I had to manually remove "/srv/ftp/community/os/$arch/community.db.tar.gz.lck" on sigurd in order not to lock the whole database after interrupting `/arch/db-update`. I hope I didn't miss anything here but this should hopefully not cause any other unforeseen issues... scripts/repo-add.sh.in | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 29b150c..753200b 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -522,6 +522,9 @@ remove() { trap_exit() { echo error "$@" + + (( CLEAN_LOCK )) && [[ -f $LOCKFILE ]] && rm -f "$LOCKFILE" + exit 1 } -- 1.7.7
On Tue, Oct 11, 2011 at 11:13 AM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
There's no need to keep the lock file if we interrupt repo-{add,remove} using ^C (same applies to other signals). Definite +1 here.
In contrast to a clean exit, we do not `rm -rf` the temporary directory here, so that it's still possible to analyze what went wrong. I'm not sure what the right answer is here- I see where you're coming from, but given that we give the user no indication we're leaving a mess behind, I think we should either clean it up or tell the user we did not. Also noticed that we print an error message with TERM even when HUP and QUIT are caught.
Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> --- Hrm. Second time I had to manually remove "/srv/ftp/community/os/$arch/community.db.tar.gz.lck" on sigurd in order not to lock the whole database after interrupting `/arch/db-update`.
I hope I didn't miss anything here but this should hopefully not cause any other unforeseen issues...
scripts/repo-add.sh.in | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 29b150c..753200b 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -522,6 +522,9 @@ remove() { trap_exit() { echo error "$@" + + (( CLEAN_LOCK )) && [[ -f $LOCKFILE ]] && rm -f "$LOCKFILE" + exit 1 }
-- 1.7.7
On Tue, Oct 11, 2011 at 11:29:31AM -0500, Dan McGee wrote:
On Tue, Oct 11, 2011 at 11:13 AM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
There's no need to keep the lock file if we interrupt repo-{add,remove} using ^C (same applies to other signals). Definite +1 here.
In contrast to a clean exit, we do not `rm -rf` the temporary directory here, so that it's still possible to analyze what went wrong. I'm not sure what the right answer is here- I see where you're coming from, but given that we give the user no indication we're leaving a mess behind, I think we should either clean it up or tell the user we did not. Also noticed that we print an error message with TERM even when HUP and QUIT are caught.
I'm not really sure either, but I'd say that showing a message that includes "$tmpdir" might be the best thing to do here. I also submitted a separate patch that fixes the SIGTERM message issue.
Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> --- Hrm. Second time I had to manually remove "/srv/ftp/community/os/$arch/community.db.tar.gz.lck" on sigurd in order not to lock the whole database after interrupting `/arch/db-update`.
I hope I didn't miss anything here but this should hopefully not cause any other unforeseen issues...
scripts/repo-add.sh.in | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 29b150c..753200b 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -522,6 +522,9 @@ remove() { trap_exit() { echo error "$@" + + (( CLEAN_LOCK )) && [[ -f $LOCKFILE ]] && rm -f "$LOCKFILE" + exit 1 }
-- 1.7.7
On Tue, Oct 11, 2011 at 07:03:13PM +0200, Lukas Fleischer wrote:
On Tue, Oct 11, 2011 at 11:29:31AM -0500, Dan McGee wrote:
On Tue, Oct 11, 2011 at 11:13 AM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
There's no need to keep the lock file if we interrupt repo-{add,remove} using ^C (same applies to other signals). Definite +1 here.
In contrast to a clean exit, we do not `rm -rf` the temporary directory here, so that it's still possible to analyze what went wrong. I'm not sure what the right answer is here- I see where you're coming from, but given that we give the user no indication we're leaving a mess behind, I think we should either clean it up or tell the user we did not. Also noticed that we print an error message with TERM even when HUP and QUIT are caught.
I'm not really sure either, but I'd say that showing a message that includes "$tmpdir" might be the best thing to do here.
I also submitted a separate patch that fixes the SIGTERM message issue.
Ugh. It's a bit more complicated, even. Given that we use exit at the very end of trap_exit(), this should already trigger clean_up() automatically. The only case where this doesn't work is if we send a signal when clean_up() is already being executed (it's not too hard to reproduce that, actually). So just copying the two lines that unlink the lock file and remove the directory from clean_up() might be the easiest way to go?
Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> --- Hrm. Second time I had to manually remove "/srv/ftp/community/os/$arch/community.db.tar.gz.lck" on sigurd in order not to lock the whole database after interrupting `/arch/db-update`.
I hope I didn't miss anything here but this should hopefully not cause any other unforeseen issues...
scripts/repo-add.sh.in | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 29b150c..753200b 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -522,6 +522,9 @@ remove() { trap_exit() { echo error "$@" + + (( CLEAN_LOCK )) && [[ -f $LOCKFILE ]] && rm -f "$LOCKFILE" + exit 1 }
-- 1.7.7
On Tue, Oct 11, 2011 at 1:33 PM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
On Tue, Oct 11, 2011 at 07:03:13PM +0200, Lukas Fleischer wrote:
On Tue, Oct 11, 2011 at 11:29:31AM -0500, Dan McGee wrote:
On Tue, Oct 11, 2011 at 11:13 AM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
There's no need to keep the lock file if we interrupt repo-{add,remove} using ^C (same applies to other signals). Definite +1 here.
In contrast to a clean exit, we do not `rm -rf` the temporary directory here, so that it's still possible to analyze what went wrong. I'm not sure what the right answer is here- I see where you're coming from, but given that we give the user no indication we're leaving a mess behind, I think we should either clean it up or tell the user we did not. Also noticed that we print an error message with TERM even when HUP and QUIT are caught.
I'm not really sure either, but I'd say that showing a message that includes "$tmpdir" might be the best thing to do here.
I also submitted a separate patch that fixes the SIGTERM message issue.
Ugh. It's a bit more complicated, even. Given that we use exit at the very end of trap_exit(), this should already trigger clean_up() automatically. Ahh, true. Odd I didn't think about this before. So this only happens when? If someone hits Ctrl-C in quick succession just right?
The only case where this doesn't work is if we send a signal when clean_up() is already being executed (it's not too hard to reproduce that, actually). So just copying the two lines that unlink the lock file and remove the directory from clean_up() might be the easiest way to go? I'm not sure we can totally eliminate all race conditions. Can we get closer by unhooking all traps at the start of both trap_exit() and clean_up(), and then explicitly calling clean_up() from trap_exit()? The bash manpage is woefully non-descriptive about signal handlers and their reentry or recall procedure- I'm assuming you could theoretically keep sending signals fast enough that signal handlers pile up on the stack and none of them ever complete, so attempting to unhook as soon as possible and not depend on the signal handlers being in place anymore would at least prevent that stack growth.
-Dan
On Wed, Oct 12, 2011 at 08:27:53AM -0500, Dan McGee wrote:
On Tue, Oct 11, 2011 at 1:33 PM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
On Tue, Oct 11, 2011 at 07:03:13PM +0200, Lukas Fleischer wrote:
On Tue, Oct 11, 2011 at 11:29:31AM -0500, Dan McGee wrote:
On Tue, Oct 11, 2011 at 11:13 AM, Lukas Fleischer <archlinux@cryptocrack.de> wrote:
There's no need to keep the lock file if we interrupt repo-{add,remove} using ^C (same applies to other signals). Definite +1 here.
In contrast to a clean exit, we do not `rm -rf` the temporary directory here, so that it's still possible to analyze what went wrong. I'm not sure what the right answer is here- I see where you're coming from, but given that we give the user no indication we're leaving a mess behind, I think we should either clean it up or tell the user we did not. Also noticed that we print an error message with TERM even when HUP and QUIT are caught.
I'm not really sure either, but I'd say that showing a message that includes "$tmpdir" might be the best thing to do here.
I also submitted a separate patch that fixes the SIGTERM message issue.
Ugh. It's a bit more complicated, even. Given that we use exit at the very end of trap_exit(), this should already trigger clean_up() automatically. Ahh, true. Odd I didn't think about this before. So this only happens when? If someone hits Ctrl-C in quick succession just right?
Well, I was able to reproduce that a couple of times pressing ^C once. It only works as of a certain moment, tough (probably after we entered clean_up() - that's by best guess anyway): ---- $ ./repo-add extra.db.tar.gz foobar ==> Extracting database to a temporary location... ^C ==> ERROR: Aborted by user! Exiting... $ ./repo-add extra.db.tar.gz foobar ==> Extracting database to a temporary location... ==> ERROR: File 'foobar' not found. ==> No packages modified, nothing to do. ^C ==> ERROR: Aborted by user! Exiting... $ ./repo-add extra.db.tar.gz foobar ==> ERROR: Failed to acquire lockfile: extra.db.tar.gz.lck. ==> ERROR: Held by process 18522 ----
The only case where this doesn't work is if we send a signal when clean_up() is already being executed (it's not too hard to reproduce that, actually). So just copying the two lines that unlink the lock file and remove the directory from clean_up() might be the easiest way to go? I'm not sure we can totally eliminate all race conditions. Can we get closer by unhooking all traps at the start of both trap_exit() and clean_up(), and then explicitly calling clean_up() from trap_exit()?
I thought of unhooking traps as well but I'm not too enthusiastic about that. We `rm -r` a directory in clean_up() and if this ever happens to hang for some reason, there will be no way to cancel, except for sending SIGKILL.
The bash manpage is woefully non-descriptive about signal handlers and their reentry or recall procedure- I'm assuming you could theoretically keep sending signals fast enough that signal handlers pile up on the stack and none of them ever complete, so attempting to unhook as soon as possible and not depend on the signal handlers being in place anymore would at least prevent that stack growth.
Doing some more tests, it seems like signals are blocked if the signal handler for a particular signal is already running. So, basically just call clean_up() from trap_exit() (or copy-paste the two lines, like I suggested before)?
On Wed, Oct 12, 2011 at 04:20:00PM +0200, Lukas Fleischer wrote:
[...]
The only case where this doesn't work is if we send a signal when clean_up() is already being executed (it's not too hard to reproduce that, actually). So just copying the two lines that unlink the lock file and remove the directory from clean_up() might be the easiest way to go? I'm not sure we can totally eliminate all race conditions. Can we get closer by unhooking all traps at the start of both trap_exit() and clean_up(), and then explicitly calling clean_up() from trap_exit()?
I thought of unhooking traps as well but I'm not too enthusiastic about that. We `rm -r` a directory in clean_up() and if this ever happens to hang for some reason, there will be no way to cancel, except for sending SIGKILL.
Forget what I said. +1 to Dan's suggestion. After reconsideration, I noticed that even if we do not unhook traps, we will get stuck since we will enter the very same `rm -r` (that might have frozen) again when sending any signal. The only difference to the first call of clean_up() is that we will not be able to send the signal again (it's blocked by the handler invocation) which is exactly the same situation we would face when unhooking all traps. Keeping that in mind, there's only a few possible solutions: * Reduce the instructions in our trap handler to a bare minimum. This is what we have in master right now. * Unhook all traps and hope that nothing will go wrong in our trap handler. If we fail here, SIGKILL will be the only way out. Still sounds like the most sane alternative to me. * Register a fallback trap handler when our trap handler is entered, so that we can ^C^C if everything goes wrong. Doesn't sound very KISS, though. If there are no objections, I'll take option 2 and implement it exactly the way Dan suggested (unhook all traps at the start of both trap_exit() and clean_up(), and then explicitly call clean_up() from trap_exit()).
The bash manpage is woefully non-descriptive about signal handlers and their reentry or recall procedure- I'm assuming you could theoretically keep sending signals fast enough that signal handlers pile up on the stack and none of them ever complete, so attempting to unhook as soon as possible and not depend on the signal handlers being in place anymore would at least prevent that stack growth.
Doing some more tests, it seems like signals are blocked if the signal handler for a particular signal is already running.
So, basically just call clean_up() from trap_exit() (or copy-paste the two lines, like I suggested before)?
Instead of always showing "==> ERROR: TERM signal caught. Exiting...", replace "TERM" with whatever signal is actually caught. Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> --- scripts/repo-add.sh.in | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 753200b..7fc8637 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -569,7 +569,9 @@ tmpdir=$(mktemp -d /tmp/repo-tools.XXXXXXXXXX) || (\ mkdir $tmpdir/tree trap 'clean_up' EXIT -trap 'trap_exit "$(gettext "TERM signal caught. Exiting...")"' TERM HUP QUIT +for signal in TERM HUP QUIT; do + trap 'trap_exit "$(gettext "'"$signal"' signal caught. Exiting...")"' "$signal" +done trap 'trap_exit "$(gettext "Aborted by user! Exiting...")"' INT trap 'trap_exit "$(gettext "An unknown error has occured. Exiting...")"' ERR -- 1.7.7
Instead of always showing "==> ERROR: TERM signal caught. Exiting...", replace "TERM" with whatever signal is actually caught.
Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> --- scripts/repo-add.sh.in | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 753200b..7fc8637 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -569,7 +569,9 @@ tmpdir=$(mktemp -d /tmp/repo-tools.XXXXXXXXXX) || (\ mkdir $tmpdir/tree
trap 'clean_up' EXIT -trap 'trap_exit "$(gettext "TERM signal caught. Exiting...")"' TERM HUP QUIT +for signal in TERM HUP QUIT; do + trap 'trap_exit "$(gettext "'"$signal"' signal caught. Exiting...")"' "$signal" You need to use '%s' in the gettext string, and then pass $signal to
On Tue, Oct 11, 2011 at 12:00 PM, Lukas Fleischer <archlinux@cryptocrack.de> wrote: the trap_exit invocation; e.g. trap 'trap_exit "$(gettext "'%s' signal caught. Exiting...")" $signal' $signal
+done trap 'trap_exit "$(gettext "Aborted by user! Exiting...")"' INT trap 'trap_exit "$(gettext "An unknown error has occured. Exiting...")"' ERR We should also correct this spelling mistake- ouch. s/occured/occurred/ here.
-- 1.7.7
On Tue, Oct 11, 2011 at 12:19:42PM -0500, Dan McGee wrote:
Instead of always showing "==> ERROR: TERM signal caught. Exiting...", replace "TERM" with whatever signal is actually caught.
Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> --- scripts/repo-add.sh.in | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/scripts/repo-add.sh.in b/scripts/repo-add.sh.in index 753200b..7fc8637 100644 --- a/scripts/repo-add.sh.in +++ b/scripts/repo-add.sh.in @@ -569,7 +569,9 @@ tmpdir=$(mktemp -d /tmp/repo-tools.XXXXXXXXXX) || (\ mkdir $tmpdir/tree
trap 'clean_up' EXIT -trap 'trap_exit "$(gettext "TERM signal caught. Exiting...")"' TERM HUP QUIT +for signal in TERM HUP QUIT; do + trap 'trap_exit "$(gettext "'"$signal"' signal caught. Exiting...")"' "$signal" You need to use '%s' in the gettext string, and then pass $signal to
On Tue, Oct 11, 2011 at 12:00 PM, Lukas Fleischer <archlinux@cryptocrack.de> wrote: the trap_exit invocation; e.g. trap 'trap_exit "$(gettext "'%s' signal caught. Exiting...")" $signal' $signal
Right, I missed the gettext part... :/ Note that we use the same code in makepkg. I'll fix both repo-add and makepkg when resubmitting the patch.
+done trap 'trap_exit "$(gettext "Aborted by user! Exiting...")"' INT trap 'trap_exit "$(gettext "An unknown error has occured. Exiting...")"' ERR We should also correct this spelling mistake- ouch. s/occured/occurred/ here.
Will do.
-- 1.7.7
participants (2)
-
Dan McGee
-
Lukas Fleischer