[pacman-dev] [PATCH] repo-add: Remove lock on SIG{TERM, HUP, QUIT, INT, ERR}

Wed Oct 12 10:20:00 EDT 2011

On Wed, Oct 12, 2011 at 08:27:53AM -0500, Dan McGee wrote:
> On Tue, Oct 11, 2011 at 1:33 PM, Lukas Fleischer
> <archlinux at cryptocrack.de> wrote:
> > On Tue, Oct 11, 2011 at 07:03:13PM +0200, Lukas Fleischer wrote:
> >> On Tue, Oct 11, 2011 at 11:29:31AM -0500, Dan McGee wrote:
> >> > On Tue, Oct 11, 2011 at 11:13 AM, Lukas Fleischer
> >> > <archlinux at cryptocrack.de> wrote:
> >> > > There's no need to keep the lock file if we interrupt repo-{add,remove}
> >> > > using ^C (same applies to other signals).
> >> > Definite +1 here.
> >> >
> >> > > In contrast to a clean exit, we do not `rm -rf` the temporary directory
> >> > > here, so that it's still possible to analyze what went wrong.
> >> > I'm not sure what the right answer is here- I see where you're coming
> >> > from, but given that we give the user no indication we're leaving a
> >> > mess behind, I think we should either clean it up or tell the user we
> >> > did not. Also noticed that we print an error message with TERM even
> >> > when HUP and QUIT are caught.
> >>
> >> I'm not really sure either, but I'd say that showing a message that
> >> includes "$tmpdir" might be the best thing to do here.
> >>
> >> I also submitted a separate patch that fixes the SIGTERM message issue.
> >
> > Ugh. It's a bit more complicated, even. Given that we use exit at the
> > very end of trap_exit(), this should already trigger clean_up()
> > automatically.
> Ahh, true. Odd I didn't think about this before. So this only happens
> when? If someone hits Ctrl-C in quick succession just right?

Well, I was able to reproduce that a couple of times pressing ^C once.
It only works as of a certain moment, tough (probably after we entered
clean_up() - that's by best guess anyway):

----
$ ./repo-add extra.db.tar.gz foobar
==> Extracting database to a temporary location...
^C
==> ERROR: Aborted by user! Exiting...
$ ./repo-add extra.db.tar.gz foobar
==> Extracting database to a temporary location...
==> ERROR: File 'foobar' not found.
==> No packages modified, nothing to do.
^C
==> ERROR: Aborted by user! Exiting...
$ ./repo-add extra.db.tar.gz foobar
==> ERROR: Failed to acquire lockfile: extra.db.tar.gz.lck.
==> ERROR: Held by process 18522
----

> 
> > The only case where this doesn't work is if we send a
> > signal when clean_up() is already being executed (it's not too hard to
> > reproduce that, actually). So just copying the two lines that unlink the
> > lock file and remove the directory from clean_up() might be the easiest
> > way to go?
> I'm not sure we can totally eliminate all race conditions. Can we get
> closer by unhooking all traps at the start of both trap_exit() and
> clean_up(), and then explicitly calling clean_up() from trap_exit()?

I thought of unhooking traps as well but I'm not too enthusiastic about
that. We `rm -r` a directory in clean_up() and if this ever happens to
hang for some reason, there will be no way to cancel, except for sending
SIGKILL.

> The bash manpage is woefully non-descriptive about signal handlers and
> their reentry or recall procedure- I'm assuming you could
> theoretically keep sending signals fast enough that signal handlers
> pile up on the stack and none of them ever complete, so attempting to
> unhook as soon as possible and not depend on the signal handlers being
> in place anymore would at least prevent that stack growth.

Doing some more tests, it seems like signals are blocked if the signal
handler for a particular signal is already running.

So, basically just call clean_up() from trap_exit() (or copy-paste the
two lines, like I suggested before)?