[PATCH v3] alpm_run_chroot: Fix the execution of pacman hooks
Running "pacman -Syyu" to perform a full system update resulted in a rather conspicuous error message in the output of pacman: ( 3/18) Creating temporary files... Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting. /usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create error: command failed to execute correctly Expectedly, running "systemd-tmpfiles --create" manually afterwards went just fine and resulted in no errors, leading to a conclusion that executing /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook failed, but only when it was run from within pacman. After a detailed and rather lengthy investigation, it turned out the code in lib/libalpm/util.c that executes the hooks by forking a child has some bugs that allow the error to occur under certain circumstances. In particular, function _alpm_run_chroot() that executes hooks in a fork()ed child does not employ dup2() properly, but instead executes close() followed by dup2(). The man page for dup2() clearly states in the quotation below that attempts to re-implement the equivalent functionality, which is the case in function _alpm_run_chroot(), must be avoided: The dup2() system call performs the same task as dup(), but instead of using the lowest-numbered unused file descriptor, it uses the file descriptor number specified in newfd. In other words, the file descriptor newfd is adjusted so that it now refers to the same open file description as oldfd. If the file descriptor newfd was previously open, it is closed before being reused; the close is performed silently (i.e., any errors during the close are not reported by dup2()). The steps of closing and reusing the file descriptor newfd are performed atomically. This is important, because trying to implement equivalent functionality using close(2) and dup() would be subject to race conditions, whereby newfd might be reused between the two steps. Such reuse could happen because the main program is interrupted by a signal handler that allocates a file descriptor, or because a parallel thread allocates a file descriptor. As a result, a condition can occur in which the file descriptor 0 is closed by calling close(0), and left closed after the while loop that fails to execute dup2() because of receiving EBUSY, resulting in the described issues. Also, failed attempts to execute dup2() should be treated as fatal errors instead of being silently ignored. Let's improve the code to prevent the described issues. While there, perform a minor cleanup as well, to make the formatting of the code a tiny bit more consistent. Signed-off-by: Dragan Simic <dsimic@manjaro.org> --- lib/libalpm/util.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c index dffa3b51..0897c5ea 100644 --- a/lib/libalpm/util.c +++ b/lib/libalpm/util.c @@ -639,28 +639,40 @@ int _alpm_run_chroot(alpm_handle_t *handle, const char *cmd, char *const argv[], if(pid == 0) { /* this code runs for the child only (the actual chroot/exec) */ - close(0); - close(1); - close(2); - while(dup2(child2parent_pipefd[HEAD], 1) == -1 && errno == EINTR); - while(dup2(child2parent_pipefd[HEAD], 2) == -1 && errno == EINTR); - while(dup2(parent2child_pipefd[TAIL], 0) == -1 && errno == EINTR); - close(parent2child_pipefd[TAIL]); close(parent2child_pipefd[HEAD]); close(child2parent_pipefd[TAIL]); + while(dup2(child2parent_pipefd[HEAD], STDERR_FILENO) == -1) { + if(errno != EINTR) { + /* the child cannot talk through the parent at this point, but try yelling anyway */ + fprintf(stderr, _("could not redirect standard error (%s)\n"), strerror(errno)); + exit(1); + } + } + while(dup2(parent2child_pipefd[TAIL], STDIN_FILENO) == -1) { + if(errno != EINTR) { + /* use fprintf() instead of _alpm_log() to send output through the parent */ + fprintf(stderr, _("could not redirect standard input (%s)\n"), strerror(errno)); + exit(1); + } + } + close(parent2child_pipefd[TAIL]); + while(dup2(child2parent_pipefd[HEAD], STDOUT_FILENO) == -1) { + if(errno != EINTR) { + fprintf(stderr, _("could not redirect standard output (%s)\n"), strerror(errno)); + exit(1); + } + } close(child2parent_pipefd[HEAD]); if(cwdfd >= 0) { close(cwdfd); } - /* use fprintf instead of _alpm_log to send output through the parent */ if(chroot(handle->root) != 0) { fprintf(stderr, _("could not change the root directory (%s)\n"), strerror(errno)); exit(1); } if(chdir("/") != 0) { - fprintf(stderr, _("could not change directory to %s (%s)\n"), - "/", strerror(errno)); + fprintf(stderr, _("could not change directory to %s (%s)\n"), "/", strerror(errno)); exit(1); } /* bash assumes it's being run under rsh/ssh if stdin is a socket and
On 08/21/23 at 04:42am, Dragan Simic wrote:
Running "pacman -Syyu" to perform a full system update resulted in a rather conspicuous error message in the output of pacman:
( 3/18) Creating temporary files... Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting. /usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create error: command failed to execute correctly
Expectedly, running "systemd-tmpfiles --create" manually afterwards went just fine and resulted in no errors, leading to a conclusion that executing /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook failed, but only when it was run from within pacman.
After a detailed and rather lengthy investigation, it turned out the code in lib/libalpm/util.c that executes the hooks by forking a child has some bugs that allow the error to occur under certain circumstances. In particular, function _alpm_run_chroot() that executes hooks in a fork()ed child does not employ dup2() properly, but instead executes close() followed by dup2().
The man page for dup2() clearly states in the quotation below that attempts to re-implement the equivalent functionality, which is the case in function _alpm_run_chroot(), must be avoided:
The dup2() system call performs the same task as dup(), but instead of using the lowest-numbered unused file descriptor, it uses the file descriptor number specified in newfd. In other words, the file descriptor newfd is adjusted so that it now refers to the same open file description as oldfd.
If the file descriptor newfd was previously open, it is closed before being reused; the close is performed silently (i.e., any errors during the close are not reported by dup2()).
The steps of closing and reusing the file descriptor newfd are performed atomically. This is important, because trying to implement equivalent functionality using close(2) and dup() would be subject to race conditions, whereby newfd might be reused between the two steps. Such reuse could happen because the main program is interrupted by a signal handler that allocates a file descriptor, or because a parallel thread allocates a file descriptor.
As a result, a condition can occur in which the file descriptor 0 is closed by calling close(0), and left closed after the while loop that fails to execute dup2() because of receiving EBUSY, resulting in the described issues. Also, failed attempts to execute dup2() should be treated as fatal errors instead of being silently ignored.
I am trying to reproduce this to wrap my head around what's actually happening and verify that this actually fixes it, but I am unable to get pacman to execute a hook with fd 0 closed with or without the ARM patches. Can you provide a reproducible case? apg
On 2023-08-21 18:19, Andrew Gregory wrote:
On 08/21/23 at 04:42am, Dragan Simic wrote:
Running "pacman -Syyu" to perform a full system update resulted in a rather conspicuous error message in the output of pacman:
( 3/18) Creating temporary files... Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting. /usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create error: command failed to execute correctly
Expectedly, running "systemd-tmpfiles --create" manually afterwards went just fine and resulted in no errors, leading to a conclusion that executing /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook failed, but only when it was run from within pacman.
After a detailed and rather lengthy investigation, it turned out the code in lib/libalpm/util.c that executes the hooks by forking a child has some bugs that allow the error to occur under certain circumstances. In particular, function _alpm_run_chroot() that executes hooks in a fork()ed child does not employ dup2() properly, but instead executes close() followed by dup2().
The man page for dup2() clearly states in the quotation below that attempts to re-implement the equivalent functionality, which is the case in function _alpm_run_chroot(), must be avoided:
The dup2() system call performs the same task as dup(), but instead of using the lowest-numbered unused file descriptor, it uses the file descriptor number specified in newfd. In other words, the file descriptor newfd is adjusted so that it now refers to the same open file description as oldfd.
If the file descriptor newfd was previously open, it is closed before being reused; the close is performed silently (i.e., any errors during the close are not reported by dup2()).
The steps of closing and reusing the file descriptor newfd are performed atomically. This is important, because trying to implement equivalent functionality using close(2) and dup() would be subject to race conditions, whereby newfd might be reused between the two steps. Such reuse could happen because the main program is interrupted by a signal handler that allocates a file descriptor, or because a parallel thread allocates a file descriptor.
As a result, a condition can occur in which the file descriptor 0 is closed by calling close(0), and left closed after the while loop that fails to execute dup2() because of receiving EBUSY, resulting in the described issues. Also, failed attempts to execute dup2() should be treated as fatal errors instead of being silently ignored.
I am trying to reproduce this to wrap my head around what's actually happening and verify that this actually fixes it, but I am unable to get pacman to execute a hook with fd 0 closed with or without the ARM patches. Can you provide a reproducible case?
Ah, I know very well how it feels, I spent at least a couple of days hunting the issue down and fixing it. One of the issues with reproducing the error is that you must have an old systemd version installed, which still contains assert(fd) checks in src/tmpfiles/tmpfiles.c that actually trigger the error. IIRC, systemd removed those checks about a couple of years ago, presumably to fix similar complaints. You may also want to see my original report of the issue, which also contains the fix: https://gitlab.manjaro.org/manjaro-arm/packages/core/pacman/-/issues/1
On 2023-08-21 18:32, Dragan Simic wrote:
On 2023-08-21 18:19, Andrew Gregory wrote:
On 08/21/23 at 04:42am, Dragan Simic wrote:
Running "pacman -Syyu" to perform a full system update resulted in a rather conspicuous error message in the output of pacman:
( 3/18) Creating temporary files... Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting. /usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create error: command failed to execute correctly
Expectedly, running "systemd-tmpfiles --create" manually afterwards went just fine and resulted in no errors, leading to a conclusion that executing /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook failed, but only when it was run from within pacman.
After a detailed and rather lengthy investigation, it turned out the code in lib/libalpm/util.c that executes the hooks by forking a child has some bugs that allow the error to occur under certain circumstances. In particular, function _alpm_run_chroot() that executes hooks in a fork()ed child does not employ dup2() properly, but instead executes close() followed by dup2().
The man page for dup2() clearly states in the quotation below that attempts to re-implement the equivalent functionality, which is the case in function _alpm_run_chroot(), must be avoided:
The dup2() system call performs the same task as dup(), but instead of using the lowest-numbered unused file descriptor, it uses the file descriptor number specified in newfd. In other words, the file descriptor newfd is adjusted so that it now refers to the same open file description as oldfd.
If the file descriptor newfd was previously open, it is closed before being reused; the close is performed silently (i.e., any errors during the close are not reported by dup2()).
The steps of closing and reusing the file descriptor newfd are performed atomically. This is important, because trying to implement equivalent functionality using close(2) and dup() would be subject to race conditions, whereby newfd might be reused between the two steps. Such reuse could happen because the main program is interrupted by a signal handler that allocates a file descriptor, or because a parallel thread allocates a file descriptor.
As a result, a condition can occur in which the file descriptor 0 is closed by calling close(0), and left closed after the while loop that fails to execute dup2() because of receiving EBUSY, resulting in the described issues. Also, failed attempts to execute dup2() should be treated as fatal errors instead of being silently ignored.
I am trying to reproduce this to wrap my head around what's actually happening and verify that this actually fixes it, but I am unable to get pacman to execute a hook with fd 0 closed with or without the ARM patches. Can you provide a reproducible case?
Ah, I know very well how it feels, I spent at least a couple of days hunting the issue down and fixing it.
One of the issues with reproducing the error is that you must have an old systemd version installed, which still contains assert(fd) checks in src/tmpfiles/tmpfiles.c that actually trigger the error. IIRC, systemd removed those checks about a couple of years ago, presumably to fix similar complaints.
You may also want to see my original report of the issue, which also contains the fix: https://gitlab.manjaro.org/manjaro-arm/packages/core/pacman/-/issues/1
Just checking, have you managed to reproduce the issue? If not, please let me know, and I'll put together some kind of instructions for creating an appropriate environment and a test case that reproduces the error.
On 08/25/23 at 07:36pm, Dragan Simic wrote:
On 2023-08-21 18:32, Dragan Simic wrote:
On 2023-08-21 18:19, Andrew Gregory wrote: ...
I am trying to reproduce this to wrap my head around what's actually happening and verify that this actually fixes it, but I am unable to get pacman to execute a hook with fd 0 closed with or without the ARM patches. Can you provide a reproducible case?
Ah, I know very well how it feels, I spent at least a couple of days hunting the issue down and fixing it.
One of the issues with reproducing the error is that you must have an old systemd version installed, which still contains assert(fd) checks in src/tmpfiles/tmpfiles.c that actually trigger the error. IIRC, systemd removed those checks about a couple of years ago, presumably to fix similar complaints.
You may also want to see my original report of the issue, which also contains the fix: https://gitlab.manjaro.org/manjaro-arm/packages/core/pacman/-/issues/1
Just checking, have you managed to reproduce the issue? If not, please let me know, and I'll put together some kind of instructions for creating an appropriate environment and a test case that reproduces the error.
I still haven't been able to reproduce pacman running a hook with fd 0 closed. I've put together a hook/script to test the hook execution environment: https://github.com/andrewgregory/pachooks/commit/b91e5de6f203b9be9fbf2f4fec5...
On 2023-09-04 04:52, Andrew Gregory wrote:
On 08/25/23 at 07:36pm, Dragan Simic wrote:
On 2023-08-21 18:32, Dragan Simic wrote:
On 2023-08-21 18:19, Andrew Gregory wrote: ...
I am trying to reproduce this to wrap my head around what's actually happening and verify that this actually fixes it, but I am unable to get pacman to execute a hook with fd 0 closed with or without the ARM patches. Can you provide a reproducible case?
Ah, I know very well how it feels, I spent at least a couple of days hunting the issue down and fixing it.
One of the issues with reproducing the error is that you must have an old systemd version installed, which still contains assert(fd) checks in src/tmpfiles/tmpfiles.c that actually trigger the error. IIRC, systemd removed those checks about a couple of years ago, presumably to fix similar complaints.
You may also want to see my original report of the issue, which also contains the fix: https://gitlab.manjaro.org/manjaro-arm/packages/core/pacman/-/issues/1
Just checking, have you managed to reproduce the issue? If not, please let me know, and I'll put together some kind of instructions for creating an appropriate environment and a test case that reproduces the error.
I still haven't been able to reproduce pacman running a hook with fd 0 closed. I've put together a hook/script to test the hook execution environment: https://github.com/andrewgregory/pachooks/commit/b91e5de6f203b9be9fbf2f4fec5...
I apologize for my delayed response. I've had some health issues, unfortunately, and I still haven't recovered fully. Thank you very much for putting together this test hook for pacman, which is really neat. I'll get back to you with the further steps to reproduce the issue as soon as possible.
participants (2)
-
Andrew Gregory
-
Dragan Simic