During a full system update of a Manjaro ARM installation, on
an ARM-based computer, of course, I spotted a highly conspicuous
error message in the output of pacman:
( 3/18) Creating temporary files...
Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting.
/usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create
error: command failed to execute correctly
Here's also an excerpt from /var/log/pacman.log (the stack trace
was rather unusable, so I'll skip it for the sake of brevity):
[ALPM] running '30-systemd-tmpfiles.hook'...
[ALPM-SCRIPTLET] Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting.
[ALPM-SCRIPTLET] /usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create
Running "systemd-tmpfiles --create" manually afterwards resulted
in no errors, which made the error message even more conspicuous.
Thus, executing /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook
failed, but only when it was run from within pacman. I also saw
a few people complaining about the same error message on Manjaro
and Arch Linux forums, even with one complaint dating more than
a few years ago, but nobody came through with a fix or workaround.
After a detailed and rather lengthy investigation, it turned out
that the root cause was twofold, as described below:
1) For its pacman package, Arch Linux ARM applies a patch named
0003-Revert-alpm_run_chroot-always-connect-parent2child-p.patch
that reverts rather old pacman commit 1d6583a5, for an unknown
reason. This patch causes the error like clockwork.
2) The code in pacman's lib/libalpm/util.c that executes the hooks
by forking a child has some rather subtle bugs that allow the
error to occur under certain circumstances.
Regarding the first point, the patch from Arch Linux ARM creates
a condition in which the file descriptor 0 is closed by calling
close(0) and left closed when the executed hook has no option
"NeedsTargets" specified, which is the case for the hook mentioned
above, /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook. As a
result, the first call to open() during the execution of the hook
returns 0 as the file descriptor, because 0 is the lowest currently
available value. It would all go unnoticed, but systemd-tmpfiles
performed assert(fd) checks in its file src/tmpfiles/tmpfiles.c,
which failed because fd equaled 0. These checks seem to have been
removed in the meantime, which effectively made the error go away,
but the original issue still remains.
Regarding the second point, function _alpm_run_chroot() that executes
hooks in a fork()ed child does not execute dup2() properly, but
instead executes close() followed by dup2(). The man page for dup2()
clearly states that such attempts to re-implement the equivalent
functionality must be avoided, as visible in this quotation:
The dup2() system call performs the same task as dup(), but
instead of using the lowest-numbered unused file descriptor,
it uses the file descriptor number specified in newfd. In other
words, the file descriptor newfd is adjusted so that it now
refers to the same open file description as oldfd.
If the file descriptor newfd was previously open, it is closed
before being reused; the close is performed silently (i.e., any
errors during the close are not reported by dup2()).
The steps of closing and reusing the file descriptor newfd are
performed atomically. This is important, because trying to
implement equivalent functionality using close(2) and dup()
would be subject to race conditions, whereby newfd might be
reused between the two steps. Such reuse could happen because
the main program is interrupted by a signal handler that
allocates a file descriptor, or because a parallel thread
allocates a file descriptor.
As a result, a condition can occur in which the file descriptor 0 is
closed by calling close(0), and left closed after the while loop that
fails to execute dup2() because of receiving EBUSY, resulting in the
original issue. On top of that, failed attempts to execute dup2()
should be treated as fatal errors instead of being silently ignored.
Let's improve the code to prevent the issues described in the second
point, while not applying the above-mentioned Arch Linux ARM package
patch fixes the issues decribed in the first point.
While there, perform a minor cleanup as well, to make the formatting
of the code a tiny bit more consistent.
Signed-off-by: Dragan Simic <dsimic(a)manjaro.org>
---
lib/libalpm/util.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/lib/libalpm/util.c b/lib/libalpm/util.c
index dffa3b51..97e87c6c 100644
--- a/lib/libalpm/util.c
+++ b/lib/libalpm/util.c
@@ -639,28 +639,39 @@ int _alpm_run_chroot(alpm_handle_t *handle, const char *cmd, char *const argv[],
if(pid == 0) {
/* this code runs for the child only (the actual chroot/exec) */
- close(0);
- close(1);
- close(2);
- while(dup2(child2parent_pipefd[HEAD], 1) == -1 && errno == EINTR);
- while(dup2(child2parent_pipefd[HEAD], 2) == -1 && errno == EINTR);
- while(dup2(parent2child_pipefd[TAIL], 0) == -1 && errno == EINTR);
- close(parent2child_pipefd[TAIL]);
close(parent2child_pipefd[HEAD]);
close(child2parent_pipefd[TAIL]);
+ while(dup2(child2parent_pipefd[HEAD], STDERR_FILENO) == -1) {
+ if(errno != EINTR) {
+ /* at this point, the child cannot talk through the parent */
+ exit(1);
+ }
+ }
+ while(dup2(parent2child_pipefd[TAIL], STDIN_FILENO) == -1) {
+ if(errno != EINTR) {
+ /* use fprintf() instead of _alpm_log() to send output through the parent */
+ fprintf(stderr, _("could not redirect standard input (%s)\n"), strerror(errno));
+ exit(1);
+ }
+ }
+ close(parent2child_pipefd[TAIL]);
+ while(dup2(child2parent_pipefd[HEAD], STDOUT_FILENO) == -1) {
+ if(errno != EINTR) {
+ fprintf(stderr, _("could not redirect standard output (%s)\n"), strerror(errno));
+ exit(1);
+ }
+ }
close(child2parent_pipefd[HEAD]);
if(cwdfd >= 0) {
close(cwdfd);
}
- /* use fprintf instead of _alpm_log to send output through the parent */
if(chroot(handle->root) != 0) {
fprintf(stderr, _("could not change the root directory (%s)\n"), strerror(errno));
exit(1);
}
if(chdir("/") != 0) {
- fprintf(stderr, _("could not change directory to %s (%s)\n"),
- "/", strerror(errno));
+ fprintf(stderr, _("could not change directory to %s (%s)\n"), "/", strerror(errno));
exit(1);
}
/* bash assumes it's being run under rsh/ssh if stdin is a socket and
--
2.33.1