On 2023-08-21 18:19, Andrew Gregory wrote:
On 08/21/23 at 04:42am, Dragan Simic wrote:
Running "pacman -Syyu" to perform a full system update resulted in a rather conspicuous error message in the output of pacman:
( 3/18) Creating temporary files... Assertion 'fd' failed at src/tmpfiles/tmpfiles.c:843, function fd_set_perms(). Aborting. /usr/share/libalpm/scripts/systemd-hook: line 28: 1735 Aborted (core dumped) /usr/bin/systemd-tmpfiles --create error: command failed to execute correctly
Expectedly, running "systemd-tmpfiles --create" manually afterwards went just fine and resulted in no errors, leading to a conclusion that executing /usr/share/libalpm/hooks/30-systemd-tmpfiles.hook failed, but only when it was run from within pacman.
After a detailed and rather lengthy investigation, it turned out the code in lib/libalpm/util.c that executes the hooks by forking a child has some bugs that allow the error to occur under certain circumstances. In particular, function _alpm_run_chroot() that executes hooks in a fork()ed child does not employ dup2() properly, but instead executes close() followed by dup2().
The man page for dup2() clearly states in the quotation below that attempts to re-implement the equivalent functionality, which is the case in function _alpm_run_chroot(), must be avoided:
The dup2() system call performs the same task as dup(), but instead of using the lowest-numbered unused file descriptor, it uses the file descriptor number specified in newfd. In other words, the file descriptor newfd is adjusted so that it now refers to the same open file description as oldfd.
If the file descriptor newfd was previously open, it is closed before being reused; the close is performed silently (i.e., any errors during the close are not reported by dup2()).
The steps of closing and reusing the file descriptor newfd are performed atomically. This is important, because trying to implement equivalent functionality using close(2) and dup() would be subject to race conditions, whereby newfd might be reused between the two steps. Such reuse could happen because the main program is interrupted by a signal handler that allocates a file descriptor, or because a parallel thread allocates a file descriptor.
As a result, a condition can occur in which the file descriptor 0 is closed by calling close(0), and left closed after the while loop that fails to execute dup2() because of receiving EBUSY, resulting in the described issues. Also, failed attempts to execute dup2() should be treated as fatal errors instead of being silently ignored.
I am trying to reproduce this to wrap my head around what's actually happening and verify that this actually fixes it, but I am unable to get pacman to execute a hook with fd 0 closed with or without the ARM patches. Can you provide a reproducible case?
Ah, I know very well how it feels, I spent at least a couple of days hunting the issue down and fixing it. One of the issues with reproducing the error is that you must have an old systemd version installed, which still contains assert(fd) checks in src/tmpfiles/tmpfiles.c that actually trigger the error. IIRC, systemd removed those checks about a couple of years ago, presumably to fix similar complaints. You may also want to see my original report of the issue, which also contains the fix: https://gitlab.manjaro.org/manjaro-arm/packages/core/pacman/-/issues/1