Problem generating the telegraf package in a chroot
Hello folks, Let me tell you, I've been maintaining the Telegraf[1] package in AUR for some time now, and whenever I update it (or generate it), I have to do it in a container because it's not possible with the `extra-x86_64- build` command. If you look at the PKGBUILD, there is a point where the configuration is generated with the newly created binary itself. It is the last line of `build()`, `./build/telegraf config > telegraf.conf`. It is at this point that the execution stops completely if I build the package with `extra-x86_64-build`. Can you think of any reason why this might happen or how it could be fixed? This step is necessary because there is no other way to generate the base configuration. Best regards. [1]: https://aur.archlinux.org/packages/telegraf -- Óscar García Amor | ogarcia at moire.org | http://ogarcia.me
On 9/30/25 9:28 AM, Óscar García Amor wrote:
Hello folks,
Hey
Let me tell you, I've been maintaining the Telegraf[1] package in AUR for some time now, and whenever I update it (or generate it), I have to do it in a container because it's not possible with the `extra-x86_64- build` command.
If you look at the PKGBUILD, there is a point where the configuration is generated with the newly created binary itself. It is the last line of `build()`, `./build/telegraf config > telegraf.conf`. It is at this point that the execution stops completely if I build the package with `extra-x86_64-build`.
Can you think of any reason why this might happen or how it could be fixed? This step is necessary because there is no other way to generate the base configuration.
For some reason it seems like, despite the "config" argument (which you would expect to solely generate the config), the telegraf binary runs as a daemon and therefore never stops unless you kill it. The `--once` flag, which supposedly should prevent this to happen according to the help message, does not help here. A sketchy workaround is to run `timeout 2s ./build/telegraf config > telegraf.conf || true` The timeout command will kill the process after 2s (to ensure it doesn't run endlessly) and the `|| true` part will ensure the build doesn't fail due to the error exit code produced by the process being killed by `timeout`. As I said, it's a bit of a sketchy workaround but should allow the build to succeed in a clean chroot. I guess this should probably be reported and properly fixed upstream :)
Best regards.
-- Regards, Robin Candau / Antiz
On Tue, Sep 30, 2025 at 10:43:19AM +0200, Robin Candau wrote:
On 9/30/25 9:28 AM, Óscar García Amor wrote:
Hello folks,
Hey
Let me tell you, I've been maintaining the Telegraf[1] package in AUR for some time now, and whenever I update it (or generate it), I have to do it in a container because it's not possible with the `extra-x86_64- build` command.
If you look at the PKGBUILD, there is a point where the configuration is generated with the newly created binary itself. It is the last line of `build()`, `./build/telegraf config > telegraf.conf`. It is at this point that the execution stops completely if I build the package with `extra-x86_64-build`.
Can you think of any reason why this might happen or how it could be fixed? This step is necessary because there is no other way to generate the base configuration.
For some reason it seems like, despite the "config" argument (which you would expect to solely generate the config), the telegraf binary runs as a daemon and therefore never stops unless you kill it. The `--once` flag, which supposedly should prevent this to happen according to the help message, does not help here.
The above was discussed further downthread, included for context only.
A sketchy workaround is to run `timeout 2s ./build/telegraf config > telegraf.conf || true`
The timeout command will kill the process after 2s (to ensure it doesn't run endlessly) and the `|| true` part will ensure the build doesn't fail due to the error exit code produced by the process being killed by `timeout`. As I said, it's a bit of a sketchy workaround but should allow the build to succeed in a clean chroot.
In similar cases, I've found it useful to do something like: timeout 2s ... || [ "$?" -eq 124 ] ...or whatever code the previous command will exit with in the timeout case. This helps stop the execution of the whole thing if the program exits with a different code - if it encounters a different error, not the timeout. (of course, if this is certain to be Bash, [[ $? -eq 124 ]] is valid too) G'luck, Peter -- Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com PGP key: https://www.ringlet.net/roam/roam.key.asc Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
On 2025-09-30 09:28, Óscar García Amor wrote:
It is the last line of `build()`, `./build/telegraf config > telegraf.conf`. It is at this point that the execution stops completely if I build the package with `extra-x86_64-build`.
Can you think of any reason why this might happen or how it could be fixed?
I've reproduced the issue, and boiled it down to a likely interaction of systemd-nspawn and go's threading/process handling. systemd-nspawn is used by extra-x86_64-build implicitly to containerize the build. The call of `telegraf config` does output, but hangs on termination in an epoll_pwait/futex wait loop as described in [issue 55120](https://github.com/golang/go/issues/55120). I boiled reproduction down to running the build once until it hangs, Ctrl-C to abort it, then running the `telegraf config` manually in a minimal systemd-nspawn container. Always hangs. I installed strace in the container root as well to see where exactly the process hangs, which is how I found the unresolved go issue 55120: ``` sudo systemd-nspawn -D /var/lib/archbuild/extra-x86_64/gyroplast /build/telegraf/src/telegraf-1.36.2/build/telegraf config sudo systemd-nspawn -D /var/lib/archbuild/extra-x86_64/gyroplast strace -x -y -v -ff /build/telegraf/src/telegraf-1.36.2/build/telegraf config ``` Unfortunately I'm running out of time to look into this further, but as this is easily reproducible, someone else with better knowledge of go and/or systemd-nspawn peculiarities may pick up here. My gut says this may be a systemd-nspawn configuration issue, affecting go threading or signal handling (there's a SIGURG passed between processes) hindering go's process cleanup on termination. Using `taskset -a -c 1 ./build/telegraf config` does NOT work around the issue, the binary still hangs in the same way. My strace -xyffv excerpt of periodically repeated hang loop, one period: <unfinished ...> [pid 5] <... epoll_pwait resumed>, [], 128, 999, NULL, 0) = 0 [pid 4] <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 5] futex(0x5f2c52e08130, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 4] sched_yield( <unfinished ...> [pid 5] <... futex resumed>) = 0 [pid 4] <... sched_yield resumed>) = 0 [pid 5] openat(AT_FDCWD</>, "/proc/3/stat", O_RDONLY|O_CLOEXEC <unfinished ...> [pid 4] sched_getaffinity(0, 8192 <unfinished ...> [pid 5] <... openat resumed>) = 7</proc/3/stat> [pid 4] <... sched_getaffinity resumed>, [1]) = 8 [pid 5] fcntl(7</proc/3/stat>, F_GETFL <unfinished ...> [pid 4] pread64(3</sys/fs/cgroup/cpu.max> <unfinished ...> [pid 5] <... fcntl resumed>) = 0x8000 (flags O_RDONLY|O_LARGEFILE) [pid 4] <... pread64 resumed>, "max 100000\n", 64, 0) = 11 [pid 5] fcntl(7</proc/3/stat>, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE <unfinished ...> [pid 4] futex(0x5f2c52dfdc50, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 5] <... fcntl resumed>) = 0 [pid 3] <... futex resumed>) = 0 [pid 4] <... futex resumed>) = 1 [pid 3] epoll_ctl(5<anon_inode:[eventpoll]>, EPOLL_CTL_ADD, 7</proc/3/stat>, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data=0x7ddb66bb6c000015} <unfinished ...> [pid 5] futex(0xc0000e3158, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 3] <... epoll_ctl resumed>) = -1 EPERM (Operation not permitted) [pid 4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...> [pid 3] fcntl(7</proc/3/stat>, F_GETFL) = 0x8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE) [pid 4] <... nanosleep resumed>, NULL) = 0 [pid 3] fcntl(7</proc/3/stat>, F_SETFL, O_RDONLY|O_LARGEFILE <unfinished ...> [pid 4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...> [pid 3] <... fcntl resumed>) = 0 [pid 3] fstat(7</proc/3/stat> <unfinished ...> [pid 4] <... nanosleep resumed>, NULL) = 0 [pid 3] <... fstat resumed>, {st_dev=makedev(0, 0x6c), st_ino=267663, st_mode=S_IFREG|0444, st_nlink=1, st_uid=0, st_gid=0, st_blksize=1024, st_blocks=0, st_size=0, st_atime=1759226607 /* 2025-09-30T12:03:27.583355914+0200 */, st_atime_nsec=583355914, st_mtime=1759226607 /* 2025-09-30T12:03:27.583355914+0200 */, st_mtime_nsec=583355914, st_ctime=1759226607 /* 2025-09-30T12:03:27.583355914+0200 */, st_ctime_nsec=583355914}) = 0 [pid 4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...> [pid 3] read(7</proc/3/stat>, "3 (telegraf) R 1 1 1 34816 1 419"..., 512) = 317 [pid 4] <... nanosleep resumed>, NULL) = 0 [pid 3] read(7</proc/3/stat> <unfinished ...> [pid 4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...> [pid 3] <... read resumed>, "", 579) = 0 [pid 3] close(7</proc/3/stat> <unfinished ...> [pid 4] <... nanosleep resumed>, NULL) = 0 [pid 3] <... close resumed>) = 0 [pid 4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...> [pid 3] sysinfo({uptime=8492, loads=[26912, 310112, 222848], totalram=33252589568, freeram=13363355648, sharedram=1062043648, bufferram=482197504, totalswap=17178816512, freeswap=17178816512, procs=1240, totalhigh=0, freehigh=0, mem_unit=1}) = 0 [pid 4] <... nanosleep resumed>, NULL) = 0 [pid 3] epoll_pwait(5<anon_inode:[eventpoll]> <unfinished ...> [pid 4] nanosleep({tv_sec=0, tv_nsec=20000} <unfinished ...> [pid 3] <... epoll_pwait resumed>, [], 128, 0, NULL, 0) = 0 [pid 3] epoll_pwait(5<anon_inode:[eventpoll]> <unfinished ...> [pid 4] <... nanosleep resumed>, NULL) = 0 [pid 4] futex(0x5f2c52e08130, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=999750431} Good luck, anyone else! — Dennis
On 9/30/25 12:08 PM, Dennis Herbrich wrote:
On 2025-09-30 09:28, Óscar García Amor wrote:
It is the last line of `build()`, `./build/telegraf config > telegraf.conf`. It is at this point that the execution stops completely if I build the package with `extra-x86_64-build`.
Can you think of any reason why this might happen or how it could be fixed?
I've reproduced the issue, and boiled it down to a likely interaction of systemd-nspawn and go's threading/process handling. systemd-nspawn is used by extra-x86_64-build implicitly to containerize the build.
The call of `telegraf config` does output, but hangs on termination in an epoll_pwait/futex wait loop as described in [issue 55120](https://github.com/golang/go/issues/55120).
I only took a quick look at the problem and unjustly guessed that the telegraf binary was launching itself as "daemon" mode in my initial response [1] for some reason, but the above analyse feels way more accurate than my wild guess! 😄 Thanks for research and the detailed answer! The (somewhat sketchy) workaround I gave in my initial mail still stands though (while waiting for this systemd-nspawn + go process handling issue to be solved).
Good luck, anyone else!
— Dennis
[1] https://lists.archlinux.org/archives/list/aur-general@lists.archlinux.org/me... -- Regards, Robin Candau / Antiz
El mar, 30-09-2025 a las 12:17 +0200, Robin Candau escribió:
The (somewhat sketchy) workaround I gave in my initial mail still stands though (while waiting for this systemd-nspawn + go process handling issue to be solved).
I have tested the workaround and it works correctly. At least we have a temporary solution until the problem is fixed upstream. Thank you very much again! Best regards. -- Óscar García Amor | ogarcia at moire.org | http://ogarcia.me
On 9/30/25 3:17 PM, Óscar García Amor wrote:
El mar, 30-09-2025 a las 12:17 +0200, Robin Candau escribió:
The (somewhat sketchy) workaround I gave in my initial mail still stands though (while waiting for this systemd-nspawn + go process handling issue to be solved).
I have tested the workaround and it works correctly. At least we have a temporary solution until the problem is fixed upstream.
Cool! This is a bit of a hack, but at least it allows the package to be built in a clean chroot, which is great :)
Thank you very much again!
You're welcome! Glad I could help :D
Best regards.
-- Regards, Robin Candau / Antiz
El mar, 30-09-2025 a las 12:08 +0200, Dennis Herbrich escribió:
I've reproduced the issue [..]
Thank you both very much for your analysis. It is clear that the problem arises in the iteration of systemd-nspawn and go, but as you rightly point out, it is not clear which of the two should solve it. Let's see if someone with more knowledge can give us a hand.
The (somewhat sketchy) workaround I gave in my initial mail still stands though (while waiting for this systemd-nspawn + go process handling issue to be solved).
I'll have to check it out, but it could be a good workaround in the meantime.
Good luck, anyone else!
Thanks again! -- Óscar García Amor | ogarcia at moire.org | http://ogarcia.me
participants (4)
-
Dennis Herbrich
-
Peter Pentchev
-
Robin Candau
-
Óscar García Amor