[arch-general] pacman Segfault on full system update
I have an Arch guest in VirtualBox running on an Arch server. I installed virtualbox-bin 1-34-2 from AUR with 5.18 patch and the install on the host went fine. I access the guest headless using rdesktop over the LAN. (have done so for years). Doing a full system update from 5.17.7 to 5.18.6 (about 25 days of updates) on guest. Pacman segfaulted leaving post transaction hooks unrun. I reinstalled the kernel and systemd to force the initrd to be made so the guest is bootable. However, I'm not sure what additional hooks were missed. The guest window when X is started is 1440 x 864 but the only portion of the desktop shown is 1024 x 768 (rest is black and not shown, taskbar unable to be seen, etc.) After the pacman segfault, now the console is also missing the last 7-8 lines. I have to hit clear and start over at the top to see what I'm typing. How can I recover this system? I was able to rsync the last 1000 lines of pacman.log from the guest to the host, but the shared folders no longer work. The pacman.log summary of how the issue occurred is: [2022-06-24T02:06:26-0500] [PACMAN] Running 'pacman -Sy --needed archlinux-keyring' [2022-06-24T02:06:26-0500] [PACMAN] synchronizing package lists [2022-06-24T02:07:03-0500] [PACMAN] Running 'pacman -Su' [2022-06-24T02:07:03-0500] [PACMAN] starting full system upgrade [2022-06-24T02:07:30-0500] [ALPM] running '60-mkinitcpio-remove.hook'... [2022-06-24T02:07:30-0500] [ALPM] transaction started [2022-06-24T02:07:30-0500] [ALPM] upgraded iana-etc (20220427-1 -> 20220603-1) [2022-06-24T02:07:30-0500] [ALPM] warning: /etc/locale.gen installed as /etc/locale.gen.pacnew [2022-06-24T02:07:31-0500] [ALPM] upgraded glibc (2.35-5 -> 2.35-6) [2022-06-24T02:07:31-0500] [ALPM-SCRIPTLET] Generating locales... [2022-06-24T02:07:32-0500] [ALPM-SCRIPTLET] en_US.UTF-8... done [2022-06-24T02:07:32-0500] [ALPM-SCRIPTLET] Generation complete. <snipped lots of packages installed> ... [2022-06-24T02:09:04-0500] [ALPM] upgraded linux-firmware-whence (20220509.b19cbdc-1 -> 20220610.7b71b75-1) Segfault Remove db.lock and continue: [2022-06-24T02:12:57-0500] [PACMAN] Running 'pacman -Syu' [2022-06-24T02:12:57-0500] [PACMAN] synchronizing package lists [2022-06-24T02:12:59-0500] [PACMAN] starting full system upgrade [2022-06-24T02:13:16-0500] [ALPM] transaction started [2022-06-24T02:13:17-0500] [ALPM] error: could not extract /usr/share/licenses/linux-firmware/LICENCE.nvidia (Zstd decompression failed: Restored data doesn't match checksum) [2022-06-24T02:13:17-0500] [ALPM] error: problem occurred while upgrading linux-firmware [2022-06-24T02:13:17-0500] [ALPM] upgraded linux-firmware (20220509.b19cbdc-1 -> 20220610.7b71b75-1) [2022-06-24T02:13:17-0500] [ALPM] transaction failed So I delete the firmware file from /var/cache/pacman/pkg and run the upgrade again. To my surprise, it doesn't download the firmware package again, it just considers it installed and proceeds to other packages, e.g. [2022-06-24T02:15:49-0500] [PACMAN] Running 'pacman -Syu' [2022-06-24T02:15:49-0500] [PACMAN] synchronizing package lists [2022-06-24T02:15:51-0500] [PACMAN] starting full system upgrade [2022-06-24T02:16:03-0500] [ALPM] transaction started [2022-06-24T02:16:03-0500] [ALPM] upgraded python (3.10.4-1 -> 3.10.5-1) [2022-06-24T02:16:09-0500] [ALPM] upgraded linux-headers (5.17.7.arch1-1 -> 5.18.6.arch1-1) [2022-06-24T02:16:11-0500] [ALPM] error: could not extract /usr/lib/mysql/plugin/client_ed25519.so (Zstd decompression failed: Restored data doesn't match checksum) [2022-06-24T02:16:11-0500] [ALPM] error: problem occurred while upgrading mariadb-libs [2022-06-24T02:16:11-0500] [ALPM] upgraded mariadb-libs (10.7.3-1 -> 10.8.3-1) [2022-06-24T02:16:11-0500] [ALPM] transaction failed Crash again and delete mariadb-libs from /var/cache... and try again: [2022-06-24T02:19:16-0500] [PACMAN] Running 'pacman -Syu' [2022-06-24T02:19:16-0500] [PACMAN] synchronizing package lists [2022-06-24T02:19:17-0500] [PACMAN] starting full system upgrade [2022-06-24T02:19:26-0500] [ALPM] transaction started [2022-06-24T02:19:26-0500] [ALPM] upgraded mariadb-clients (10.7.3-1 -> 10.8.3-1) [2022-06-24T02:19:27-0500] [ALPM] warning: directory permissions differ on /usr/lib/mysql/plugin/auth_pam_tool_dir/ filesystem: 700 package: 755 [2022-06-24T02:19:27-0500] [ALPM] upgraded mariadb (10.7.3-1 -> 10.8.3-1) [2022-06-24T02:19:27-0500] [ALPM-SCRIPTLET] :: MariaDB was updated to a new feature release. To update the data run: [2022-06-24T02:19:27-0500] [ALPM-SCRIPTLET] systemctl restart mariadb.service && mariadb-upgrade -u root -p [2022-06-24T02:19:27-0500] [ALPM] warning: /etc/pacman.d/mirrorlist installed as /etc/pacman.d/mirrorlist.pacnew [2022-06-24T02:19:27-0500] [ALPM] upgraded pacman-mirrorlist (20220501-1 -> 20220605-1) <snip lots more packages installed> ... [2022-06-24T02:19:34-0500] [ALPM] upgraded tmux (3.2_a-1 -> 3.3_a-2) [2022-06-24T02:19:35-0500] [ALPM] error: could not extract /usr/share/doc/valgrind/valgrind_manual.ps (Zstd decompression failed: Restored data doesn't match checksum) [2022-06-24T02:19:35-0500] [ALPM] error: problem occurred while upgrading valgrind [2022-06-24T02:19:35-0500] [ALPM] upgraded valgrind (3.19.0-3 -> 3.19.0-4) [2022-06-24T02:19:35-0500] [ALPM] transaction failed Crash again. Rinse, repeat and try again: [2022-06-24T02:20:14-0500] [PACMAN] Running 'pacman -Syu' [2022-06-24T02:20:14-0500] [PACMAN] synchronizing package lists [2022-06-24T02:20:15-0500] [PACMAN] starting full system upgrade [2022-06-24T02:20:20-0500] [ALPM] transaction started [2022-06-24T02:20:21-0500] [ALPM] upgraded vim-runtime (8.2.4827-1 -> 8.2.5046-2) [2022-06-24T02:20:21-0500] [ALPM] upgraded vim (8.2.4827-1 -> 8.2.5046-2) [2022-06-24T02:20:21-0500] [ALPM] upgraded xbitmaps (1.1.2-2 -> 1.1.2-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xcb-util-cursor (0.1.3-3 -> 0.1.3-4) [2022-06-24T02:20:21-0500] [ALPM] upgraded xcursor-themes (1.0.6-2 -> 1.0.6-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-bdftopcf (1.1-2 -> 1.1-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-font-util (1.3.2-2 -> 1.3.2-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-server-common (21.1.3-6 -> 21.1.3-7) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-server (21.1.3-6 -> 21.1.3-7) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-xcursorgen (1.0.7-2 -> 1.0.7-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-xmessage (1.0.5-2 -> 1.0.5-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-xmodmap (1.0.10-2 -> 1.0.10-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xorg-xsetroot (1.1.2-2 -> 1.1.2-3) [2022-06-24T02:20:21-0500] [ALPM] upgraded xterm (372-1 -> 372-2) [2022-06-24T02:20:21-0500] [ALPM] upgraded zsh (5.8.1-2 -> 5.9-1) [2022-06-24T02:20:24-0500] [ALPM] transaction completed [2022-06-24T02:20:24-0500] [ALPM] running '30-systemd-update.hook'... [2022-06-24T02:20:24-0500] [ALPM] running 'fontconfig.hook'... [2022-06-24T02:20:25-0500] [ALPM] running 'gtk-update-icon-cache.hook'... [2022-06-24T02:20:25-0500] [ALPM] running 'update-desktop-database.hook'... [2022-06-24T02:20:26-0500] [ALPM] running 'xorg-mkfontscale.hook'... Now system has many hooks not run, at this point if upgrade the kernel to force an image to be created and it succeeds: [2022-06-24T02:24:31-0500] [PACMAN] Running 'pacman -U /var/cache/pacman/pkg/linux-5.18.6.arch1-1-x86_64.pkg.tar.zst' [2022-06-24T02:24:37-0500] [ALPM] transaction started [2022-06-24T02:24:38-0500] [ALPM] reinstalled linux (5.18.6.arch1-1) [2022-06-24T02:24:40-0500] [ALPM] transaction completed [2022-06-24T02:24:41-0500] [ALPM] running '30-systemd-update.hook'... [2022-06-24T02:24:41-0500] [ALPM] running '60-depmod.hook'... [2022-06-24T02:24:46-0500] [ALPM] running '90-mkinitcpio-install.hook'... <snip -- succeeds without issue> [2022-06-24T02:25:05-0500] [ALPM-SCRIPTLET] ==> Creating zstd-compressed initcpio image: /boot/initramfs-linux-fall back.img [2022-06-24T02:25:06-0500] [ALPM-SCRIPTLET] ==> Image generation successful Reinstall systemd to force it to run its hooks and vim to see if I can't fix "can't allocate for color Orange", etc... [2022-06-24T02:46:41-0500] [PACMAN] Running 'pacman -S systemd systemd-libs vim vim-runtime' [2022-06-24T02:46:53-0500] [ALPM] transaction started [2022-06-24T02:46:53-0500] [ALPM] reinstalled systemd-libs (251.2-1) [2022-06-24T02:46:54-0500] [ALPM] reinstalled systemd (251.2-1) [2022-06-24T02:46:55-0500] [ALPM] reinstalled vim-runtime (8.2.5046-2) [2022-06-24T02:46:55-0500] [ALPM] reinstalled vim (8.2.5046-2) [2022-06-24T02:46:56-0500] [ALPM] transaction completed [2022-06-24T02:46:56-0500] [ALPM] running '20-systemd-sysusers.hook'... [2022-06-24T02:46:56-0500] [ALPM] running '30-systemd-catalog.hook'... [2022-06-24T02:46:56-0500] [ALPM] running '30-systemd-daemon-reload.hook'... [2022-06-24T02:46:56-0500] [ALPM] running '30-systemd-hwdb.hook'... [2022-06-24T02:46:57-0500] [ALPM] running '30-systemd-sysctl.hook'... [2022-06-24T02:46:57-0500] [ALPM] running '30-systemd-tmpfiles.hook'... [2022-06-24T02:46:57-0500] [ALPM] running '30-systemd-udev-reload.hook'... [2022-06-24T02:46:57-0500] [ALPM] running '30-systemd-update.hook'... [2022-06-24T02:46:57-0500] [ALPM] running '90-mkinitcpio-install.hook'... So after all of this manual running, I can start the VM and log in as user or root, but I'm not sure what got missed? Is there any way to know what hooks remain unrun or what packages the system thinks were installed but that were not actually installed? Why does X and the console only show 1024 x 768 of the guest even though the window (which is normally filled with the VM) is 1440 x 864? What do I need to look at/test to determine what shape my install is in? Sorry for the long post, but without the logs, it would have been impossible to explain what happened or where it happened in the upgrade process. Let me know if there is anything else I can post that will help. -- David C. Rankin, J.D.,P.E.
Op vr 24 jun. 2022 10:54 schreef David C. Rankin via arch-general < arch-general@lists.archlinux.org>:
I have an Arch guest in VirtualBox running on an Arch server. I installed virtualbox-bin 1-34-2 from AUR with 5.18 patch and the install on the host went fine.
I access the guest headless using rdesktop over the LAN. (have done so for years). Doing a full system update from 5.17.7 to 5.18.6 (about 25 days of updates) on guest.
Pacman segfaulted leaving post transaction hooks unrun. I reinstalled the kernel and systemd to force the initrd to be made so the guest is bootable.
Perhaps you already did so, but did you check the health of the host (filesystem & RAM)? You mentioned a couple of errors, like segfaults and decompression errors. Especially that last one sounds like a corruption somewhere.... As these errors happened in a VM, is it possible to copy the virtual harddisk to another host (hardware) and trying again? To be honest: the RAM is my first suspect, but it could very well be an io error on the underlying host. I really doubt that it's caused by the installed software. Mvg, Guus Snijders
On 6/24/22 04:11, Guus Snijders via arch-general wrote:
Perhaps you already did so, but did you check the health of the host (filesystem & RAM)?
Host filesystem and RAM are 100% good. RAID arrays are scrubbed and mismatch_cnt is zero. Filesystem checks fine. No RAM or any type MCE on the host.
You mentioned a couple of errors, like segfaults and decompression errors. Especially that last one sounds like a corruption somewhere....
As these errors happened in a VM, is it possible to copy the virtual harddisk to another host (hardware) and trying again?
I haven't done that before, but no reason I wouldn't be able to move the install to hardware. If I recall VirtualBox even has a specific procedure to do that.
To be honest: the RAM is my first suspect, but it could very well be an io error on the underlying host. I really doubt that it's caused by the installed software.
I suspect it is a VM guest issue. This guest has been running since VirtualBox 4.X, so it is quite possible there is a guest issue. But given the issues that have cropped up since 5.17 with the display size no longer showing the full extent of the desktop - there is something going on with this VM. Though other guests show the same behavior, so it may be a VirtualBox issue itself. -- David C. Rankin, J.D.,P.E.
On Fri, 24 Jun 2022 at 14:34, David C. Rankin via arch-general <arch-general@lists.archlinux.org> wrote:
You mentioned a couple of errors, like segfaults and decompression errors. Especially that last one sounds like a corruption somewhere....
There are several recent threads in the forums about segfaults and corruption issues in Virtualbox. This is the thread with most posts: https://bbs.archlinux.org/viewtopic.php?id=276883
Hi David - curious if you've tried or willing to try using kvm - if it works might be a better path forward; thought clearly some work to migrate. gene
On Sat, 25 Jun 2022 07:54:24 -0400, Genes Lists via arch-general wrote:
Hi David - curious if you've tried or willing to try using kvm - if it works might be a better path forward; thought clearly some work to migrate.
Hi, I run a Debian Edu 11 guest in QEMU/KVM via Virtual Machine Manager on an Arch Linux host, but for good reasons I stay with my Windows 11 and older Windows guests with Virtualbox. One of the reasons to stay with Virtualbox is file sharing between guest and host. However, I'm using aur/virtualbox-bin with an Arch host running 4.19 rt-patched LTS kernels [1]. At least my Windows 7, 10 and 11 guests don't suffer from issues with either virtualbox-bin 6.1.34-1 and 6.1.34-2. I don't know if it is fixed yet, but the reason I build my own 4.19 LTS kernels are 1. The host doesn't work with new kernels and the "intel" driver. Migrating to the "modesetting" driver is a PITA, no help for me. However, older 5+ kernels should still work with the "intel" driver. 2. Those "good" 5+ kernels with the rt patch applied don't work with DKMS to build the Virtualbox modules. Disclaimer: It might be fixed, I just didn't test 5.18.6.arch1-1 and a lot of earlier Arch kernel packages. My point is, that temporarily building current 4.19 kernels for the host and/or the guest might solve the issue for other users, too. Note, 4.19 with or without the rt-patch is still LTS by https://www.kernel.org/ , but 4.19 it is also SuperLTS by https://wiki.linuxfoundation.org/civilinfrastructureplatform/start until 2029. Regards, Ralf [1] [rocketmouse@archlinux ~]$ grep virtualbox-bin\ \( /var/log/pacman.log | grep up | tail -2 [2022-04-21T05:54:09+0200] [ALPM] upgraded virtualbox-bin (6.1.32-1 -> 6.1.34-1) [2022-06-18T03:24:03+0200] [ALPM] upgraded virtualbox-bin (6.1.34-1 -> 6.1.34-2) [rocketmouse@archlinux ~]$ pacman -Q linux{,-rt{,-cornflower,-pussytoes,-securityink}}|cut -d\ -f2 5.18.6.arch1-1 4.19.246_rt110-0.1000 4.19.245_rt109-0.300 4.19.240_rt108-0.300 4.19.237_rt107-0.300
Op vr 24 jun. 2022 15:34 schreef David C. Rankin via arch-general < arch-general@lists.archlinux.org>:
On 6/24/22 04:11, Guus Snijders via arch-general wrote:
Perhaps you already did so, but did you check the health of the host (filesystem & RAM)?
Host filesystem and RAM are 100% good. RAID arrays are scrubbed and mismatch_cnt is zero. Filesystem checks fine. No RAM or any type MCE on the host.
Good work! I'll admit that my experience with vm's is limited, hypervisor bugs didn't cross my mind at first.
You mentioned a couple of errors, like segfaults and decompression errors.
Especially that last one sounds like a corruption somewhere....
As these errors happened in a VM, is it possible to copy the virtual harddisk to another host (hardware) and trying again?
I haven't done that before, but no reason I wouldn't be able to move the install to hardware. If I recall VirtualBox even has a specific procedure to do that.
Apologies, I meant moving the guest to a (virtualbox) hypervisor on another (physical) machine (still running as a VM). After reading the forum post on more issues with virtualbox, I doubt that this will make a difference, though.
To be honest: the RAM is my first suspect, but it could very well be an io
error on the underlying host. I really doubt that it's caused by the installed software.
I suspect it is a VM guest issue. [...]there is something going on with this VM. Though other guests show the same behavior, so it may be a VirtualBox issue itself.
Indeed, this could very well be a software issue after all. I guess that the logs on the host (outside the VM) don't give some clues (errors, warnings)? Mvg, Guus Snijders
participants (5)
-
David C. Rankin
-
Genes Lists
-
Guus Snijders
-
Piscium
-
Ralf Mardorf