[arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting
All, I experienced a hard lockup during kernel update to 4.15.8 on a Supermicro Dual Opteron Quad-core box. I've updated this box 50 times without issue, but something caused a hardlock. The filesystems are on mdadm linux-raid 1 partitions. The hardlock occurred after the packages were installed and it was in postprocessing at: ( 3/16) Install DKMS modules Now on boot I receive: Warning: /lib/modules/4.15.8-1-ARCH/modules.devname not found - ignoring starting version 237 ERROR: devide `UUID=c7492ac0-e805...` not found. Skipping fsck. mount: /new_root: can't find UUID UUID=c7492ac0-e805... You are being dropped into an emergency shell. sh: can't access tty; job control turned off [rootfs ]# (and the box hardlocks) So I downloaded the 201803 iso to try and fix the box. I have to boot from CD, since this box does not boot from USB. So I burn the .iso to CD (making sure the CD Label is `ARCH_201803`) and boot the box again in attempt to fix it: All goes well until... :: Mounting '/dev/disk/by-label/ARCH_201803' to '/run/archiso/bootmnt' Waiting 30 seconds for device /dev/disk/by-label/ARCH_201803 ... ERROR: '/dev/disk/by-label/ARCH_201803' device did not show up after 30 seconds ... Falling back to interactive prompt You can try to fix the problem manually, log out when you are finished sh: can't access tty; job control tuned off [rootfs ]# (thankfully this prompt is not hardlocked) This is bizarre, I've created the iso, sha1sums are correct, CD label is 'ARCH_201803', but the iso won't boot. I've researched, but these solutions don't solve the problem: https://bbs.archlinux.org/viewtopic.php?id=195671 https://superuser.com/questions/519784/error-installing-arch-linux https://bugs.launchpad.net/bugs/1318400 Check /dev/disk from the recovery prompt, there is no "by-label" directory under /dev/disk to begin with. Attempting to create 'by-label' and softlinking /dev/sr0 to /dev/disk/by-label/ARCH_201803 does create a series of additional errors I/O errors concluding with, mount: /run/archiso/bootmnt: wrong fs type, bad option, bad superblock on /dev/sr0, missing codepage or helper system .... So I'm snakebit and need help. I've never had the system lock during kernel update before and it has left part of the system thinking it has 4.15.7 and the rest thinking it is 4.15.8 (but the 4.15.8 update never finished) (1) How do I go about recovering? 4.15.7 was A-OK. I'm not sure what part of the install is still 4.15.7 and what's 4.15.8. 59 packages were updated, including the kernel and lts-kernel, but the initramfs was never regenerated due to the failure at the 'Install DKMS modules' phase. If I can get the ARCH_201803 install media to boot properly -- what next? (2) How do I get around the ERROR: '/dev/disk/by-label/ARCH_201803' device did not show up after 30 seconds problem? The disk label is correct, it's just not being seen and mounted by the installer to /run/archiso/bootmnt Any help greatly appreciated. -- David C. Rankin, J.D.,P.E.
Op zo 11 mrt. 2018 21:29 schreef David C. Rankin < drankinatty@suddenlinkmail.com>:
All,
I experienced a hard lockup during kernel update to 4.15.8 on a Supermicro Dual Opteron Quad-core box.
[cd problem] Just to make sure; can you run a memtest on this machine? It's a bit of a long shot, but hard lockups are suspicious. Especially since the CD also acts strangely. Though an overheating CPU could also cause these symptons. Mvg, Guus Snijders
On 03/11/2018 04:08 PM, Guus Snijders via arch-general wrote:
Op zo 11 mrt. 2018 21:29 schreef David C. Rankin < drankinatty@suddenlinkmail.com>:
All,
I experienced a hard lockup during kernel update to 4.15.8 on a Supermicro Dual Opteron Quad-core box.
[cd problem]
Just to make sure; can you run a memtest on this machine? It's a bit of a long shot, but hard lockups are suspicious. Especially since the CD also acts strangely. Though an overheating CPU could also cause these symptons.
Mvg, Guus Snijders
This was a nightmare. It's not a CD problem, it's a problem with the system seeing the CD Label and/or creating the /dev/disk/by-label directory in time for the link to be created. I burned 3 different CD's from the .iso (validating the sha1sum). I burned 2 of them from the Arch server next to this box running the 4.15.8 kernel whose update went fine. I burned per: https://wiki.archlinux.org/index.php/Optical_disc_drive#Burning_an_ISO_image... cdrecord -v -sao dev=/dev/sr0 archlinux-2018.03.01-x86_64.iso and I burned from K3b as well. No change. Same failure. So even though this box cannot boot from a USB, I created a USB install media and plugged it into a USB port so that maybe its ARCH_201803 drive label would be seen. (I think the problem is the .iso CD lsblk Label isn't updated during boot for some reason) Low-and-behold... It worked!. I was able to boot to the Arch install prompt. mdadm ran and assembled my arrays. I arch-chrooted to /mnt and then reinstalled the kernel, kernel-lts and then had to reinstall the other 57 packages. I don't know what the hiccup was, but for this box it was a death sentence. No linker modules updated, only 2 out of 16 post install processes run. That really leaves you in a bad way... Fixed now. So to recap, the key to solving the 30 second CD label not seen bug, was to put a USB install media in a USB port before boot so the drive would be activated and the LABEL available when it got to the find disk/by-label part of the installer boot. (I hope I recall this trick 2 years from now when something like this happens again...) -- David C. Rankin, J.D.,P.E.
On 3/11/18, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
This was a nightmare. It's not a CD problem, it's a problem with the system seeing the CD Label and/or creating the /dev/disk/by-label directory in time for the link to be created.
Hi David, so in the end you were able to boot off usb, right? Also, the nightmare you had to work through can be avoided on servers where you run illumos or FreeBSD by way of ZFS boot environments (BE). Basically, it's like Windows style snapshots of core files you can boot, in case stuff goes south. I didn't post this to the list, since it mentions ZFS, and that alone might get some people pissed off.
Or I actually did post it to the list by accident. Please don't flame me for mention ZFS boot environments as a technique available for FOSS servers. On 3/12/18, Carsten Mattner <carstenmattner@gmail.com> wrote:
On 3/11/18, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
This was a nightmare. It's not a CD problem, it's a problem with the system seeing the CD Label and/or creating the /dev/disk/by-label directory in time for the link to be created.
Hi David,
so in the end you were able to boot off usb, right?
Also, the nightmare you had to work through can be avoided on servers where you run illumos or FreeBSD by way of ZFS boot environments (BE). Basically, it's like Windows style snapshots of core files you can boot, in case stuff goes south.
I didn't post this to the list, since it mentions ZFS, and that alone might get some people pissed off.
On Mon, 12 Mar 2018 01:04:14 +0000 Carsten Mattner via arch-general <arch-general@archlinux.org> wrote:
Or I actually did post it to the list by accident.
Please don't flame me for mention ZFS boot environments as a technique available for FOSS servers.
On 3/12/18, Carsten Mattner <carstenmattner@gmail.com> wrote:
On 3/11/18, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
This was a nightmare. It's not a CD problem, it's a problem with the system seeing the CD Label and/or creating the /dev/disk/by-label directory in time for the link to be created.
Hi David,
so in the end you were able to boot off usb, right?
Also, the nightmare you had to work through can be avoided on servers where you run illumos or FreeBSD by way of ZFS boot environments (BE). Basically, it's like Windows style snapshots of core files you can boot, in case stuff goes south.
I didn't post this to the list, since it mentions ZFS, and that alone might get some people pissed off.
I don't see why anyone should get pissed off. I mean, ArchZFS[1] is definitely a thing that works reasonably well, and the wiki page[2] specifically mentions boot environments and beadm. ~Celti [1]: https://github.com/archzfs/archzfs [2]: https://wiki.archlinux.org/index.php/Installing_Arch_Linux_on_ZFS
On 3/12/18, Celti Burroughs via arch-general <arch-general@archlinux.org> wrote:
On Mon, 12 Mar 2018 01:04:14 +0000 Carsten Mattner via arch-general <arch-general@archlinux.org> wrote:
Or I actually did post it to the list by accident.
Please don't flame me for mention ZFS boot environments as a technique available for FOSS servers.
On 3/12/18, Carsten Mattner <carstenmattner@gmail.com> wrote:
On 3/11/18, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
This was a nightmare. It's not a CD problem, it's a problem with the system seeing the CD Label and/or creating the /dev/disk/by-label directory in time for the link to be created.
Hi David,
so in the end you were able to boot off usb, right?
Also, the nightmare you had to work through can be avoided on servers where you run illumos or FreeBSD by way of ZFS boot environments (BE). Basically, it's like Windows style snapshots of core files you can boot, in case stuff goes south.
I didn't post this to the list, since it mentions ZFS, and that alone might get some people pissed off.
I don't see why anyone should get pissed off. I mean, ArchZFS[1] is definitely a thing that works reasonably well, and the wiki page[2] specifically mentions boot environments and beadm.
I'm happy to hear that. My rationale is based on past observations of needlessly heated arguments and ZFS, due to its license splitting the Linux community in half, appearing to be perfect fuel for such a thread. Thanks for the wiki links. Never used ZFS on Linux because I avoid out of kernel patches. Maybe I will give it a try on Linux as well.
On 03/11/2018 10:00 PM, Carsten Mattner via arch-general wrote:
I'm happy to hear that. My rationale is based on past observations of needlessly heated arguments and ZFS, due to its license splitting the Linux community in half, appearing to be perfect fuel for such a thread.
Thanks for the wiki links. Never used ZFS on Linux because I avoid out of kernel patches. Maybe I will give it a try on Linux as well.
Well yes, the main reason people get heated about it I think is because it is out-of-tree kernel modules and as such are less reliably stable or some such. Based on how well archzfs keeps their binary repos up to date, I'm not 100% convinced on the stability. Moreso consider that it's difficult to bootstrap a system without zfs available, and if their binary repo does not match the current archiso... -- Eli Schwartz Bug Wrangler and Trusted User
On 3/12/18, Eli Schwartz via arch-general <arch-general@archlinux.org> wrote:
On 03/11/2018 10:00 PM, Carsten Mattner via arch-general wrote:
I'm happy to hear that. My rationale is based on past observations of needlessly heated arguments and ZFS, due to its license splitting the Linux community in half, appearing to be perfect fuel for such a thread.
Thanks for the wiki links. Never used ZFS on Linux because I avoid out of kernel patches. Maybe I will give it a try on Linux as well.
Well yes, the main reason people get heated about it I think is because it is out-of-tree kernel modules and as such are less reliably stable or some such.
Based on how well archzfs keeps their binary repos up to date, I'm not 100% convinced on the stability. Moreso consider that it's difficult to bootstrap a system without zfs available, and if their binary repo does not match the current archiso...
I'll stay away from it, thanks. I saw that Alpine Linux has good ZFS support, but I didn't do anything serious with it. When it comes to filesystems, I'm conservative, EXT4 and XFS on Linux. It's a pity there's no modern filesystem to share volumes between FOSS kernels. It's all some compromise that you might or might not accept. My current recommendation if one is looking for ZFS: (1) FreeBSD good enough, no linux binaries needed? Go with that. (2) illumos derivative work for you in terms of drivers _and_ you need Linux binaries to run seamlessly? illumos lx branded zones are your solution then. You can even dtrace a linux zone from the illumos outer environment. It's like FreeBSD Jails on steroids without the immaturity and chaos of Linux containers. Crossbow is nice, too. Keep in mind both 1 and 2 start off with a desire to use 1st class native ZFS support. illumos #1 problem is the unneeded distro fragmentation when the community is so small anyway. But they're collaborating on the base and core system very well. The main issue is porting or writing drivers. To this day I wonder why Google for all of its Java language reliance didn't buy Sun liberate it fully. Past fights over the language and Apache Java might have led Sun to block any Google talks.
On Mon, Mar 12, 2018 at 10:24:37PM +0000, Carsten Mattner via arch-general wrote:
On 3/12/18, Eli Schwartz via arch-general <arch-general@archlinux.org> wrote:
On 03/11/2018 10:00 PM, Carsten Mattner via arch-general wrote:
I'm happy to hear that. My rationale is based on past observations of needlessly heated arguments and ZFS, due to its license splitting the Linux community in half, appearing to be perfect fuel for such a thread.
Thanks for the wiki links. Never used ZFS on Linux because I avoid out of kernel patches. Maybe I will give it a try on Linux as well.
Well yes, the main reason people get heated about it I think is because it is out-of-tree kernel modules and as such are less reliably stable or some such.
Based on how well archzfs keeps their binary repos up to date, I'm not 100% convinced on the stability. Moreso consider that it's difficult to bootstrap a system without zfs available, and if their binary repo does not match the current archiso...
I'll stay away from it, thanks. I saw that Alpine Linux has good ZFS support, but I didn't do anything serious with it. When it comes to filesystems, I'm conservative, EXT4 and XFS on Linux. It's a pity there's no modern filesystem to share volumes between FOSS kernels. It's all some compromise that you might or might not accept.
What's wrong with btrfs? Yeah, I know it is not marked "stable", but this is just a label. And people shying away from it doesn't help in advancing its stability either. Cheers, -- Leonid Isaev
On 03/12/2018 06:57 PM, Leonid Isaev via arch-general wrote:
I'll stay away from it, thanks. I saw that Alpine Linux has good ZFS support, but I didn't do anything serious with it. When it comes to filesystems, I'm conservative, EXT4 and XFS on Linux. It's a pity there's no modern filesystem to share volumes between FOSS kernels. It's all some compromise that you might or might not accept.
What's wrong with btrfs? Yeah, I know it is not marked "stable", but this is just a label. And people shying away from it doesn't help in advancing its stability either.
Well, I think the only outstanding issue really with btrfs is raid5/6 support. Maybe this scares people away? -- Eli Schwartz Bug Wrangler and Trusted User
On 3/12/18, Leonid Isaev via arch-general <arch-general@archlinux.org> wrote:
On Mon, Mar 12, 2018 at 10:24:37PM +0000, Carsten Mattner via arch-general wrote:
On 3/12/18, Eli Schwartz via arch-general <arch-general@archlinux.org> wrote:
On 03/11/2018 10:00 PM, Carsten Mattner via arch-general wrote:
I'm happy to hear that. My rationale is based on past observations of needlessly heated arguments and ZFS, due to its license splitting the Linux community in half, appearing to be perfect fuel for such a thread.
Thanks for the wiki links. Never used ZFS on Linux because I avoid out of kernel patches. Maybe I will give it a try on Linux as well.
Well yes, the main reason people get heated about it I think is because it is out-of-tree kernel modules and as such are less reliably stable or some such.
Based on how well archzfs keeps their binary repos up to date, I'm not 100% convinced on the stability. Moreso consider that it's difficult to bootstrap a system without zfs available, and if their binary repo does not match the current archiso...
I'll stay away from it, thanks. I saw that Alpine Linux has good ZFS support, but I didn't do anything serious with it. When it comes to filesystems, I'm conservative, EXT4 and XFS on Linux. It's a pity there's no modern filesystem to share volumes between FOSS kernels. It's all some compromise that you might or might not accept.
What's wrong with btrfs? Yeah, I know it is not marked "stable", but this is just a label. And people shying away from it doesn't help in advancing its stability either.
btrfs never got on my radar because it's Linux only and its instability is a blocker. If I have to be careful how I use a filesystem even when I didn't explicitly enable beta features, I'm too scared to put my files on it. If I were a Suse Enterprise customer, I might use it, but Red Hat isn't behind it anymore, so it's like Reiser3 back in the day. Only Suse was putting their weight behind it. Well Facebook has developers on it, but Facebook isn't a distro developer and can't be trusted with continued maintenance, since they might switch on a weekend to some Facebook-FS. Facebook has too many engineers and is reinventing stuff in-house a lot. btrfs and zfs suffer from design limitations, but zfs has been stable and in petabyte production for a long time across many organizations. btrfs is one of many future Linux filesystems with no clear winner so far. It looks like XFS will gain full checksums and scrubbing before btrfs gets reliable and Red Hat's XFS++ work will provide snapshots. It's like git replacing bitkeeper in 2005. Seems like XFS++ will do the same with btrfs left to history of experiments. All I want is a modern filesystem whose volume I can share without exposing it via a network protocol.
It almost looks like filesystem development doesn't fit Linux kernel development style of iterating constantly and evolving with time. btrfs has had the same time as zfs had in-house at Oracle before it was declared publicly stable, and there are still buggy/unfinished corners. If you look at past successful Linux filesystems it's either an existing design that was generally amenable to certain extensions and has evolved in-tree or came designed and implemented from a different platform (JFS, XFS, quite a few more). EXT4 is the reliable workhorse if inodes aren't a problem and you don't mind time to allocate and upper bounds thereof. It evolved step by step from EXT2 to EXT3 to EXT4, all the while having stable core features and experimental features. btrfs is busy implementing features promised 10 years ago and there are bugs in regular use if you're not careful. Developing a filesystem is hard and there's no room for mistakes. Most productive filesystem development is happening in XFS and EXT4 teams with the former being in a nice stable maintenance mode. If you haven't tried XFS in the last 3 years, give it a test run. The old issues of being optimized for certain workloads have been fixed years ago. It's a good replacement for EXT4 if you need its features and tools.
Sorry for letting gmail butcher wrapping/breaks. Someone at Google needs to be demoted for that anti-feature. I should remember to never edit in gmail's text box but use my normal editor as usual.
On Mon, Mar 12, 2018 at 11:17:21PM +0000, Carsten Mattner wrote:
On 3/12/18, Leonid Isaev via arch-general <arch-general@archlinux.org> wrote:
What's wrong with btrfs? Yeah, I know it is not marked "stable", but this is just a label. And people shying away from it doesn't help in advancing its stability either.
btrfs never got on my radar because it's Linux only and its instability is a blocker. If I have to be careful how I use a filesystem even when I didn't explicitly enable beta features, I'm too scared to put my files on it. If I were a Suse Enterprise customer, I might use it, but Red Hat isn't behind it anymore, so it's like Reiser3 back in the day. Only Suse was putting their weight behind it. Well Facebook has developers on it, but Facebook isn't a distro developer and can't be trusted with continued maintenance, since they might switch on a weekend to some Facebook-FS. Facebook has too many engineers and is reinventing stuff in-house a lot.
This is all corporate politics, but see first comment here [1]. And you still haven't explained what instability? I use btrfs on all my machines, including its subvolume/snapshot features to protect against failed updates (essentially, I reimplemented some features of snapper in bash :) because I don't like dbus). Of course, you need to do scrubbing regularly, but it's trivial to write a cron job/systemd timer for this task...
btrfs and zfs suffer from design limitations, but zfs has been stable and in petabyte production for a long time across many organizations. btrfs is one of many future Linux filesystems with no clear winner so far.
If noone uses it, then sure, btrfs will remain an underdog of filesystems. Also, if you care about petabyte production, you should know better than asking on this list...
All I want is a modern filesystem whose volume I can share without exposing it via a network protocol.
Hmm, btrfs-send(1)? [1] https://news.ycombinator.com/item?id=14907771 Cheers, -- Leonid Isaev
On 03/11/2018 08:03 PM, Carsten Mattner via arch-general wrote:
Hi David,
so in the end you were able to boot off usb, right?
Also, the nightmare you had to work through can be avoided on servers where you run illumos or FreeBSD by way of ZFS boot environments (BE). Basically, it's like Windows style snapshots of core files you can boot, in case stuff goes south.
I didn't post this to the list, since it mentions ZFS, and that alone might get some people pissed off.
Partially, I had to boot from the CD, because the BIOS (AMI circa 2005) does not have the capability to boot from USB. But, by using both the CD and USB (having plugged the USB in before boot), when the boot from CD got to the point it looked for the disk label ARCH_201803, it found the USB stick and was able to mount it at /run/archiso/bootmnt where the boot from CD alone always fails. No worries about the filesystem. I don't care whether it is xfs, zfs, ext, but it won't be btrfs, I just dance with the one that brung me. I did do a filesystem comparison several months ago, and since I never reach limitations of ext inodes anyway, zfs didn't have anything else that really moved me to switch. I don't max my disks anyway. If I'm running 1T drives and hit 850G, I'll go 3T and so on... -- David C. Rankin, J.D.,P.E.
Hi Carsten, I'm glad you ended up posting this to the list. Very useful info, even if I never end up using it. The rest of this thread has some great content too. Thanks all! On Mar 11, 2018 21:03, "Carsten Mattner via arch-general" < arch-general@archlinux.org> wrote:
On 3/11/18, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
This was a nightmare. It's not a CD problem, it's a problem with the system seeing the CD Label and/or creating the /dev/disk/by-label directory in time for the link to be created.
Hi David,
so in the end you were able to boot off usb, right?
Also, the nightmare you had to work through can be avoided on servers where you run illumos or FreeBSD by way of ZFS boot environments (BE). Basically, it's like Windows style snapshots of core files you can boot, in case stuff goes south.
I didn't post this to the list, since it mentions ZFS, and that alone might get some people pissed off.
On 03/11/2018 05:32 PM, David C. Rankin wrote:
I don't know what the hiccup was, but for this box it was a death sentence. No linker modules updated, only 2 out of 16 post install processes run. That really leaves you in a bad way...
Well, it seems it was an issue and not just a hiccup. I posted the issue to the linux-raid list on kernel.org (title: "4.15.8 Kernel - Strange linux-raid behavior, not sure where to send it.") and it seems there have been other hit with it as well: https://bugzilla.kernel.org/show_bug.cgi?id=198861 https://bugzilla.redhat.com/show_bug.cgi?id=1552124 Thankfully, since I installed the 4.15.9 kernel, it seems to be fixed (I still have more testing to do, but it looks good) So if anyone else is hit with a raid 1 issue on multi-cpu boxes, there is more information at the links above. (p.s. - not that my opinion matters on the bug tracker choice, but for the number of bugs Arch has (compared to Mozilla, etc.) I don't think there would be much of a delay issue regardless which packages is used - ease of maintenance and an easy migration path probably wins the day) -- David C. Rankin, J.D.,P.E.
participants (7)
-
Carsten Mattner
-
Celti Burroughs
-
David C. Rankin
-
Eli Schwartz
-
Guus Snijders
-
Kyle Bassett
-
Leonid Isaev