[arch-general] kernel26-2.6.34.1 - won't boot - stuck at "Setting up UTF-8 mode" [Downgrade to kernel26-2.6.34-2 OK]
Guys, I have managed to find a work-around to the latest kernel failing to boot on my Toshiba laptop. Before downgrading, I tried chrooting the system and rebuilding the initramfs, but no luck it stopped at the same place on boot -- Setting up UTF-8 mode. So I booted to the dual-install media, chrooted the system and downgraded to kernel26-2.6.34-2. This kernel boots fine. There is a problem with kernel26-2.6.34.1-1-x86_64. Since I can't boot it, I'm not sure what testing I can do, but if you can think of one, I'm happy to run it. This laptop is a Toshiba 205d. It hasn't ever had a problem with any other Arch or SuSE kernel, so I'm not sure what to tell you. I've posted the dmidecode information in case that will help. It is here: http://www.3111skyline.com/dl/Archlinux/bugs/toshiba205d-dmidecode.txt Let me know what you think. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
On 07/11/2010 05:06 PM, David C. Rankin wrote:
Guys,
I have managed to find a work-around to the latest kernel failing to boot on my Toshiba laptop. Before downgrading, I tried chrooting the system and rebuilding the initramfs, but no luck it stopped at the same place on boot -- Setting up UTF-8 mode.
So I booted to the dual-install media, chrooted the system and downgraded to kernel26-2.6.34-2. This kernel boots fine.
There is a problem with kernel26-2.6.34.1-1-x86_64. Since I can't boot it, I'm not sure what testing I can do, but if you can think of one, I'm happy to run it. This laptop is a Toshiba 205d. It hasn't ever had a problem with any other Arch or SuSE kernel, so I'm not sure what to tell you.
I've posted the dmidecode information in case that will help. It is here:
http://www.3111skyline.com/dl/Archlinux/bugs/toshiba205d-dmidecode.txt
Let me know what you think.
CORRECTION: 2.6.34-2 is not OK, it has this issue with booting once, then going bad and not booting again. Same with the 2.6.34.1 kernel... Allan, all, This kernel problem I'm having with the 2.6.34 kernels seems like it is getting worse. I'm getting 'kernel null pointer' errors and then spaghetti spewed all over the screen. This happens just about every time I install a 2.6.34 Arch kernel on this laptop. (either 2.6.34-2 or 2.6.34.1) The 2.6.32-LTS kernel is working fine (perfect actually). I've tried rebuilding the initramfs or whatever you call it with: Normal Kernel: /sbin/mkinitcpio -k 2.6.34-ARCH -c /etc/mkinitcpio.conf -g /boot/kernel26.img Fallback Kernel: /sbin/mkinitcpio -k 2.6.34-ARCH -c /etc/mkinitcpio.conf -g /boot/kernel26-fallback.img -S autodetect and each time it completes successfully. But then it will either boot once OK, then fail to boot the very next time I try to boot the box -- or -- it will never boot. It looks like it is blowing up on the loading modules line or it just gets stuck on the setting up UTF-8 mode line. No doubt this bug is also what is causing compiz to white-screen when I do get one of these new kernels to boot, but with no logging on during the boot process when it blows up, I'm not sure what to do. What say the gurus? -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
On 07/12/2010 02:57 AM, David C. Rankin wrote:
I've tried rebuilding the initramfs or whatever you call it with:
Normal Kernel:
/sbin/mkinitcpio -k 2.6.34-ARCH -c /etc/mkinitcpio.conf -g /boot/kernel26.img
Fallback Kernel:
/sbin/mkinitcpio -k 2.6.34-ARCH -c /etc/mkinitcpio.conf -g /boot/kernel26-fallback.img -S autodetect
and each time it completes successfully. But then it will either boot once OK, then fail to boot the very next time I try to boot the box -- or -- it will never boot. It looks like it is blowing up on the loading modules line or it just gets stuck on the setting up UTF-8 mode line. No doubt this bug is also what is causing compiz to white-screen when I do get one of these new kernels to boot, but with no logging on during the boot process when it blows up, I'm not sure what to do.
What say the gurus?
Can anyone think of the possible mechanism that would cause a kernel to boot once after rebuilding the initramfs, but then be corrupt for every boot thereafter?? As mentioned in the title on the 2nd boot attempt (and all subsequent attempts), the boot process either hard-locks when the "Setting up UTF-8 mode" message is displayed --or-- a kernel NULL Pointer message is displayed and then I get 3 screens of garbage before the box either locks or a ctrl+c kills that part of the boot process and booting proceeds until it craters 4-10 steps later. (Memory can be ruled OUT as a problem, it memtests fine and I'm working from the same box right now and it will boot the LTS kernel and the opensuse kernel's fine each and every time) Moreover, I have 8-10 Arch boxes running 2.6.34.1 happily, but this laptop exhibits the "boot once then fail" behavior every time. I would like to help find out what is causing this problem, but I have exhausted my shallow pool of Arch boot sequence knowledge so I'm looking for some help. Even something as simple as: When the boot fails try this .... What does file XYZ contain?, etc... I have posted the dmidecode information for the box in case the problem is related some weird hardware or hardware where a regression has occurred between 2.6.33 and 2.6.34. I don't know what else to do except wait until the next kernel release and pray that one will work. All Arch and suse and gparted kernels have worked fine on the box until the past two 2.6.34 kernels. Somehow just doing nothing and waiting seems less than scientific and an approach that is unlikely to help Arch or my present situation. I don't know if you guys would rather me open a ticket on this one or just sit-tight and see if we can get some better information here before doing so? Dunno -- that's why I'm asking... I'll even take your best swag at this point :p Let me know what the best way to pursue this on is. Thanks. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
On 07/13/2010 09:26 AM, David C. Rankin wrote:
Can anyone think of the possible mechanism that would cause a kernel to boot once after rebuilding the initramfs, but then be corrupt for every boot thereafter?? As mentioned in the title on the 2nd boot attempt (and all subsequent attempts), the boot process either hard-locks when the "Setting up UTF-8 mode" message is displayed --or-- a kernel NULL Pointer message is displayed and then I get 3 screens of garbage before the box either locks or a ctrl+c kills that part of the boot process and booting proceeds until it craters 4-10 steps later.
Could the Null Pointer blow up be due to incorrect gpu handling by the Arch kernel causing the blow-up when the modules are loaded (about the same time the KMS magic is taking place? I say this because I have one of ATI's less common gpu's in this Toshiba laptop. The video card is: Radeon X1250 Graphics(690G Chipset), RS690M, RV410 Graphics Core. This uses the onboard PCIe bus interface and has API support for DirectX 9.0b and OpenGL 2.0. For some reason the kernel crashes 'smell' like a mishandling of the gpu subsystem in the 2.6.34 kernels (Note: this is just a 'gut feel', and I can't point to anything in particular). Of all things that could have changed for the past 2 kernels, the KMS magic and a possible bug slipping in for this card seems like one of the likely areas to start looking. The lspci -vv data for the card are as follows (I have opensuse running at the moment - thus the fglrx driver is shown): 01:05.0 VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series] (prog-if 00 [VGA controller]) Subsystem: Toshiba America Info Systems Device ff00 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at f0000000 (64-bit, prefetchable) [size=128M] Region 2: Memory at f8100000 (64-bit, non-prefetchable) [size=64K] Region 4: I/O ports at 9000 [size=256] Region 5: Memory at f8000000 (32-bit, non-prefetchable) [size=1M] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [80] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Kernel driver in use: fglrx_pci Kernel modules: fglrx I don't know why the card is reporting as an X1200 in lspci. The Core Clock for this gpu is 400 MHz and according to AMD, that means this is the 1250 and not the 1200 because the Core Clock on the 1200 is 350 MHz. I don't know what, if any, changes took place in KMS or in gpu initialization for the 2.6.34 kernel, but this card always sucked when using the ATI driver which prevented me from moving to Arch sooner on this box. Then with the 2.6.32 & 2.6.33 kernels, it was like somebody turned on a light-switch in the kernel and I was getting Blazing fast performance out of the xf86-video-ati driver on Arch, compiz was working great, and the gpu subsystem was working better than ever before in Arch with just the 'radeon' driver. When I updated to 2.6.34-2 I ran into the problem with compiz "whitescreening" and video performance 'tanked' when I had the system running on 'first boot' which would boot. Then on every attempt to boot thereafter - the boot would fail and either hang of blow-up with the kernel NULL Pointer error. That has me thinking that this problem has to be related to some module rearrangement/updating that takes place after you boot the box for the first time -- thus preventing the next boot from working. I don't know how to verify or check this out, but this is what my gut tells me is going on. Arch gurus -- any way to test this hypothesis?? -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
On 07/13/2010 04:24 PM, David C. Rankin wrote:
Could the Null Pointer blow up be due to incorrect gpu handling by the Arch kernel causing the blow-up when the modules are loaded (about the same time the KMS magic is taking place?
I say this because I have one of ATI's less common gpu's in this Toshiba laptop. The video card is:
Radeon X1250 Graphics(690G Chipset), RS690M, RV410 Graphics Core. This uses the onboard PCIe bus interface and has API support for DirectX 9.0b and OpenGL 2.0. For some reason the kernel crashes 'smell' like a mishandling of the gpu subsystem in the 2.6.34 kernels (Note: this is just a 'gut feel', and I can't point to anything in particular). Of all things that could have changed for the past 2 kernels, the KMS magic and a possible bug slipping in for this card seems like one of the likely areas to start looking.
That could be it, try to disable KMS or use early KMS. I see this happen once in a while when I boot my desktop pc from a usb drive with arch. Most of the times it boots just fine but every once in a while it will hang at that exact place, "Setting console font ...." but I've never seen a kernel panic or had to clean spaghetti of my screen. I could never figure out exactly what was wrong, starting with a cold boot sometimes it hangs, most of the times it works. -- Mauro Santos
On 07/13/10 10:26, David C. Rankin wrote:
Can anyone think of the possible mechanism that would cause a kernel to boot once after rebuilding the initramfs, but then be corrupt for every boot thereafter??
Do you rebuild the initramfs on 2.6.32? Do you let the machine sit for a minute, shut-down, between each boot? Yes, I can think of a mechanism. I'll tell it by example: My machine has a built-in webcam that the OS has to upload firmware to on every boot. Sometimes when I boot it ends up in a screwed-up state somehow (so the webcam doesn't work), and sometimes rebooting doesn't help: shutting down and waiting a few minutes sometimes helps: booting into MacOSX then shutting down also can change things a bit, often for the better (after all this hardware and OSX were made for each other). I believe it has some sort of volatile memory that decays randomly and slowly when not powered (like RAM does). I guess that when it boots with its memory containing partly corrupted firmware, it causes some kind of trouble depending on the exact state of the memory that interferes with just fixing it by uploading new firmware. That's an example of how something could possibly persist across reboots. Maybe if you build on 2.6.32, the actual effect is that you were just booted into a good kernel that initialized some piece of hardware into some reasonable state, and this state is likely to persist across a reboot, but 2.6.34 screws up the state such that the next boot of 2.6.34 doesn't like it but 2.6.32 is a good enough kernel to nevertheless re-initialize it properly. (It could be non-volatile memory too, and the randomness could be part of linux boot process being nondeterministic as it is) Or...maybe the explanation is entirely different. -Isaac
participants (3)
-
David C. Rankin
-
Isaac Dupree
-
Mauro Santos