[arch-general] mce after linux-3.11.5-1 on NP900X3C

Rasmus Liland jensrasmus at gmail.com
Wed Nov 19 21:52:49 UTC 2014


On 2014-11-19 21:41, Mark Lee wrote:
> On 11/19/2014 12:15 PM, Rasmus Liland wrote:
> > On 2014-11-17 00:19, Rasmus Liland wrote:
> >> On 2014-11-15 18:28, Mark Lee wrote:
> >>> On 11/15/2014 12:20 PM, Rasmus Liland wrote:
> >>>> On 2014-11-15 15:21, LoneVVolf wrote:
> >>>>> On 15-11-14 06:57, Rasmus Liland wrote:
> >>>>>> On 2014-11-15 06:10, Mark Lee wrote:
> >>>>>>> On 11/14/2014 10:29 PM, Rasmus Liland wrote:
> >>>>>>>> On 2014-11-15 04:01, Mark Lee wrote:
> >>>>>>>>> Are you booting with the new intel u-code?
> >>>>>>>> Are you fairly sure this is a Intel microcode issue?
> >>>>>>> I'm not completely certain; but it would make sense.
> >>>>>>> I'd test it out.
> >>>>>> Thank you for your help thus far. I'll examine this
> >>>>>> further tomorrow, g'night.
> >>>>> From rasmus first post:
> >>>>>> I'm experiencing machine check exceptions since every
> >>>>>> kernel after package linux-3.11.5-1 (Oct 14 2013)
> >>>>> New intel microcode was only introduced with kernel 3.17
> >>>>> ... It's unlikely to have to do with this issue.
> >>>>> 
> >>>>> install mcelog, run it as the log tells you and post the
> >>>>> result.
> >>>> [ ... output, see previous messages ... ] I never did use the
> >>>> mcelog tool before, but to me it looks like not much of an
> >>>> analysis, perhaps I'm doing it wrong.
> >>> Looks like a microcode error, please try to add the intel-ucode
> >>> to your kernel cmdline.
> >> Bah, just as I was finished enabling syslinux using
> >> syslinux-install_update and rebooted, the system did not respond,
> >> just a blank screen and lighting shutting off, then rebooting
> >> again.
> >> 
> >> Thus, this system needs an overhaul -- apparently some difficulty
> >> with the bootcode or the MBR, though I am able to mount the old
> >> partitions and chroot into them using arch-chroot.
> >> 
> >> I tried installing grub using the standard method grub-install
> >> according to the wiki, with little success -- some good news at
> >> least relevant to previous topic in this thread is that grub
> >> recognized and added the intel-ucode file I had copied to the
> >> /boot directory, when running grub-mkconfig.
> >> 
> >> The plan forward is to forget about generating new mbr using
> >> gpart and install Debian at the end of the disk to, hopefully,
> >> restore some boot related stuff that might have come crashing
> >> down after meddling with syslinux.
> > 
> > A breakthrough in this thread has happened.
> > 
> > I ended up taking a backup of the disk to an external hdd using
> > 
> >> # dd if=/dev/sda of=/mnt/angrist-sda-18nov14.img
> > 
> > then I booted FreeBSD 10.1 memstick, entered shell and entered some
> > commands:
> > 
> >> # gpart delete -i 1 ada0 # gpart delete -i 2 ada0 # gpart delete
> >> -i 3 ada0 # gpart destroy ada0 # gpart create -s mbr ada0 # gpart
> >> add -s 20g -t linux-data ada0 # gpart add -t linux-data ada0
> > 
> > Then I rebooted into ArchLinux iso memstick to install Arch on the
> > 20G partition and using the other one as /home. So now Syslinux
> > works, unfortunately I don't know why. And I was able to install
> > all new packages including linux 3.17.3-1 and intel-ucode
> > 20140913-1, loading it in Syslinux according to the wiki.
> > 
> > I got a new mce after exactly three hours:
> > 
> >> [10827.051523] mce: [Hardware Error]: CPU 1: Machine Check
> >> Exception: 5 Bank 4: b200000000100402 Increasing limit for this
> >> warning to that value arg [10827.051632] mce: [Hardware Error]:
> >> RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180} 
> >> [10827.055440] mce: [Hardware Error]: TSC 2238c73db17 
> >> [10827.059291] mce: [Hardware Error]: PROCESSOR 0:306a9 TIME
> >> 1416411506 SOCKET 0 APIC 1 microcode 1b [10827.063192] mce:
> >> [Hardware Error]: Run the above through 'mcelog --ascii' 
> >> [10827.067078] mce: [Hardware Error]: CPU 3: Machine Check
> >> Exception: 5 Bank 4: b200000000100402 [10827.070986] mce:
> >> [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387>
> >> {intel_idle+0xe7/0x180} [10827.074899] mce: [Hardware Error]: TSC
> >> 2238c73db43 [10827.078769] mce: [Hardware Error]: PROCESSOR
> >> 0:306a9 TIME 1416411506 SOCKET 0 APIC 3 microcode 1b 
> >> [10827.082673] mce: [Hardware Error]: Run the above through
> >> 'mcelog --ascii' [10827.086569] mce: [Hardware Error]: CPU 2:
> >> Machine Check Exception: 5 Bank 4: b200000000100402 
> >> [10827.090503] mce: [Hardware Error]: RIP !INEXACT!
> >> 10:<ffffffff812ab186> {intel_sqrt+0x36/0x50} [10827.094415] mce:
> >> [Hardware Error]: TSC 2238c73db28 [10827.098299] mce: [Hardware
> >> Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 2
> >> microcode 1b [10827.102242] mce: [Hardware Error]: Run the above
> >> through 'mcelog --ascii' [10827.106182] mce: [Hardware Error]:
> >> CPU 0: Machine Check Exception: 5 Bank 4: b200000000100402 
> >> [10827.110177] mce: [Hardware Error]: RIP !INEXACT!
> >> 10:<ffffffff81321387> {intel_idle+0xe7/0x180} [10827.114143] mce:
> >> [Hardware Error]: TSC 2238c73db06 [10827.118038] mce: [Hardware
> >> Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 0
> >> microcode 1b [10827.122028] mce: [Hardware Error]: Run the above
> >> through 'mcelog --ascii' [10827.126037] mce: [Hardware Error]:
> >> Machine check: Processor context corrupt [10827.130076] Kernel
> >> panic - not syncing: Fatal Machine check [10827.134149] Kernel
> >> Offset: 0x0 from 0xffffffff81000000 (relocation range:
> >> 0xffffffff80000000-0xffffffff9fffffff) [10827.136647]
> >> drm_kms_helper: panic occured, switching back to text console 
> >> [10827.163009] Rebooting in 30 seconds.. [10857.234707] ACPI
> >> MEMORY or I/O RESET_REG.
> > 
> > I am also making this output an attachment. There is a lot of more 
> > information in this new mce compared to the other one I sent.
> > 
> > Perhaps some of you got some new suggestions.
> > 
> > Meanwhile, I am downgrading back to 3.11.5-1.
> > 
> 
> To Rasmus,
> 
> Can you run the parts where it says "run the abvoe through mcelog
> --ascii" and post the contents?
> 
> Regards,
> Mark
> 

I'm attaching the output of mcelog to this message. However, I'm unsure of
the usefulness of the output.

-- 
Rasmus Liland, jrl at jrl.dyndns.dk, jens.rasmus.liland at nmbu.no 
-------------- next part --------------
mce: [Hardware Error]: CPU 1: Machine Check Exception: 5 Bank 4: b200000000100402
mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180}
mce: [Hardware Error]: TSC 2238c73db17
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 1 microcode 1b
mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 4: b200000000100402
mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180}
mce: [Hardware Error]: TSC 2238c73db43
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 3 microcode 1b
mce: [Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 4: b200000000100402
mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff812ab186> {intel_sqrt+0x36/0x50}
mce: [Hardware Error]: TSC 2238c73db28
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 2 microcode 1b
mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 4: b200000000100402
mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180}
mce: [Hardware Error]: TSC 2238c73db06
mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 0 microcode 1b
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.archlinux.org/pipermail/arch-general/attachments/20141119/5b004f41/attachment.bin>


More information about the arch-general mailing list