On 2014-11-17 00:19, Rasmus Liland wrote:
On 2014-11-15 18:28, Mark Lee wrote:
On 11/15/2014 12:20 PM, Rasmus Liland wrote:
On 2014-11-15 15:21, LoneVVolf wrote:
On 15-11-14 06:57, Rasmus Liland wrote:
On 2014-11-15 06:10, Mark Lee wrote:
On 11/14/2014 10:29 PM, Rasmus Liland wrote: > On 2014-11-15 04:01, Mark Lee wrote: >> Are you booting with the new intel u-code? > Are you fairly sure this is a Intel microcode issue? I'm not completely certain; but it would make sense. I'd test it out. Thank you for your help thus far. I'll examine this further tomorrow, g'night. From rasmus first post: I'm experiencing machine check exceptions since every kernel after package linux-3.11.5-1 (Oct 14 2013) New intel microcode was only introduced with kernel 3.17 ... It's unlikely to have to do with this issue.
install mcelog, run it as the log tells you and post the result. [ ... output, see previous messages ... ] I never did use the mcelog tool before, but to me it looks like not much of an analysis, perhaps I'm doing it wrong. Looks like a microcode error, please try to add the intel-ucode to your kernel cmdline. Bah, just as I was finished enabling syslinux using syslinux-install_update and rebooted, the system did not respond, just a blank screen and lighting shutting off, then rebooting again.
Thus, this system needs an overhaul -- apparently some difficulty with the bootcode or the MBR, though I am able to mount the old partitions and chroot into them using arch-chroot.
I tried installing grub using the standard method grub-install according to the wiki, with little success -- some good news at least relevant to previous topic in this thread is that grub recognized and added the intel-ucode file I had copied to the /boot directory, when running grub-mkconfig.
The plan forward is to forget about generating new mbr using gpart and install Debian at the end of the disk to, hopefully, restore some boot related stuff that might have come crashing down after meddling with syslinux.
A breakthrough in this thread has happened. I ended up taking a backup of the disk to an external hdd using
# dd if=/dev/sda of=/mnt/angrist-sda-18nov14.img
then I booted FreeBSD 10.1 memstick, entered shell and entered some commands:
# gpart delete -i 1 ada0 # gpart delete -i 2 ada0 # gpart delete -i 3 ada0 # gpart destroy ada0 # gpart create -s mbr ada0 # gpart add -s 20g -t linux-data ada0 # gpart add -t linux-data ada0
Then I rebooted into ArchLinux iso memstick to install Arch on the 20G partition and using the other one as /home. So now Syslinux works, unfortunately I don't know why. And I was able to install all new packages including linux 3.17.3-1 and intel-ucode 20140913-1, loading it in Syslinux according to the wiki. I got a new mce after exactly three hours:
[10827.051523] mce: [Hardware Error]: CPU 1: Machine Check Exception: 5 Bank 4: b200000000100402 Increasing limit for this warning to that value arg [10827.051632] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180} [10827.055440] mce: [Hardware Error]: TSC 2238c73db17 [10827.059291] mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 1 microcode 1b [10827.063192] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [10827.067078] mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 4: b200000000100402 [10827.070986] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180} [10827.074899] mce: [Hardware Error]: TSC 2238c73db43 [10827.078769] mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 3 microcode 1b [10827.082673] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [10827.086569] mce: [Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 4: b200000000100402 [10827.090503] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff812ab186> {intel_sqrt+0x36/0x50} [10827.094415] mce: [Hardware Error]: TSC 2238c73db28 [10827.098299] mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 2 microcode 1b [10827.102242] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [10827.106182] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 4: b200000000100402 [10827.110177] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81321387> {intel_idle+0xe7/0x180} [10827.114143] mce: [Hardware Error]: TSC 2238c73db06 [10827.118038] mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1416411506 SOCKET 0 APIC 0 microcode 1b [10827.122028] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [10827.126037] mce: [Hardware Error]: Machine check: Processor context corrupt [10827.130076] Kernel panic - not syncing: Fatal Machine check [10827.134149] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [10827.136647] drm_kms_helper: panic occured, switching back to text console [10827.163009] Rebooting in 30 seconds.. [10857.234707] ACPI MEMORY or I/O RESET_REG.
I am also making this output an attachment. There is a lot of more information in this new mce compared to the other one I sent. Perhaps some of you got some new suggestions. Meanwhile, I am downgrading back to 3.11.5-1. -- Rasmus Liland, jrl@jrl.dyndns.dk, jens.rasmus.liland@nmbu.no