[arch-general] mce after linux-3.11.5-1 on NP900X3C

Rasmus Liland jensrasmus at gmail.com
Fri Nov 21 01:24:02 UTC 2014


On 2014-11-19 22:53, Rasmus Liland wrote:
> On 2014-11-19 21:41, Mark Lee wrote:
> > 
> > To Rasmus,
> > 
> > Can you run the parts where it says "run the abvoe through mcelog
> > --ascii" and post the contents?
> > 
> > Regards,
> > Mark
> > 
> 
> I'm attaching the output of mcelog to this message. However, I'm unsure of
> the usefulness of the output.
> 

I checked dmesg now after having uptime of ...
> rasmus at angrist ~ % uptime
>  02:04:01 up 1 day,  7:35,  1 user,  load average: 0.04, 0.15, 0.40
> rasmus at angrist ~ % uname -a
> Linux angrist 3.11.5-1-ARCH #1 SMP PREEMPT Mon Oct 14 08:31:43 CEST 2013
> x86_64 GNU/Linux

... about 26 hours. It seems after about 19 hours some (possibly) temperature
related were causing mce hardware errors over a ten minute interval:
> [70133.209654] mce: [Hardware Error]: Machine check events logged
> [70376.833053] CPU2: Core temperature above threshold, cpu clock throttled (total events = 30628)
> [70376.833056] CPU3: Core temperature above threshold, cpu clock throttled (total events = 30628)
> [70376.833061] CPU3: Package temperature above threshold, cpu clock throttled (total events = 174126)
> [70376.833070] CPU2: Package temperature above threshold, cpu clock throttled (total events = 174126)
> [70376.833074] CPU1: Package temperature above threshold, cpu clock throttled (total events = 174126)
> [70376.833077] CPU0: Package temperature above threshold, cpu clock throttled (total events = 174124)
> [70376.835060] CPU3: Core temperature/speed normal
> [70376.835064] CPU2: Core temperature/speed normal
> [70376.835070] CPU2: Package temperature/speed normal
> [70376.835074] CPU3: Package temperature/speed normal
> [70376.835087] CPU1: Package temperature/speed normal
> [70376.835090] CPU0: Package temperature/speed normal
> [70433.353800] mce: [Hardware Error]: Machine check events logged
> [70676.969501] CPU2: Core temperature/speed normal
> [70676.969505] CPU3: Core temperature/speed normal
> [70676.969511] CPU0: Package temperature above threshold, cpu clock throttled (total events = 198545)
> [70676.969516] CPU1: Package temperature above threshold, cpu clock throttled (total events = 198547)
> [70676.969522] CPU3: Package temperature above threshold, cpu clock throttled (total events = 198547)
> [70676.969545] CPU2: Package temperature above threshold, cpu clock throttled (total events = 198547)
> [70676.970519] CPU0: Package temperature/speed normal
> [70676.970522] CPU2: Package temperature/speed normal
> [70676.970524] CPU3: Package temperature/speed normal
> [70676.970526] CPU1: Package temperature/speed normal
> [70733.497978] mce: [Hardware Error]: Machine check events logged

As the system did not reboot, it were able to self heal. 

-- 
Rasmus Liland, jrl at jrl.dyndns.dk, jens.rasmus.liland at nmbu.no 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.archlinux.org/pipermail/arch-general/attachments/20141121/68f3a2e3/attachment.bin>


More information about the arch-general mailing list