[arch-general] mce after linux-3.11.5-1 on NP900X3C

Mark Lee mark at markelee.com
Fri Nov 21 01:40:47 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 11/20/2014 08:24 PM, Rasmus Liland wrote:
> On 2014-11-19 22:53, Rasmus Liland wrote:
>> On 2014-11-19 21:41, Mark Lee wrote:
>>> 
>>> To Rasmus,
>>> 
>>> Can you run the parts where it says "run the abvoe through
>>> mcelog --ascii" and post the contents?
>>> 
>>> Regards, Mark
>>> 
>> 
>> I'm attaching the output of mcelog to this message. However, I'm
>> unsure of the usefulness of the output.
>> 
> 
> I checked dmesg now after having uptime of ...
>> rasmus at angrist ~ % uptime 02:04:01 up 1 day,  7:35,  1 user,
>> load average: 0.04, 0.15, 0.40 rasmus at angrist ~ % uname -a Linux
>> angrist 3.11.5-1-ARCH #1 SMP PREEMPT Mon Oct 14 08:31:43 CEST
>> 2013 x86_64 GNU/Linux
> 
> ... about 26 hours. It seems after about 19 hours some (possibly)
> temperature related were causing mce hardware errors over a ten
> minute interval:
>> [70133.209654] mce: [Hardware Error]: Machine check events
>> logged [70376.833053] CPU2: Core temperature above threshold, cpu
>> clock throttled (total events = 30628) [70376.833056] CPU3: Core
>> temperature above threshold, cpu clock throttled (total events =
>> 30628) [70376.833061] CPU3: Package temperature above threshold,
>> cpu clock throttled (total events = 174126) [70376.833070] CPU2:
>> Package temperature above threshold, cpu clock throttled (total
>> events = 174126) [70376.833074] CPU1: Package temperature above
>> threshold, cpu clock throttled (total events = 174126) 
>> [70376.833077] CPU0: Package temperature above threshold, cpu
>> clock throttled (total events = 174124) [70376.835060] CPU3: Core
>> temperature/speed normal [70376.835064] CPU2: Core
>> temperature/speed normal [70376.835070] CPU2: Package
>> temperature/speed normal [70376.835074] CPU3: Package
>> temperature/speed normal [70376.835087] CPU1: Package
>> temperature/speed normal [70376.835090] CPU0: Package
>> temperature/speed normal [70433.353800] mce: [Hardware Error]:
>> Machine check events logged [70676.969501] CPU2: Core
>> temperature/speed normal [70676.969505] CPU3: Core
>> temperature/speed normal [70676.969511] CPU0: Package temperature
>> above threshold, cpu clock throttled (total events = 198545) 
>> [70676.969516] CPU1: Package temperature above threshold, cpu
>> clock throttled (total events = 198547) [70676.969522] CPU3:
>> Package temperature above threshold, cpu clock throttled (total
>> events = 198547) [70676.969545] CPU2: Package temperature above
>> threshold, cpu clock throttled (total events = 198547) 
>> [70676.970519] CPU0: Package temperature/speed normal 
>> [70676.970522] CPU2: Package temperature/speed normal 
>> [70676.970524] CPU3: Package temperature/speed normal 
>> [70676.970526] CPU1: Package temperature/speed normal 
>> [70733.497978] mce: [Hardware Error]: Machine check events
>> logged
> 
> As the system did not reboot, it were able to self heal.
> 

To Rasmus,

Can you run a logger to find out which programs causing your cpu
temperatures to rise?

Regards,
Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iF4EAREIAAYFAlRumB8ACgkQZ/Z80n6+J/YI8gD/bN3dHoENwzLxK33lS0GCF2zs
cn+8X3TDDqIMWSe8lEQBAJLcUwazQrJS7R4qTOZo8gbk2NE9wSoAo1t1jaeoolCB
=mirr
-----END PGP SIGNATURE-----


More information about the arch-general mailing list