-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 11/20/2014 08:24 PM, Rasmus Liland wrote:
On 2014-11-19 22:53, Rasmus Liland wrote:
On 2014-11-19 21:41, Mark Lee wrote:
To Rasmus,
Can you run the parts where it says "run the abvoe through mcelog --ascii" and post the contents?
Regards, Mark
I'm attaching the output of mcelog to this message. However, I'm unsure of the usefulness of the output.
I checked dmesg now after having uptime of ...
rasmus@angrist ~ % uptime 02:04:01 up 1 day, 7:35, 1 user, load average: 0.04, 0.15, 0.40 rasmus@angrist ~ % uname -a Linux angrist 3.11.5-1-ARCH #1 SMP PREEMPT Mon Oct 14 08:31:43 CEST 2013 x86_64 GNU/Linux
... about 26 hours. It seems after about 19 hours some (possibly) temperature related were causing mce hardware errors over a ten minute interval:
[70133.209654] mce: [Hardware Error]: Machine check events logged [70376.833053] CPU2: Core temperature above threshold, cpu clock throttled (total events = 30628) [70376.833056] CPU3: Core temperature above threshold, cpu clock throttled (total events = 30628) [70376.833061] CPU3: Package temperature above threshold, cpu clock throttled (total events = 174126) [70376.833070] CPU2: Package temperature above threshold, cpu clock throttled (total events = 174126) [70376.833074] CPU1: Package temperature above threshold, cpu clock throttled (total events = 174126) [70376.833077] CPU0: Package temperature above threshold, cpu clock throttled (total events = 174124) [70376.835060] CPU3: Core temperature/speed normal [70376.835064] CPU2: Core temperature/speed normal [70376.835070] CPU2: Package temperature/speed normal [70376.835074] CPU3: Package temperature/speed normal [70376.835087] CPU1: Package temperature/speed normal [70376.835090] CPU0: Package temperature/speed normal [70433.353800] mce: [Hardware Error]: Machine check events logged [70676.969501] CPU2: Core temperature/speed normal [70676.969505] CPU3: Core temperature/speed normal [70676.969511] CPU0: Package temperature above threshold, cpu clock throttled (total events = 198545) [70676.969516] CPU1: Package temperature above threshold, cpu clock throttled (total events = 198547) [70676.969522] CPU3: Package temperature above threshold, cpu clock throttled (total events = 198547) [70676.969545] CPU2: Package temperature above threshold, cpu clock throttled (total events = 198547) [70676.970519] CPU0: Package temperature/speed normal [70676.970522] CPU2: Package temperature/speed normal [70676.970524] CPU3: Package temperature/speed normal [70676.970526] CPU1: Package temperature/speed normal [70733.497978] mce: [Hardware Error]: Machine check events logged
As the system did not reboot, it were able to self heal.
To Rasmus, Can you run a logger to find out which programs causing your cpu temperatures to rise? Regards, Mark -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iF4EAREIAAYFAlRumB8ACgkQZ/Z80n6+J/YI8gD/bN3dHoENwzLxK33lS0GCF2zs cn+8X3TDDqIMWSe8lEQBAJLcUwazQrJS7R4qTOZo8gbk2NE9wSoAo1t1jaeoolCB =mirr -----END PGP SIGNATURE-----