[arch-general] linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK
David C. Rankin
drankinatty at suddenlinkmail.com
Thu Nov 10 14:16:16 EST 2011
On 11/10/2011 12:56 PM, David J. Haines wrote:
> On Thu, Nov 10, 2011 at 1:44 PM, Richard Schütz<r.schtz at t-online.de> wrote:
>> Am 10.11.2011 18:47, schrieb David C. Rankin:
>>> Upgraded 5 i686 boxes and 2 x86_64 boxes to linux 3.1-4 yesterday night.
>>> This morning, one i686 server is dead, other i686 box responded to xterm
>>> (return input) and then locked (ssh connection was left up after login
>>> to confirm reboot). Two other i686 boxes (under no load) still running.
>>> The boxes are remote. I'll pull the logs when I get to the site and
>>> send. Anybody else seeing this with linux 3.1-4?
>> I had lockups on my notebook  and netbook  during normal usage. Both
>> have a Intel processor. The AMD based desktop machine had no problems so
>> far. All systems are running linux 3.1-4 x86_64.
>>  http://pastebin.com/VAnTLKtP
>>  http://pastebin.com/64QKSJTN
>> Richard Schütz
> I'm getting lockups on an i5 box with Intel graphics running x86_64
> while I'm using it. This has been happening while I've been using the
> computer and has been happening since 3.0.7-1. 3.0.6-2, however,
> seemed perfectly fine.
> David J. Haines
> dhaines at gmail.com
Hmm.. Absolutely no help from the logs on the box that locked:
Nov 10 03:20:04 phoenix -- MARK --
Nov 10 03:25:34 phoenix dhcpd: DHCPREQUEST for 192.168.7.124 from
00:11:43:22:50:08 via eth0
Nov 10 03:25:34 phoenix dhcpd: DHCPACK on 192.168.7.124 to 00:11:43:22:50:08 via
Nov 10 12:44:33 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpuset
Nov 10 12:44:33 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpu
Obviously something occurred after 03:25:34, but no indication of what. The
second box I lost and thought was locked, wasn't locked, I just had the uncanny
coincidence of trying it during one of its spontaneous reboots due to hwclock
drift (I'll create a cron job to update this). The boxes are on the same LAN
subnet. The only SWAG I have is that once the box with the drifting clock got
far enough out of time any net communications with the box that locked may have
caused it to panic over the time sync issue.
(but that is wrong because once running, the sysclock is the only clock that
matters - right? But that can't be all wrong, otherwise there is no explanation
for the spontaneous reboot due to clock drift. A digital paradox so to speak :)
Richard, David - check your hardware clock "# hwclock -r" and compare that to
the time returned by "# date". If they are hours apart, then make sure your
sysclock is correct and set the hardware clock to your sysclock with "# hwclock
-w". Worth checking regardless. I know this used to be done on boot or shutdown
and I don't know why it isn't anymore. I'll do some more digging.
David C. Rankin, J.D.,P.E.
More information about the arch-general