[arch-general] apcupsd or rtkit - Caused time to go haywire on shutdown/reboot??
Guys, Here is one for the brain-trust to consider. Something totally screwed up the time on my computer during an apcupsd commanded shutdown and the subsequent reboot resulting in a 479 Minute time disparity. I don't think this was handled on the first reboot after the power outage because after reboot, I worked for 20-30 minutes before the OS imploded rebooted automatically. I suspect the time discrepancy somehow carried through the boot and went along until some kernel mechanism totally went bonkers and shut the system down. In 12 years of Linux use, this is the first time I've seen something like this happen. (note the time below) The log is: Oct 18 09:21:07 providence -- MARK -- Oct 18 09:24:18 providence apcupsd[949]: Power failure. Oct 18 09:24:24 providence apcupsd[949]: Running on UPS batteries. Oct 18 09:24:24 providence wall[11985]: wall: user root broadcasted 1 lines (56 chars) Oct 18 09:24:24 providence postfix/pickup[11962]: 3468078891: uid=0 from=<root> Oct 18 09:24:24 providence postfix/cleanup[11988]: 3468078891: message-id=<20111018142424.3468078891@providence.rlfpllc.com> Oct 18 09:24:24 providence postfix/qmgr[1338]: 3468078891: from=<root@rlfpllc.com>, size=1525, nrcpt=1 (queue active) Oct 18 09:24:24 providence postfix/smtp[11990]: 3468078891: to=<root@rlfpllc.com>, orig_to=<root>, relay=phoenix.rlfpllc.com[192.168.7.15]:25, delay=0.13, delays=0.07/0.01/0.02/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4699114AB47) Oct 18 09:24:24 providence postfix/qmgr[1338]: 3468078891: removed Oct 18 09:25:16 providence shutdown[13133]: shutting down for system halt Oct 18 09:25:17 providence init: Switching to runlevel: 0 Oct 18 14:25:18 providence rtkit-daemon[2768]: Exiting cleanly. Oct 18 14:25:18 providence rtkit-daemon[2768]: Demoting known real-time threads. Oct 18 14:25:18 providence rtkit-daemon[2768]: Successfully demoted thread 2766 of process 2766 (/usr/bin/pulseaudio (deleted)). Oct 18 14:25:18 providence rtkit-daemon[2768]: Demoted 1 threads. Oct 18 14:25:18 providence rtkit-daemon[2768]: Exiting watchdog thread. Oct 18 14:25:18 providence rtkit-daemon[2768]: Exiting canary thread. Oct 18 09:25:18 providence apcupsd[949]: apcupsd exiting, signal 15 Oct 18 09:25:19 providence apcupsd[949]: apcupsd shutdown succeeded Oct 18 09:25:19 providence nmbd[1339]: [2011/10/18 09:25:19.139254, 0] nmbd/nmbd.c:66(terminate) Oct 18 09:25:19 providence nmbd[1339]: Got SIGTERM: going down... Oct 18 09:25:19 providence postfix/postfix-script[13294]: stopping the Postfix mail system Oct 18 09:25:19 providence postfix/postfix-script[13297]: waiting for the Postfix mail system to terminate Oct 18 09:25:19 providence postfix/master[1330]: terminating on signal 15 Oct 18 09:25:22 providence ntpd[944]: ntpd exiting on signal 15 Oct 18 09:25:22 providence dhcpcd[13471]: sending signal 1 to pid 762 Oct 18 09:25:22 providence dhcpcd[762]: received SIGHUP, releasing Oct 18 09:25:22 providence dhcpcd[762]: eth0: releasing lease of 192.168.7.124 Oct 18 09:25:22 providence dhcpcd[13471]: waiting for pid 762 to exit Oct 18 09:25:23 providence dhcpcd[762]: eth0: removing interface Oct 18 09:25:23 providence kernel: [518721.438558] tg3 0000:02:00.0: PME# enabled Oct 18 09:25:23 providence kernel: [518721.438578] pcieport 0000:00:1c.0: wake-up capability enabled by ACPI Oct 18 09:25:25 providence ntfs-3g[524]: Unmounting /dev/sdb2 () Oct 18 09:25:25 providence syslog-ng[707]: Termination requested via signal, terminating; Oct 18 09:25:25 providence syslog-ng[707]: syslog-ng shutting down; version='3.2.4' Oct 18 17:28:06 providence kernel: [ 0.000000] Initializing cgroup subsys cpuset Oct 18 17:28:06 providence kernel: [ 0.000000] Initializing cgroup subsys cpu Oct 18 17:28:06 providence kernel: [ 0.000000] Linux version 3.0-ARCH (tobias@T-POWA-LX) (gcc version 4.6.1 20110819 (prerelease) (GCC) ) #1 SMP PREEMPT Fri Oct 7 10:11:05 UTC 2011 Oct 18 17:28:06 providence kernel: [ 0.000000] BIOS-provided physical RAM map: Oct 18 17:28:06 providence kernel: [ 0.000000] BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) I understand the jump from 09:25:17 to 14:25:18, that just reflects the tzoffset for CDT. What I can't explain is the 09:25:25 to 17:28:06 jump. The power was only out for 5 minutes and the box booted as soon as power was restored. The log entry after reboot that caught my eye was: Oct 18 11:19:01 providence crond[978]: time disparity of -479 minutes detected I don't have a clue what could have caused this time shift unless it was some spurious voltage issue. If not, then something is definitely funny with the time handling in either apcupsd or rtkit. What say the experts, problem or just weird voltage induced time skew? -- David C. Rankin, J.D.,P.E.
On (10/18/11 11:55), David C. Rankin wrote: -~> Guys, -~> -~> Here is one for the brain-trust to consider. Something totally -~> screwed up the time on my computer during an apcupsd commanded -~> shutdown and the subsequent reboot resulting in a 479 Minute time -~> disparity. I don't think this was handled on the first reboot -~> after the power outage because after reboot, I worked for 20-30 -~> minutes before the OS imploded rebooted automatically. I suspect -~> the time discrepancy somehow carried through the boot and went -~> along until some kernel mechanism totally went bonkers and shut -~> the system down. In 12 years of Linux use, this is the first time -~> I've seen something like this happen. (note the time below) The -~> log is: -~> -~> Oct 18 09:21:07 providence -- MARK -- -~> Oct 18 09:24:18 providence apcupsd[949]: Power failure. -~> Oct 18 09:24:24 providence apcupsd[949]: Running on UPS batteries. -~> Oct 18 09:24:24 providence wall[11985]: wall: user root -~> broadcasted 1 lines (56 chars) -~> Oct 18 09:24:24 providence postfix/pickup[11962]: 3468078891: uid=0 from=<root> -~> Oct 18 09:24:24 providence postfix/cleanup[11988]: 3468078891: -~> message-id=<20111018142424.3468078891@providence.rlfpllc.com> -~> Oct 18 09:24:24 providence postfix/qmgr[1338]: 3468078891: -~> from=<root@rlfpllc.com>, size=1525, nrcpt=1 (queue active) -~> Oct 18 09:24:24 providence postfix/smtp[11990]: 3468078891: -~> to=<root@rlfpllc.com>, orig_to=<root>, -~> relay=phoenix.rlfpllc.com[192.168.7.15]:25, delay=0.13, -~> delays=0.07/0.01/0.02/0.04, dsn=2.0.0, status=sent (250 2.0.0 Ok: -~> queued as 4699114AB47) -~> Oct 18 09:24:24 providence postfix/qmgr[1338]: 3468078891: removed -~> Oct 18 09:25:16 providence shutdown[13133]: shutting down for system halt -~> Oct 18 09:25:17 providence init: Switching to runlevel: 0 -~> Oct 18 14:25:18 providence rtkit-daemon[2768]: Exiting cleanly. -~> Oct 18 14:25:18 providence rtkit-daemon[2768]: Demoting known real-time threads. -~> Oct 18 14:25:18 providence rtkit-daemon[2768]: Successfully -~> demoted thread 2766 of process 2766 (/usr/bin/pulseaudio -~> (deleted)). -~> Oct 18 14:25:18 providence rtkit-daemon[2768]: Demoted 1 threads. -~> Oct 18 14:25:18 providence rtkit-daemon[2768]: Exiting watchdog thread. -~> Oct 18 14:25:18 providence rtkit-daemon[2768]: Exiting canary thread. -~> Oct 18 09:25:18 providence apcupsd[949]: apcupsd exiting, signal 15 -~> Oct 18 09:25:19 providence apcupsd[949]: apcupsd shutdown succeeded -~> Oct 18 09:25:19 providence nmbd[1339]: [2011/10/18 -~> 09:25:19.139254, 0] nmbd/nmbd.c:66(terminate) -~> Oct 18 09:25:19 providence nmbd[1339]: Got SIGTERM: going down... -~> Oct 18 09:25:19 providence postfix/postfix-script[13294]: stopping -~> the Postfix mail system -~> Oct 18 09:25:19 providence postfix/postfix-script[13297]: waiting -~> for the Postfix mail system to terminate -~> Oct 18 09:25:19 providence postfix/master[1330]: terminating on signal 15 -~> Oct 18 09:25:22 providence ntpd[944]: ntpd exiting on signal 15 -~> Oct 18 09:25:22 providence dhcpcd[13471]: sending signal 1 to pid 762 -~> Oct 18 09:25:22 providence dhcpcd[762]: received SIGHUP, releasing -~> Oct 18 09:25:22 providence dhcpcd[762]: eth0: releasing lease of 192.168.7.124 -~> Oct 18 09:25:22 providence dhcpcd[13471]: waiting for pid 762 to exit -~> Oct 18 09:25:23 providence dhcpcd[762]: eth0: removing interface -~> Oct 18 09:25:23 providence kernel: [518721.438558] tg3 0000:02:00.0: PME# enabled -~> Oct 18 09:25:23 providence kernel: [518721.438578] pcieport -~> 0000:00:1c.0: wake-up capability enabled by ACPI -~> Oct 18 09:25:25 providence ntfs-3g[524]: Unmounting /dev/sdb2 () -~> Oct 18 09:25:25 providence syslog-ng[707]: Termination requested -~> via signal, terminating; -~> Oct 18 09:25:25 providence syslog-ng[707]: syslog-ng shutting -~> down; version='3.2.4' -~> Oct 18 17:28:06 providence kernel: [ 0.000000] Initializing -~> cgroup subsys cpuset -~> Oct 18 17:28:06 providence kernel: [ 0.000000] Initializing cgroup subsys cpu -~> Oct 18 17:28:06 providence kernel: [ 0.000000] Linux version -~> 3.0-ARCH (tobias@T-POWA-LX) (gcc version 4.6.1 20110819 -~> (prerelease) (GCC) ) #1 SMP PREEMPT Fri Oct 7 10:11:05 UTC 2011 -~> Oct 18 17:28:06 providence kernel: [ 0.000000] BIOS-provided physical RAM map: -~> Oct 18 17:28:06 providence kernel: [ 0.000000] BIOS-e820: -~> 0000000000000000 - 00000000000a0000 (usable) -~> -~> I understand the jump from 09:25:17 to 14:25:18, that just -~> reflects the tzoffset for CDT. What I can't explain is the -~> 09:25:25 to 17:28:06 jump. The power was only out for 5 minutes -~> and the box booted as soon as power was restored. The log entry -~> after reboot that caught my eye was: -~> -~> Oct 18 11:19:01 providence crond[978]: time disparity of -479 minutes detected -~> -~> I don't have a clue what could have caused this time shift -~> unless it was some spurious voltage issue. If not, then something -~> is definitely funny with the time handling in either apcupsd or -~> rtkit. -~> -~> What say the experts, problem or just weird voltage induced time skew? -~> -~> -~> -- -~> David C. Rankin, J.D.,P.E. Well, a simple search shows that rtkit has an open bug report, https://bugzilla.redhat.com/show_bug.cgi?id=698040, which explains why you have its logs in UTC. Regarding the rest, are you saying the system spontaneously rebooted after a successfull recovery from power outage? Also, what does grep "system clock" /var/log/messages.log say? -- Leonid Isaev GnuPG key ID: 164B5A6D Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
On 10/18/2011 03:05 PM, Leonid Isaev wrote:
Well, a simple search shows that rtkit has an open bug report, https://bugzilla.redhat.com/show_bug.cgi?id=698040, which explains why you have its logs in UTC. Regarding the rest, are you saying the system spontaneously rebooted after a successfull recovery from power outage? Also, what does grep "system clock" /var/log/messages.log say?
Yes, after looking further, the rtc issue didn't bother me. I knew that was just timezone offset stuff. The big issue was the reboot and the 8 hour disparity instead of 5. The system clock information in the logs is puzzling as well: 15:45 providence:~> sudo grep 'system clock' /var/log/everything.log Oct 18 17:28:06 providence kernel: [ 1.676407] rtc_cmos 00:05: setting system clock to 2011-10-18 22:27:16 UTC (1318976836) Oct 18 19:15:10 providence kernel: [ 1.676460] rtc_cmos 00:05: setting system clock to 2011-10-19 00:14:30 UTC (1318983270) Oct 18 20:26:06 providence kernel: [ 1.676356] rtc_cmos 00:05: setting system clock to 2011-10-19 01:25:40 UTC (1318987540) Oct 18 20:28:10 providence kernel: [ 1.676507] rtc_cmos 00:05: setting system clock to 2011-10-19 01:27:56 UTC (1318987676) Oct 18 20:37:40 providence kernel: [ 1.676403] rtc_cmos 00:05: setting system clock to 2011-10-19 01:37:26 UTC (1318988246) That's 3 spontaneous reboots in 11 minutes. Something is really off. -- David C. Rankin, J.D.,P.E.
On (10/18/11 15:49), David C. Rankin wrote: -~> On 10/18/2011 03:05 PM, Leonid Isaev wrote: -~> >Well, a simple search shows that rtkit has an open bug report, -~> >https://bugzilla.redhat.com/show_bug.cgi?id=698040, which explains why you have -~> >its logs in UTC. Regarding the rest, are you saying the system spontaneously -~> >rebooted after a successfull recovery from power outage? Also, what does grep -~> >"system clock" /var/log/messages.log say? -~> -~> Yes, after looking further, the rtc issue didn't bother me. I knew -~> that was just timezone offset stuff. The big issue was the reboot -~> and the 8 hour disparity instead of 5. The system clock -~> information in the logs is puzzling as well: -~> -~> 15:45 providence:~> sudo grep 'system clock' /var/log/everything.log -~> Oct 18 17:28:06 providence kernel: [ 1.676407] rtc_cmos 00:05: -~> setting system clock to 2011-10-18 22:27:16 UTC (1318976836) -~> Oct 18 19:15:10 providence kernel: [ 1.676460] rtc_cmos 00:05: -~> setting system clock to 2011-10-19 00:14:30 UTC (1318983270) -~> Oct 18 20:26:06 providence kernel: [ 1.676356] rtc_cmos 00:05: -~> setting system clock to 2011-10-19 01:25:40 UTC (1318987540) -~> Oct 18 20:28:10 providence kernel: [ 1.676507] rtc_cmos 00:05: -~> setting system clock to 2011-10-19 01:27:56 UTC (1318987676) -~> Oct 18 20:37:40 providence kernel: [ 1.676403] rtc_cmos 00:05: -~> setting system clock to 2011-10-19 01:37:26 UTC (1318988246) -~> -~> That's 3 spontaneous reboots in 11 minutes. Something is really off. -~> -~> -- -~> David C. Rankin, J.D.,P.E. Nothing puzzling here: the kernel is working correctly but your RTC is wrong (no idea how this could happen though). If by "spontaneous" you mean "no errors in the logs", then it's a hardware problem. As a first step, I would kill apcupsd and get rid of UPS. -- Leonid Isaev GnuPG key ID: 164B5A6D Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
On 10/18/2011 04:33 PM, Leonid Isaev wrote:
-~> That's 3 spontaneous reboots in 11 minutes. Something is really off. -~> -~> -- -~> David C. Rankin, J.D.,P.E.
Nothing puzzling here: the kernel is working correctly but your RTC is wrong (no idea how this could happen though). If by "spontaneous" you mean "no errors in the logs", then it's a hardware problem. As a first step, I would kill apcupsd and get rid of UPS.
No errors in the logs, to the RTC looks like the culprit. After fixing the RTC, I haven't experienced any more of the reboots. Bizarre... This box has run fine for years, then following the power outage, the RTC went nuts. I think that was simply the issue. However, I don't think we will ever explain why the RTC became an issue that caused the reboots. Looking at all the logs, the RTC has been all over the place this past month: Oct 19 13:15:02 providence crond[955]: time disparity of 301 minutes detected Oct 18 12:42:00 providence crond[958]: time disparity of -479 minutes detected Oct 18 12:32:00 providence crond[1180]: time disparity of -479 minutes detected Oct 18 11:19:01 providence crond[978]: time disparity of -479 minutes detected Oct 18 09:44:01 providence crond[953]: time disparity of -478 minutes detected Oct 12 09:24:01 providence crond[942]: time disparity of -79 minutes detected Oct 7 10:55:59 providence crond[964]: time disparity of 253 minutes detected Oct 5 07:55:57 providence crond[921]: time disparity of 383 minutes detected Oct 3 16:23:01 providence crond[954]: time disparity of 301 minutes detected 300 is correct for the 5 hour offset for CDT, I can't explain the wild swings from Oct 3 to Oct 18. That is a 862 minute difference between Oct 5 and Oct 18. init scripts? kernels? The other box that experienced this problem had the following: Oct 5 08:27:32 localhost crond[789]: time disparity of -546 minutes detected Sep 27 22:18:01 localhost crond[794]: time disparity of -475 minutes detected both those dates must have been kernel updates, because that is the only time that box gets restarted. Strange, thanks for your help. -- David C. Rankin, J.D.,P.E.
On (10/19/11 14:19), David C. Rankin wrote: -~> On 10/18/2011 04:33 PM, Leonid Isaev wrote: -~> >-~> That's 3 spontaneous reboots in 11 minutes. Something is really off. -~> >-~> -~> >-~> -- -~> >-~> David C. Rankin, J.D.,P.E. -~> > -~> >Nothing puzzling here: the kernel is working correctly but your RTC is wrong -~> >(no idea how this could happen though). If by "spontaneous" you mean "no errors -~> >in the logs", then it's a hardware problem. As a first step, I would kill -~> >apcupsd and get rid of UPS. -~> -~> No errors in the logs, to the RTC looks like the culprit. After -~> fixing the RTC, I haven't experienced any more of the reboots. -~> Bizarre... This box has run fine for years, then following the -~> power outage, the RTC went nuts. I think that was simply the -~> issue. However, I don't think we will ever explain why the RTC -~> became an issue that caused the reboots. Looking at all the logs, -~> the RTC has been all over the place this past month: -~> -~> Oct 19 13:15:02 providence crond[955]: time disparity of 301 minutes detected -~> Oct 18 12:42:00 providence crond[958]: time disparity of -479 minutes detected -~> Oct 18 12:32:00 providence crond[1180]: time disparity of -479 minutes detected -~> Oct 18 11:19:01 providence crond[978]: time disparity of -479 minutes detected -~> Oct 18 09:44:01 providence crond[953]: time disparity of -478 minutes detected -~> Oct 12 09:24:01 providence crond[942]: time disparity of -79 minutes detected -~> Oct 7 10:55:59 providence crond[964]: time disparity of 253 minutes detected -~> Oct 5 07:55:57 providence crond[921]: time disparity of 383 minutes detected -~> Oct 3 16:23:01 providence crond[954]: time disparity of 301 minutes detected -~> -~> 300 is correct for the 5 hour offset for CDT, I can't explain the -~> wild swings from Oct 3 to Oct 18. That is a 862 minute difference -~> between Oct 5 and Oct 18. init scripts? kernels? -~> -~> The other box that experienced this problem had the following: -~> -~> Oct 5 08:27:32 localhost crond[789]: time disparity of -546 minutes detected -~> Sep 27 22:18:01 localhost crond[794]: time disparity of -475 minutes detected -~> -~> both those dates must have been kernel updates, because that is -~> the only time that box gets restarted. Strange, thanks for your -~> help. -~> -~> -- -~> David C. Rankin, J.D.,P.E. OK. But few remarks: (1) Crond or any other program including syslog must use the current timezone unless you specifically instructed it to use UTC, so 5h offsets are not fine. The rtkit stuff IS A BUG related to chroot being unable to set a correct TZ. (2) Either your motherboard is hosed or the RTC battery is discharged. Do you have a large time drift (like an hour per day) reported by the hwclock? If you have time offsets only when rebooted, I think the second option is more likely. -- Leonid Isaev GnuPG key ID: 164B5A6D Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
On Wed, Oct 19, 2011 at 14:46, Leonid Isaev <lisaev@umail.iu.edu> wrote:
On (10/19/11 14:19), David C. Rankin wrote: -~> On 10/18/2011 04:33 PM, Leonid Isaev wrote: -~> >-~> That's 3 spontaneous reboots in 11 minutes. Something is really off. -~> >-~> -~> >-~> -- -~> >-~> David C. Rankin, J.D.,P.E. -~> > -~> >Nothing puzzling here: the kernel is working correctly but your RTC is wrong -~> >(no idea how this could happen though). If by "spontaneous" you mean "no errors -~> >in the logs", then it's a hardware problem. As a first step, I would kill -~> >apcupsd and get rid of UPS. -~> -~> No errors in the logs, to the RTC looks like the culprit. After -~> fixing the RTC, I haven't experienced any more of the reboots. -~> Bizarre... This box has run fine for years, then following the -~> power outage, the RTC went nuts. I think that was simply the -~> issue. However, I don't think we will ever explain why the RTC -~> became an issue that caused the reboots. Looking at all the logs, -~> the RTC has been all over the place this past month: -~> -~> Oct 19 13:15:02 providence crond[955]: time disparity of 301 minutes detected -~> Oct 18 12:42:00 providence crond[958]: time disparity of -479 minutes detected -~> Oct 18 12:32:00 providence crond[1180]: time disparity of -479 minutes detected -~> Oct 18 11:19:01 providence crond[978]: time disparity of -479 minutes detected -~> Oct 18 09:44:01 providence crond[953]: time disparity of -478 minutes detected -~> Oct 12 09:24:01 providence crond[942]: time disparity of -79 minutes detected -~> Oct 7 10:55:59 providence crond[964]: time disparity of 253 minutes detected -~> Oct 5 07:55:57 providence crond[921]: time disparity of 383 minutes detected -~> Oct 3 16:23:01 providence crond[954]: time disparity of 301 minutes detected -~> -~> 300 is correct for the 5 hour offset for CDT, I can't explain the -~> wild swings from Oct 3 to Oct 18. That is a 862 minute difference -~> between Oct 5 and Oct 18. init scripts? kernels? -~> -~> The other box that experienced this problem had the following: -~> -~> Oct 5 08:27:32 localhost crond[789]: time disparity of -546 minutes detected -~> Sep 27 22:18:01 localhost crond[794]: time disparity of -475 minutes detected -~> -~> both those dates must have been kernel updates, because that is -~> the only time that box gets restarted. Strange, thanks for your -~> help. -~> -~> -- -~> David C. Rankin, J.D.,P.E.
OK. But few remarks: (1) Crond or any other program including syslog must use the current timezone unless you specifically instructed it to use UTC, so 5h offsets are not fine. The rtkit stuff IS A BUG related to chroot being unable to set a correct TZ.
(2) Either your motherboard is hosed or the RTC battery is discharged. Do you have a large time drift (like an hour per day) reported by the hwclock? If you have time offsets only when rebooted, I think the second option is more likely.
-- Leonid Isaev GnuPG key ID: 164B5A6D Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
Late to the party, but I just had a box jump off by 360 minutes. It was used this morning then sat all day. Bios is set to utc, rc.conf is set for utc, and time zone is correct. HARDWARECLOCK="UTC" TIMEZONE="America/Chicago" Local time was 23:04 Nov 20, and the box showed 05:04 Nov 21. I updated the box yesterday. Turns out there was an error in /etc/ntp.conf. I didn't check file and line 37 [ Allow localhost ] caused ntpd to through a shoe. Proof that laziness is not always the best policy. This may be totally unrelated to your problem but thought I would pass this on. Myra -- Life's fun when your sick and psychotic!
On 10/18/2011 11:55 AM, David C. Rankin wrote:
Guys,
Here is one for the brain-trust to consider. Something totally screwed up the time on my computer during an apcupsd commanded shutdown and the subsequent reboot resulting in a 479 Minute time disparity. I don't think this was handled on the first reboot after the power outage because after reboot, I worked for 20-30 minutes before the OS imploded rebooted automatically. I suspect the time discrepancy somehow carried through the boot and went along until some kernel mechanism totally went bonkers and shut the system down. <snip>
I don't have a clue what could have caused this time shift unless it was some spurious voltage issue. If not, then something is definitely funny with the time handling in either apcupsd or rtkit.
What say the experts, problem or just weird voltage induced time skew?
I think I have found part of the problem, but I cannot explain why the hwclock wasn't in sync with the sysclock. The box has now rebooted 4 more times, each time there is a time discrepancy of ~479 minutes. [15:06 providence:/home/david] # grep disparity /var/log/everything.log Oct 18 09:44:01 providence crond[953]: time disparity of -478 minutes detected Oct 18 11:19:01 providence crond[978]: time disparity of -479 minutes detected Oct 18 12:32:00 providence crond[1180]: time disparity of -479 minutes detected Oct 18 12:42:00 providence crond[958]: time disparity of -479 minutes detected hwclock was way out: [14:44 providence:/home/david] # hwclock -r Tue 18 Oct 2011 10:44:19 PM CDT -0.436795 seconds huh??? [14:44 providence:/home/david] # hwclock -w [14:44 providence:/home/david] # hwclock -r Tue 18 Oct 2011 02:44:52 PM CDT -0.578618 seconds I don't recall the specifics, but I recall a discussion about a change in the way the OS keeps the hwclock in sync. I could be way off, but doesn't Arch maintain the hwclock with --systohc? With ntp running the sysclock was always correct after it synced, but for some reason, the hwclock had drifted out of sync by this much? Granted, this box usually runs from kernel update to kernel update without a reboot, but still is it normal to accumulate that much hwclock drift? I have checked 5 other Arch boxes and 4 were fine, but I have found another with a hwclock issue: date && sudo hwclock -r Tue Oct 18 14:58:24 CDT 2011 Tue Oct 18 23:58:00 2011 -0.438598 seconds Of the 6 boxes checked, 3 were Dell's and 2 out of 3 Dells had the hardware clock skew. I don't know what to make of it. Bad hardware? Anybody else seeing anything like this? Sorry for the noise, but if this is something other than an "idiot, check your hwclock in the BIOS issue.." I would welcome any other thoughts or proposed checks. Thanks. -- David C. Rankin, J.D.,P.E.
participants (3)
-
David C. Rankin
-
Leonid Isaev
-
Myra Nelson