On 3/18/13, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
Guys,
I have a server that will hardlock every week or two. The log entries always look the same. There is a postfix/smtp transaction in progress when the lock occurs. After the lockup you are dropped to maintenance mode on next reboot and there are always 4 inodes that are part of an orphaned link list that are fixed with fsck and then the machine reboot normally. The log entries just prior to the lockup look like this:
Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max connection rate 1/60s for (smtp:213.199.243.30) at Mar 17 16:01:52 Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max connection count 1 for (smtp:213.199.243.30) at Mar 17 16:01:52 Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max cache size 1 at Mar 17 16:01:52 Mar 17 16:14:52 phoenix postfix/qmgr[1019]: 81963E9720: from=<inconsiderableka04@gil.com.au>, size=7485, nrcpt=1 (queue active) Mar 17 16:14:52 phoenix postfix/smtp[26899]: 81963E9720: to=<**snipped**@3111skyline.com>, relay=3111skyline.com[66.76.63.120]:25, delay=1118, delays=1118/0.02/0.16/0.17, dsn=4.7.1, status=deferred (host 3111skyline.com[66.76.63.120] said: 450 4.7.1 Client host rejected: cannot find your hostname, [66.76.63.60] (in reply to RCPT TO command)) Mar 18 07:34:19 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpuset Mar 18 07:34:19 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpu Mar 18 07:34:19 phoenix kernel: [ 0.000000] Linux version 3.4.7-1-ARCH (tobias@T-POWA-LX) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1
I cannot find any connection between the postfix/smtp and the lockup searching the web. So I'm asking here, has anyone else seen a lockup where the last log entry is a postfix/smtp entry and then experienced a 4 orphaned inode error on reboot? This has occurred multiple times over the past year or so. memtest completes without error and the drives show no other errors or issues. Drive temps are stable at:
/dev/sda: ST3250410AS: 35°C /dev/sdb: ST3250410AS: 39°C
Any feedback welcomed. Otherwise, it looks like this has to be hardware.
What about df/tmpfs overflows etc, to cover the obvious sources of error...? Do you have that email 81963E9720 somewhere in lost+found or could otherwise make sure it survives the crash? I would be surprised if that email is making things crash, but who knows. One of the things that caught my eye was the 450 error for which a quick google turned me to [1]... As this is something my boss also was fighting with this week, I thought I'd read it quickly - it doesn't look that hard if you compute English, which the people I work with don't... For examining this stuff mor thoroughly, we'd need your postfix config, said main.cf file would be most likely to be revealing. cheers! mar77i [1] http://www.postfix.org/ADDRESS_VERIFICATION_README.html