[arch-general] Process 13696 (systemctl) of user 0 dumped core ??
All, After bandaiding my server back together by putting a 4-port pci-sata controller in it to work around the failed onboard disk controller, the system is up and running fine. In the BIOS, currently the onboard sata controller is 'Enabled', but each of the sata ports is 'Disabled'. When I check the status of something with systemclt, I get an odd error at the end of each command, eg: [15:47 phoinix:~/.ssh] # sc status smbd ● smbd.service - Samba SMB/CIFS server Loaded: loaded (/usr/lib/systemd/system/smbd.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2015-08-22 22:57:26 CDT; 1 day 16h ago Main PID: 542 (smbd) CGroup: /system.slice/smbd.service ├─542 /usr/bin/smbd -D └─559 /usr/bin/smbd -D Bus error (core dumped) Looking at the journal and looking at the core dumps, the only other process that is implicated is: Cannot add dependency job for unit cups.socket, ignoring: Unit cups.socket failed to load: No such file or directory. Nothing else is generating a core dump. But each time I check the status of a process, it ends with: Bus error (core dumped) The only other thing I see in the journal that may or may not be related is: Aug 24 14:21:58 phoinix systemd[13187]: pam_unix(systemd-user:session): session opened for user root by (uid=0) Aug 24 14:21:58 phoinix systemd[13187]: Unit type .busname is not supported on this system. I don't know if that's related, but it was the only thing else tangentially related to 'bus'. Looking at the core dump list with 'coredumpctl list' show a handful of files: [17:46 phoinix:~/.ssh] # coredumpctl list TIME PID UID GID SIG PRESENT EXE Mon 2015-04-06 19:00:15 CDT 342 0 0 11 /usr/bin/cupsd Tue 2015-05-26 13:15:01 CDT 23265 0 0 11 /usr/bin/crond Tue 2015-05-26 14:01:01 CDT 23563 0 0 11 /usr/bin/crond Tue 2015-05-26 14:05:01 CDT 23593 0 0 11 /usr/bin/crond Sun 2015-08-23 05:51:43 CDT 3151 0 0 7 * /usr/bin/systemctl Sun 2015-08-23 05:52:16 CDT 3179 0 0 7 * /usr/bin/systemctl Sun 2015-08-23 07:11:33 CDT 3639 0 0 7 * /usr/bin/systemctl Sun 2015-08-23 07:12:31 CDT 3652 0 0 7 * /usr/bin/systemctl Mon 2015-08-24 15:30:11 CDT 13565 0 0 7 * /usr/bin/systemctl Mon 2015-08-24 15:32:05 CDT 13580 0 0 7 * /usr/bin/systemctl Mon 2015-08-24 15:53:37 CDT 13696 0 0 7 * /usr/bin/systemctl Looking at the dumps in gdb shows: [17:47 phoinix:~/.ssh] # coredumpctl gdb 13696 PID: 13696 (systemctl) UID: 0 (root) GID: 0 (root) Signal: 7 (BUS) Timestamp: Mon 2015-08-24 15:53:37 CDT (1h 54min ago) Command Line: systemctl status smbd Executable: /usr/bin/systemctl Control Group: /user.slice/user-1000.slice/session-c2.scope Unit: session-c2.scope Slice: user-1000.slice Session: c2 Owner UID: 1000 (david) Boot ID: aeecdf7479ea4b43aae7f1b9b83b2502 Machine ID: 8d32bcc3152b4a1f87c4d71f948f93fb Hostname: phoinix Coredump: /var/lib/systemd/coredump/core.systemctl.0.aeecdf7479ea4b43aae7f1b9b83b2502.13696.1440449617000000.lz4 Message: Process 13696 (systemctl) of user 0 dumped core. <snip> (gdb) bt #0 0x00007f353981becf in ?? () #1 0x00007f3539801c09 in ?? () #2 0x00007f3539801d38 in ?? () #3 0x00007f3539801b64 in ?? () #4 0x00007f3539801d38 in ?? () #5 0x00007f3539801b64 in ?? () #6 0x00007f353980310e in ?? () #7 0x00007f35397f4080 in ?? () #8 0x00007f353983340b in ?? () #9 0x00007f35397ed1d1 in ?? () #10 0x00007f35397e2414 in ?? () #11 0x00007f35386f5790 in __libc_start_main () from /usr/lib/libc.so.6 #12 0x00007f35397e3049 in ?? () (gdb) frame 0 #0 0x00007f353981becf in ?? () (gdb) info frame Stack level 0, frame at 0x7ffed3907080: rip = 0x7f353981becf; saved rip = 0x7f3539801c09 called by frame at 0x7ffed3907160 Arglist at 0x7ffed3906fd8, args: Locals at 0x7ffed3906fd8, Previous frame's sp is 0x7ffed3907080 Saved registers: rbx at 0x7ffed3907048, rbp at 0x7ffed3907050, r12 at 0x7ffed3907058, r13 at 0x7ffed3907060, r14 at 0x7ffed3907068, r15 at 0x7ffed3907070, rip at 0x7ffed3907078 (gdb) quit I haven't seen or noticed this happening before, but obviously the first core dump was back in April related to cups. The question is "What should I check?" and "Does any of this look related to BIOS settings and the new disk controller?" (that looks more doubtful after looking over all the information) Anybody have experience with this type thing? -- David C. Rankin, J.D.,P.E.
Mon 2015-08-24 15:32:05 CDT 13580 0 0 7 * /usr/bin/systemctl Mon 2015-08-24 15:53:37 CDT 13696 0 0 7 * /usr/bin/systemctl
I haven't seen or noticed this happening before, but obviously the first core dump was back in April related to cups. The question is "What should I check?" and "Does any of this look related to BIOS settings and the new disk controller?" (that looks more doubtful after looking over all the information)
Anybody have experience with this type thing?
are you running everything Arch up-to-date vanilla or do you have some custom stuff? if you're vanilla, run memtest on the machine. -- damjan
On 25 August 2015 at 01:17, Damjan Georgievski <gdamjan@gmail.com> wrote:
Mon 2015-08-24 15:32:05 CDT 13580 0 0 7 * /usr/bin/systemctl Mon 2015-08-24 15:53:37 CDT 13696 0 0 7 * /usr/bin/systemctl
I haven't seen or noticed this happening before, but obviously the first core dump was back in April related to cups. The question is "What should I check?" and "Does any of this look related to BIOS settings and the new disk controller?" (that looks more doubtful after looking over all the information)
Anybody have experience with this type thing?
are you running everything Arch up-to-date vanilla or do you have some custom stuff? if you're vanilla, run memtest on the machine.
also, make sure to: update the bios and do you have the inte-ucode installed and configured (this is very important for certain cpus)? https://wiki.archlinux.org/index.php/Microcode -- damjan
On 08/24/2015 06:17 PM, Damjan Georgievski wrote:
I haven't seen or noticed this happening before, but obviously the first
core dump was back in April related to cups. The question is "What should I check?" and "Does any of this look related to BIOS settings and the new disk controller?" (that looks more doubtful after looking over all the information)
Anybody have experience with this type thing?
are you running everything Arch up-to-date vanilla or do you have some custom stuff? if you're vanilla, run memtest on the machine.
All vanilla, I'll double-check with memtest. After putting the pci-sata controller in, I have noticed an IRQ 13 error on boot related to the failed onboard disk controller. I suspect disabling the onboard controller completely will eliminate that error. (next snip)
are you running everything Arch up-to-date vanilla or do you have some
custom stuff? if you're vanilla, run memtest on the machine. also, make sure to: update the bios and do you have the inte-ucode installed and configured (this is very important for certain cpus)? https://wiki.archlinux.org/index.php/Microcode
Thankfully, this is a situation where I have an older AMD Phenom-9850 Black in the box, so linux-firmware should catch it. Thanks for your reply. -- David C. Rankin, J.D.,P.E.
participants (2)
-
Damjan Georgievski
-
David C. Rankin