[arch-general] System crash troubleshooting help?
Good evening all, I was just watching Linux Sucks (https://www.youtube.com/watch?v=5pOxlazS3zs) on Youtube when my system froze. It was the whole system, I could for example not switch to a TTY. The audio was looping a one second thing. My troubleshooting skills are no pride to me, so I though I could just ask you for help. These are the last entries in the journal (in reverse). Did my system just crash because of a DHCP request? Or should I focus on the kwin error, which is the very last one? I will probably not be available to answer to anything during the next 20 hours. -- Vänligen, Aron Widforss ---- Jan 06 22:31:19 hostname kwin_x11[899]: QXcbConnection: XCB error: 3 (BadWindow), sequence: 4113, resource id: 2097154, major code: 18 (ChangeProperty), minor code: 0 Jan 06 22:31:19 hostname systemd[1]: Started Network Manager Script Dispatcher Service. Jan 06 22:31:19 hostname dbus[635]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Jan 06 22:31:19 hostname kdeinit5[864]: networkmanager-qt: virtual void NetworkManager::DevicePrivate::propertyChanged(const QString&, const QVariant&) Unhandled property "Metered" Jan 06 22:31:19 hostname systemd[1]: Starting Network Manager Script Dispatcher Service... Jan 06 22:31:19 hostname dbus[635]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' Jan 06 22:31:19 hostname NetworkManager[637]: <info> (wlp2s0): Activation: successful, device activated. Jan 06 22:31:19 hostname NetworkManager[637]: <info> Policy set 'ssid' (wlp2s0) as default for IPv4 routing and DNS. Jan 06 22:31:19 hostname NetworkManager[637]: <info> NetworkManager state is now CONNECTED_GLOBAL Jan 06 22:31:19 hostname dhclient[1074]: bound to 192.168.1.10 -- renewal in 37620 seconds. Jan 06 22:31:19 hostname NetworkManager[637]: <info> NetworkManager state is now CONNECTED_LOCAL Jan 06 22:31:19 hostname NetworkManager[637]: <info> (wlp2s0): device state change: secondaries -> activated (reason 'none') [90 100 0] Jan 06 22:31:19 hostname NetworkManager[637]: <info> (wlp2s0): device state change: ip-check -> secondaries (reason 'none') [80 90 0] Jan 06 22:31:19 hostname NetworkManager[637]: <info> (wlp2s0): device state change: ip-config -> ip-check (reason 'none') [70 80 0] Jan 06 22:31:19 hostname NetworkManager[637]: <info> (wlp2s0): DHCPv4 state changed unknown -> bound Jan 06 22:31:19 hostname NetworkManager[637]: <info> nameserver '192.168.1.1' Jan 06 22:31:19 hostname NetworkManager[637]: <info> lease time 86400 Jan 06 22:31:19 hostname NetworkManager[637]: <info> server identifier 192.168.1.1 Jan 06 22:31:19 hostname NetworkManager[637]: <info> gateway 192.168.1.1 Jan 06 22:31:19 hostname NetworkManager[637]: <info> plen 24 (255.255.255.0) Jan 06 22:31:19 hostname NetworkManager[637]: <info> address 192.168.1.10 Jan 06 22:31:19 hostname dhclient[1074]: DHCPACK from 192.168.1.1 Jan 06 22:31:19 hostname dhclient[1074]: DHCPOFFER from 192.168.1.1 Jan 06 22:31:19 hostname dhclient[1074]: DHCPREQUEST on wlp2s0 to 255.255.255.255 port 67 Jan 06 22:31:18 hostname dhclient[1074]: DHCPDISCOVER on wlp2s0 to 255.255.255.255 port 67 interval 3 Jan 06 22:31:18 hostname NetworkManager[637]: <info> (wlp2s0): DHCPv4 state changed expire -> unknown Jan 06 22:31:18 hostname NetworkManager[637]: <info> (wlp2s0): DHCPv4 state changed unknown -> expire Jan 06 22:31:18 hostname kernel: ll header: 00000000: ff ff ff ff ff ff e0 91 f5 7a fe 9a 08 00 .........z.... Jan 06 22:31:18 hostname kernel: IPv4: martian source 255.255.255.255 from 192.168.1.1, on dev wlp2s0 Jan 06 22:31:18 hostname dhclient[1074]: DHCPNAK from 192.168.1.1 Jan 06 22:31:18 hostname dhclient[1074]: DHCPREQUEST on wlp2s0 to 255.255.255.255 port 67
That's really strange. There are lots of reports of that kwin error online, but none reporting a system lockup like that. I don't see anything there that would cause this. Unfortunately you might need more than the last two seconds of logs to find what's wrong. Can you try to reproduce it?
On 06-01-2016 21:54, Aron Widforss wrote:
Good evening all,
I was just watching Linux Sucks (https://www.youtube.com/watch?v=5pOxlazS3zs) on Youtube when my system froze. It was the whole system, I could for example not switch to a TTY. The audio was looping a one second thing.
My troubleshooting skills are no pride to me, so I though I could just ask you for help. These are the last entries in the journal (in reverse). Did my system just crash because of a DHCP request? Or should I focus on the kwin error, which is the very last one?
I will probably not be available to answer to anything during the next 20 hours.
You'll have to post the full output of dmesg or journalctl so that we can see a larger picture. Do put it on pastebin or something like that and please don't post it in reverse order, it makes it harder to track the events that lead to the crash. -- Mauro Santos
in general, such a lockup means a kernel level lockup or panic, but panics are probably displayed on the console. What hardware do you have? Occassionally something like that happens to me and it always happens when sound is playing, so it may be the same. W dniu 06.01.2016 o 23:21, Mauro Santos pisze:
On 06-01-2016 21:54, Aron Widforss wrote:
Good evening all,
I was just watching Linux Sucks (https://www.youtube.com/watch?v=5pOxlazS3zs) on Youtube when my system froze. It was the whole system, I could for example not switch to a TTY. The audio was looping a one second thing.
My troubleshooting skills are no pride to me, so I though I could just ask you for help. These are the last entries in the journal (in reverse). Did my system just crash because of a DHCP request? Or should I focus on the kwin error, which is the very last one?
I will probably not be available to answer to anything during the next 20 hours.
You'll have to post the full output of dmesg or journalctl so that we can see a larger picture. Do put it on pastebin or something like that and please don't post it in reverse order, it makes it harder to track the events that lead to the crash.
On Wed, Jan 06, 2016 at 10:54:07PM +0100, Aron Widforss wrote:
Good evening all,
I was just watching Linux Sucks (https://www.youtube.com/watch?v=5pOxlazS3zs) on Youtube when my system froze. It was the whole system, I could for example not switch to a TTY. The audio was looping a one second thing.
My troubleshooting skills are no pride to me, so I though I could just ask you for help. These are the last entries in the journal (in reverse). Did my system just crash because of a DHCP request? Or should I focus on the kwin error, which is the very last one?
This seems like a very strange error. Unfortunately the log you provided is not enough to troubleshoot this. If you haven't turned off/reset your system, I suggest that you run 'journalctl -xb', put it in a pastebin and send it. If you have, then you could try 'journalctl -xb -1' (or +1, i don't remember) which would present the logs for the previous boot. Hope this helps, Jonathan
Hi, I will not even try to explain how my incompetense made me think that those were the last entries, the system was rebooted much, much earlier. So forget anything regarding the logs i just gave you. I uploaded the full log to http://adelie.antarkt.is/trouble/journal.txt (I could not upload a log that large to pastebin apparently). Vänligen, Aron Widforss
On 06-01-2016 23:08, Aron Widforss wrote:
Hi,
I will not even try to explain how my incompetense made me think that those were the last entries, the system was rebooted much, much earlier. So forget anything regarding the logs i just gave you.
I uploaded the full log to http://adelie.antarkt.is/trouble/journal.txt (I could not upload a log that large to pastebin apparently).
Vänligen, Aron Widforss
It would be very helpful if you said around which time the machine crashed as that is a lot of log to go through. -- Mauro Santos
On बुधवार, ६ जानेवारी, २०१६ १०:५४:०७ म.उ. IST Aron Widforss wrote:
Good evening all,
I was just watching Linux Sucks (https://www.youtube.com/watch?v=5pOxlazS3zs) on Youtube when my system froze. It was the whole system, I could for example not switch to a TTY. The audio was looping a one second thing.
My troubleshooting skills are no pride to me, so I though I could just ask you for help. These are the last entries in the journal (in reverse). Did my system just crash because of a DHCP request? Or should I focus on the kwin error, which is the very last one?
I will probably not be available to answer to anything during the next 20 hours.
most importantly, can you reproduce the lockup? That would be needed to try various fixes/workarounds You have i915 graphics. Are you using uxa? That could avoid some lockups. $ cat /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" Option "AccelMethod" "uxa" EndSection and are you using any window decoration other than default? I found that plastic can cause kwin crash rather regularly but it was not a lockup. Furthermore, in kwin settings, what opengl backend is configured? Does changing to a lower version help? Newer standards of opengl could expose some bugs in the stack. HTH. -- Regards Shridhar
On 01/06/2016 11:25 PM, Michał Zegan wrote:
in general, such a lockup means a kernel level lockup or panic, but panics are probably displayed on the console. What hardware do you have? Occassionally something like that happens to me and it always happens when sound is playing, so it may be the same.
It's a XPS 13 9343. ---- On 01/07/2016 12:50 AM, Mauro Santos wrote:
It would be very helpful if you said around which time the machine crashed as that is a lot of log to go through.
That would be at the end of the log. The log is over a single boot. ---- On 01/07/2016 02:42 AM, Shridhar Daithankar wrote:
most importantly, can you reproduce the lockup? That would be needed to try various fixes/workarounds
I unfortunately cannot. It has happened maybe once a week for a month now. I _think_ that it mostly happens after heavy load and memory usage (mostly Windows 10 WMs), but the CPU time and memory allocation was down to normal at least during the last crash. I think that I should apply the fixes you suggest, run some stress tests and, if I cannot provoke the crash to occur, wait a couple of weeks.
Furthermore, in kwin settings, what opengl backend is configured? Does changing to a lower version help? Newer standards of opengl could expose some bugs in the stack.
I'm not completly sure what to look for here, but KInfoCenter says that OpenGL Version is 3.0 Mesa 11.1.0. I guess I could try downgrading this.
You have i915 graphics. Are you using uxa? That could avoid some lockups.
$ cat /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" Option "AccelMethod" "uxa" EndSection
Added now.
and are you using any window decoration other than default? I found that plastic can cause kwin crash rather regularly but it was not a lockup.
Nope, standard Plasma, straight out of the box. Vänligen, Aron Widforss
I am specifically asking about the exact model of your graphic card, and the model of your sound card. As said, I had something similar few times, where sound was playing, suddenly started to loop infinitely, and both screen and keyboard/mouse become frozen. Even magic sysrq keys did not work, and that means a kernel level lockup, but no messages were displayed on the screen so (probably) not a panic. If your problem is the same it would be really nice to somehow catch it, especially that in my case it is impossible to reproduce at all. it doesn't happen for few months and then it starts happening once a day for a week.W dniu 07.01.2016 o 23:57, Aron Widforss pisze:
On 01/06/2016 11:25 PM, Michał Zegan wrote:
in general, such a lockup means a kernel level lockup or panic, but panics are probably displayed on the console. What hardware do you have? Occassionally something like that happens to me and it always happens when sound is playing, so it may be the same.
It's a XPS 13 9343.
----
On 01/07/2016 12:50 AM, Mauro Santos wrote:
It would be very helpful if you said around which time the machine crashed as that is a lot of log to go through.
That would be at the end of the log. The log is over a single boot.
----
On 01/07/2016 02:42 AM, Shridhar Daithankar wrote:
most importantly, can you reproduce the lockup? That would be needed to try various fixes/workarounds
I unfortunately cannot. It has happened maybe once a week for a month now. I _think_ that it mostly happens after heavy load and memory usage (mostly Windows 10 WMs), but the CPU time and memory allocation was down to normal at least during the last crash.
I think that I should apply the fixes you suggest, run some stress tests and, if I cannot provoke the crash to occur, wait a couple of weeks.
Furthermore, in kwin settings, what opengl backend is configured? Does changing to a lower version help? Newer standards of opengl could expose some bugs in the stack.
I'm not completly sure what to look for here, but KInfoCenter says that OpenGL Version is 3.0 Mesa 11.1.0. I guess I could try downgrading this.
You have i915 graphics. Are you using uxa? That could avoid some lockups.
$ cat /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" Option "AccelMethod" "uxa" EndSection
Added now.
and are you using any window decoration other than default? I found that plastic can cause kwin crash rather regularly but it was not a lockup.
Nope, standard Plasma, straight out of the box.
Vänligen, Aron Widforss
participants (6)
-
Aron Widforss
-
Christopher Mullins
-
Jonathan Horacio Villatoro Córdoba
-
Mauro Santos
-
Michał Zegan
-
Shridhar Daithankar