[arch-general] khugepaged hangs and filesystem unresponsive
Good evening, I just experienced a major problem with my system while listening to a music file in mpd from an xfs filesystem over a mdadm raid6. A kernel error was thrown, with the following error.log entry: output: /var/log/error.log
Aug 26 23:34:50 localhost kernel: [283781.061258] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:50 localhost kernel: [283781.062268] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:50 localhost kernel: [283781.063273] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:50 localhost kernel: [283781.064245] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:51 localhost kernel: [283782.058901] timeout: still 1 active urbs.. Aug 26 23:38:48 localhost kernel: [284019.080666] INFO: task mpd:707 blocked for more than 120 seconds. Aug 26 23:38:48 localhost kernel: [284019.080696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071419] INFO: task khugepaged:32 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071589] INFO: task mpd:525 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071721] INFO: task mpd:707 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071943] INFO: task mplayer:28316 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071968] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062189] INFO: task khugepaged:32 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062358] INFO: task mpd:525 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062489] INFO: task mpd:702 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062688] INFO: task mpd:703 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062829] INFO: task mpd:704 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Attempts to access other files on the same filesystem after the incident caused the applications used to also go into interruptible sleep (see the mplayer processes that appear later in the log). I was forced to kill and unmount what I could, then force the system down. Afterwards, I could replicate the error by attempting to read the file in question at the same point Even if you have no solution, pointing me towards the relevant kernel mailing list would be very helpful. Thanks, pants.
On 27 August 2012 09:10, pants <pants@cs.hmc.edu> wrote:
Good evening,
I just experienced a major problem with my system while listening to a music file in mpd from an xfs filesystem over a mdadm raid6. A kernel error was thrown, with the following error.log entry:
output: /var/log/error.log
Aug 26 23:34:50 localhost kernel: [283781.061258] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:50 localhost kernel: [283781.062268] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:50 localhost kernel: [283781.063273] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:50 localhost kernel: [283781.064245] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD Aug 26 23:34:51 localhost kernel: [283782.058901] timeout: still 1 active urbs.. Aug 26 23:38:48 localhost kernel: [284019.080666] INFO: task mpd:707 blocked for more than 120 seconds. Aug 26 23:38:48 localhost kernel: [284019.080696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071419] INFO: task khugepaged:32 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071589] INFO: task mpd:525 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071721] INFO: task mpd:707 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:40:48 localhost kernel: [284139.071943] INFO: task mplayer:28316 blocked for more than 120 seconds. Aug 26 23:40:48 localhost kernel: [284139.071968] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062189] INFO: task khugepaged:32 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062358] INFO: task mpd:525 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062489] INFO: task mpd:702 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062688] INFO: task mpd:703 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 26 23:42:48 localhost kernel: [284259.062829] INFO: task mpd:704 blocked for more than 120 seconds. Aug 26 23:42:48 localhost kernel: [284259.062852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Attempts to access other files on the same filesystem after the incident caused the applications used to also go into interruptible sleep (see the mplayer processes that appear later in the log). I was forced to kill and unmount what I could, then force the system down. Afterwards, I could replicate the error by attempting to read the file in question at the same point
Even if you have no solution, pointing me towards the relevant kernel mailing list would be very helpful.
Thanks,
pants.
It is difficult to say where the problem is in. I'd go for LKML mailing list [1] or for the Kernel Bugzilla [2] as stated in [3]. You may try XFS mailing list if you think it's XFS-only issue (ie. it doesn't happen with other filesystems). Lukas [1] https://lkml.org/ (the email address is linux-kernel@vger.kernel.org) [2] https://bugzilla.kernel.org/ [3] http://www.kernel.org/doc/man-pages/reporting_code_bugs.html
On Tue, Aug 28, 2012 at 11:07:10AM +0200, Lukas Jirkovsky wrote:
It is difficult to say where the problem is in. I'd go for LKML mailing list [1] or for the Kernel Bugzilla [2] as stated in [3]. You may try XFS mailing list if you think it's XFS-only issue (ie. it doesn't happen with other filesystems).
I don't know how much I can bring them; I ran xfs_repair on the filesystem in question, deleted, and replaced the problematic file, and have had no further problems with it. Thank you for your input regardless, pants.
participants (2)
-
Lukas Jirkovsky
-
pants