[arch-general] khugepaged hangs and filesystem unresponsive
Lukas Jirkovsky
l.jirkovsky at gmail.com
Tue Aug 28 05:07:10 EDT 2012
On 27 August 2012 09:10, pants <pants at cs.hmc.edu> wrote:
> Good evening,
>
> I just experienced a major problem with my system while listening to a
> music file in mpd from an xfs filesystem over a mdadm raid6. A kernel
> error was thrown, with the following error.log entry:
>
> output: /var/log/error.log
>> Aug 26 23:34:50 localhost kernel: [283781.061258] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:50 localhost kernel: [283781.062268] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:50 localhost kernel: [283781.063273] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:50 localhost kernel: [283781.064245] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> Aug 26 23:34:51 localhost kernel: [283782.058901] timeout: still 1 active urbs..
>> Aug 26 23:38:48 localhost kernel: [284019.080666] INFO: task mpd:707 blocked for more than 120 seconds.
>> Aug 26 23:38:48 localhost kernel: [284019.080696] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071419] INFO: task khugepaged:32 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071589] INFO: task mpd:525 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071721] INFO: task mpd:707 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:40:48 localhost kernel: [284139.071943] INFO: task mplayer:28316 blocked for more than 120 seconds.
>> Aug 26 23:40:48 localhost kernel: [284139.071968] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062189] INFO: task khugepaged:32 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062358] INFO: task mpd:525 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062382] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062489] INFO: task mpd:702 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062688] INFO: task mpd:703 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 26 23:42:48 localhost kernel: [284259.062829] INFO: task mpd:704 blocked for more than 120 seconds.
>> Aug 26 23:42:48 localhost kernel: [284259.062852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> Attempts to access other files on the same filesystem after the incident
> caused the applications used to also go into interruptible sleep (see
> the mplayer processes that appear later in the log). I was forced to
> kill and unmount what I could, then force the system down. Afterwards,
> I could replicate the error by attempting to read the file in question
> at the same point
>
> Even if you have no solution, pointing me towards the relevant kernel
> mailing list would be very helpful.
>
> Thanks,
>
> pants.
It is difficult to say where the problem is in. I'd go for LKML
mailing list [1] or for the Kernel Bugzilla [2] as stated in [3]. You
may try XFS mailing list if you think it's XFS-only issue (ie. it
doesn't happen with other filesystems).
Lukas
[1] https://lkml.org/ (the email address is linux-kernel at vger.kernel.org)
[2] https://bugzilla.kernel.org/
[3] http://www.kernel.org/doc/man-pages/reporting_code_bugs.html
More information about the arch-general
mailing list