[arch-general] Steam hard-locks my PC (amdgpu fault)

Marc Ranolfi marc.2377 at gmail.com
Mon May 13 01:07:32 UTC 2019


Hi. Can you check if this is related to the dynamic power management bug
reported at https://bugzilla.redhat.com/show_bug.cgi?id=1478219 and
https://bugs.freedesktop.org/show_bug.cgi?id=101976? In order to test it,
you can take a look at my scripts from https://gitlab.com/ranolfi/rforcedpm.

On Sat, Apr 13, 2019 at 12:11 PM David Runge <dave at sleepmap.de> wrote:

> Hi!
>
> On 2019-04-13 23:40:23 (+1000), Stephen Gregoratto via arch-general wrote:
> > I've been having this problem for a while (since late 4.??) and it's
> > been driving me up the wall. Basically, opening Steam 9/10 times hard
> > locks my PC. I can still ssh into it, but the display is frozen and it
> > hangs on shutdown, requiring a manual reset. Here's what comes up when
> > viewing the dmesg:
> We have a bug tracker. Please use it (for searches and reporting):
> https://bugs.archlinux.org/
>
> > [ 5191.955414] amdgpu 0000:01:00.0: GPU fault detected: 147 0x0ef1c801
> for process vulkandriverque pid 11510 thread vulkandriverque pid 11510
> > [ 5191.955416] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR
>  0x0FDEFDDE
> > [ 5191.955417] amdgpu 0000:01:00.0:
>  VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x021C8001
> > [ 5191.955419] amdgpu 0000:01:00.0: VM fault (0x01, vmid 1, pasid 32776)
> at page 266272222, read from 'TC6' (0x54433600) (456)
> > [ 5202.015490] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, signaled seq=196445, emitted seq=196447
> > [ 5202.015588] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> information: process vulkandriverque pid 11510 thread vulkandriverque pid
> 11510
> > [ 5202.015610] amdgpu 0000:01:00.0: GPU reset begin!
> > [ 5209.913537] audit: type=1006 audit(1555161631.600:68): pid=11659
> uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=3
> res=1
> > [ 5212.032315] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
> [CRTC:49:crtc-1] hw_done or flip_done timed out
> > [ 5406.595049] INFO: task kworker/u16:3:16913 blocked for more than 120
> seconds.
> > [ 5406.595052]       Not tainted 5.0.4-arch1-1-ARCH #1
> > [ 5406.595053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> > [ 5406.595055] kworker/u16:3   D    0 16913      2 0x80000080
> > [ 5406.595074] Workqueue: events_unbound commit_work [drm_kms_helper]
> > [ 5406.595075] Call Trace:
> > [ 5406.595085]  ? __schedule+0x30b/0x8b0
> > [ 5406.595089]  schedule+0x32/0x80
> > [ 5406.595093]  schedule_timeout+0x311/0x4a0
> > [ 5406.595205]  ? dce110_timing_generator_get_crtc_scanoutpos+0x88/0x130
> [amdgpu]
> > [ 5406.595210]  dma_fence_default_wait+0x204/0x280
> > [ 5406.595213]  ? dma_fence_wait_timeout+0x120/0x120
> > [ 5406.595215]  dma_fence_wait_timeout+0x105/0x120
> > [ 5406.595218]  reservation_object_wait_timeout_rcu+0x1f2/0x370
> > [ 5406.595224]  ? preempt_count_add+0x79/0xb0
> > [ 5406.595331]  amdgpu_dm_do_flip+0x14a/0x4a0 [amdgpu]
> > [ 5406.595337]  ? _raw_spin_unlock_irqrestore+0x20/0x40
> > [ 5406.595445]  ? amdgpu_dm_atomic_commit_tail+0x5f9/0xbc0 [amdgpu]
> > [ 5406.595547]  amdgpu_dm_atomic_commit_tail+0x5f9/0xbc0 [amdgpu]
> > [ 5406.595561]  commit_tail+0x3d/0x70 [drm_kms_helper]
> > [ 5406.595566]  process_one_work+0x1eb/0x410
> > [ 5406.595570]  worker_thread+0x2d/0x3d0
> > [ 5406.595573]  ? process_one_work+0x410/0x410
> > [ 5406.595576]  kthread+0x112/0x130
> > [ 5406.595578]  ? kthread_park+0x80/0x80
> > [ 5406.595581]  ret_from_fork+0x1f/0x40
> Seems like you hit a bug in the driver or firmware for your AMD GPU.
>
> Make sure to look into microcode updates for your CPU:
> https://wiki.archlinux.org/index.php/Microcode
>
> Also, look into any pitfalls regarding your GPU:
> https://wiki.archlinux.org/index.php/AMDGPU
>
> You're likely better off searching for similar issues of users with your
> graphics card and/or reporting this to (your card's) upstream directly
> though.
>
> > Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Fiji [Radeon R9
> FURY / NANO Series] driver: amdgpu v: kernel
> >            Display: server: X.org 1.20.4 driver: amdgpu tty: 228x62
> This is the relevant data.
>
> > And here's all my packages:
> There's no reason to post them.
>
> Best,
> David
>
> P.S.: Please refrain from sending extensive output.
>
> --
> https://sleepmap.de
>


More information about the arch-general mailing list