Hi. Can you check if this is related to the dynamic power management bug reported at https://bugzilla.redhat.com/show_bug.cgi?id=1478219 and https://bugs.freedesktop.org/show_bug.cgi?id=101976? In order to test it, you can take a look at my scripts from https://gitlab.com/ranolfi/rforcedpm. On Sat, Apr 13, 2019 at 12:11 PM David Runge <dave@sleepmap.de> wrote:
Hi!
On 2019-04-13 23:40:23 (+1000), Stephen Gregoratto via arch-general wrote:
I've been having this problem for a while (since late 4.??) and it's been driving me up the wall. Basically, opening Steam 9/10 times hard locks my PC. I can still ssh into it, but the display is frozen and it hangs on shutdown, requiring a manual reset. Here's what comes up when viewing the dmesg: We have a bug tracker. Please use it (for searches and reporting): https://bugs.archlinux.org/
[ 5191.955414] amdgpu 0000:01:00.0: GPU fault detected: 147 0x0ef1c801 for process vulkandriverque pid 11510 thread vulkandriverque pid 11510 [ 5191.955416] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0FDEFDDE [ 5191.955417] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x021C8001 [ 5191.955419] amdgpu 0000:01:00.0: VM fault (0x01, vmid 1, pasid 32776) at page 266272222, read from 'TC6' (0x54433600) (456) [ 5202.015490] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=196445, emitted seq=196447 [ 5202.015588] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vulkandriverque pid 11510 thread vulkandriverque pid 11510 [ 5202.015610] amdgpu 0000:01:00.0: GPU reset begin! [ 5209.913537] audit: type=1006 audit(1555161631.600:68): pid=11659 uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=3 res=1 [ 5212.032315] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:49:crtc-1] hw_done or flip_done timed out [ 5406.595049] INFO: task kworker/u16:3:16913 blocked for more than 120 seconds. [ 5406.595052] Not tainted 5.0.4-arch1-1-ARCH #1 [ 5406.595053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5406.595055] kworker/u16:3 D 0 16913 2 0x80000080 [ 5406.595074] Workqueue: events_unbound commit_work [drm_kms_helper] [ 5406.595075] Call Trace: [ 5406.595085] ? __schedule+0x30b/0x8b0 [ 5406.595089] schedule+0x32/0x80 [ 5406.595093] schedule_timeout+0x311/0x4a0 [ 5406.595205] ? dce110_timing_generator_get_crtc_scanoutpos+0x88/0x130 [amdgpu] [ 5406.595210] dma_fence_default_wait+0x204/0x280 [ 5406.595213] ? dma_fence_wait_timeout+0x120/0x120 [ 5406.595215] dma_fence_wait_timeout+0x105/0x120 [ 5406.595218] reservation_object_wait_timeout_rcu+0x1f2/0x370 [ 5406.595224] ? preempt_count_add+0x79/0xb0 [ 5406.595331] amdgpu_dm_do_flip+0x14a/0x4a0 [amdgpu] [ 5406.595337] ? _raw_spin_unlock_irqrestore+0x20/0x40 [ 5406.595445] ? amdgpu_dm_atomic_commit_tail+0x5f9/0xbc0 [amdgpu] [ 5406.595547] amdgpu_dm_atomic_commit_tail+0x5f9/0xbc0 [amdgpu] [ 5406.595561] commit_tail+0x3d/0x70 [drm_kms_helper] [ 5406.595566] process_one_work+0x1eb/0x410 [ 5406.595570] worker_thread+0x2d/0x3d0 [ 5406.595573] ? process_one_work+0x410/0x410 [ 5406.595576] kthread+0x112/0x130 [ 5406.595578] ? kthread_park+0x80/0x80 [ 5406.595581] ret_from_fork+0x1f/0x40 Seems like you hit a bug in the driver or firmware for your AMD GPU.
Make sure to look into microcode updates for your CPU: https://wiki.archlinux.org/index.php/Microcode
Also, look into any pitfalls regarding your GPU: https://wiki.archlinux.org/index.php/AMDGPU
You're likely better off searching for similar issues of users with your graphics card and/or reporting this to (your card's) upstream directly though.
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] driver: amdgpu v: kernel Display: server: X.org 1.20.4 driver: amdgpu tty: 228x62 This is the relevant data.
And here's all my packages: There's no reason to post them.
Best, David
P.S.: Please refrain from sending extensive output.