kernel 6.1 : suspend/resume problem - s2idle interference
Hi, I am requesting help for a suspend/resume problem on a laptop since kernel 6.1. The laptop randomly goes in s2idle mode instead of deep. Resuming from s2idle is the problem, it would most often result in a black screen, magic keys don't work and there's no choice than a hard reboot. The cycles below are grepped from everything.log (using syslog-ng). Jan 2 09:04:02 asus0 kernel: [ 272.442844] PM: suspend entry (s2idle) Jan 2 09:07:24 asus0 kernel: [ 81.496530] PM: suspend entry (s2idle) Jan 2 09:07:36 asus0 kernel: [ 94.219950] PM: suspend exit Jan 2 09:13:43 asus0 kernel: [ 74.020096] PM: suspend entry (deep) Jan 2 09:13:53 asus0 kernel: [ 78.968818] PM: suspend exit Jan 2 09:27:03 asus0 kernel: [ 831.165661] PM: suspend entry (deep) Jan 2 09:27:12 asus0 kernel: [ 835.967426] PM: suspend exit Jan 2 09:29:54 asus0 kernel: [ 103.750713] PM: suspend entry (deep) Jan 2 09:30:12 asus0 kernel: [ 122.122386] PM: suspend exit Jan 2 09:30:12 asus0 kernel: [ 122.122473] PM: suspend entry (s2idle) Jan 2 09:35:27 asus0 kernel: [ 66.427487] PM: suspend entry (deep) Jan 2 09:35:35 asus0 kernel: [ 71.262239] PM: suspend exit Jan 2 09:49:10 asus0 kernel: [ 101.216846] PM: suspend entry (deep) Jan 2 09:49:32 asus0 kernel: [ 122.833061] PM: suspend exit Jan 2 09:49:32 asus0 kernel: [ 122.833242] PM: suspend entry (s2idle) Jan 2 09:52:30 asus0 kernel: [ 96.227978] PM: suspend entry (deep) Jan 2 09:54:33 asus0 kernel: [ 101.035466] PM: suspend exit Jan 2 10:00:29 asus0 kernel: [ 456.496279] PM: suspend entry (deep) Jan 2 10:00:50 asus0 kernel: [ 476.827893] PM: suspend exit Jan 2 10:00:50 asus0 kernel: [ 476.828015] PM: suspend entry (s2idle) Jan 2 10:06:29 asus0 kernel: [ 149.561486] PM: suspend entry (deep) Jan 2 11:46:21 asus0 kernel: [ 155.932575] PM: suspend exit Using 'mem_sleep_default=deep' in '/etc/default/grub' (and grub-mkconfig afterwards) does not alter this behaviour. After removing the above parameter, i.e., the default, /sys/power/mem_sleep contains 's2idle [deep]'. I grant that 'deep' is the default suspend mode, though it does randomly go in 's2idle' on its own. /sys/power/state contains 'freeze mem disk'. After reverting to kernel 6.0.12-arch1, suspend/resume works normally, as it always did. I'm out of ideas for a fix, and would appreciate any clues. Some hardware specs : AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4) Thank you.
I'd begin by checking if your BIOS/UEFI is up to date. Am 02.01.23 um 12:16 schrieb SET:
Hello, Just as a disclaimer, it can be very difficult to update your BIOS/UEFI, as the majority of the proprietary tools required only run on windows, I am not sure if they can be emulated through wine, but it can be very difficult to update without explicit linux support. Normally just having a windows disk somewhere and switch to windows to install it could work. If you use lenovo they do everything through their proprietary windows application (lenovo vantage), which makes it even harder to update. Thankfully SOME Lenovo laptops can be updated natively on Linux. As for the suspend/resume issue, I assume this is an issue with the linux kernel build, grub is just a bootloader so I doubt configuring that would fix the problem. I would check the kernel configuration, the kernel controls how it is suspended and resumed and the modules must be compiled to function properly! Hope this helps, Polarian
On Mon, 2023-01-02 at 13:05 +0000, Polarian wrote:
it can be very difficult to update your BIOS/UEFI
Hi, nowadays it's usually very easy to do by either saving a downloaded file to an fat partition on the internal drive or a fat formatted USB stick or drive. After that it's possible to update the BIOS by an BIOS option loading this file. However, I'm allergic to the inflationary trend of recommending BIOS updates when something goes wrong without even knowing if it's BIOS related. I strongly recommend to read the changelog of available BIOS updates, to see if something related to the issue is fixed, if not consider to stay away from an update. Apart from my recommendation vendors usually recommend to update the BIOS only if something is fishy, because they claim that a BIOS update is risky. From the OP's description I've got the impression that it could be a kernel regression. Muscle tension is also said to be a cause of tinnitus, but initially it makes more sense to see an ENT doctor than an orthopaedist. IOW If the problem occurs after a kernel update but goes away after a kernel downgrade, it's not that unlikely that the kernel is to blame, so a BIOS update to fix the problem doesn't sound particularly promising to me, but a BIOS update could cause additional problems. Regards, Ralf
Hi, On Mon, 02 Jan 2023 15:12:32 +0100, Ralf Mardorf wrote:
It's even easier (on supported platforms) via fwupd and LVFS: https://wiki.archlinux.org/title/Fwupd -- Merlin Büge
Yes, I am aware of the existance of LVFS and fwupd, but the issue is the number of devices supported by them still is heavily outweighed by the number of devices not supported by them. Thanks, Polarian
Hello, Bios updates can be risky, most of the time I do not want to risk it. Recently my family tried to update a Lenovo Laptop through vintage, which is meant to be a seemless BIOS updates for the end consumer without the hassle, well vantage bricked the device, turns out it failed to install the BIOS. When sent over to Lenovo they of course did not take responsibility for it, and they wanted £400 for a replacement motherboard, even though they were the ones who bricked the damn laptop. It is now on my todo list, I need to flash the ROM on the motherboard, but whether I can actually do that is hit or miss, because of anti-right to repair features, and lenovo don't give you the tools you need, such as pcb diagrams. TL;DR, if you do not need to update your BIOS, and there is currently no security issues, DO NOT UPDATE IT, each time you update you risk bricking your motherboard, and these proprietary developers couldn't care less because it actually earns them money if their tools fail because then you pay to have it repaired. Some motherboards do not have built in flashing tools unfortunately, you would think that in the modern age that every single motherboard will have a flash utility, but some still depend on proprietary chinese flashing utilities, which first of are written in chinese so how you meant to know what you are clicking? and secondly, only run on windows. This is why when PC builders recommend to cheap out on the motherboard as it doesn't effect performance as much as cheaping out on graphics card or cpu, are full of sh*t because the motherboard is the most important part, without a reliable motherboard good luck getting anything to work! Always do your research into whether your motherboard has built-in flashing utilities, and whether they support linux flashing tools etc, saves you headache! Thanks, Polarian
Le lundi 2 janvier 2023 13:35:12 CET Uwe Sauter a écrit :
I'd begin by checking if your BIOS/UEFI is up to date.
Polarian :
it can be very difficult to update your BIOS/UEFI
I updated the BIOS/UEFI to latest (dating to 25-Feb-2020) using the integrated Asus EZ Flash Utility (the process went on seamlessly). Unfortunately, nothing changed after reinstalling linux 6.1 (arch package, no custom kernel). s2idle gets triggered with a black screen on resume, and unresponsive keyboard. Anyway, I can keep the 6.0.12 kernel forever, the hardware won't change, just somehow annoying. Thanks for your replies.
Hello, Can you confirm which linux kernel package from the arch repository you currently have installed? Thanks, Polarian
Le lundi 2 janvier 2023 14:48:52 CET Polarian a écrit :
Can you confirm which linux kernel package from the arch repository you currently have installed?
The misbehaviour started with linux-6.1.1.arch1-1. I am currently using 6.0.12-arch1-1 to avoid any resume issue, but will switch to linux-6.1.1.arch1-1, or the latest repository package, to try any suggestion.
Hi,
take a look here: https://gitlab.freedesktop.org/drm/amd/-/issues/ If the issue does not exist, create one. Best Regards Bjoern
Hello, This could be due to a new bug introduced into the kernel, in which case you should file a bug on the kernel. The other issue it could be, is that the package maintainer which built the latest version of the kernel has changed the configuration for the kernel thus not compiling in the features you need for the suspending to work. Jan Alexander Steffens was the last packager for the kernel, if any of the Linux package maintainers are subscribed to this mailing list, your support would be useful to figure out if it is a kernel configuration issue, or an issue with the kernel codebase. Looking at the commit (on github) which updated the kernel: https://github.com/archlinux/svntogit-packages/commit/fdde8b0bbded69507f5e76... The only thing which was changed is the package version, thus the configuration was not changed. Going back other commit: https://github.com/archlinux/svntogit-packages/commit/94647cd1eefbdb81665c9b... The config was changed, thus it might be worth you looking through the diff on the configuration and see if any of the features you wanted have been disabled, because you stated that 6.0.12-arch1-1 is working for you, thus if it is a configuration issue, the configuration changes above may show the solution to your problem. If it is a configuration issue, one of the package maintainers will need to be contacted to request a change to the configuration file so that the mainline kernel package supports your hardware (the mainline linux package is designed to be monolithic so that it supports as much hardware as possible). If they do not want to solve this, you will need to clone, configure and compile the linux kernel from source with the features you need enabled, you can then use: make install where it will move it to the boot directory, then you need to edit your grub configuration or efi entries to boot to the new kernel, be aware this means for now on you will need to manually update every kernel update yourself. The second solution, if the first one is not the issue, is to report this to the kernel devs as they might have changed some of the logic within this aspect of the kernel, and thus it has been broken for your hardware. Most likely I would say the first solution is the correct solution for your problem, it is the easier one :) You may want to try linux-lts, so that you still get security patches but it should support your hardware fully. These issues you experience are the downsides of using rolling release kernel, you will run into issues like this, the lts version is designed to be reliable, so if you do not want the hassle at the expense of the latest kernel features and optimisations, switch over to lts, it still gets all the security patches just not the unstable features they are adding. Hope this helps, Polarian
Hello, I do recommend either emailing one of the packagers directly for the arch package, or HOPEFULLY they see this thread and will respond. Another solution is to file a bug on arch linux bug tracker against this package so that the packagers know what issue you have and hopefully will be able to resolve them for you. Good luck, Polarian
Hi,
How should they solve it? They don't have a crystal ball which tells them which was the bad commit. Regards Bjoern
On Mon, 2023-01-02 at 14:35 +0000, Polarian wrote:
switch over to lts, it still gets all the security patches just not the unstable features they are adding
[off-topic] Hi, just a hint for those who need or want to stay with a kernel over a very long time. Arch Linux provides the latest longterm kernel. Those who really want a permanent longterm kernel it's necessary to build the packages. IOW "LTS" is a relative term, since https://www.kernel.org/ supports longterm: 5.15.86 longterm: 5.10.161 longterm: 5.4.228 longterm: 4.19.269 longterm: 4.14.302 longterm: 4.9.336 and "SLTS (Super Long Term Support)" is provided by the Civil Infrastructure Platform https://wiki.linuxfoundation.org/civilinfrastructureplatform/start . Supported are EOL SLTS v5.10 2031-01 SLTS v5.10-rt 2031-01 SLTS v4.19 2029-01 SLTS v4.19-rt 2029-01 SLTS v4.4 2027-01 SLTS v4.4-rt 2027-01 As long as possible I will stay with my current hardware and build 4.19-rt kernels. I always keep 4 versions and replace the oldest by an available update. To do that they get an individual affix, such as "pussytoes". I misuse pkgrel and pkgdesc to provide information about the config, e.g. "0.300" is for CONFIG_HZ=300, since CONFIG_HZ might matter and change for my usage. $ uname -r 4.19.269-rt119-0.300-pussytoes Regards, Ralf
On 02.01.23 14:35, Polarian wrote:
This was probably the result of something like `make oldconfig` while going from 6.0 to 6.1, so just seeing "config has changed" does not imply it was a deliberate change of any particular setting by the maintainer. To me this whole issue sounds more like a kernel regression and not something directly caused by our Arch Linux kernel package, as we're not really doing any downstream changes in our vanilla/default "linux" package. (E.g. in contrast to linux-hardened or linux-zen which have patch sets applied and deliberately modified build configs.) As already mentioned a couple times in this thread, the best way forward would probably be: - try building the current kernel git HEAD (i.e. aur/linux-git) and check if your issue is still present there - check for recently opened kernel bugs similar to your issue, if you find one that seems to closely match: comment with your findings and add yourself to the CC list to get emails on subsequent comments/changes - if there isn't a bug report matching your issue yet and your issue is still present in git HEAD, go ahead an submit a new kernel bug following their bug submission guidelines with as many details as possible - if you are able and have the time: bisect the kernel commits to possibly identify the "bad" commit that introduced the regression in 6.1 and comment that on the corresponding kernel bug ticket - downgrade to linux-lts for the time being as a workound to still get kernel updates while the issue is hopefully being addressed and fixed upstream and ideally backported (where needed) Cheers -- Thore "foxxx0" Bödecker GPG ID: 0xEB763B4E9DB887A6 GPG FP: 051E AD6A 6155 389D 69DA 02E5 EB76 3B4E 9DB8 87A6
I’m running into the same problem on ASUS ROG Zephyrus G14, which has a AMD 4800HS CPU and is running the latest UEFI firmware.
On Mon, 2 Jan 2023 at 11:16, SET <set@nmset.info> wrote:
Have you tried configuring the kernel to use deep sleep instead? https://wiki.archlinux.org/title/Dell_Inspiron_15_(7590)#S3_Suspend Paul
Le lundi 2 janvier 2023 14:41:17 CET Paul Dann a écrit :
Have you tried configuring the kernel to use deep sleep instead?
Yes, I have already tried 'mem_sleep_default=deep' in /etc/default/grub, followed by grub-mkconfig. It did not help at all.
Not sure if it's related, but I also ran into suspend issues recently, namely that my desktop machine wakes up directly after reaching suspend.target, turning off displays for a short amount of time (like two seconds) and then turning them on again. I tried network WoL settings and BIOS settings. Downgrading kernel to 6.0.12 or linux-lts, and even linux-git didn't work out either. I need to admit though, that I don't use suspend very frequently, so that slipped for a while I guess. What solved it for me was adding specific udev rules, e.g. to /etc/udev/rules.d/10-wakeup.rules. ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x1022", ATTR{subsystem_device}=="0x1484", ATTR{subsystem_vendor}=="0x1022", ATTR{power/wakeup}="disabled" In my case, two PCI devices caused it. I hunted it down by changing all devices to disabled in /proc/acpi/wakeup and enabling them one by one again. Still not sure if that should happen at all, if it's wrong behavior of hardware or of the kernel. Cheers On 02.01.23 15:08, SET wrote:
On Wed, Jan 4, 2023 at 9:02 PM Varakh <varakh@varakh.de> wrote:
I have seen suggestions elsewhere that adding the kernel parameter "amd_iommu=off" to the boot line brings back the ability to suspend in recent kernels. I have not tried it but might be worth testing to see if this helps for the latest kernel? Also this may be relevant: https://www.phoronix.com/news/AMD-s2idle-Check-FW -- mike c
Le mercredi 4 janvier 2023 22:56:00 CET Mike Cloaked a écrit :
amd_iommu=off
This kernel parameter does not change anything with linux-6.1.2. s2idle is preferred with a black screen on resume. I'm using linux-lts now, deep suspend is the default and works as usual. Regards.
Le lundi 2 janvier 2023 12:16:30 CET SET a écrit :
A significant update. After considering these in /var/log/error.log : Jan 1 10:00:04 asus0 kernel: [ 2095.395568] xhci_hcd 0000:03:00.4: PCI post- resume error -19! Jan 1 10:00:04 asus0 kernel: [ 2095.395570] xhci_hcd 0000:03:00.4: HC died; cleaning up Jan 1 10:00:04 asus0 kernel: [ 2095.395601] xhci_hcd 0000:03:00.4: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -19 Jan 1 10:00:04 asus0 kernel: [ 2095.395618] xhci_hcd 0000:03:00.4: PM: failed to resume async: error -19 Jan 1 10:00:12 asus0 kernel: [ 2107.896558] xhci_hcd 0000:03:00.4: xHCI host controller not responding, assume dead Jan 1 10:00:12 asus0 kernel: [ 2107.896572] xhci_hcd 0000:03:00.4: HC died; cleaning up a web search led me to unload xhci_* on suspend, and load them again on resume. I went through a bash script in /usr/lib/systemd/system-sleep/. At least 25 suspend/resume cycles in a row have succeeded. I tested both s2idle and deep by tweaking SuspendState={freeze,mem} in sleep.conf. Using linux-6.1.3, that misbehaves similarly to 6.1.2 and 6.1.1. The drawback is that all USB connections get lost. Also, opening the lid does not trigger anything when using s2idle, while closing the lid does trigger suspend in both modes. I'll do with these. I'll know in the coming days if delayed cycles are well managed. Regards.
participants (11)
-
Bjoern Franke
-
Merlin Büge
-
Mike Cloaked
-
Mike Yuan
-
Paul Dann
-
Polarian
-
Ralf Mardorf
-
SET
-
Thore Bödecker
-
Uwe Sauter
-
Varakh