Linux kernel: delay before power-off
Hi all, for a long time now, I'm seeing unclean shutdowns of SATA drives on different machines running Arch. There is a nice write-up about the problem here: https://lore.kernel.org/lkml/20170410232118.GA4816@khazad-dum.debian.net/ If you are willing to google with the right keywords, you will find a lot of similar reports all over the interwebs. I can remember a configuration setting for the Linux kernel, which existed in ancient times, to delay the final step of powering off the machine for some time. Is there anything like that in the distributed kernels right now? Any knob I could turn? BR
Hello, You would need to compile Linux kernel yourself, find the opinion in the kernel config and compile Good luck, -- Polarian GPG signature: 0770E5312238C760 Website: https://polarian.dev JID/XMPP: polarian@polarian.dev
Am 16.01.23 um 17:16 schrieb Polarian:
You would need to compile Linux kernel yourself, find the opinion in the kernel config and compile
Thanks for sharing, but if I were happy to compile kernels, I could just apply the patch from that posting I linked to, couldn't I. BR
I think that may be the best option. Building the kernel manually isn't as hard as it is made out to be. With the right config compiling usually takes <5 minutes On January 16, 2023 10:40:51 AM CST, Markus Schaaf <markuschaaf@gmail.com> wrote:
Am 16.01.23 um 17:16 schrieb Polarian:
You would need to compile Linux kernel yourself, find the opinion in the kernel config and compile
Thanks for sharing, but if I were happy to compile kernels, I could just apply the patch from that posting I linked to, couldn't I.
BR
On Mon, 16 Jan 2023 at 15:45, Markus Schaaf <markuschaaf@gmail.com> wrote:
for a long time now, I'm seeing unclean shutdowns of SATA drives on different machines running Arch. There is a nice write-up about the problem here: https://lore.kernel.org/lkml/20170410232118.GA4816@khazad-dum.debian.net/ If you are willing to google with the right keywords, you will find a lot of similar reports all over the interwebs. I can remember a configuration setting for the Linux kernel, which existed in ancient times, to delay the final step of powering off the machine for some time. Is there anything like that in the distributed kernels right now? Any knob I could turn?
you can probably make a shutdown service that delays the shutdown or even invokes some commands to make sure HDDs are fine? another way is to make a custom mkinitcpio hook that adds a shutdown script (alas I can't find the proper documentation about it) -- damjan
Am 16.01.23 um 15:45 schrieb Markus Schaaf:
for a long time now, I'm seeing unclean shutdowns of SATA drives on different machines running Arch. [...] Is there anything like that in the distributed kernels right now? Any knob I could turn?
Cursory reading some kernel sources, I could not find any knob. But while looking through the code, I realized that a simple module could do the trick. So here it is: https://github.com/markuschaaf/linux-delay_power_off BR
Hi, Am So., 22. Jan. 2023 um 07:20 Uhr schrieb Markus Schaaf <markuschaaf@gmail.com>:
Cursory reading some kernel sources, I could not find any knob. But while looking through the code, I realized that a simple module could do the trick. So here it is: https://github.com/markuschaaf/linux-delay_power_off
Before attempting to install your kernel module, I'd like to verify that it is actually needed for me. To find out about this, I am attaching the current ``# smartctl -A /dev/nvm0n1`` output. There are 127 "Unsafe Shutdowns" counted. I have been working with this machine for 5-6 years now. It is very rare that I shut it down ungracefully by pressing the power button. Can you maybe detail a little the actual symptoms of these unsafe shutdowns? My box is dual-booting Arch Linux and Windows 10 Pro. To move data from one system to the other, there is an ExFAT partition on the SSD. Sometimes I observe somewhat "strange" phenomena when using this partition. I guessed that it might be shutdown-related and therefore am mounting/unmounting the volume when my user 'friedrich' logs in and off, but the "stangeness" was, if I am not mistaken, not entirely remedied by this step. I remember my symptoms rather vaguely, but it has to do with files missing on Windows, while a boot back into Linux, logging on and off, and rebooting Windows might have sorted the issue. I am sorry that I cannot give more reliable details. I never asked for help on mailing lists because my anamnesis is inprecise and cannot pinpoint the issue. But with your ideas, I'd like to give it a try, to maybe solve (and understand!) the problem. Best, Friedrich
On 1/23/23 11:40, Friedrich Romstedt wrote:
Hi, ...
I haven't been following this thread - but I'm curious. Does this very old commit (2014) not do what's needed? I just checked a couple machines smartctl and indeed I also see unsafe shutdowns being logged for nvme disks (not for other ssd or spinners). 2 seconds seems to be the default and looks like there is a parameter already to extend this. gene commit 2484f40780b97df1b5eb09e78ce4efaa78b21875 Author: Dan McLeran <daniel.mcleran@intel.com> Date: Tue Jul 1 09:33:32 2014 -0600 NVMe: Add shutdown timeout as module parameter. The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds.
I am beginning to wonder if this 'unsafe shutdown' from smart is misleading or just plain wrong. And whether an nvme firmware update might fix it. Given that the kernel is already giving it 2 seconds by default, I am quite doubtful this is something I need to act on kernel wise, tho it is possible setting the nvme model to longer could help for some models. anyway - be interested to learn more from those in the know :) gene
Hi! Am Mo., 23. Jan. 2023 um 18:08 Uhr schrieb Genes Lists <lists@sapience.com>:
I haven't been following this thread - but I'm curious. Does this very old commit (2014) not do what's needed? I just checked a couple machines smartctl and indeed I also see unsafe shutdowns being logged for nvme disks (not for other ssd or spinners).
2 seconds seems to be the default and looks like there is a parameter already to extend this.
If there is some kind of parameter already present I would prefer using this.
commit 2484f40780b97df1b5eb09e78ce4efaa78b21875 Author: Dan McLeran <daniel.mcleran@intel.com> Date: Tue Jul 1 09:33:32 2014 -0600
NVMe: Add shutdown timeout as module parameter.
The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds.
What commit do you refer here to? From googling the hash I guess it is some kind of a kernel patch, but maybe you can clarify a little? Best, Friedrich
What commit do you refer here to? From googling the hash I guess it is some kind of a kernel patch, but maybe you can clarify a little?
I just searched the kernel git source - so this git commit was accepted into kernel source in 2014. A quick look at diff it refers to a file in drivers/block/nvme-core.c which no longer exists. Looks like code was moved and plenty of changes. I find in drivers/nvme that : host/core.c:static unsigned char shutdown_timeout = 5; host/core.c:MODULE_PARM_DESC(shutdown_timeout, "timeout in seconds for controller shutdown"); which suggests the current default is 5 seconds to wait for h/w to ack - maybe some h/w needs more than 5 secs ... As I said, not an expert, but I'm definitely curious but not concerned. I have no actual evidence of cache flushes not being on disk. Not saying there are none - just I've not been aware of any. About all I know at this point :) gene
For what it's worth - on a machine with nvme I don't see shutdown_timeout in either $ ls /sys/module/nvme/parameters/ or $ modinfo nvme So that's mildly interesting :) Not that it matters here, but I build nvme into my kernels rather than as a dynamic loaded module like arch kernels do. gene
1) Further thoughts - in my case with nvme module built in to kernel, parameters can be set using kernel boot command line. 2) Also this may be useful, I found this tidbit from 2017 in case your hardware has this: ommit 07fbd32a6b215d8b2fc01ccc89622207b9b782fd Author: Martin K. Petersen <martin.petersen@oracle.com> Date: Fri Aug 25 19:14:50 2017 -0400 nvme: honor RTD3 Entry Latency for shutdowns If an NVMe controller reports RTD3 Entry Latency larger than shutdown_timeout, up to a maximum of 60 seconds, use that value to set the shutdown timer. Otherwise fall back to the module parameter which defaults to 5 seconds.
On Mon, Jan 23, 2023 at 6:20 PM Genes Lists <lists@sapience.com> wrote:
1) Further thoughts - in my case with nvme module built in to kernel, parameters can be set using kernel boot command line.
2) Also this may be useful, I found this tidbit from 2017 in case your hardware has this:
ommit 07fbd32a6b215d8b2fc01ccc89622207b9b782fd Author: Martin K. Petersen <martin.petersen@oracle.com> Date: Fri Aug 25 19:14:50 2017 -0400
nvme: honor RTD3 Entry Latency for shutdowns
If an NVMe controller reports RTD3 Entry Latency larger than shutdown_timeout, up to a maximum of 60 seconds, use that value to set the shutdown timer. Otherwise fall back to the module parameter which defaults to 5 seconds.
There is also a long standing bugzilla thread at https://bugzilla.kernel.org/show_bug.cgi?id=195039 -- mike c
On Mon, Jan 23, 2023 at 6:05 PM Genes Lists <lists@sapience.com> wrote:
For what it's worth - on a machine with nvme I don't see shutdown_timeout in either
$ ls /sys/module/nvme/parameters/ or $ modinfo nvme
So that's mildly interesting :) Not that it matters here, but I build nvme into my kernels rather than as a dynamic loaded module like arch kernels do.
Interesting on the loaded modules, since there are both nvme and nvme_core, if you do: $ modinfo nvme | grep timeout $ Nothing - but if you do: $ modinfo nvme_core | grep timeout parm: admin_timeout:timeout in seconds for admin commands (uint) parm: io_timeout:timeout in seconds for I/O (uint) parm: shutdown_timeout:timeout in seconds for controller shutdown (byte) parm: apst_primary_timeout_ms:primary APST timeout in ms (ulong) parm: apst_secondary_timeout_ms:secondary APST timeout in ms (ulong) it does have the shutdown timeout parameter. So presumably nvme_core.shutdown_timeout is the right way to set a value, either as command line option (needed when nvme is builtin) or using a file as /etc/modprobe.d/xxx if dynamic loaded. mike -- mike c
On 1/23/23 12:24, Friedrich Romstedt wrote:
Hi! .. If there is some kind of parameter already present I would prefer using this.
You could try adding nvme.shutdown_timeout=10 to kernel boot line gene
Am 23.01.23 um 18:08 schrieb Genes Lists:
I haven't been following this thread - but I'm curious. Does this very old commit (2014) not do what's needed?
commit 2484f40780b97df1b5eb09e78ce4efaa78b21875 Author: Dan McLeran <daniel.mcleran@intel.com> Date: Tue Jul 1 09:33:32 2014 -0600
NVMe: Add shutdown timeout as module parameter.
This is the timeout between the stop message and it's acknowledgement. Does not help if the firmware acknowledges without being really ready for power-off, which seems to apply to most (if not all) SATA disks, at least. Someone somewhere mentioned, that this behavior is according to the specification. I am seeing respective SMART counters increasing with many manufacturers' (Samsung, WD, Crucial, Intel) standard consumer drives and top-notch enterprise drives. BR
Am 23.01.23 um 17:40 schrieb Friedrich Romstedt:
Can you maybe detail a little the actual symptoms of these unsafe shutdowns?
The post I linked to explains the problem rather well. Devices are sent a stop/shutdown message shortly before the system is losing power. The devices acknowledge these messages and the system is halted, rebooted or powered off. What I can see, despite everything working as written, is that filesystems that care to report it (f2fs) sometimes are left in a state like after a sudden power loss, while the system reported a proper shutdown. Other filesystems are less noisy about this, simply replaying their (imcomplete) journal, and rarely losing some files. I have seen the latter after a heavy `pacman -Syu` followed by an immediate shutdown. One might wonder why this happens, after a disk acknowledged the shutdown message. I can only speculate. Of course the message/acknowledgement cycle comes with a timeout. Maybe it is sometimes too short for disks with large RAM-caches or complicated data management schemes. This is only a problem when write caches are filled before power-off, so developers may opt to send that acknowledgement in time and hope for the best, instead of triggering error messages that may make the drive look bad. As this is a race condition depending on a lot of random timings and cheap SSDs have no RAM nowadays, this all is probably not a real problem to most users. BR
On 1/23/23 13:35, Markus Schaaf wrote:
Very interesting thanks for clarifying Markus. So the hardware just lies - perhaps they should just add a capacitor on board with enough juice to finish the cache flush or something. Have you observed any ill effects? gene
Am 23.01.23 um 19:47 schrieb Genes Lists:
So the hardware just lies -
What a surprise. Disk firmware has been lying for ages. Acknowledging write barriers or cache flushes early, to look good in benchmarks, for instance.
perhaps they should just add a capacitor on board with enough juice to finish the cache flush or something.
This has been done already. Look for disks with power-loss protection. But these emergency flushes seem to come with downsides too, because the firmware cares to count them and report the counter as SMART attribute of type "Old_age".
Have you observed any ill effects?
Of what? Waiting two more seconds for power-off? No. I am running some large arrays of rotating disks. Power-loss head retract is certainly a factor for wear and totally avoidable. And about SSDs? I do not know. Some people reported SSDs had been bricked by sudden power-loss. I believe this happend to me some years ago, but you never know why some hardware starts acting up. Why take the risk? I could not care less about a machine needing some seconds more to power off or reboot, compared to the headache of changing drives and restoring from backup. Which is never current, BTW. :-) BR
Op ma 23 jan. 2023 20:32 schreef Genes Lists <lists@sapience.com>:
On 1/23/23 14:22, Markus Schaaf wrote:
Have you observed any ill effects?
Of what? Waiting two more seconds for power-off? No.
No the opposite - ill effects of not waiting.
While not directly related; i think i've seen the same on Windows PC's, actually. With ghost (and related tools) complaining of unclean filesystems after proper shutdowns. I blamed it on hibernation/fast startup, but the this scenario seems very believable also. Mvg, Guus
Markus Schaaf <markuschaaf@gmail.com> wrote:
Am 23.01.23 um 17:40 schrieb Friedrich Romstedt:
Can you maybe detail a little the actual symptoms of these unsafe shutdowns?
The post I linked to explains the problem rather well. Devices are sent a stop/shutdown message shortly before the system is losing power. The devices acknowledge these messages
Can it be the acknowledgement means `I got your message', but not `I read it, and am ready for a shutdown'? Just a guess. I haven't looked at it in depth. -- u34
and the system is halted, rebooted or powered off. What I can see, despite everything working as written, is that filesystems that care to report it (f2fs) sometimes are left in a state like after a sudden power loss, while the system reported a proper shutdown. Other filesystems are less noisy about this, simply replaying their (imcomplete) journal, and rarely losing some files. I have seen the latter after a heavy `pacman -Syu` followed by an immediate shutdown.
One might wonder why this happens, after a disk acknowledged the shutdown message. I can only speculate. Of course the message/acknowledgement cycle comes with a timeout. Maybe it is sometimes too short for disks with large RAM-caches or complicated data management schemes. This is only a problem when write caches are filled before power-off, so developers may opt to send that acknowledgement in time and hope for the best, instead of triggering error messages that may make the drive look bad.
As this is a race condition depending on a lot of random timings and cheap SSDs have no RAM nowadays, this all is probably not a real problem to most users.
BR
participants (9)
-
Damjan Georgievski
-
Friedrich Romstedt
-
Genes Lists
-
Guus Snijders
-
Isaac R.
-
Markus Schaaf
-
Mike Cloaked
-
Polarian
-
u34@net9.ga