Re: [arch-projects] [mkinitcpio] [PATCH 2/2] resume: generate a udev rule for specifying device

26 Nov 2013

      On Tue, Nov 26, 2013 at 10:44:48AM +0100, Thomas Bächler wrote:
...
Am 24.11.2013 03:01, schrieb Dave Reisner:
...
This solves the problem of the major:minor of devices changing between
hibernation. It also makes initialization lazy, as we no longer need to
explicitly wait on the device to show up (this will be taken care of by
udevadm settle).
Note that this patch drops support for tux-on-ice. As upstream
developerment of TOI seems stalled and forever relegated to being
maintained out of tree, I'm okay with this.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
---
So, I tested this as far as ensuring that the udev rule is generated correctly
and triggers, writing the correct values into /sys/power/resume, but I can't
actually hibernate/resume my desktop to test this (mobo firmware bug).
Your firmware should not be involved at all in hibernation. As far as
your firmware is concerned, you shut down and reboot.
Whatever it is, the shutdown never completes, and the reboot isn't a
resume.
...
...
If anyone who uses this hook wants to test this, please do and report back.
I feel this approach is extremely dangerous due to the fragile state of
file systems during hibernation.
Note that the file system must not change at all while the system is
hibernated. The kernel's internal state is saved in the hibernation
image - if the file system changed while the system was hibernated, the
on-disk and in-memory state will be inconsistent, almost guaranteeing
corruption (I have experienced this first-hand).
Now, your new approach does not preserve the ordering: Trying to resume
from a hibernation image MUST happen BEFORE any of the hibernated
system's mounted file systems are touched.
Now imagine the following situation: You have two hard drives, one
holding your root file system, the other holding your swap (and thus
hibernation image). Now imagine the following order of events:
* Linux loads, starts /init
* Udev is started
* Hard drive A is detected.
* fsck is started, repairs the "dirty" root file system (and changes
on-disk structures, clears the journal, ...)
Who/what triggered fsck here? Why is fsck being run before the udev
event queue is flushed?
...
* Hard drive B is detected -> udev start the resume procedure.
...
...
Now, you have a casual file system corruption. This should easily be
reproducible by putting your swap on a hard drive and the root file
system on SSD. To reproduce this more reliably, but the swap on USB instead.
This patch gets as many -1's from me as I can find. Hibernation is
dangerous for your data as it is, this patch plays russian data roulette.
I can understand the USB case and how it might be bad news if you
plugged in a matching resume device at some arbitrary point during
normal operation, but I don't really think your earlier posted order of
events paints a realistic picture of what happens in early userspace.

I'll ask Harald about this -- Dracut uses a very similar mechanism (but
a bit more complex/convoluted).