On Tue, Nov 26, 2013 at 04:35:13PM +0100, Thomas Bächler wrote:
Am 26.11.2013 16:22, schrieb Dave Reisner:
* Linux loads, starts /init * Udev is started * Hard drive A is detected. * fsck is started, repairs the "dirty" root file system (and changes on-disk structures, clears the journal, ...)
Who/what triggered fsck here? Why is fsck being run before the udev event queue is flushed?
* Hard drive B is detected -> udev start the resume procedure.
...
So, you're still using udevadm settle, which mitigates the situation most of the time - but it is no guarantee. Just because the udev queue is empty does not mean that all hardware is enumerated (look into people's inboxes for mails from Kay Sievers or Greg K-H for the full speech).
I'm the last person this needs to be explained to ;)
Your patch does not keep udev from initiating the resume even when mkinitcpio started fsck'ing/mounting already. This is incorrect and dangerous behaviour.
but I don't really think your earlier posted order of events paints a realistic picture of what happens in early userspace.
Orly? Sure, only the USB examples will be easily reproducible due to the udevadm settle. But release these changes and wait for a day until some guy turns up and says he has a resume image on USB and now his data is corrupt.
In other words, never underestimate the stupidity of users.
I'll ask Harald about this -- Dracut uses a very similar mechanism (but a bit more complex/convoluted).
Nobody said this was easy to solve.
Consider the patch shelved for now. It's not something I care about -- just thought I'd take a stab at fixing it for others.