[arch-projects] [mkinitcpio] [PATCH 1/2] init_functions: escape slashes in tag values
This allows booting from devices which have labels like LABEL=/. Signed-off-by: Dave Reisner <dreisner@archlinux.org> --- init_functions | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/init_functions b/init_functions index 474d6fa..11ce3ca 100644 --- a/init_functions +++ b/init_functions @@ -252,6 +252,16 @@ fsck_root() { fi } +# TODO: this really needs to follow the logic of systemd's encode_devnode_name +# function more closely. +tag_to_udev_path() { + awk -v "tag=$1" -v "value=$2" ' + BEGIN { + gsub(/\//, "\\x2f", value) + printf "/dev/disk/by-%s/%s\n", tolower(tag), value + }' +} + resolve_device() { local major minor dev tag device=$1 @@ -264,8 +274,7 @@ resolve_device() { UUID=*|LABEL=*|PARTUUID=*|PARTLABEL=*) dev=$(blkid -lt "$device" -o device) if [ -z "$dev" -a "$udevd_running" -eq 1 ]; then - tag=$(awk -v t="${device%%=*}" 'BEGIN { print tolower(t) }') - dev=/dev/disk/by-$tag/${device#*=} + dev=$(tag_to_udev_path "${device%%=*}" "${device#*=}") fi esac -- 1.8.4.2
This solves the problem of the major:minor of devices changing between hibernation. It also makes initialization lazy, as we no longer need to explicitly wait on the device to show up (this will be taken care of by udevadm settle). Note that this patch drops support for tux-on-ice. As upstream developerment of TOI seems stalled and forever relegated to being maintained out of tree, I'm okay with this. Signed-off-by: Dave Reisner <dreisner@archlinux.org> --- So, I tested this as far as ensuring that the udev rule is generated correctly and triggers, writing the correct values into /sys/power/resume, but I can't actually hibernate/resume my desktop to test this (mobo firmware bug). If anyone who uses this hook wants to test this, please do and report back. hooks/resume | 71 +++++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 46 insertions(+), 25 deletions(-) diff --git a/hooks/resume b/hooks/resume index 04ca1aa..ce02dfa 100644 --- a/hooks/resume +++ b/hooks/resume @@ -1,41 +1,62 @@ #!/usr/bin/ash -run_hook () { - local resumedev +generate_resume_udev_rule() { + local resume=$1 case $resume in '') - err "resume: no device specified for hibernation" - return 1; + # no resume parameter specified + return 0; ;; - swap:*|file:*) - # tux-on-ice syntax: swap:/dev/sda2 or file:/dev/sda2:0xdeadbeef - if [ -d /sys/power/tuxonice ]; then - echo "$resume" >/sys/power/tuxonice/resume - echo >/sys/power/tuxonice/do_resume - return 0 - else - err "resume: tux-on-ice syntax detected, but no support found" - return 1 - fi + PARTUUID=*) + attrmatch="ENV{ID_PART_ENTRY_NAME}==\"${resume#*=}\"" + ;; + PARTLABEL=*) + attrmatch="ENV{ID_PART_ENTRY_UUID}==\"${resume#*=}\"" + ;; + UUID=*|LABEL=*) + attrmatch="ENV{ID_FS_${resume%%=*}}==\"${resume#*=}\"" + ;; + /dev/*) + attrmatch="KERNEL==\"${resume#/dev/}\"" ;; - *) - # standard hibernation - if resumedev=$(resolve_device "$resume" "$rootdelay"); then - if [ -e /sys/power/resume ]; then - printf "%d:%d" $(stat -Lc "0x%t 0x%T" "$resumedev") >/sys/power/resume - return 0 - else - err "resume: no hibernation support found" - return 1 - fi - fi + err "resume: hibernation device '$resume' not found" + return 1 ;; esac + mkdir -p /run/udev/rules.d + printf >/run/udev/rules.d/99-initramfs-resume.rules \ + "ACTION==\"add|change\", %s, RUN+=\"/bin/sh -c 'echo %%M:%%m >/sys/power/resume'\"\n" \ + "$attrmatch" +} + +set_resume_device() { + local resumedev resume=$1 + + if resumedev=$(resolve_device "$resume" "$rootdelay"); then + printf "%d:%d" $(stat -Lc "0x%t 0x%T" "$resumedev") >/sys/power/resume + return 0 + fi + err "resume: hibernation device '$resume' not found" return 1 } +run_earlyhook () { + [ -z "$resume" ] && return 0; + + if [ ! -e /sys/power/resume ]; then + err "resume: no hibernation support found" + return 1 + fi + + if [ -n "$udev_running" ]; then + generate_resume_udev_rule "$resume" + else + set_resume_device "$resume" + fi +} + # vim: set ft=sh ts=4 sw=4 et: -- 1.8.4.2
Am 24.11.2013 03:01, schrieb Dave Reisner:
This solves the problem of the major:minor of devices changing between hibernation. It also makes initialization lazy, as we no longer need to explicitly wait on the device to show up (this will be taken care of by udevadm settle).
Note that this patch drops support for tux-on-ice. As upstream developerment of TOI seems stalled and forever relegated to being maintained out of tree, I'm okay with this.
Signed-off-by: Dave Reisner <dreisner@archlinux.org> --- So, I tested this as far as ensuring that the udev rule is generated correctly and triggers, writing the correct values into /sys/power/resume, but I can't actually hibernate/resume my desktop to test this (mobo firmware bug).
Your firmware should not be involved at all in hibernation. As far as your firmware is concerned, you shut down and reboot.
If anyone who uses this hook wants to test this, please do and report back.
I feel this approach is extremely dangerous due to the fragile state of file systems during hibernation. Note that the file system must not change at all while the system is hibernated. The kernel's internal state is saved in the hibernation image - if the file system changed while the system was hibernated, the on-disk and in-memory state will be inconsistent, almost guaranteeing corruption (I have experienced this first-hand). Now, your new approach does not preserve the ordering: Trying to resume from a hibernation image MUST happen BEFORE any of the hibernated system's mounted file systems are touched. Now imagine the following situation: You have two hard drives, one holding your root file system, the other holding your swap (and thus hibernation image). Now imagine the following order of events: * Linux loads, starts /init * Udev is started * Hard drive A is detected. * fsck is started, repairs the "dirty" root file system (and changes on-disk structures, clears the journal, ...) * Hard drive B is detected -> udev start the resume procedure. Now, you have a casual file system corruption. This should easily be reproducible by putting your swap on a hard drive and the root file system on SSD. To reproduce this more reliably, but the swap on USB instead. This patch gets as many -1's from me as I can find. Hibernation is dangerous for your data as it is, this patch plays russian data roulette.
On Tue, Nov 26, 2013 at 10:44:48AM +0100, Thomas Bächler wrote:
Am 24.11.2013 03:01, schrieb Dave Reisner:
This solves the problem of the major:minor of devices changing between hibernation. It also makes initialization lazy, as we no longer need to explicitly wait on the device to show up (this will be taken care of by udevadm settle).
Note that this patch drops support for tux-on-ice. As upstream developerment of TOI seems stalled and forever relegated to being maintained out of tree, I'm okay with this.
Signed-off-by: Dave Reisner <dreisner@archlinux.org> --- So, I tested this as far as ensuring that the udev rule is generated correctly and triggers, writing the correct values into /sys/power/resume, but I can't actually hibernate/resume my desktop to test this (mobo firmware bug).
Your firmware should not be involved at all in hibernation. As far as your firmware is concerned, you shut down and reboot.
Whatever it is, the shutdown never completes, and the reboot isn't a resume.
If anyone who uses this hook wants to test this, please do and report back.
I feel this approach is extremely dangerous due to the fragile state of file systems during hibernation.
Note that the file system must not change at all while the system is hibernated. The kernel's internal state is saved in the hibernation image - if the file system changed while the system was hibernated, the on-disk and in-memory state will be inconsistent, almost guaranteeing corruption (I have experienced this first-hand).
Now, your new approach does not preserve the ordering: Trying to resume from a hibernation image MUST happen BEFORE any of the hibernated system's mounted file systems are touched.
Now imagine the following situation: You have two hard drives, one holding your root file system, the other holding your swap (and thus hibernation image). Now imagine the following order of events:
* Linux loads, starts /init * Udev is started * Hard drive A is detected. * fsck is started, repairs the "dirty" root file system (and changes on-disk structures, clears the journal, ...)
Who/what triggered fsck here? Why is fsck being run before the udev event queue is flushed?
* Hard drive B is detected -> udev start the resume procedure.
...
Now, you have a casual file system corruption. This should easily be reproducible by putting your swap on a hard drive and the root file system on SSD. To reproduce this more reliably, but the swap on USB instead.
This patch gets as many -1's from me as I can find. Hibernation is dangerous for your data as it is, this patch plays russian data roulette.
I can understand the USB case and how it might be bad news if you plugged in a matching resume device at some arbitrary point during normal operation, but I don't really think your earlier posted order of events paints a realistic picture of what happens in early userspace. I'll ask Harald about this -- Dracut uses a very similar mechanism (but a bit more complex/convoluted).
Am 26.11.2013 16:22, schrieb Dave Reisner:
* Linux loads, starts /init * Udev is started * Hard drive A is detected. * fsck is started, repairs the "dirty" root file system (and changes on-disk structures, clears the journal, ...)
Who/what triggered fsck here? Why is fsck being run before the udev event queue is flushed?
* Hard drive B is detected -> udev start the resume procedure.
...
So, you're still using udevadm settle, which mitigates the situation most of the time - but it is no guarantee. Just because the udev queue is empty does not mean that all hardware is enumerated (look into people's inboxes for mails from Kay Sievers or Greg K-H for the full speech). Your patch does not keep udev from initiating the resume even when mkinitcpio started fsck'ing/mounting already. This is incorrect and dangerous behaviour.
but I don't really think your earlier posted order of events paints a realistic picture of what happens in early userspace.
Orly? Sure, only the USB examples will be easily reproducible due to the udevadm settle. But release these changes and wait for a day until some guy turns up and says he has a resume image on USB and now his data is corrupt.
I'll ask Harald about this -- Dracut uses a very similar mechanism (but a bit more complex/convoluted).
Nobody said this was easy to solve.
On Tue, Nov 26, 2013 at 04:35:13PM +0100, Thomas Bächler wrote:
Am 26.11.2013 16:22, schrieb Dave Reisner:
* Linux loads, starts /init * Udev is started * Hard drive A is detected. * fsck is started, repairs the "dirty" root file system (and changes on-disk structures, clears the journal, ...)
Who/what triggered fsck here? Why is fsck being run before the udev event queue is flushed?
* Hard drive B is detected -> udev start the resume procedure.
...
So, you're still using udevadm settle, which mitigates the situation most of the time - but it is no guarantee. Just because the udev queue is empty does not mean that all hardware is enumerated (look into people's inboxes for mails from Kay Sievers or Greg K-H for the full speech).
I'm the last person this needs to be explained to ;)
Your patch does not keep udev from initiating the resume even when mkinitcpio started fsck'ing/mounting already. This is incorrect and dangerous behaviour.
but I don't really think your earlier posted order of events paints a realistic picture of what happens in early userspace.
Orly? Sure, only the USB examples will be easily reproducible due to the udevadm settle. But release these changes and wait for a day until some guy turns up and says he has a resume image on USB and now his data is corrupt.
In other words, never underestimate the stupidity of users.
I'll ask Harald about this -- Dracut uses a very similar mechanism (but a bit more complex/convoluted).
Nobody said this was easy to solve.
Consider the patch shelved for now. It's not something I care about -- just thought I'd take a stab at fixing it for others.
On Tue, Nov 26, 2013 at 5:09 PM, Dave Reisner <d@falconindy.com> wrote:
On Tue, Nov 26, 2013 at 04:35:13PM +0100, Thomas Bächler wrote:
Am 26.11.2013 16:22, schrieb Dave Reisner:
I'll ask Harald about this -- Dracut uses a very similar mechanism (but a bit more complex/convoluted).
Nobody said this was easy to solve.
Consider the patch shelved for now. It's not something I care about -- just thought I'd take a stab at fixing it for others.
Well now I got interested ;-) Could this be solved (in a systemd world) by a systemd generator (running in the initramfs of course) generating a one-shot service which does the RUN from your udev rule, but which is ordered between the <hibernation>.device and local-fs-pre.target? Cheers, Tom PS I never used hibernation, and the only things I know about it, I learnt by reading this thread...
Am 26.11.2013 19:00, schrieb Tom Gundersen:
On Tue, Nov 26, 2013 at 5:09 PM, Dave Reisner <d@falconindy.com> wrote:
On Tue, Nov 26, 2013 at 04:35:13PM +0100, Thomas Bächler wrote:
Am 26.11.2013 16:22, schrieb Dave Reisner:
I'll ask Harald about this -- Dracut uses a very similar mechanism (but a bit more complex/convoluted).
Nobody said this was easy to solve.
Consider the patch shelved for now. It's not something I care about -- just thought I'd take a stab at fixing it for others.
Well now I got interested ;-)
Could this be solved (in a systemd world) by a systemd generator (running in the initramfs of course) generating a one-shot service which does the RUN from your udev rule, but which is ordered between the <hibernation>.device and local-fs-pre.target?
This should work, but we have to be careful that everything that touches file systems is ordered After=local-fs-pre.target. I would not put this into udev, but use a service file. Simply have a service ordered Before=local-fs-pre.target, which binds to the hibernation device (generated from the resume= option). I have no idea what happens if you add another job to systemd in the middle of booting up - and doing everything with a service file seems cleaner to me. FYI, systemd-fsck@.service currently has no ordering to local-fs-pre.target, this may lead to races between resuming and fsck.
On Tue, Nov 26, 2013 at 11:25 PM, Thomas Bächler <thomas@archlinux.org> wrote:
Am 26.11.2013 19:00, schrieb Tom Gundersen:
On Tue, Nov 26, 2013 at 5:09 PM, Dave Reisner <d@falconindy.com> wrote:
On Tue, Nov 26, 2013 at 04:35:13PM +0100, Thomas Bächler wrote:
Am 26.11.2013 16:22, schrieb Dave Reisner:
I'll ask Harald about this -- Dracut uses a very similar mechanism (but a bit more complex/convoluted).
Nobody said this was easy to solve.
Consider the patch shelved for now. It's not something I care about -- just thought I'd take a stab at fixing it for others.
Well now I got interested ;-)
Could this be solved (in a systemd world) by a systemd generator (running in the initramfs of course) generating a one-shot service which does the RUN from your udev rule, but which is ordered between the <hibernation>.device and local-fs-pre.target?
This should work, but we have to be careful that everything that touches file systems is ordered After=local-fs-pre.target.
I would not put this into udev, but use a service file.
Sure, that's what I meant.
Simply have a service ordered Before=local-fs-pre.target, which binds to the hibernation device (generated from the resume= option). I have no idea what happens if you add another job to systemd in the middle of booting up - and doing everything with a service file seems cleaner to me.
FYI, systemd-fsck@.service currently has no ordering to local-fs-pre.target, this may lead to races between resuming and fsck.
Ok. Something to keep in mind. -t
participants (4)
-
Dave Reisner
-
Dave Reisner
-
Thomas Bächler
-
Tom Gundersen