[arch-projects] [initscripts][PATCH 0/3] Minor cleanup to rc and disabling udevadm settle
This is mostly for the first patch, which allows setting UDEV_TIMEOUT=0 to prevent 'udevadm settle' from being called. Looking back historically, this seems to cause a lot of grief with a lot of hardware. Googling turns up things such as... https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/387086 https://bugzilla.redhat.com/show_bug.cgi?id=585527 http://lkml.org/lkml/2011/4/12/538 https://bbs.archlinux.org/viewtopic.php?id=117481 So, we're looking at both recent and less recent. Moreover, most people won't miss this if its disabled, as it's really just acting as a barrier for coldplug events to complete processing. As long as it's optional and documented, I think this is a reasonable thing to do. d diffstat: rc | 15 +++++++++------ rc.conf | 3 ++- rc.sysinit | 4 ++-- 3 files changed, 13 insertions(+), 9 deletions(-)
This seems to cause a lot of grief for people lately. Moreover, most people don't need it and won't miss it if its disabled. Allow a setting of 0 to disable this barrier and document it. Side effect: remove the -o %PPID flag to pidof which won't have any effect. This is used in /etc/rc.d scripts because they often share the same name as the service they're controlling. In this case, we're querying /sbin/udevd from /etc/rc.sysinit, so there's no name clash. --- rc.conf | 3 ++- rc.sysinit | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/rc.conf b/rc.conf index f09b413..d41c3e4 100644 --- a/rc.conf +++ b/rc.conf @@ -39,7 +39,8 @@ VERBOSE="3" MOD_AUTOLOAD="yes" MODULES=() -# Udev settle timeout (default to 30) +# Udev settle timeout (defaults to 30). A setting of 0 disables waiting for udev +# to finish processing before continuing with the bootstrap. UDEV_TIMEOUT=30 # Scan for FakeRAID (dmraid) Volumes at startup diff --git a/rc.sysinit b/rc.sysinit index 070d7cf..76ed3d1 100755 --- a/rc.sysinit +++ b/rc.sysinit @@ -93,8 +93,8 @@ if [[ $load_modules != off && -f /proc/modules && $mods ]]; then fi unset mods -# Wait for udev uevents -if /bin/pidof -o %PPID /sbin/udevd >/dev/null; then +# Wait for udev uevents. Skip this if the user defines UDEV_TIMEOUT=0 +if { [[ -z $UDEV_TIMEOUT ]] || (( UDEV_TIMEOUT )); } && /bin/pidof /sbin/udevd >/dev/null; then status "Waiting for UDev uevents to be processed" \ /sbin/udevadm settle --quiet --timeout=${UDEV_TIMEOUT:-30} fi -- 1.7.4.4
--- rc | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/rc b/rc index 4bb9730..9abeb52 100755 --- a/rc +++ b/rc @@ -25,15 +25,15 @@ case $1 in have_daemon "$d" || continue # print running / stopped satus if ! ck_daemon "$d"; then - printf "${C_OTHER}[${C_DONE}STARTED${C_OTHER}]" + printf "${C_OTHER}[${C_DONE}STARTED${C_OTHER}]" else - printf "${C_OTHER}[${C_FAIL}STOPPED${C_OTHER}]" + printf "${C_OTHER}[${C_FAIL}STOPPED${C_OTHER}]" fi # print auto / manual status if ! ck_autostart "$d"; then - printf "${C_OTHER}[${C_DONE}AUTO${C_OTHER}]" + printf "${C_OTHER}[${C_DONE}AUTO${C_OTHER}]" else - printf "${C_OTHER}[${C_FAIL} ${C_OTHER}]" + printf "${C_OTHER}[${C_FAIL} ${C_OTHER}]" fi printf " ${C_MAIN}$d${C_CLEAR}\n" done -- 1.7.4.4
--- rc | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/rc b/rc index 9abeb52..1e64119 100755 --- a/rc +++ b/rc @@ -11,10 +11,12 @@ e.g: rc list rc help rc start sshd gpm EOF + exit 1 } -(( $# < 1 )) && usage && exit 1 +(( $# < 1 )) && usage +declare -i ret=0 case $1 in help) usage @@ -43,9 +45,10 @@ case $1 in shift for i; do [[ -x "/etc/rc.d/$i" ]] && "/etc/rc.d/$i" $action + (( ret += $? )) done esac -true +exit $ret # vim: set ts=2 sw=2 noet: -- 1.7.4.4
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
--- rc | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/rc b/rc index 9abeb52..1e64119 100755 --- a/rc +++ b/rc @@ -11,10 +11,12 @@ e.g: rc list rc help rc start sshd gpm EOF + exit 1 }
-(( $# < 1 )) && usage && exit 1 +(( $# < 1 )) && usage
+declare -i ret=0 case $1 in help) usage @@ -43,9 +45,10 @@ case $1 in shift for i; do [[ -x "/etc/rc.d/$i" ]] && "/etc/rc.d/$i" $action + (( ret += $? )) done esac
-true +exit $ret
Why return a value which is the sum of error value rather than 1 if something fail? The sum is meaningless. -- Sébastien Luttringer www.seblu.net
On Sun, Apr 24, 2011 at 09:28:00PM +0200, Seblu wrote:
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
--- rc | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/rc b/rc index 9abeb52..1e64119 100755 --- a/rc +++ b/rc @@ -11,10 +11,12 @@ e.g: rc list rc help rc start sshd gpm EOF + exit 1 }
-(( $# < 1 )) && usage && exit 1 +(( $# < 1 )) && usage
+declare -i ret=0 case $1 in help) usage @@ -43,9 +45,10 @@ case $1 in shift for i; do [[ -x "/etc/rc.d/$i" ]] && "/etc/rc.d/$i" $action + (( ret += $? )) done esac
-true +exit $ret
Why return a value which is the sum of error value rather than 1 if something fail? The sum is meaningless.
-- Sébastien Luttringer www.seblu.net
Well, it's more meaningful than always exiting 0. Perhaps I should have just done (( ++ret )) here. d
On Sun, Apr 24, 2011 at 9:31 PM, Dave Reisner <d@falconindy.com> wrote:
On Sun, Apr 24, 2011 at 09:28:00PM +0200, Seblu wrote:
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
--- rc | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/rc b/rc index 9abeb52..1e64119 100755 --- a/rc +++ b/rc @@ -11,10 +11,12 @@ e.g: rc list rc help rc start sshd gpm EOF + exit 1 }
-(( $# < 1 )) && usage && exit 1 +(( $# < 1 )) && usage
+declare -i ret=0 case $1 in help) usage @@ -43,9 +45,10 @@ case $1 in shift for i; do [[ -x "/etc/rc.d/$i" ]] && "/etc/rc.d/$i" $action + (( ret += $? )) done esac
-true +exit $ret
Why return a value which is the sum of error value rather than 1 if something fail? The sum is meaningless.
-- Sébastien Luttringer www.seblu.net
Well, it's more meaningful than always exiting 0. Perhaps I should have just done (( ++ret )) here.
you'r right and your patch is meaningfull! My point was just about the sum. ++ret looks more sexy. You patch? -- Sébastien Luttringer www.seblu.net
On Sun, Apr 24, 2011 at 09:44:48PM +0200, Seblu wrote:
On Sun, Apr 24, 2011 at 9:31 PM, Dave Reisner <d@falconindy.com> wrote:
On Sun, Apr 24, 2011 at 09:28:00PM +0200, Seblu wrote:
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
--- rc | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/rc b/rc index 9abeb52..1e64119 100755 --- a/rc +++ b/rc @@ -11,10 +11,12 @@ e.g: rc list rc help rc start sshd gpm EOF + exit 1 }
-(( $# < 1 )) && usage && exit 1 +(( $# < 1 )) && usage
+declare -i ret=0 case $1 in help) usage @@ -43,9 +45,10 @@ case $1 in shift for i; do [[ -x "/etc/rc.d/$i" ]] && "/etc/rc.d/$i" $action + (( ret += $? )) done esac
-true +exit $ret
Why return a value which is the sum of error value rather than 1 if something fail? The sum is meaningless.
-- Sébastien Luttringer www.seblu.net
Well, it's more meaningful than always exiting 0. Perhaps I should have just done (( ++ret )) here.
you'r right and your patch is meaningfull! My point was just about the sum. ++ret looks more sexy. You patch?
Uggh, ridiculous. /etc/rc.d scripts often have exit 0 at the end, so the "fix" here still doesn't guarantee anything useful. I'll send it anyways in hopes that someday we have some massive /etc/rc.d cleanup party. d
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
This is mostly for the first patch, which allows setting UDEV_TIMEOUT=0 to prevent 'udevadm settle' from being called. Looking back historically, this seems to cause a lot of grief with a lot of hardware. Googling turns up things such as...
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/387086 https://bugzilla.redhat.com/show_bug.cgi?id=585527 http://lkml.org/lkml/2011/4/12/538 https://bbs.archlinux.org/viewtopic.php?id=117481
Why not just pass --timeout=0 to udev? Does that not work (from the discussions I got the impression it should)? It should check the queue and return immediately, so basically the same as not calling udevadm at all. It seems like a reasonable workaround for those who need it.
So, we're looking at both recent and less recent. Moreover, most people won't miss this if its disabled, as it's really just acting as a barrier for coldplug events to complete processing. As long as it's optional and documented, I think this is a reasonable thing to do.
I think it is wrong to say that most people will not miss this. We are not able to deal with devices that appear after udevsettle, so if the timeout is too short some drives might not get mounted. I think we could support this as a hack for people who are hit by the bugs you describe (if just passing --timeout=0 does not work), but we should not give the impression that this is a proper fix (if ever the timeout is reached it means something is broken...). Cheers, Tom
On Sun, Apr 24, 2011 at 02:24:08PM +0200, Tom Gundersen wrote:
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
This is mostly for the first patch, which allows setting UDEV_TIMEOUT=0 to prevent 'udevadm settle' from being called. Looking back historically, this seems to cause a lot of grief with a lot of hardware. Googling turns up things such as...
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/387086 https://bugzilla.redhat.com/show_bug.cgi?id=585527 http://lkml.org/lkml/2011/4/12/538 https://bbs.archlinux.org/viewtopic.php?id=117481
Why not just pass --timeout=0 to udev? Does that not work (from the discussions I got the impression it should)? It should check the queue and return immediately, so basically the same as not calling udevadm at all. It seems like a reasonable workaround for those who need it.
The end result is the same, but explicitly passing 0 will dump the contents of the event queue if it's not empty.
So, we're looking at both recent and less recent. Moreover, most people won't miss this if its disabled, as it's really just acting as a barrier for coldplug events to complete processing. As long as it's optional and documented, I think this is a reasonable thing to do.
I think it is wrong to say that most people will not miss this. We are not able to deal with devices that appear after udevsettle, so if the timeout is too short some drives might not get mounted.
I think we could support this as a hack for people who are hit by the bugs you describe (if just passing --timeout=0 does not work), but we should not give the impression that this is a proper fix (if ever the timeout is reached it means something is broken...).
This doesn't prevent uevents from being handled, it just removes an explicit barrier where we wait for X seconds for the queue to be emptied. Also, there's really 2 groups of people who benefit: 1) Those with problematic hardware who get stuck processing this queue to timeout. 2) Others with non-complicated setups who don't need to wait for udev to fully complete processing. They'll lose a second off their boot time, and no harm will come of this. My intention was never to advertise this as a fix for anything, which is why I didn't document it as such in rc.conf. d
On Sun, Apr 24, 2011 at 3:56 PM, Dave Reisner <d@falconindy.com> wrote:
Why not just pass --timeout=0 to udev?
The end result is the same, but explicitly passing 0 will dump the contents of the event queue if it's not empty.
That sounds fair enough to me. Then we will at least know that something is not ready yet.
This doesn't prevent uevents from being handled, it just removes an explicit barrier where we wait for X seconds for the queue to be emptied.
My concern is that uevents for block devices are handled after we try to mount (or decrypt/assemble) them. In which case it will be as if the uevents were not handled at all, and for the user it will not be clear why.
1) Those with problematic hardware who get stuck processing this queue to timeout.
In case udev gets stuck waiting for something that we know it should not wait for, they should lower the timeout. However, they should never set it to 0, as we do need udev to wait for the things mentioned above. Even if things boot fine today with a 0 timeout, if other things speeds up or slow down in the future, your boot might break in non-obvious ways.
2) Others with non-complicated setups who don't need to wait for udev to fully complete processing. They'll lose a second off their boot time, and no harm will come of this.
As outlined above, this would lead to a fragile system. Waiting a second or two during boot is the cost of having a static initsystem... Unless there are specific bugs I'm not aware of, I think that setting the timeout to less than 5 is never the right thing to do (and less than the default is only very rarely the right thing to do), so I think we would mislead people if we explicitly optimize for timeout=0. Or am I missing something? -t PS To the people with broken hardware who wants to set a very low timeout, I suggest one of the following: 1) set a reasonable timeout (5-10 seconds) and just accept a slow boot, 2) fix the underlying problem (change hardware?), or 3) change to a more advanced initsystem (systemd/upstart) that can deal with hardware appearing and disappearing at any time (but they are not perfect in this regard yet either, as far as I know).
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
This is mostly for the first patch, which allows setting UDEV_TIMEOUT=0 to prevent 'udevadm settle' from being called. Looking back historically, this seems to cause a lot of grief with a lot of hardware. Googling turns up things such as...
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/387086 https://bugzilla.redhat.com/show_bug.cgi?id=585527 http://lkml.org/lkml/2011/4/12/538 https://bbs.archlinux.org/viewtopic.php?id=117481
So, we're looking at both recent and less recent. Moreover, most people won't miss this if its disabled, as it's really just acting as a barrier for coldplug events to complete processing. As long as it's optional and documented, I think this is a reasonable thing to do.
d
diffstat: rc | 15 +++++++++------ rc.conf | 3 ++- rc.sysinit | 4 ++-- 3 files changed, 13 insertions(+), 9 deletions(-)
I pushed 2 and 3, and the cleanup part of 1. I'm not sold on the need for the rest of 1, but if I'm wrong I'll push it out later. Thanks for your work! -t
On Sun, Apr 24, 2011 at 06:53:19PM +0200, Tom Gundersen wrote:
On Sun, Apr 24, 2011 at 3:01 AM, Dave Reisner <d@falconindy.com> wrote:
This is mostly for the first patch, which allows setting UDEV_TIMEOUT=0 to prevent 'udevadm settle' from being called. Looking back historically, this seems to cause a lot of grief with a lot of hardware. Googling turns up things such as...
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/387086 https://bugzilla.redhat.com/show_bug.cgi?id=585527 http://lkml.org/lkml/2011/4/12/538 https://bbs.archlinux.org/viewtopic.php?id=117481
So, we're looking at both recent and less recent. Moreover, most people won't miss this if its disabled, as it's really just acting as a barrier for coldplug events to complete processing. As long as it's optional and documented, I think this is a reasonable thing to do.
d
diffstat: rc | 15 +++++++++------ rc.conf | 3 ++- rc.sysinit | 4 ++-- 3 files changed, 13 insertions(+), 9 deletions(-)
I pushed 2 and 3, and the cleanup part of 1. I'm not sold on the need for the rest of 1, but if I'm wrong I'll push it out later.
Thanks for your work!
-t
Nah I'm not sold either. I think the behavior of --timeout=0 is sufficient, since it's informative (albeit with a level of vaguery) when it might be a bad idea to be skipping this barrier. Thanks for merging the others. d
participants (3)
-
Dave Reisner
-
Seblu
-
Tom Gundersen