[arch-releng] regression with race condition on pxe boot with nbd
Hello everybody, with my latest builds I see a regression on pxe boot with nbd. About 50% of boots fail. The nbd module is loaded, nbd-client attaches the device, but mount fails: mount: you must specify the filesystem type ERROR; Failed to mount '/dev/nbd0' Falling back to interactive prompt You can try to fix the problem manually, log out when you are finished A simple mount allows to continue boot: mount /dev/nbd0 /run/archiso/bootmnt/ <Ctrl>-d My guess is that linux 4.6 introduced a race condition. Any idea how to fix or handle this? -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
Christian Hesse <list@eworm.de> on Tue, 2016/05/24 14:35:
Hello everybody,
with my latest builds I see a regression on pxe boot with nbd. About 50% of boots fail. The nbd module is loaded, nbd-client attaches the device, but mount fails:
mount: you must specify the filesystem type ERROR; Failed to mount '/dev/nbd0' Falling back to interactive prompt You can try to fix the problem manually, log out when you are finished
A simple mount allows to continue boot:
mount /dev/nbd0 /run/archiso/bootmnt/ <Ctrl>-d
My guess is that linux 4.6 introduced a race condition. Any idea how to fix or handle this?
Looks like adding a boot parameter nbd.nbds_max=2 fixes this (or makes it a lot less likely to happen). However this is still more of a workaround... -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On 05/24/16 09:54, Christian Hesse wrote:
Christian Hesse <list@eworm.de> on Tue, 2016/05/24 14:35:
Hello everybody,
with my latest builds I see a regression on pxe boot with nbd. About 50% of boots fail. The nbd module is loaded, nbd-client attaches the device, but mount fails:
mount: you must specify the filesystem type ERROR; Failed to mount '/dev/nbd0' Falling back to interactive prompt You can try to fix the problem manually, log out when you are finished
A simple mount allows to continue boot:
mount /dev/nbd0 /run/archiso/bootmnt/ <Ctrl>-d
My guess is that linux 4.6 introduced a race condition. Any idea how to fix or handle this?
Looks like adding a boot parameter
nbd.nbds_max=2
fixes this (or makes it a lot less likely to happen). However this is still more of a workaround...
Hi Christian Did you try booting with earlymodules=nbd if goes better? Thanks for doing a good job here.
Gerardo Exequiel Pozzi <vmlinuz386@gmail.com> on Tue, 2016/05/24 21:27:
On 05/24/16 09:54, Christian Hesse wrote:
Christian Hesse <list@eworm.de> on Tue, 2016/05/24 14:35:
Hello everybody,
with my latest builds I see a regression on pxe boot with nbd. About 50% of boots fail. The nbd module is loaded, nbd-client attaches the device, but mount fails:
mount: you must specify the filesystem type ERROR; Failed to mount '/dev/nbd0' Falling back to interactive prompt You can try to fix the problem manually, log out when you are finished
A simple mount allows to continue boot:
mount /dev/nbd0 /run/archiso/bootmnt/ <Ctrl>-d
My guess is that linux 4.6 introduced a race condition. Any idea how to fix or handle this?
Looks like adding a boot parameter
nbd.nbds_max=2
fixes this (or makes it a lot less likely to happen). However this is still more of a workaround...
Hi Christian
Did you try booting with earlymodules=nbd if goes better?
Thanks for doing a good job here.
Yes, looks like earlymodules=nbd works as well. What's the best way to get this into the scripts? Or should we just move modprobe to run_earlyhook()? -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
From: Christian Hesse <mail@eworm.de> Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_nbd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/archiso/initcpio/hooks/archiso_pxe_nbd b/archiso/initcpio/hooks/archiso_pxe_nbd index fdb2c2b..b61cb1b 100644 --- a/archiso/initcpio/hooks/archiso_pxe_nbd +++ b/archiso/initcpio/hooks/archiso_pxe_nbd @@ -1,5 +1,10 @@ # vim: set ft=sh: +run_earlyhook() { + # Module autoloading like with loop devices does not work, doing manually... + modprobe nbd 2> /dev/null +} + run_hook() { if [[ -n "${ip}" && -n "${archiso_nbd_srv}" ]]; then @@ -13,9 +18,6 @@ run_hook() { archiso_pxe_nbd_mount_handler () { newroot="${1}" - # Module autoloading like with loop devices does not work, doing manually... - modprobe nbd 2> /dev/null - msg ":: Waiting for boot device..." while ! poll_device /dev/nbd0 30; do echo "ERROR: boot device didn't show up after 30 seconds..." -- 2.8.3
From: Christian Hesse <mail@eworm.de> Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_nbd | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/archiso/initcpio/hooks/archiso_pxe_nbd b/archiso/initcpio/hooks/archiso_pxe_nbd index fdb2c2b..532a7e1 100644 --- a/archiso/initcpio/hooks/archiso_pxe_nbd +++ b/archiso/initcpio/hooks/archiso_pxe_nbd @@ -1,5 +1,12 @@ # vim: set ft=sh: +run_earlyhook() { + if [[ -n "${ip}" && -n "${archiso_nbd_srv}" ]]; then + # Module autoloading like with loop devices does not work, doing manually... + modprobe nbd 2> /dev/null + fi +} + run_hook() { if [[ -n "${ip}" && -n "${archiso_nbd_srv}" ]]; then @@ -13,9 +20,6 @@ run_hook() { archiso_pxe_nbd_mount_handler () { newroot="${1}" - # Module autoloading like with loop devices does not work, doing manually... - modprobe nbd 2> /dev/null - msg ":: Waiting for boot device..." while ! poll_device /dev/nbd0 30; do echo "ERROR: boot device didn't show up after 30 seconds..." -- 2.8.3
On 05/25/16 16:35, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_nbd | 10 +++++++---
Hola! I am thinking in release another archiso version with these changes before next ISO, do you have more patches pending? Thanks.
From: Christian Hesse <mail@eworm.de> According to ip-address(8) flushing an interface requires the keyword 'dev'. Also add proper quoting. Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index e97324f..66eecfa 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -51,8 +51,8 @@ run_latehook () { [[ -z "${copy_resolvconf}" ]] && copy_resolvconf="y" if [[ "${copytoram}" == "y" ]]; then - ip addr flush ${bootif_dev} - ip link set ${bootif_dev} down + ip addr flush dev "${bootif_dev}" + ip link set "${bootif_dev}" down elif [[ "${copy_resolvconf}" != "n" && -f /etc/resolv.conf ]]; then cp /etc/resolv.conf /new_root/etc/resolv.conf fi -- 2.8.3
From: Christian Hesse <mail@eworm.de> Booting from iPXE we can set bootif_mac without having BOOTIF around. Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index 66eecfa..cedf585 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -10,9 +10,12 @@ run_hook () { # /tmp/net-*.conf if [[ -n "${ip}" ]]; then - if [[ -n "${BOOTIF}" ]]; then + if [[ -z "${bootif_mac}" && -n "${BOOTIF}" ]]; then bootif_mac=${BOOTIF#01-} bootif_mac=${bootif_mac//-/:} + fi + + if [[ -n "${bootif_mac}" ]]; then for i in /sys/class/net/*/address; do read net_mac < ${i} if [[ "${bootif_mac}" == "${net_mac}" ]]; then -- 2.8.3
On 05/26/16 18:53, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
Booting from iPXE we can set bootif_mac without having BOOTIF around.
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index 66eecfa..cedf585 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -10,9 +10,12 @@ run_hook () { # /tmp/net-*.conf
if [[ -n "${ip}" ]]; then - if [[ -n "${BOOTIF}" ]]; then + if [[ -z "${bootif_mac}" && -n "${BOOTIF}" ]]; then bootif_mac=${BOOTIF#01-} bootif_mac=${bootif_mac//-/:} + fi + + if [[ -n "${bootif_mac}" ]]; then for i in /sys/class/net/*/address; do read net_mac < ${i} if [[ "${bootif_mac}" == "${net_mac}" ]]; then
If bootit_mac becomes a new cmdline paramteter, please add to docs ;) Is not a bit redundant? User can set BOOTIF= at syslinux prompt. what is the advantage here?
Gerardo Exequiel Pozzi <vmlinuz386@gmail.com> on Thu, 2016/05/26 21:09:
On 05/26/16 18:53, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
Booting from iPXE we can set bootif_mac without having BOOTIF around.
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index 66eecfa..cedf585 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -10,9 +10,12 @@ run_hook () { # /tmp/net-*.conf
if [[ -n "${ip}" ]]; then - if [[ -n "${BOOTIF}" ]]; then + if [[ -z "${bootif_mac}" && -n "${BOOTIF}" ]]; then bootif_mac=${BOOTIF#01-} bootif_mac=${bootif_mac//-/:} + fi + + if [[ -n "${bootif_mac}" ]]; then for i in /sys/class/net/*/address; do read net_mac < ${i} if [[ "${bootif_mac}" == "${net_mac}" ]]; then
If bootit_mac becomes a new cmdline paramteter, please add to docs ;)
Is not a bit redundant? User can set BOOTIF= at syslinux prompt. what is the advantage here?
Thinking about this... Just drop the patch. It does not matter what format I give to BOOTIF. So I can use the pxelinux version with hardware type prefix and mac address including dashes: BOOTIF=01-88-99-aa-bb-cc-dd Or give the mac address directly: BOOTIF=88:99:aa:bb:cc:dd Right? So I will adjust my boot parameters to always use BOOTIF=. -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On 05/27/16 03:44, Christian Hesse wrote:
Gerardo Exequiel Pozzi <vmlinuz386@gmail.com> on Thu, 2016/05/26 21:09:
On 05/26/16 18:53, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
Booting from iPXE we can set bootif_mac without having BOOTIF around.
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index 66eecfa..cedf585 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -10,9 +10,12 @@ run_hook () { # /tmp/net-*.conf
if [[ -n "${ip}" ]]; then - if [[ -n "${BOOTIF}" ]]; then + if [[ -z "${bootif_mac}" && -n "${BOOTIF}" ]]; then bootif_mac=${BOOTIF#01-} bootif_mac=${bootif_mac//-/:} + fi + + if [[ -n "${bootif_mac}" ]]; then for i in /sys/class/net/*/address; do read net_mac < ${i} if [[ "${bootif_mac}" == "${net_mac}" ]]; then
If bootit_mac becomes a new cmdline paramteter, please add to docs ;)
Is not a bit redundant? User can set BOOTIF= at syslinux prompt. what is the advantage here?
Thinking about this... Just drop the patch. It does not matter what format I give to BOOTIF. So I can use the pxelinux version with hardware type prefix and mac address including dashes:
BOOTIF=01-88-99-aa-bb-cc-dd
Or give the mac address directly:
BOOTIF=88:99:aa:bb:cc:dd
Right? So I will adjust my boot parameters to always use BOOTIF=.
Yes, both forms are valid :) The only warning here is when using dash-form, always append 01- first if your MAC start with 01, otherwise will be considered as hardware type [HTYPE] (01 for Ethernet) This is also valid, but not recommended. BOOTIF=88-99-aa-bb-cc-dd
From: Christian Hesse <mail@eworm.de> Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index cedf585..adadefc 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -39,6 +39,12 @@ run_hook () { pxeserver=${ROOTSERVER} + # If neither BOOTIF nor bootif_mac have been set from bootloader we do + # not know the boot interface, yet. Get it from ipconfig output now. + if [[ -z "${bootif_dev}" ]]; then + bootif_dev="${DEVICE}" + fi + # setup DNS resolver if [[ "${IPV4DNS0}" != "0.0.0.0" ]]; then echo "nameserver ${IPV4DNS0}" > /etc/resolv.conf -- 2.8.3
On 05/26/16 18:53, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index cedf585..adadefc 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -39,6 +39,12 @@ run_hook () {
pxeserver=${ROOTSERVER}
+ # If neither BOOTIF nor bootif_mac have been set from bootloader we do + # not know the boot interface, yet. Get it from ipconfig output now. + if [[ -z "${bootif_dev}" ]]; then + bootif_dev="${DEVICE}" + fi + # setup DNS resolver if [[ "${IPV4DNS0}" != "0.0.0.0" ]]; then echo "nameserver ${IPV4DNS0}" > /etc/resolv.conf
I guess this is not needed (not that you know about BOOTIF=), right?
Gerardo Exequiel Pozzi <vmlinuz386@gmail.com> on Fri, 2016/05/27 13:53:
On 05/26/16 18:53, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index cedf585..adadefc 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -39,6 +39,12 @@ run_hook () {
pxeserver=${ROOTSERVER}
+ # If neither BOOTIF nor bootif_mac have been set from bootloader we do + # not know the boot interface, yet. Get it from ipconfig output now. + if [[ -z "${bootif_dev}" ]]; then + bootif_dev="${DEVICE}" + fi + # setup DNS resolver if [[ "${IPV4DNS0}" != "0.0.0.0" ]]; then echo "nameserver ${IPV4DNS0}" > /etc/resolv.conf
I guess this is not needed (not that you know about BOOTIF=), right?
My setup works without now and users of pxelinux and iPXE are fine. Are there any other pxe boot loaders that do not support giving mac address via boot parameter? -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
From: Christian Hesse <mail@eworm.de> Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index adadefc..1a9fe9d 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -1,7 +1,8 @@ # vim: set ft=sh: run_hook () { - local i net_mac bootif_mac bootif_dev + # Do *not* declare 'bootif_dev' local! We need it in run_latehook(). + local i net_mac bootif_mac # These variables will be parsed from /tmp/net-*.conf generated by ipconfig local DEVICE local IPV4ADDR IPV4BROADCAST IPV4NETMASK IPV4GATEWAY IPV4DNS0 IPV4DNS1 -- 2.8.3
Christian Hesse <list@eworm.de> on Thu, 2016/05/26 23:53:
From: Christian Hesse <mail@eworm.de>
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index adadefc..1a9fe9d 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -1,7 +1,8 @@ # vim: set ft=sh:
run_hook () { - local i net_mac bootif_mac bootif_dev + # Do *not* declare 'bootif_dev' local! We need it in run_latehook(). + local i net_mac bootif_mac # These variables will be parsed from /tmp/net-*.conf generated by ipconfig local DEVICE local IPV4ADDR IPV4BROADCAST IPV4NETMASK IPV4GATEWAY IPV4DNS0 IPV4DNS1
This one is most important. :D Did you miss it? -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
On 05/27/16 15:01, Christian Hesse wrote:
Christian Hesse <list@eworm.de> on Thu, 2016/05/26 23:53:
From: Christian Hesse <mail@eworm.de>
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_common | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/archiso/initcpio/hooks/archiso_pxe_common b/archiso/initcpio/hooks/archiso_pxe_common index adadefc..1a9fe9d 100644 --- a/archiso/initcpio/hooks/archiso_pxe_common +++ b/archiso/initcpio/hooks/archiso_pxe_common @@ -1,7 +1,8 @@ # vim: set ft=sh:
run_hook () { - local i net_mac bootif_mac bootif_dev + # Do *not* declare 'bootif_dev' local! We need it in run_latehook(). + local i net_mac bootif_mac # These variables will be parsed from /tmp/net-*.conf generated by ipconfig local DEVICE local IPV4ADDR IPV4BROADCAST IPV4NETMASK IPV4GATEWAY IPV4DNS0 IPV4DNS1
This one is most important. :D Did you miss it?
woops, confused with bootif_mac! pushing... ¡Gracias!
Gerardo Exequiel Pozzi <vmlinuz386@gmail.com> on Thu, 2016/05/26 08:30:
On 05/25/16 16:35, Christian Hesse wrote:
From: Christian Hesse <mail@eworm.de>
Signed-off-by: Christian Hesse <mail@eworm.de> --- archiso/initcpio/hooks/archiso_pxe_nbd | 10 +++++++---
Hola!
I am thinking in release another archiso version with these changes before next ISO, do you have more patches pending?
Bringing down a network interface in copy-to-ram mode has been broken since... ever. (And flushing broke with e018653a.) I investigated and prepared four more patches. That's it for now I think. -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
participants (2)
-
Christian Hesse
-
Gerardo Exequiel Pozzi