Re: [arch-releng] 2010.04.05 snapshots ready for testing
On Sat, 10 Apr 2010 15:13:18 -0500 Chris Brannon <cmbrannon79@gmail.com> wrote:
if you can reproduce it consistently, file a bug report. however i cannot make guarantees as to when it will be fixed, due to time constraints. I can assist you if you want to debug this issue yourself, though.
I added some debugging statements to the AIF code on the CD image, and what I found is even more baffling. When process_filesystem /dev/mapper/cryptoroot ext4 is called, the test [ ! -b $1 ] returns 0 at the top of the function.
okay. so you are sure now that process_filesystem is called with args '/dev/mapper/cryptoroot' and 'ext4'. can you maybe do an ls -alh, stat, and file on $1 as the first thing in process_filesystem?
However, the test [ -b /dev/mapper/$fs_label ] returns 0 right after cryptsetup format and cryptsetup open have been called.
that's really odd. so basically you are saying when the dm_crypt is created -b on it returns true, but a bit later -b on the same thing does not. please doublecheck that $fs_label is cryptoroot here. do a ls -alh, stat and file here also. so we can compare.
The bizarre part is that everything works when I use the -d option for debugging output. I'm beginning to wonder if qemu isn't the source of the trouble, but I don't have a spare machine for testing.
I think it probably has something to do with open fd's such as stdin/stderr etc. notice that we do some magic in process_filesystem like cryptsetup $fs_params $opts luksFormat -q $part >$LOG 2>&1 < /dev/tty ; ret=$? #hack to give cryptsetup the approriate stdin. keep in mind we're in a loop (see process_filesystems where something else is on stdin) infofy "Please enter your passphrase to unlock the device" cryptsetup luksOpen $part $fs_label >$LOG 2>&1 < /dev/tty; ret=$? || ( show_warning 'cryptsetup' "Error luksOpening $part on /dev/mapper/$fs_label" ) ;; maybe the debugging somehow affects this. Dieter
Dieter Plaetinck wrote:
can you maybe do an ls -alh, stat, and file on $1 as the first thing in process_filesystem? *SNIP* that's really odd. so basically you are saying when the dm_crypt is created -b on it returns true, but a bit later -b on the same thing does not. please doublecheck that $fs_label is cryptoroot here. do a ls -alh, stat and file here also. so we can compare.
Ok, I added file, ls -al, and stat at both points. It worked like a charm, even when aif is invoked as aif -p interactive. That log is the first attachment. Next, I removed the file and ls -al calls, so we were left with only the two stats. I wanted to make it fail again, but it didn't oblige. That log is the second attachment, partlog2.txt. The stat following cryptsetup is different than the stat at the start of process_filesystem. In the first stat, you'll see that /dev/mapper/cryptoroot is a block device. It is at inode 7872. The second stat shows that /dev/mapper/cryptoroot is a symlink, and it is at inode 7949. -- Chris
On Sat, 10 Apr 2010 17:21:19 -0500 Chris Brannon <cmbrannon79@gmail.com> wrote:
Dieter Plaetinck wrote:
can you maybe do an ls -alh, stat, and file on $1 as the first thing in process_filesystem? *SNIP* that's really odd. so basically you are saying when the dm_crypt is created -b on it returns true, but a bit later -b on the same thing does not. please doublecheck that $fs_label is cryptoroot here. do a ls -alh, stat and file here also. so we can compare.
Ok, I added file, ls -al, and stat at both points. It worked like a charm, even when aif is invoked as aif -p interactive. That log is the first attachment.
Next, I removed the file and ls -al calls, so we were left with only the two stats. I wanted to make it fail again, but it didn't oblige.
odd. byebye "it works with -d, but always fails without it" theory. this makes me think again that simple things like writing to a file do something with filedescriptors that affect the breakage behavior.
That log is the second attachment, partlog2.txt. The stat following cryptsetup is different than the stat at the start of process_filesystem. In the first stat, you'll see that /dev/mapper/cryptoroot is a block device. It is at inode 7872. The second stat shows that /dev/mapper/cryptoroot is a symlink, and it is at inode 7949.
-- Chris
so, those two process_filesystem invocations follow each other directly right? it's not that it did some other filesystems in between but you left those out? and your stat calls are just after cryptsetup luksOpen and just at the beginning of process_filesystem? because really nothing should happen between those two that would affect what /dev/mapper/cryptoroot is. There is some labelling,mounting, adding to fstab logic but it should all be skipped for /dev/mapper/cryptoroot. You could put stat calls all over process_filesystem and process_filesystems, to find out where exactly it changes. but then again, as you said adding those calls makes it not break... maybe you should start with a clean aif again, make it fail and then manually do file, ls etc on /dev/mapper/cryptoroot. because the initial problem was that aif breaks on a simple `[ -z "$1" -o ! -b "$1" ]` check. Dieter
Dieter Plaetinck wrote:
so, those two process_filesystem invocations follow each other directly right? it's not that it did some other filesystems in between but you left those out? and your stat calls are just after cryptsetup luksOpen and just at the beginning of process_filesystem? because really nothing should happen between those two that would affect what /dev/mapper/cryptoroot is.
That's where I called stat, and there were no other filesystems processed in between those two. Could udev be doing something in between those two calls? I'm grasping at straws again.
maybe you should start with a clean aif again, make it fail and then manually do file, ls etc on /dev/mapper/cryptoroot. because the initial problem was that aif breaks on a simple `[ -z "$1" -o ! -b "$1" ]` check.
Ok, here's what I get from file, ls -al, and stat on /dev/mapper/cryptoroot. I called them manually right after the failure. /dev/mapper/cryptoroot: symbolic link to `../dm-0' lrwxrwxrwx 1 root root 7 Apr 11 22:09 /dev/mapper/cryptoroot -> ../dm-0 File: `/dev/mapper/cryptoroot' -> `../dm-0' Size: 7 Blocks: 0 IO Block: 4096 symbolic link Device: eh/14d Inode: 5239 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2010-04-11 22:09:25.043276999 +0000 Modify: 2010-04-11 22:09:13.903282148 +0000 Change: 2010-04-11 22:09:13.903282148 +0000 -- Chris
On Sun, 11 Apr 2010 17:17:25 -0500 Chris Brannon <cmbrannon79@gmail.com> wrote:
Dieter Plaetinck wrote:
so, those two process_filesystem invocations follow each other directly right? it's not that it did some other filesystems in between but you left those out? and your stat calls are just after cryptsetup luksOpen and just at the beginning of process_filesystem? because really nothing should happen between those two that would affect what /dev/mapper/cryptoroot is.
That's where I called stat, and there were no other filesystems processed in between those two. Could udev be doing something in between those two calls? I'm grasping at straws again.
maybe you should start with a clean aif again, make it fail and then manually do file, ls etc on /dev/mapper/cryptoroot. because the initial problem was that aif breaks on a simple `[ -z "$1" -o ! -b "$1" ]` check.
Ok, here's what I get from file, ls -al, and stat on /dev/mapper/cryptoroot. I called them manually right after the failure.
/dev/mapper/cryptoroot: symbolic link to `../dm-0' lrwxrwxrwx 1 root root 7 Apr 11 22:09 /dev/mapper/cryptoroot -> ../dm-0 File: `/dev/mapper/cryptoroot' -> `../dm-0' Size: 7 Blocks: 0 IO Block: 4096 symbolic link Device: eh/14d Inode: 5239 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2010-04-11 22:09:25.043276999 +0000 Modify: 2010-04-11 22:09:13.903282148 +0000 Change: 2010-04-11 22:09:13.903282148 +0000
-- Chris
I'm stumped. this looks fine. maybe there's a timing thing? maybe the devicefiles change just after aif dies. how about this: aif -p interactive ; ls -alh /dev/mapper; file /dev/mapper/*; stat /dev/mapper*; \ sleep 10; ls -alh /dev/mapper; file /dev/mapper/*; stat /dev/mapper*; Dieter
Am 11.04.2010 00:21, schrieb Chris Brannon:
The stat following cryptsetup is different than the stat at the start of process_filesystem. In the first stat, you'll see that /dev/mapper/cryptoroot is a block device. It is at inode 7872. The second stat shows that /dev/mapper/cryptoroot is a symlink, and it is at inode 7949.
/dev/mapper/ devices should NEVER be block devices. They should ONLY be created by udev and should ALWAYS be symbolic links to /dev/dm-* block devices. This is true for all type of device-mapper devices, be it cryptsetup, lvm or dmraid: $ ls -lhF /dev/mapper/ /dev/dm-* brw-rw---- 1 root disk 254, 0 12. Apr 10:40 /dev/dm-0 brw-rw---- 1 root disk 254, 1 12. Apr 10:40 /dev/dm-1 brw-rw---- 1 root disk 254, 2 12. Apr 10:40 /dev/dm-2 brw-rw---- 1 root disk 254, 3 12. Apr 10:40 /dev/dm-3 brw-rw---- 1 root disk 254, 4 12. Apr 10:40 /dev/dm-4 /dev/mapper/: insgesamt 0 crw-rw---- 1 root root 10, 59 12. Apr 10:39 control lrwxrwxrwx 1 root root 7 12. Apr 10:40 evey-chroot32 -> ../dm-4 lrwxrwxrwx 1 root root 7 12. Apr 10:40 evey-home -> ../dm-3 lrwxrwxrwx 1 root root 7 12. Apr 10:40 evey-root -> ../dm-1 lrwxrwxrwx 1 root root 7 12. Apr 10:40 evey-swap -> ../dm-2 lrwxrwxrwx 1 root root 7 12. Apr 10:40 pv -> ../dm-0
On Mon, 12 Apr 2010 13:50:21 +0200 Thomas Bächler <thomas@archlinux.org> wrote:
Am 11.04.2010 00:21, schrieb Chris Brannon:
The stat following cryptsetup is different than the stat at the start of process_filesystem. In the first stat, you'll see that /dev/mapper/cryptoroot is a block device. It is at inode 7872. The second stat shows that /dev/mapper/cryptoroot is a symlink, and it is at inode 7949.
/dev/mapper/ devices should NEVER be block devices. They should ONLY be created by udev and should ALWAYS be symbolic links to /dev/dm-* block devices. This is true for all type of device-mapper devices, be it cryptsetup, lvm or dmraid:
that's true, but test -b should exit(0) on them. aif has a [ -b /dev/mapper/.. ] check which fails, causing aif to abort. and we don't know why. we only see that the file looks fine after aif has crashed (okay. a bit later, see my previous mail), and when we put debugging into aif the error does not happen. Would be great if you could help us out because we're both clueless :) Dieter
Am 12.04.2010 13:57, schrieb Dieter Plaetinck:
On Mon, 12 Apr 2010 13:50:21 +0200 Thomas Bächler <thomas@archlinux.org> wrote:
Am 11.04.2010 00:21, schrieb Chris Brannon:
The stat following cryptsetup is different than the stat at the start of process_filesystem. In the first stat, you'll see that /dev/mapper/cryptoroot is a block device. It is at inode 7872. The second stat shows that /dev/mapper/cryptoroot is a symlink, and it is at inode 7949.
/dev/mapper/ devices should NEVER be block devices. They should ONLY be created by udev and should ALWAYS be symbolic links to /dev/dm-* block devices. This is true for all type of device-mapper devices, be it cryptsetup, lvm or dmraid:
that's true, but test -b should exit(0) on them. aif has a [ -b /dev/mapper/.. ] check which fails, causing aif to abort. and we don't know why. we only see that the file looks fine after aif has crashed (okay. a bit later, see my previous mail), and when we put debugging into aif the error does not happen.
Would be great if you could help us out because we're both clueless :)
There were a number of races w.r.t. device creating and linking in the past. You could try udevadm settle (maybe with a --subsystem= option) after cryptsetup as a workaround and see if the bug disappears. Other than that, I wouldn't know how to solve it. Also, please make sure that all files (especially udev rules) from device-mapper are installed into /lib/udev/rules.d/ (they should be, as archiso uses pacman, but it never hurts to be absolutely sure).
On Mon, 12 Apr 2010 14:23:48 +0200 Thomas Bächler <thomas@archlinux.org> wrote:
Am 12.04.2010 13:57, schrieb Dieter Plaetinck:
On Mon, 12 Apr 2010 13:50:21 +0200 Thomas Bächler <thomas@archlinux.org> wrote:
Am 11.04.2010 00:21, schrieb Chris Brannon:
The stat following cryptsetup is different than the stat at the start of process_filesystem. In the first stat, you'll see that /dev/mapper/cryptoroot is a block device. It is at inode 7872. The second stat shows that /dev/mapper/cryptoroot is a symlink, and it is at inode 7949.
/dev/mapper/ devices should NEVER be block devices. They should ONLY be created by udev and should ALWAYS be symbolic links to /dev/dm-* block devices. This is true for all type of device-mapper devices, be it cryptsetup, lvm or dmraid:
that's true, but test -b should exit(0) on them. aif has a [ -b /dev/mapper/.. ] check which fails, causing aif to abort. and we don't know why. we only see that the file looks fine after aif has crashed (okay. a bit later, see my previous mail), and when we put debugging into aif the error does not happen.
Would be great if you could help us out because we're both clueless :)
There were a number of races w.r.t. device creating and linking in the past. You could try udevadm settle (maybe with a --subsystem= option) after cryptsetup as a workaround and see if the bug disappears. Other than that, I wouldn't know how to solve it.
or hell. sleep 10. Chris, feeling lucky? :)
Also, please make sure that all files (especially udev rules) from device-mapper are installed into /lib/udev/rules.d/ (they should be, as archiso uses pacman, but it never hurts to be absolutely sure).
okay. btw on my laptop (which uses a lvm-on-top-of-dm_crypt setup): dieter@dieter-dellD620-arch ~ ls /lib/udev/rules.d/*crypt* /lib/udev/rules.d/*mapp* ls: cannot access /lib/udev/rules.d/*crypt*: No such file or directory ls: cannot access /lib/udev/rules.d/*mapp*: No such file or directory ? Dieter
Am 12.04.2010 14:51, schrieb Dieter Plaetinck:
There were a number of races w.r.t. device creating and linking in the past. You could try udevadm settle (maybe with a --subsystem= option) after cryptsetup as a workaround and see if the bug disappears. Other than that, I wouldn't know how to solve it.
or hell. sleep 10. Chris, feeling lucky? :)
Or that, at least for a short test.
Also, please make sure that all files (especially udev rules) from device-mapper are installed into /lib/udev/rules.d/ (they should be, as archiso uses pacman, but it never hurts to be absolutely sure).
okay. btw on my laptop (which uses a lvm-on-top-of-dm_crypt setup): dieter@dieter-dellD620-arch ~ ls /lib/udev/rules.d/*crypt* /lib/udev/rules.d/*mapp* ls: cannot access /lib/udev/rules.d/*crypt*: No such file or directory ls: cannot access /lib/udev/rules.d/*mapp*: No such file or directory
? Dieter
$ ls -lhF /lib/udev/rules.d/*dm* /sbin/dmsetup -rw-r--r-- 1 root root 4,4K 6. Apr 05:55 /lib/udev/rules.d/10-dm.rules -rw-r--r-- 1 root root 1,3K 6. Apr 05:55 /lib/udev/rules.d/11-dm-lvm.rules -rw-r--r-- 1 root root 1011 6. Apr 05:55 /lib/udev/rules.d/13-dm-disk.rules -rw-r--r-- 1 root root 492 6. Apr 05:55 /lib/udev/rules.d/95-dm-notify.rules -r-xr-xr-x 1 root root 53K 6. Apr 05:55 /sbin/dmsetup*
There were a number of races w.r.t. device creating and linking in the past. You could try udevadm settle (maybe with a --subsystem=3D option) after cryptsetup as a workaround and see if the bug disappears. Other than that, I wouldn't know how to solve it.
Also, please make sure that all files (especially udev rules) from device-mapper are installed into /lib/udev/rules.d/ (they should be, as archiso uses pacman, but it never hurts to be absolutely sure).
I added udevadm settle right after cryptsetup luksOpen, and that solved my problem! Thank you. I also verified that dmsetup and the udev rules from device-mapper are available on the CD. Now I can report a successful install using dm-crypt under qemu. The installed system boots and runs as it should. -- Chris
Am 12.04.2010 18:15, schrieb Chris Brannon:
I added udevadm settle right after cryptsetup luksOpen, and that solved my problem! Thank you.
This shouldn't be necessary. However, I don't know if cryptsetup uses the udev-synchronization feature of device-mapper.
I also verified that dmsetup and the udev rules from device-mapper are available on the CD.
Good.
On Mon, 12 Apr 2010 18:50:50 +0200 Thomas Bächler <thomas@archlinux.org> wrote:
Am 12.04.2010 18:15, schrieb Chris Brannon:
I added udevadm settle right after cryptsetup luksOpen, and that solved my problem! Thank you.
This shouldn't be necessary. However, I don't know if cryptsetup uses the udev-synchronization feature of device-mapper.
I also verified that dmsetup and the udev rules from device-mapper are available on the CD.
Good.
1) Chris, thanks a lot for your assistance in helping out. 2) Thomas, do you think it would be useful to add a 'sleep 1' "just in case" ? or if Chris reports the udevadm settle worked, that should be enough? 3) Thomas, do you think this fix is also needed after lvcreate/vgcreate/<other devicemapper thingie> ? or: it wouldn't cause harm right, so maybe i should put it after any time a [devicemapper] filesystem is created. Dieter
Am 12.04.2010 18:58, schrieb Dieter Plaetinck:
2) Thomas, do you think it would be useful to add a 'sleep 1' "just in case" ? or if Chris reports the udevadm settle worked, that should be enough? 3) Thomas, do you think this fix is also needed after lvcreate/vgcreate/<other devicemapper thingie> ? or: it wouldn't cause harm right, so maybe i should put it after any time a [devicemapper] filesystem is created.
As I said before, none of this should be needed at all. However, settling uevents after creation of devices is always safe, and usually only results in a sleep of less than a second.
participants (3)
-
Chris Brannon
-
Dieter Plaetinck
-
Thomas Bächler