[arch-general] Btrfs RAID1 corrupted after crash
Dear all, unfortunately, I am very very deperate and I highly appreciate any help. One week ago, I move my entire system to btrfs to setup a RAID1. I created the RAID between device /dev/sdb and /dev/sdc with no partition table on normal HDDs. Everything was working smoothly until my computer crashed and at reboot I was not able to mount the device (my home dir) again and got the following messages: [ 125.834802] BTRFS info (device sdc): disk space caching is enabled [ 130.600101] BTRFS error (device sdc): block group 1268688879616 has wrong amount of free space [ 130.600113] BTRFS error (device sdc): failed to load free space cache for block group 1268688879616 [ 130.751274] BTRFS critical (device sdc): corrupt leaf, slot offset bad: block=1268477591552,root=1, slot=137 [ 130.751659] BTRFS critical (device sdc): corrupt leaf, slot offset bad: block=1268477591552,root=1, slot=137 So I cleared the cache with trying the mount option clear_cache, but it stayed problematic and I was not able to mount it: [ 368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755: errno=-5 IO failure [ 368.159602] BTRFS: error (device sdc) in btrfs_run_delayed_refs:2713: errno=-5 IO failure [ 368.165584] BTRFS warning (device sdc): Skipping commit of aborted transaction. [ 368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545: errno=-5 IO failure [ 368.165787] BTRFS: error (device sdc) in open_ctree:2839: errno=-5 IO failure (Failed to recover log tree) [ 368.227161] BTRFS: open_ctree failed Now, if I tried to mount it manually with degraded option enabled: # mount -t btrfs -o degraded /dev/sdb /mnt/sonst/ mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. Now I run btrfsck with repair option enabled but still I cannot mount it. Here you can find the dmesg and btrfsck outputs: dmesg: http://pastebin.com/zsaKQ0h1 btrfsck: http://pastebin.com/xva6uJwT Please, help me! ;( Are there other options to investigate my RAID or to even temporarily mount it to get some data? What went wrong here? What can I do? Why is a simple crash making my RAID unusable? Can I use other tools for a recovery? Again, every help is highly appreciated. Best wishes, Max PS: Archlinux, linux-3.14-5, btrfs-progs-3.14-1
2014-04-13 17:23 GMT-03:00 Maximilian Bräutigam <m@xbra.de>:
Dear all,
unfortunately, I am very very deperate and I highly appreciate any help. One week ago, I move my entire system to btrfs to setup a RAID1. I created the RAID between device /dev/sdb and /dev/sdc with no partition table on normal HDDs. Everything was working smoothly until my computer crashed and at reboot I was not able to mount the device (my home dir) again and got the following messages:
[ 125.834802] BTRFS info (device sdc): disk space caching is enabled [ 130.600101] BTRFS error (device sdc): block group 1268688879616 has wrong amount of free space [ 130.600113] BTRFS error (device sdc): failed to load free space cache for block group 1268688879616 [ 130.751274] BTRFS critical (device sdc): corrupt leaf, slot offset bad: block=1268477591552,root=1, slot=137 [ 130.751659] BTRFS critical (device sdc): corrupt leaf, slot offset bad: block=1268477591552,root=1, slot=137
So I cleared the cache with trying the mount option clear_cache, but it stayed problematic and I was not able to mount it:
[ 368.159594] BTRFS: error (device sdc) in __btrfs_free_extent:5755: errno=-5 IO failure [ 368.159602] BTRFS: error (device sdc) in btrfs_run_delayed_refs:2713: errno=-5 IO failure [ 368.165584] BTRFS warning (device sdc): Skipping commit of aborted transaction. [ 368.165589] BTRFS: error (device sdc) in cleanup_transaction:1545: errno=-5 IO failure [ 368.165787] BTRFS: error (device sdc) in open_ctree:2839: errno=-5 IO failure (Failed to recover log tree) [ 368.227161] BTRFS: open_ctree failed
Now, if I tried to mount it manually with degraded option enabled:
# mount -t btrfs -o degraded /dev/sdb /mnt/sonst/ mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error
In some cases useful info is found in syslog - try dmesg | tail or so.
Now I run btrfsck with repair option enabled but still I cannot mount it. Here you can find the dmesg and btrfsck outputs: dmesg: http://pastebin.com/zsaKQ0h1 btrfsck: http://pastebin.com/xva6uJwT
Please, help me! ;( Are there other options to investigate my RAID or to even temporarily mount it to get some data? What went wrong here? What can I do? Why is a simple crash making my RAID unusable? Can I use other tools for a recovery?
Again, every help is highly appreciated. Best wishes, Max PS: Archlinux, linux-3.14-5, btrfs-progs-3.14-1
I don't really know a lot about RAID volumes, but shouldn't there be a way to mount only one half of the array (only one of the partitions in RAID)? Also, if you recently updated anything that may be the cause of the problem (I'd say the kernel), you may try to create a bootable USB with the previous version of said packages to access or restore it. Best regards, -- Leonardo Dagnino
On 13-04-2014 21:23, Maximilian Bräutigam wrote:
Please, help me! ;( Are there other options to investigate my RAID or to even temporarily mount it to get some data? What went wrong here? What can I do? Why is a simple crash making my RAID unusable? Can I use other tools for a recovery?
Again, every help is highly appreciated. Best wishes, Max PS: Archlinux, linux-3.14-5, btrfs-progs-3.14-1
First leave both disks alone, don't do anything else with/to them, no fsck or trying to mount them with non-default options. Second go to the BTRFS IRC channel [1] and ask for help there, I'd say that's the place where it might be easier to find the most qualified help. Explain your problem from the start, say which kernel and tool versions you are using and provide any other information you think that may useful, such as what you have already tried. Keep your head cool, be patient and polite. [1] https://btrfs.wiki.kernel.org/index.php/Main_Page#Project_information.2FCont... -- Mauro Santos
On Sun, Apr 13, 2014 at 10:23 PM, Maximilian Bräutigam <m@xbra.de> wrote:
Please, help me! ;( Are there other options to investigate my RAID or to even temporarily mount it to get some data? What went wrong here? What can I do? Why is a simple crash making my RAID unusable? Can I use other tools for a recovery?
I've had (a lot of) luck in the past with "btrfs restore" [1]. It will not fix your volumes, but it will copy whatever it can read from the disk to a safe place. If it is a RAID1 both disks should hold the same data, so you can try btrfs-restoring first one and then the other... HTH [1]: https://btrfs.wiki.kernel.org/index.php/Restore
Am 14.04.2014 00:15, schrieb Rodrigo Rivas:
On Sun, Apr 13, 2014 at 10:23 PM, Maximilian Bräutigam <m@xbra.de> wrote:
Please, help me! ;( Are there other options to investigate my RAID or to even temporarily mount it to get some data? What went wrong here? What can I do? Why is a simple crash making my RAID unusable? Can I use other tools for a recovery?
I've had (a lot of) luck in the past with "btrfs restore" [1]. It will not fix your volumes, but it will copy whatever it can read from the disk to a safe place. If it is a RAID1 both disks should hold the same data, so you can try btrfs-restoring first one and then the other...
HTH
Hi all, thanks for your replies. I tried several things according to [1]. 1) btrfs restore Was not really working, only a few GB of my data. 2) then I realised some "transid verify failed", so I did a btrfs-zero-log DEVICE 3) From here I was able to mount my volume again – so I could save my latest photos. When I mount my volume with autodefrag,compress=lzo,subvolid=0, I end up with a "rw" mounted device. Then I copy some data with e.g. rsync and it turns to "ro" at some point. I found this while I wanted to scrub the devices, but this is naturally only working for writable mounts. And it is still – I don't know why – not possible to boot from the device again. Things to do next: try again with recovery option. If this is not working: roll back to ext4. But I really like the idea behind COW, subvolumes, no partitioning, RAID and everything in one fs. Snapshots against user mistakes, RAID against disk failure – perfectly save, if there was not the fs itself. So far, so good. The problem is, that even if I can come back to a fully working device or RAID again, the work load (that I have to put in just because my computer crashed) is much to high for something profound like a home dir. Unfortunately, the only thing I learned to far is to give btrfs some more decades to age. More ideas are of course welcome. Best wishes and thanks again, Max [1] https://unix.stackexchange.com/questions/32440/how-do-i-fix-btrfs
participants (4)
-
Leonardo Dagnino
-
Mauro Santos
-
Maximilian Bräutigam
-
Rodrigo Rivas