[arch-general] btrfs kernel incompatibility?
I have a btrfs raid1 on raspbian (kernel 4.19.75) which overheated. To fix the btrfs filesystem I attached the raid1 to my workstation with Arch Linux (kernel 5.5.1). I run scrub to identify broken files and fixed them. Furthermore I run --full-balance and defrag -r. All fine so far. Now I can't mount the btrfs on raspbian any more (bad superblock). I checked with a blank raspbian and different boards, same result. On my Arch Linux workstation there are no problems. The btrfs changelog [1] only mentions more checksums (xxhash, sha256...) as possible incompatible features between 4.19 and 5.5. However the raid1 is still crc32c (see below). I'm afraid balance/scrub/defrag operations of my Arch Linux made the btrfs raid1 incompatible to older kernels. Is that plausible? How could I convert it back? 4.19.75 dmesg: [ 17.707873] GPT:Primary header thinks Alt. header is not at the end of the di sk. [ 17.707889] GPT:7814037167 != 253879390758629 [ 17.707895] GPT:Alternate GPT header not at the end of the disk. [ 17.707902] GPT:7814037167 != 253879390758629 [ 17.707907] GPT: Use GNU Parted to correct GPT errors. [ 17.707977] sdb: sdb1 [ 17.709682] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 17.709697] GPT:7814037167 != 253879390758629 [ 17.709703] GPT:Alternate GPT header not at the end of the disk. [ 17.709710] GPT:7814037167 != 253879390758629 [ 17.709715] GPT: Use GNU Parted to correct GPT errors. [ 17.709787] sda: sda1 [ 17.710776] sd 0:0:0:1: [sdb] Attached SCSI disk [ 17.721324] sd 0:0:0:0: [sda] Attached SCSI disk [ 18.301910] raid6: int32x1 gen() 203 MB/s [ 18.471765] raid6: int32x1 xor() 178 MB/s [ 18.641705] raid6: int32x2 gen() 278 MB/s [ 18.811703] raid6: int32x2 xor() 207 MB/s [ 18.981816] raid6: int32x4 gen() 307 MB/s [ 19.151686] raid6: int32x4 xor() 228 MB/s [ 19.321816] raid6: int32x8 gen() 315 MB/s [ 19.491762] raid6: int32x8 xor() 219 MB/s [ 19.661699] raid6: neonx1 gen() 711 MB/s [ 19.831664] raid6: neonx1 xor() 811 MB/s [ 20.001711] raid6: neonx2 gen() 1175 MB/s [ 20.171661] raid6: neonx2 xor() 1187 MB/s [ 20.341685] raid6: neonx4 gen() 1550 MB/s [ 20.511663] raid6: neonx4 xor() 1344 MB/s [ 20.681678] raid6: neonx8 gen() 1371 MB/s [ 20.851667] raid6: neonx8 xor() 1125 MB/s [ 20.851673] raid6: using algorithm neonx4 gen() 1550 MB/s [ 20.851676] raid6: .... xor() 1344 MB/s, rmw enabled [ 20.851680] raid6: using neon recovery algorithm [ 20.881255] xor: measuring software checksum speed [ 20.971664] arm4regs : 2020.800 MB/sec [ 21.071659] 8regs : 1357.600 MB/sec [ 21.171657] 32regs : 1262.800 MB/sec [ 21.271658] neon : 2156.800 MB/sec [ 21.271664] xor: using function: neon (2156.800 MB/sec) [ 21.417002] Btrfs loaded, crc32c=crc32c-generic [ 21.418650] BTRFS: device label URAID devid 8 transid 1254272 /dev/sdb1 [ 21.423593] BTRFS: device label URAID devid 7 transid 1254272 /dev/sda1 [ 313.823521] BTRFS info (device sda1): disk space caching is enabled [ 313.823537] BTRFS info (device sda1): has skinny extents [ 313.826597] BTRFS critical (device sda1): unable to find logical 3746684731392 length 4096 [ 313.839009] BTRFS critical (device sda1): unable to find logical 3746684731392 length 4096 [ 313.851766] BTRFS critical (device sda1): unable to find logical 3746684731392 length 4096 [ 313.864676] BTRFS critical (device sda1): unable to find logical 3746684731392 length 4096 [ 313.878039] BTRFS critical (device sda1): unable to find logical 3746684731392 length 4096 [ 313.891649] BTRFS critical (device sda1): unable to find logical 3746684731392 length 4096 [ 313.905782] BTRFS error (device sda1): failed to read chunk root [ 313.942529] BTRFS error (device sda1): open_ctree failed btrfs inspect-internal dump-super /dev/disk/by-label/URAID superblock: bytenr=65536, device=/dev/disk/by-label/URAID --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0x33567328 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid da28e00e-6ae7-4a28-9bf4-6826157c7e43 metadata_uuid da28e00e-6ae7-4a28-9bf4-6826157c7e43 label URAID generation 1254278 root 14868497367040 sys_array_size 129 chunk_root_generation 1254275 root_level 1 chunk_root 21338870824960 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 8001571065856 bytes_used 2903387471872 sectorsize 4096 nodesize 16384 leafsize (deprecated) 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation 1254278 uuid_tree_generation 1254278 dev_item.uuid fd1a07a5-4cdc-41eb-9776-d4519f2d086b dev_item.fsid da28e00e-6ae7-4a28-9bf4-6826157c7e43 [match] dev_item.type 0 dev_item.total_bytes 4000785104896 dev_item.bytes_used 2931348733952 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 8 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 [1] https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
There's not enough information to know what's going on yet. My strong advice is to make no further changes (no writes) to either block device until you understand exactly what's going on. Every write increases the chance of permanently losing the file system. On Thu, Feb 6, 2020 at 2:01 PM Simeon Felis <arch-general@sfelis.de> wrote:
4.19.75 dmesg:
[ 17.707873] GPT:Primary header thinks Alt. header is not at the end of the di sk. [ 17.707889] GPT:7814037167 != 253879390758629 [ 17.707895] GPT:Alternate GPT header not at the end of the disk. [ 17.707902] GPT:7814037167 != 253879390758629 [ 17.707907] GPT: Use GNU Parted to correct GPT errors. [ 17.707977] sdb: sdb1 [ 17.709682] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 17.709697] GPT:7814037167 != 253879390758629 [ 17.709703] GPT:Alternate GPT header not at the end of the disk. [ 17.709710] GPT:7814037167 != 253879390758629
This is terrible error reporting (by the kernel) in that it's not clearly stating whether the primary GPT is reporting 7814037167 or 253879390758629. No shit, they aren't the same. Usually this error means the backup GPT at the end of the drive has been stepped on by something; but LBA 253879390758629 is plainly bogus, that's ~115PiB. What do you get for either 'fdisk -l' or 'gdisk -l' or 'parted /dev/sda u s p' for each device?
btrfs inspect-internal dump-super /dev/disk/by-label/URAID superblock: bytenr=65536, device=/dev/disk/by-label/URAID total_bytes 8001571065856 ... dev_item.total_bytes 4000785104896
4000785104896 bytes is 7814033408 sectors, which approximates LBA 7814037167. And fortunately the latter number is bigger. There is a gotcha moving Btrfs between different archs. "Btrfs sector size", which is an internal Btrfs thing, not a reference to either logical or physical sector size of the device, must be the same as page size. Page size is 4KiB on x86 and I'm pretty sure it's 16KiB on ARM. So I wonder if you've run into an arch mixing problem (or bug). -- Chris Murphy
On Thu, Feb 6, 2020 at 3:03 PM Chris Murphy <lists@colorremedies.com> wrote:
There is a gotcha moving Btrfs between different archs. "Btrfs sector size", which is an internal Btrfs thing, not a reference to either logical or physical sector size of the device, must be the same as page size. Page size is 4KiB on x86 and I'm pretty sure it's 16KiB on ARM.
So I wonder if you've run into an arch mixing problem (or bug).
OK I pulled my Pi Zero out, which also uses Btrfs, and 4.19.75 kernel. The Btrfs sector size is 4096. Same as on x86_64. So now I'm back to something stepping on the end of the drives, that munged the backup GPT, and possibly munged Btrfs too. Has either device been mounted degraded since the successful scrub? -- Chris Murphy
Am 07.02.20 um 05:02 schrieb Chris Murphy:
On Thu, Feb 6, 2020 at 3:03 PM Chris Murphy <lists@colorremedies.com> wrote:
There is a gotcha moving Btrfs between different archs. "Btrfs sector size", which is an internal Btrfs thing, not a reference to either logical or physical sector size of the device, must be the same as page size. Page size is 4KiB on x86 and I'm pretty sure it's 16KiB on ARM.
So I wonder if you've run into an arch mixing problem (or bug).
OK I pulled my Pi Zero out, which also uses Btrfs, and 4.19.75 kernel. The Btrfs sector size is 4096. Same as on x86_64.
So now I'm back to something stepping on the end of the drives, that munged the backup GPT, and possibly munged Btrfs too.
Has either device been mounted degraded since the successful scrub?
No, a degraded mount was not performed since the successful scub. This is very kind of you. I'm probably not fit enough to do these steps.
Am 06.02.20 um 23:03 schrieb Chris Murphy:
There's not enough information to know what's going on yet. My strong advice is to make no further changes (no writes) to either block device until you understand exactly what's going on. Every write increases the chance of permanently losing the file system.
On Thu, Feb 6, 2020 at 2:01 PM Simeon Felis <arch-general@sfelis.de> wrote:
4.19.75 dmesg:
[ 17.707873] GPT:Primary header thinks Alt. header is not at the end of the di sk. [ 17.707889] GPT:7814037167 != 253879390758629 [ 17.707895] GPT:Alternate GPT header not at the end of the disk. [ 17.707902] GPT:7814037167 != 253879390758629 [ 17.707907] GPT: Use GNU Parted to correct GPT errors. [ 17.707977] sdb: sdb1 [ 17.709682] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 17.709697] GPT:7814037167 != 253879390758629 [ 17.709703] GPT:Alternate GPT header not at the end of the disk. [ 17.709710] GPT:7814037167 != 253879390758629
This is terrible error reporting (by the kernel) in that it's not clearly stating whether the primary GPT is reporting 7814037167 or 253879390758629. No shit, they aren't the same. Usually this error means the backup GPT at the end of the drive has been stepped on by something; but LBA 253879390758629 is plainly bogus, that's ~115PiB.
What do you get for either 'fdisk -l' or 'gdisk -l' or 'parted /dev/sda u s p' for each device?
root@omv:~# fdisk -l /dev/sda Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK03 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 Device Start End Sectors Size Type /dev/sda1 2048 7814037134 7814035087 3.7T Linux filesystem root@omv:~# fdisk -l /dev/sdb Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK04 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F Device Start End Sectors Size Type /dev/sdb1 2048 7814035455 7814033408 3.7T Linux filesystem root@omv:~# gdisk -l /dev/sda GPT fdisk (gdisk) version 1.0.3 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 7814037168 sectors, 3.6 TiB Model: USB3.0 DISK03 Sector size (logical/physical): 512/4096 bytes Disk identifier (GUID): 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 2048, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 0 sectors (0 bytes) Number Start (sector) End (sector) Size Code Name 1 2048 7814037134 3.6 TiB 8300 root@omv:~# gdisk -l /dev/sdb GPT fdisk (gdisk) version 1.0.3 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdb: 7814037168 sectors, 3.6 TiB Model: USB3.0 DISK04 Sector size (logical/physical): 512/4096 bytes Disk identifier (GUID): 9904ABA2-B9F8-4544-9699-9935CE8A7B1F Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 3693 sectors (1.8 MiB) Number Start (sector) End (sector) Size Code Name 1 2048 7814035455 3.6 TiB 8300
On Fri, Feb 7, 2020 at 1:45 PM Simeon Felis <arch-general@sfelis.de> wrote:
Am 06.02.20 um 23:03 schrieb Chris Murphy:
There's not enough information to know what's going on yet. My strong advice is to make no further changes (no writes) to either block device until you understand exactly what's going on. Every write increases the chance of permanently losing the file system.
On Thu, Feb 6, 2020 at 2:01 PM Simeon Felis <arch-general@sfelis.de> wrote:
4.19.75 dmesg:
[ 17.707873] GPT:Primary header thinks Alt. header is not at the end of the di sk. [ 17.707889] GPT:7814037167 != 253879390758629 [ 17.707895] GPT:Alternate GPT header not at the end of the disk. [ 17.707902] GPT:7814037167 != 253879390758629 [ 17.707907] GPT: Use GNU Parted to correct GPT errors. [ 17.707977] sdb: sdb1 [ 17.709682] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 17.709697] GPT:7814037167 != 253879390758629 [ 17.709703] GPT:Alternate GPT header not at the end of the disk. [ 17.709710] GPT:7814037167 != 253879390758629
This is terrible error reporting (by the kernel) in that it's not clearly stating whether the primary GPT is reporting 7814037167 or 253879390758629. No shit, they aren't the same. Usually this error means the backup GPT at the end of the drive has been stepped on by something; but LBA 253879390758629 is plainly bogus, that's ~115PiB.
What do you get for either 'fdisk -l' or 'gdisk -l' or 'parted /dev/sda u s p' for each device?
root@omv:~# fdisk -l /dev/sda Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK03 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783
Device Start End Sectors Size Type /dev/sda1 2048 7814037134 7814035087 3.7T Linux filesystem root@omv:~# fdisk -l /dev/sdb Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK04 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
Device Start End Sectors Size Type /dev/sdb1 2048 7814035455 7814033408 3.7T Linux filesystem
root@omv:~# gdisk -l /dev/sda GPT fdisk (gdisk) version 1.0.3
Partition table scan: MBR: protective BSD: not present APM: not present GPT: present
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 7814037168 sectors, 3.6 TiB Model: USB3.0 DISK03 Sector size (logical/physical): 512/4096 bytes Disk identifier (GUID): 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 2048, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 0 sectors (0 bytes)
Number Start (sector) End (sector) Size Code Name 1 2048 7814037134 3.6 TiB 8300 root@omv:~# gdisk -l /dev/sdb GPT fdisk (gdisk) version 1.0.3
Partition table scan: MBR: protective BSD: not present APM: not present GPT: present
Found valid GPT with protective MBR; using GPT. Disk /dev/sdb: 7814037168 sectors, 3.6 TiB Model: USB3.0 DISK04 Sector size (logical/physical): 512/4096 bytes Disk identifier (GUID): 9904ABA2-B9F8-4544-9699-9935CE8A7B1F Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 3693 sectors (1.8 MiB)
Number Start (sector) End (sector) Size Code Name 1 2048 7814035455 3.6 TiB 8300
OK so neither fdisk nor gdisk have any complaints about the GPT. And yet the kernel is complaining. That's wrong and weird. Have the drives always been in these USB enclosures, for the life of this Btrfs file system? They've always been connected to x86 and ARM while in these USB enclosures? The two drives have identical sectors, 7814037168. Why don't they have identical partition maps? The /dev/sdb drive says partition 1 starts at 2048, and yet "first usable sector is 34" and that drive has 3693 sectors free. /dev/sda 7814037134-2048=7814035086, 7814035086*512=4000785964032 /dev/sdb 7814035455-2048=7814033407, 7814033407*512=4000785104384 The Btrfs super provided for one of them (I can't tell which it's for) dev_item.total_bytes 4000785104896 If that's the value for /dev/sda, it's wrong but safe. i.e. the partition is bigger than what Btrfs says it should be. And btrfs will live inside its own dev size constraints. However, if it's the value for /dev/sdb, it's wrong and not safe, because the partition is exactly 512 bytes smaller than Btrfs thinks it should be. But we need clarification. Can you provide the super block for both /dev/sda and /dev/sdb unambiguously with the above reported gdisk/fdisk outputs? Note the /dev/sda /dev/sdb designations can change if the drives have been disconnected/reconnected or the system rebooted since those commands were issued. So it's important to make certain which partition map goes with which super, because no matter what something is not exactly correct and probably should be fixed. And also as a sanity test: sudo btrfs rescue super -v /dev/sda It only needs to be run on one of the devices; but both need to be present and unmounted. This is a read only command to verify all six supers are valid. Last, I wonder if there's some weird bug. Any chance you can update the Pi to kernel 4.19.97-1-ARCH and see if this same kernel GPT error messages happen? You won't need to mount the volume (and I don't recommend trying yet anyway), just connect the devices individually and report back what the kernel says about each drive as you connect them. I do find it a bit hard to believe a GPT related bug could exist in 4.19.75...but worth a shot. For what it's worth, I did an upgrade to 4.19.97 on my Pi and it's fine. Anyway, I wanna be extremely deliberate. It's a bit tedious. But having monitored linux-raid@, LVM, linux-xfs@, linux-btrfs@ upstream lists, it's extremely common to get user induced data loss by getting panicky and doing things too fast. And also I'm intentionally verbose so as to invite anyone reading to call b.s. - it's really easy to make simple stupid mistakes. -- Chris Murphy
Sorry, Valentines preparation. Am 07.02.20 um 23:10 schrieb Chris Murphy:
OK so neither fdisk nor gdisk have any complaints about the GPT. And yet the kernel is complaining. That's wrong and weird.
Have the drives always been in these USB enclosures, for the life of this Btrfs file system? They've always been connected to x86 and ARM while in these USB enclosures?
The two disk were brand new and added to the raid. Other drives were removed due to their old age and failing SMART. They were solely used in this btrfs raid and only on x86_64 and ARM. I can't say if I provisioned both disks on the same platform. So one disk was maybe provisioned while still hooked up to Arch Linux 5.x, the second one might be provisioned with raspbian 4.19.
The two drives have identical sectors, 7814037168. Why don't they have identical partition maps? The /dev/sdb drive says partition 1 starts at 2048, and yet "first usable sector is 34" and that drive has 3693 sectors free.
/dev/sda 7814037134-2048=7814035086, 7814035086*512=4000785964032
/dev/sdb 7814035455-2048=7814033407, 7814033407*512=4000785104384
The Btrfs super provided for one of them (I can't tell which it's for)
dev_item.total_bytes 4000785104896
If that's the value for /dev/sda, it's wrong but safe. i.e. the partition is bigger than what Btrfs says it should be. And btrfs will live inside its own dev size constraints. However, if it's the value for /dev/sdb, it's wrong and not safe, because the partition is exactly 512 bytes smaller than Btrfs thinks it should be. But we need clarification. Can you provide the super block for both /dev/sda and /dev/sdb unambiguously with the above reported gdisk/fdisk outputs? Note the /dev/sda /dev/sdb designations can change if the drives have been disconnected/reconnected or the system rebooted since those commands were issued. So it's important to make certain which partition map goes with which super, because no matter what something is not exactly correct and probably should be fixed.
I'm gonna reference their UUIDs from fdisk. Also I'm gonna use Arch Linux 5.5 x86_64 in favour of raspbian 4.19 unless otherwise instructed. btrfs raid1 disks UUIDs: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 9904ABA2-B9F8-4544-9699-9935CE8A7B1F Oh, by the way it's a Seagate and Western Digital. Damn I thought I bought identical ones. 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 ==================================== # LANG=C fdisk -l /dev/sdb Disk /dev/sdb: 3,65 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK03 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 Device Start End Sectors Size Type /dev/sdb1 2048 7814037134 7814035087 3,7T Linux filesystem --> (7814037134-2048)*512 = 4000785964032 of available bytes # btrfs inspect-internal dump-super /dev/sdb1 | grep total_bytes total_bytes 8001571065856 dev_item.total_bytes 4000785960960 --> 4000785960960-4000785964032 = -3072 --> broken? 9904ABA2-B9F8-4544-9699-9935CE8A7B1F ==================================== LANG=C fdisk -l /dev/sdc Disk /dev/sdc: 3,65 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK04 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F Device Start End Sectors Size Type /dev/sdc1 2048 7814035455 7814033408 3,7T Linux filesystem --> (7814035455-2048)*512 = 4000785104384 of available bytes # btrfs inspect-internal dump-super /dev/sdc1 | grep total_bytes total_bytes 8001571065856 dev_item.total_bytes 4000785104896 --> 4000785104896-4000785104384 = 512 --> safe? So here one could create the backup GPT...
And also as a sanity test:
sudo btrfs rescue super -v /dev/sda
# btrfs rescue super -v /dev/sdb1 All Devices: Device: id = 8, name = /dev/sdc1 Device: id = 7, name = /dev/sdb1 Before Recovering: [All good supers]: device name = /dev/sdc1 superblock bytenr = 65536 device name = /dev/sdc1 superblock bytenr = 67108864 device name = /dev/sdc1 superblock bytenr = 274877906944 device name = /dev/sdb1 superblock bytenr = 65536 device name = /dev/sdb1 superblock bytenr = 67108864 device name = /dev/sdb1 superblock bytenr = 274877906944 [All bad supers]: All supers are valid, no need to recover
It only needs to be run on one of the devices; but both need to be present and unmounted. This is a read only command to verify all six supers are valid.
Last, I wonder if there's some weird bug. Any chance you can update the Pi to kernel 4.19.97-1-ARCH and see if this same kernel GPT error messages happen? You won't need to mount the volume (and I don't recommend trying yet anyway), just connect the devices individually and report back what the kernel says about each drive as you connect them. I do find it a bit hard to believe a GPT related bug could exist in 4.19.75...but worth a shot. For what it's worth, I did an upgrade to 4.19.97 on my Pi and it's fine.
The raspbian on my pi4 now has 4.19.97: uname -a Linux omv 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux The GPT error messages are gone...
Anyway, I wanna be extremely deliberate. It's a bit tedious. But having monitored linux-raid@, LVM, linux-xfs@, linux-btrfs@ upstream lists, it's extremely common to get user induced data loss by getting panicky and doing things too fast. And also I'm intentionally verbose so as to invite anyone reading to call b.s. - it's really easy to make simple stupid mistakes.
Yeah, it would be a bit sad since this I don't 4TB sparse yet. Since the outputs of dump-* are pretty large here are the links after they did not made it to the list here is a link for downloading: https://nextcloud.sfelis.de/s/SJEGp2gCGKenDsj The diffs showed that the dump-s were pretty much the same, but for dump-t there are a few differences.
On Sun, Feb 9, 2020 at 8:15 AM Simeon Felis <arch-general@sfelis.de> wrote:
btrfs raid1 disks UUIDs: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 9904ABA2-B9F8-4544-9699-9935CE8A7B1F Oh, by the way it's a Seagate and Western Digital. Damn I thought I bought identical ones.
This is better. Chance of simultaneous failures is much reduced with mixed make/model. What do you get for? # btrfs insp dump-t b 3746684731392 /dev/ That's a read-only command, volume doesn't need to be mounted, and /dev/ is either device. I actually don't need to see the output, but I want to know if the command complains or provides whatever is at that location. It could be a node or leaf, it may contain filenames so if you do post it, you'll want to sanitize them.
63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 ====================================
# LANG=C fdisk -l /dev/sdb Disk /dev/sdb: 3,65 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK03 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783
Device Start End Sectors Size Type /dev/sdb1 2048 7814037134 7814035087 3,7T Linux filesystem
--> (7814037134-2048)*512 = 4000785964032 of available bytes
# btrfs inspect-internal dump-super /dev/sdb1 | grep total_bytes total_bytes 8001571065856 dev_item.total_bytes 4000785960960
--> 4000785960960-4000785964032 = -3072 --> broken?
Other way around. The partition is larger than the file system, by exactly 3072 bytes which is spot on correct, because Btrfs sector size is 4KiB and 7814035087 sectors is exactly 512 bytes short of whole 4KiB. So the last 3072 bytes on this partition aren't used by Btrfs. This device is Btrfs devid 7. And it is OK.
9904ABA2-B9F8-4544-9699-9935CE8A7B1F ====================================
LANG=C fdisk -l /dev/sdc Disk /dev/sdc: 3,65 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK04 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
Device Start End Sectors Size Type /dev/sdc1 2048 7814035455 7814033408 3,7T Linux filesystem
--> (7814035455-2048)*512 = 4000785104384 of available bytes
# btrfs inspect-internal dump-super /dev/sdc1 | grep total_bytes total_bytes 8001571065856 dev_item.total_bytes 4000785104896
--> 4000785104896-4000785104384 = 512 --> safe? So here one could create the backup GPT...
Again, other way around. The partition is smaller than the file system, by 512 bytes, 1 sector. Simplest and safest fix? 0. Boilerplate disclosure, mount -o ro, and freshen backups. Just in case. 1. Unmount 2. fdisk or gdisk /dev/sdc1, 9904ABA2... delete partition 1 and recreate it with default values, this will change it from /dev/sdc1 2048 7814035455 7814033408 3,7T Linux filesystem to /dev/sdc1 2048 7814037134 7814035087 3,7T Linux filesystem Which matches /dev/sdb1 and will be bigger than it is now. Double check the values and write it out. Since the device isn't used the kernel should be updated automatically, you can check dmesg before and after, or just use partprobe if available. 3. mount the volume normally (this time rw) 4. resize sdc1 to account for the slightly larger partition and match its mirror # btrfs fi resize 8:max /mnt You can use 'btrfs insp dump-s' on both drives to confirm they both have dev_item.total_bytes 4000785960960 That's it. Unmount and go try it on the Pi. -- Chris Murphy
On Sun, Feb 9, 2020 at 5:12 PM Chris Murphy <lists@colorremedies.com> wrote:
Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
Device Start End Sectors Size Type /dev/sdc1 2048 7814035455 7814033408 3,7T Linux filesystem
--> (7814035455-2048)*512 = 4000785104384 of available bytes
# btrfs inspect-internal dump-super /dev/sdc1 | grep total_bytes total_bytes 8001571065856 dev_item.total_bytes 4000785104896
--> 4000785104896-4000785104384 = 512 --> safe? So here one could create the backup GPT...
Again, other way around. The partition is smaller than the file system, by 512 bytes, 1 sector.
Actually I've made a very common mistake. LBA 2048 should be *included* in the total byte count. So it's really (7814035455-2048+1)*512=4000785104896 which is identical to dev_item.total_bytes. Therefore *both* of these devices are completely safe and fine, and should not give either ARM or x86_64 grief. And you don't have to make the two mirrors identically sized. Btrfs doesn't care if they're identical. You might just go straight to ARM, and try to mount -o ro and see if it mounts it OK. I think the error messages you got from Btrfs previously had to do with the bogus GPT error messages - which we don't know why that happened. Anyway, the below is still valid and safe, but entirely optional. For what it's worth, fdisk will ask if it should wipe the Btrfs signature, don't do that. Writing out a new GPT writes to the first 34 and last 34 sectors of the drive, and this Btrfs file system is nowhere near those areas.
Simplest and safest fix? 0. Boilerplate disclosure, mount -o ro, and freshen backups. Just in case. 1. Unmount 2. fdisk or gdisk /dev/sdc1, 9904ABA2... delete partition 1 and recreate it with default values, this will change it from
/dev/sdc1 2048 7814035455 7814033408 3,7T Linux filesystem
to
/dev/sdc1 2048 7814037134 7814035087 3,7T Linux filesystem
Which matches /dev/sdb1 and will be bigger than it is now. Double check the values and write it out. Since the device isn't used the kernel should be updated automatically, you can check dmesg before and after, or just use partprobe if available.
3. mount the volume normally (this time rw) 4. resize sdc1 to account for the slightly larger partition and match its mirror
# btrfs fi resize 8:max /mnt
You can use 'btrfs insp dump-s' on both drives to confirm they both have
dev_item.total_bytes 4000785960960
-- Chris Murphy
Am 10.02.20 um 01:22 schrieb Chris Murphy:
You might just go straight to ARM, and try to mount -o ro and see if it mounts it OK. I think the error messages you got from Btrfs previously had to do with the bogus GPT error messages - which we don't know why that happened.
Unfortunately it still does not mount: mount -o ro /dev/disk/by-label/URAID /mnt/URAID/ mount: /mnt/URAID: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error. [ 182.039688] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 182.071047] usb 2-2: New USB device found, idVendor=152d, idProduct=0567, bcdDevice=52.03 [ 182.071063] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 182.071076] usb 2-2: Product: External USB 3.0 [ 182.071089] usb 2-2: Manufacturer: JMicron [ 182.071101] usb 2-2: SerialNumber: 20170331000C3 [ 182.074585] usb-storage 2-2:1.0: USB Mass Storage device detected [ 182.079212] usb-storage 2-2:1.0: Quirks match for vid 152d pid 0567: 5000000 [ 182.079424] scsi host0: usb-storage 2-2:1.0 [ 183.130129] scsi 0:0:0:0: Direct-Access External USB3.0 DISK03 5203 PQ: 0 ANSI: 6 [ 183.131024] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 183.131252] sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 183.131267] sd 0:0:0:0: [sda] 4096-byte physical blocks [ 183.131904] sd 0:0:0:0: [sda] Write Protect is off [ 183.131919] sd 0:0:0:0: [sda] Mode Sense: 2b 00 00 00 [ 183.132258] scsi 0:0:0:1: Direct-Access External USB3.0 DISK04 5203 PQ: 0 ANSI: 6 [ 183.139512] sd 0:0:0:0: [sda] No Caching mode page found [ 183.139528] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 183.140186] sd 0:0:0:1: [sdb] Very big device. Trying to use READ CAPACITY(16). [ 183.140467] sd 0:0:0:1: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 183.140483] sd 0:0:0:1: [sdb] 4096-byte physical blocks [ 183.141135] sd 0:0:0:1: [sdb] Write Protect is off [ 183.141151] sd 0:0:0:1: [sdb] Mode Sense: 2b 00 00 00 [ 183.142195] sd 0:0:0:1: [sdb] No Caching mode page found [ 183.142211] sd 0:0:0:1: [sdb] Assuming drive cache: write through [ 183.151939] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 183.152161] sd 0:0:0:1: Attached scsi generic sg1 type 0 [ 183.308231] sdb: sdb1 [ 183.308429] sda: sda1 [ 183.310527] sd 0:0:0:1: [sdb] Attached SCSI disk [ 183.311834] sd 0:0:0:0: [sda] Attached SCSI disk [ 183.572564] BTRFS: device label URAID devid 8 transid 1254643 /dev/sdb1 [ 183.575573] BTRFS: device label URAID devid 7 transid 1254643 /dev/sda1 [ 228.067813] BTRFS info (device sda1): disk space caching is enabled [ 228.067827] BTRFS info (device sda1): has skinny extents [ 228.072861] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.081639] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.090173] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.098571] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.107030] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.115469] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.123928] BTRFS error (device sda1): failed to read chunk root [ 228.160012] BTRFS error (device sda1): open_ctree failed apt-cache show btrfs-progs | grep Version Version: 4.20.1-2 uname -a Linux omv 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux I'm going to get some large external drives, copy the data and will go back to an old-school software raid. btrfs is not portable in my eyes. When this is happening I would strongly recommend to wait another 5 years before using it anywhere. I'm pretty sure this is not an Arch Linux issue, it might be a raspbian issue.
On Tue, Feb 11, 2020 at 2:02 PM Simeon Felis <arch-general@sfelis.de> wrote:
Am 10.02.20 um 01:22 schrieb Chris Murphy:
You might just go straight to ARM, and try to mount -o ro and see if it mounts it OK. I think the error messages you got from Btrfs previously had to do with the bogus GPT error messages - which we don't know why that happened.
Unfortunately it still does not mount:
mount -o ro /dev/disk/by-label/URAID /mnt/URAID/ mount: /mnt/URAID: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error.
[ 182.039688] usb 2-2: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd [ 182.071047] usb 2-2: New USB device found, idVendor=152d, idProduct=0567, bcdDevice=52.03 [ 182.071063] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 182.071076] usb 2-2: Product: External USB 3.0 [ 182.071089] usb 2-2: Manufacturer: JMicron [ 182.071101] usb 2-2: SerialNumber: 20170331000C3 [ 182.074585] usb-storage 2-2:1.0: USB Mass Storage device detected [ 182.079212] usb-storage 2-2:1.0: Quirks match for vid 152d pid 0567: 5000000 [ 182.079424] scsi host0: usb-storage 2-2:1.0 [ 183.130129] scsi 0:0:0:0: Direct-Access External USB3.0 DISK03 5203 PQ: 0 ANSI: 6 [ 183.131024] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 183.131252] sd 0:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 183.131267] sd 0:0:0:0: [sda] 4096-byte physical blocks [ 183.131904] sd 0:0:0:0: [sda] Write Protect is off [ 183.131919] sd 0:0:0:0: [sda] Mode Sense: 2b 00 00 00 [ 183.132258] scsi 0:0:0:1: Direct-Access External USB3.0 DISK04 5203 PQ: 0 ANSI: 6 [ 183.139512] sd 0:0:0:0: [sda] No Caching mode page found [ 183.139528] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 183.140186] sd 0:0:0:1: [sdb] Very big device. Trying to use READ CAPACITY(16). [ 183.140467] sd 0:0:0:1: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 183.140483] sd 0:0:0:1: [sdb] 4096-byte physical blocks [ 183.141135] sd 0:0:0:1: [sdb] Write Protect is off [ 183.141151] sd 0:0:0:1: [sdb] Mode Sense: 2b 00 00 00 [ 183.142195] sd 0:0:0:1: [sdb] No Caching mode page found [ 183.142211] sd 0:0:0:1: [sdb] Assuming drive cache: write through [ 183.151939] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 183.152161] sd 0:0:0:1: Attached scsi generic sg1 type 0 [ 183.308231] sdb: sdb1 [ 183.308429] sda: sda1 [ 183.310527] sd 0:0:0:1: [sdb] Attached SCSI disk [ 183.311834] sd 0:0:0:0: [sda] Attached SCSI disk [ 183.572564] BTRFS: device label URAID devid 8 transid 1254643 /dev/sdb1 [ 183.575573] BTRFS: device label URAID devid 7 transid 1254643 /dev/sda1 [ 228.067813] BTRFS info (device sda1): disk space caching is enabled [ 228.067827] BTRFS info (device sda1): has skinny extents [ 228.072861] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.081639] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.090173] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.098571] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.107030] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.115469] BTRFS critical (device sda1): unable to find logical 4306137776128 length 4096 [ 228.123928] BTRFS error (device sda1): failed to read chunk root [ 228.160012] BTRFS error (device sda1): open_ctree failed
apt-cache show btrfs-progs | grep Version Version: 4.20.1-2 uname -a Linux omv 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
My suggestion is to take this upstream, it smells like a bug. And there's a good chance a bug fix can be backported since 4.19 is a long term kernel. The only way bugs ever get fixed is if they get reported to the proper upstream.
I'm going to get some large external drives, copy the data and will go back to an old-school software raid.
btrfs is not portable in my eyes. When this is happening I would strongly recommend to wait another 5 years before using it anywhere.
Good luck with that. I monitor upstream mdadm and LVM lists and bugs and regressions are a fact of life. File systems are hard. The older any file system gets, the more non-deterministic it becomes. The more you mix kernel versions, the non-determinism explodes. And it's even greater when moving it across archs. That doesn't mean the problem isn't a bug, it just increases the chance of bug exposures. And this isn't going to get more reliable unless there are reliable bug reports where people persevere through the tedious task of making the software better. The reality is, your data is intact. This is not a data loss scenario. And you have metadata and data checksumming to verify your data. Giving that up you will have to completely trust the limited error detection abilities of the drives. Any corruption there will be propagated to user space, replicating in backups, silently.
I'm pretty sure this is not an Arch Linux issue, it might be a raspbian issue.
I think it's a straight up Btrfs bug. But it should be reported upstream to find out what's going on. Off hand I don't see a relevant patch between 4.19.95 and 4.19.103. If you write up the email, put me in the cc and I can fill in some of the gaps and hopefully get the proper attention. -- Chris Murphy
Am 11.02.20 um 23:13 schrieb Chris Murphy:
I think it's a straight up Btrfs bug. But it should be reported upstream to find out what's going on. Off hand I don't see a relevant patch between 4.19.95 and 4.19.103. If you write up the email, put me in the cc and I can fill in some of the gaps and hopefully get the proper attention.
Follow-up https://lore.kernel.org/linux-btrfs/8fb8442b-dbf9-4d4b-42bb-ce460048f891@sfe... Sorry I forgot you in CC.
On Thu, Feb 6, 2020 at 2:01 PM Simeon Felis <arch-general@sfelis.de> wrote:
[ 21.417002] Btrfs loaded, crc32c=crc32c-generic [ 21.418650] BTRFS: device label URAID devid 8 transid 1254272 /dev/sdb1 [ 21.423593] BTRFS: device label URAID devid 7 transid 1254272 /dev/sda1 [ 313.823521] BTRFS info (device sda1): disk space caching is enabled [ 313.823537] BTRFS info (device sda1): has skinny extents [ 313.826597] BTRFS critical (device sda1): unable to find logical 3746684731392 length 4096
For what it's worth, Btrfs uses logical addresses internally. These are in bytes. They do not correlate to a physical location. So while 3746684731392 translates to ~3.4TiB and might suggest that this is a location near the end of the drive, it's not necessarily true - but would be consistent with something stepping on the end of the drive, wiping out both the alternate GPT and the chunk tree.
From the super you provided: compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA )
These are supported since forever, going back to 3.x kernels. Can you confirm the architecture of the workstation you did the scrub and balance on? I'm guessing it's x86_64. The problem isn't with the superblock. It's with boot strapping the chunk tree. It's not clear why. The chunk tree is what's responsible for translating a logical address into a device + sector lookup; for raid1 that's two devices and two different physical sectors. If the chunk tree cannot be read, it's not possible to find anything, including other metadata. It should be true that the partition this Btrfs is on, is exactly the same number of bytes as dev_item.total_bytes. I think there's a bug somewhere, you've got two devices experiencing the exact same complaint and problem. The GPT thing might be a distraction, I'm not really sure yet. What do you get for: btrfs insp dump-s -fa /dev/ btrfs insp dump-t -t chunk /dev These are all read only commands. -- Chris Murphy
participants (2)
-
Chris Murphy
-
Simeon Felis