On Fri, Feb 7, 2020 at 1:45 PM Simeon Felis <arch-general@sfelis.de> wrote:
Am 06.02.20 um 23:03 schrieb Chris Murphy:
There's not enough information to know what's going on yet. My strong advice is to make no further changes (no writes) to either block device until you understand exactly what's going on. Every write increases the chance of permanently losing the file system.
On Thu, Feb 6, 2020 at 2:01 PM Simeon Felis <arch-general@sfelis.de> wrote:
4.19.75 dmesg:
[ 17.707873] GPT:Primary header thinks Alt. header is not at the end of the di sk. [ 17.707889] GPT:7814037167 != 253879390758629 [ 17.707895] GPT:Alternate GPT header not at the end of the disk. [ 17.707902] GPT:7814037167 != 253879390758629 [ 17.707907] GPT: Use GNU Parted to correct GPT errors. [ 17.707977] sdb: sdb1 [ 17.709682] GPT:Primary header thinks Alt. header is not at the end of the disk. [ 17.709697] GPT:7814037167 != 253879390758629 [ 17.709703] GPT:Alternate GPT header not at the end of the disk. [ 17.709710] GPT:7814037167 != 253879390758629
This is terrible error reporting (by the kernel) in that it's not clearly stating whether the primary GPT is reporting 7814037167 or 253879390758629. No shit, they aren't the same. Usually this error means the backup GPT at the end of the drive has been stepped on by something; but LBA 253879390758629 is plainly bogus, that's ~115PiB.
What do you get for either 'fdisk -l' or 'gdisk -l' or 'parted /dev/sda u s p' for each device?
root@omv:~# fdisk -l /dev/sda Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK03 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783
Device Start End Sectors Size Type /dev/sda1 2048 7814037134 7814035087 3.7T Linux filesystem root@omv:~# fdisk -l /dev/sdb Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: USB3.0 DISK04 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
Device Start End Sectors Size Type /dev/sdb1 2048 7814035455 7814033408 3.7T Linux filesystem
root@omv:~# gdisk -l /dev/sda GPT fdisk (gdisk) version 1.0.3
Partition table scan: MBR: protective BSD: not present APM: not present GPT: present
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 7814037168 sectors, 3.6 TiB Model: USB3.0 DISK03 Sector size (logical/physical): 512/4096 bytes Disk identifier (GUID): 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 2048, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 0 sectors (0 bytes)
Number Start (sector) End (sector) Size Code Name 1 2048 7814037134 3.6 TiB 8300 root@omv:~# gdisk -l /dev/sdb GPT fdisk (gdisk) version 1.0.3
Partition table scan: MBR: protective BSD: not present APM: not present GPT: present
Found valid GPT with protective MBR; using GPT. Disk /dev/sdb: 7814037168 sectors, 3.6 TiB Model: USB3.0 DISK04 Sector size (logical/physical): 512/4096 bytes Disk identifier (GUID): 9904ABA2-B9F8-4544-9699-9935CE8A7B1F Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 7814037134 Partitions will be aligned on 2048-sector boundaries Total free space is 3693 sectors (1.8 MiB)
Number Start (sector) End (sector) Size Code Name 1 2048 7814035455 3.6 TiB 8300
OK so neither fdisk nor gdisk have any complaints about the GPT. And yet the kernel is complaining. That's wrong and weird. Have the drives always been in these USB enclosures, for the life of this Btrfs file system? They've always been connected to x86 and ARM while in these USB enclosures? The two drives have identical sectors, 7814037168. Why don't they have identical partition maps? The /dev/sdb drive says partition 1 starts at 2048, and yet "first usable sector is 34" and that drive has 3693 sectors free. /dev/sda 7814037134-2048=7814035086, 7814035086*512=4000785964032 /dev/sdb 7814035455-2048=7814033407, 7814033407*512=4000785104384 The Btrfs super provided for one of them (I can't tell which it's for) dev_item.total_bytes 4000785104896 If that's the value for /dev/sda, it's wrong but safe. i.e. the partition is bigger than what Btrfs says it should be. And btrfs will live inside its own dev size constraints. However, if it's the value for /dev/sdb, it's wrong and not safe, because the partition is exactly 512 bytes smaller than Btrfs thinks it should be. But we need clarification. Can you provide the super block for both /dev/sda and /dev/sdb unambiguously with the above reported gdisk/fdisk outputs? Note the /dev/sda /dev/sdb designations can change if the drives have been disconnected/reconnected or the system rebooted since those commands were issued. So it's important to make certain which partition map goes with which super, because no matter what something is not exactly correct and probably should be fixed. And also as a sanity test: sudo btrfs rescue super -v /dev/sda It only needs to be run on one of the devices; but both need to be present and unmounted. This is a read only command to verify all six supers are valid. Last, I wonder if there's some weird bug. Any chance you can update the Pi to kernel 4.19.97-1-ARCH and see if this same kernel GPT error messages happen? You won't need to mount the volume (and I don't recommend trying yet anyway), just connect the devices individually and report back what the kernel says about each drive as you connect them. I do find it a bit hard to believe a GPT related bug could exist in 4.19.75...but worth a shot. For what it's worth, I did an upgrade to 4.19.97 on my Pi and it's fine. Anyway, I wanna be extremely deliberate. It's a bit tedious. But having monitored linux-raid@, LVM, linux-xfs@, linux-btrfs@ upstream lists, it's extremely common to get user induced data loss by getting panicky and doing things too fast. And also I'm intentionally verbose so as to invite anyone reading to call b.s. - it's really easy to make simple stupid mistakes. -- Chris Murphy