Re: [arch-general] btrfs kernel incompatibility?

7 Feb 2020

      On Fri, Feb 7, 2020 at 1:45 PM Simeon Felis <arch-general@sfelis.de> wrote:
...
Am 06.02.20 um 23:03 schrieb Chris Murphy:
...
There's not enough information to know what's going on yet. My strong
advice is to make no further changes (no writes) to either block
device until you understand exactly what's going on. Every write
increases the chance of permanently losing the file system.
On Thu, Feb 6, 2020 at 2:01 PM Simeon Felis <arch-general@sfelis.de> wrote:
...
4.19.75 dmesg:
[   17.707873] GPT:Primary header thinks Alt. header is not at the end of the di
sk.
[   17.707889] GPT:7814037167 != 253879390758629
[   17.707895] GPT:Alternate GPT header not at the end of the disk.
[   17.707902] GPT:7814037167 != 253879390758629
[   17.707907] GPT: Use GNU Parted to correct GPT errors.
[   17.707977]  sdb: sdb1
[   17.709682] GPT:Primary header thinks Alt. header is not at the end of the disk.
[   17.709697] GPT:7814037167 != 253879390758629
[   17.709703] GPT:Alternate GPT header not at the end of the disk.
[   17.709710] GPT:7814037167 != 253879390758629
This is terrible error reporting (by the kernel) in that it's not
clearly stating whether the primary GPT is reporting 7814037167 or
253879390758629. No shit, they aren't the same. Usually this error
means the backup GPT at the end of the drive has been stepped on by
something; but LBA 253879390758629 is plainly bogus, that's ~115PiB.
What do you get for either 'fdisk -l' or 'gdisk -l' or 'parted
/dev/sda u s p' for each device?
root@omv:~# fdisk -l /dev/sda
Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: USB3.0 DISK03
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783
Device     Start        End    Sectors  Size Type
/dev/sda1   2048 7814037134 7814035087  3.7T Linux filesystem
root@omv:~# fdisk -l /dev/sdb
Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: USB3.0 DISK04
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 7814035455 7814033408  3.7T Linux filesystem
root@omv:~# gdisk -l /dev/sda
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 7814037168 sectors, 3.6 TiB
Model: USB3.0 DISK03
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 2048, last usable sector is 7814037134
Partitions will be aligned on 2048-sector boundaries
Total free space is 0 sectors (0 bytes)
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048      7814037134   3.6 TiB     8300
root@omv:~# gdisk -l /dev/sdb
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/sdb: 7814037168 sectors, 3.6 TiB
Model: USB3.0 DISK04
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 7814037134
Partitions will be aligned on 2048-sector boundaries
Total free space is 3693 sectors (1.8 MiB)
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048      7814035455   3.6 TiB     8300
OK so neither fdisk nor gdisk have any complaints about the GPT. And
yet the kernel is complaining. That's wrong and weird.

Have the drives always been in these USB enclosures, for the life of
this Btrfs file system? They've always been connected to x86 and ARM
while in these USB enclosures?

The two drives have identical sectors, 7814037168. Why don't they have
identical partition maps? The /dev/sdb drive says partition 1 starts
at 2048, and yet "first usable sector is 34" and that drive has 3693
sectors free.

/dev/sda
7814037134-2048=7814035086, 7814035086*512=4000785964032

/dev/sdb
7814035455-2048=7814033407, 7814033407*512=4000785104384

The Btrfs super provided for one of them (I can't tell which it's for)

dev_item.total_bytes    4000785104896

If that's the value for /dev/sda, it's wrong but safe. i.e. the
partition is bigger than what Btrfs says it should be. And btrfs will
live inside its own dev size constraints. However, if it's the value
for /dev/sdb, it's wrong and not safe, because the partition is
exactly 512 bytes smaller than Btrfs thinks it should be. But we need
clarification. Can you provide the super block for both /dev/sda and
/dev/sdb unambiguously with the above reported gdisk/fdisk outputs?
Note the /dev/sda /dev/sdb designations can change if the drives have
been disconnected/reconnected or the system rebooted since those
commands were issued. So it's important to make certain which
partition map goes with which super, because no matter what something
is not exactly correct and probably should be fixed.

And also as a sanity test:

sudo btrfs rescue super -v /dev/sda

It only needs to be run on one of the devices; but both need to be
present and unmounted. This is a read only command to verify all six
supers are valid.

Last, I wonder if there's some weird bug. Any chance you can update
the Pi to kernel 4.19.97-1-ARCH and see if this same kernel GPT error
messages happen? You won't need to mount the volume (and I don't
recommend trying yet anyway), just connect the devices individually
and report back what the kernel says about each drive as you connect
them. I do find it a bit hard to believe a GPT related bug could exist
in 4.19.75...but worth a shot. For what it's worth, I did an upgrade
to 4.19.97 on my Pi and it's fine.

Anyway, I wanna be extremely deliberate. It's a bit tedious. But
having monitored linux-raid@, LVM, linux-xfs@, linux-btrfs@ upstream
lists, it's extremely common to get user induced data loss by getting
panicky and doing things too fast. And also I'm intentionally verbose
so as to invite anyone reading to call b.s. - it's really easy to make
simple stupid mistakes.

-- 
Chris Murphy