[arch-general] btrfs kernel incompatibility?

Chris Murphy lists at colorremedies.com
Fri Feb 7 22:10:13 UTC 2020


On Fri, Feb 7, 2020 at 1:45 PM Simeon Felis <arch-general at sfelis.de> wrote:
>
>
>
> Am 06.02.20 um 23:03 schrieb Chris Murphy:
> > There's not enough information to know what's going on yet. My strong
> > advice is to make no further changes (no writes) to either block
> > device until you understand exactly what's going on. Every write
> > increases the chance of permanently losing the file system.
> >
> >
> > On Thu, Feb 6, 2020 at 2:01 PM Simeon Felis <arch-general at sfelis.de> wrote:
> >>
> >> 4.19.75 dmesg:
> >>
> >> [   17.707873] GPT:Primary header thinks Alt. header is not at the end of the di
> >> sk.
> >> [   17.707889] GPT:7814037167 != 253879390758629
> >> [   17.707895] GPT:Alternate GPT header not at the end of the disk.
> >> [   17.707902] GPT:7814037167 != 253879390758629
> >> [   17.707907] GPT: Use GNU Parted to correct GPT errors.
> >> [   17.707977]  sdb: sdb1
> >> [   17.709682] GPT:Primary header thinks Alt. header is not at the end of the disk.
> >> [   17.709697] GPT:7814037167 != 253879390758629
> >> [   17.709703] GPT:Alternate GPT header not at the end of the disk.
> >> [   17.709710] GPT:7814037167 != 253879390758629
> >
> > This is terrible error reporting (by the kernel) in that it's not
> > clearly stating whether the primary GPT is reporting 7814037167 or
> > 253879390758629. No shit, they aren't the same. Usually this error
> > means the backup GPT at the end of the drive has been stepped on by
> > something; but LBA 253879390758629 is plainly bogus, that's ~115PiB.
> >
> > What do you get for either 'fdisk -l' or 'gdisk -l' or 'parted
> > /dev/sda u s p' for each device?
>
> root at omv:~# fdisk -l /dev/sda
> Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
> Disk model: USB3.0 DISK03
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: gpt
> Disk identifier: 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783
>
> Device     Start        End    Sectors  Size Type
> /dev/sda1   2048 7814037134 7814035087  3.7T Linux filesystem
> root at omv:~# fdisk -l /dev/sdb
> Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
> Disk model: USB3.0 DISK04
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: gpt
> Disk identifier: 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
>
> Device     Start        End    Sectors  Size Type
> /dev/sdb1   2048 7814035455 7814033408  3.7T Linux filesystem
>
> root at omv:~# gdisk -l /dev/sda
> GPT fdisk (gdisk) version 1.0.3
>
> Partition table scan:
>   MBR: protective
>   BSD: not present
>   APM: not present
>   GPT: present
>
> Found valid GPT with protective MBR; using GPT.
> Disk /dev/sda: 7814037168 sectors, 3.6 TiB
> Model: USB3.0 DISK03
> Sector size (logical/physical): 512/4096 bytes
> Disk identifier (GUID): 63E9CA8E-F1B4-8A41-9C21-F058C1AC0783
> Partition table holds up to 128 entries
> Main partition table begins at sector 2 and ends at sector 33
> First usable sector is 2048, last usable sector is 7814037134
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 0 sectors (0 bytes)
>
> Number  Start (sector)    End (sector)  Size       Code  Name
>    1            2048      7814037134   3.6 TiB     8300
> root at omv:~# gdisk -l /dev/sdb
> GPT fdisk (gdisk) version 1.0.3
>
> Partition table scan:
>   MBR: protective
>   BSD: not present
>   APM: not present
>   GPT: present
>
> Found valid GPT with protective MBR; using GPT.
> Disk /dev/sdb: 7814037168 sectors, 3.6 TiB
> Model: USB3.0 DISK04
> Sector size (logical/physical): 512/4096 bytes
> Disk identifier (GUID): 9904ABA2-B9F8-4544-9699-9935CE8A7B1F
> Partition table holds up to 128 entries
> Main partition table begins at sector 2 and ends at sector 33
> First usable sector is 34, last usable sector is 7814037134
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 3693 sectors (1.8 MiB)
>
> Number  Start (sector)    End (sector)  Size       Code  Name
>    1            2048      7814035455   3.6 TiB     8300

OK so neither fdisk nor gdisk have any complaints about the GPT. And
yet the kernel is complaining. That's wrong and weird.

Have the drives always been in these USB enclosures, for the life of
this Btrfs file system? They've always been connected to x86 and ARM
while in these USB enclosures?

The two drives have identical sectors, 7814037168. Why don't they have
identical partition maps? The /dev/sdb drive says partition 1 starts
at 2048, and yet "first usable sector is 34" and that drive has 3693
sectors free.

/dev/sda
7814037134-2048=7814035086, 7814035086*512=4000785964032

/dev/sdb
7814035455-2048=7814033407, 7814033407*512=4000785104384


The Btrfs super provided for one of them (I can't tell which it's for)

dev_item.total_bytes    4000785104896

If that's the value for /dev/sda, it's wrong but safe. i.e. the
partition is bigger than what Btrfs says it should be. And btrfs will
live inside its own dev size constraints. However, if it's the value
for /dev/sdb, it's wrong and not safe, because the partition is
exactly 512 bytes smaller than Btrfs thinks it should be. But we need
clarification. Can you provide the super block for both /dev/sda and
/dev/sdb unambiguously with the above reported gdisk/fdisk outputs?
Note the /dev/sda /dev/sdb designations can change if the drives have
been disconnected/reconnected or the system rebooted since those
commands were issued. So it's important to make certain which
partition map goes with which super, because no matter what something
is not exactly correct and probably should be fixed.

And also as a sanity test:

sudo btrfs rescue super -v /dev/sda

It only needs to be run on one of the devices; but both need to be
present and unmounted. This is a read only command to verify all six
supers are valid.

Last, I wonder if there's some weird bug. Any chance you can update
the Pi to kernel 4.19.97-1-ARCH and see if this same kernel GPT error
messages happen? You won't need to mount the volume (and I don't
recommend trying yet anyway), just connect the devices individually
and report back what the kernel says about each drive as you connect
them. I do find it a bit hard to believe a GPT related bug could exist
in 4.19.75...but worth a shot. For what it's worth, I did an upgrade
to 4.19.97 on my Pi and it's fine.

Anyway, I wanna be extremely deliberate. It's a bit tedious. But
having monitored linux-raid@, LVM, linux-xfs@, linux-btrfs@ upstream
lists, it's extremely common to get user induced data loss by getting
panicky and doing things too fast. And also I'm intentionally verbose
so as to invite anyone reading to call b.s. - it's really easy to make
simple stupid mistakes.

-- 
Chris Murphy


More information about the arch-general mailing list