[arch-releng] btrfs support in aif

Thu Dec 16 23:03:39 EST 2010

On Tue, Dec 14, 2010 at 3:18 AM, Dieter Plaetinck <dieter at plaetinck.be> wrote:
> On Mon, 13 Dec 2010 19:02:17 -0600
> C Anthony Risinger <anthony at extof.me> wrote:
>> On Sun, Dec 12, 2010 at 9:29 AM, Dieter Plaetinck
>> <dieter at plaetinck.be> wrote:
>>
>> /subvol
>>
>> ... works, but not ...
>>
>> /nested/subvol
>
> is this the only limitation with named subvols?
> i'm probably missing something, but since the real btrfs is basically
> used as a container which will contain subvolumes which are the
> ones you will mount in your filesystem (in arbitrary places), why does
> it matter where in the btrfs tree the subvolumes are defined? what's
> wrong with putting subvolumes in the btrfs root?

yeah it doesn't really matter i guess, i just don't like the restriction :-)

there was a small patch to allow mount option like
"subvol=path/to/subvol", but it never made it in; no one is interested
enough i guess... it's easy to use the id, and then i can make a
structure that makes sense to me, instead of a huge list of
subvols_that_are_namespaced_like_this.

i do think the naming can be useful, for the same reason i used
__active in the current hook... change it to a new subvol and
everything still works.  but now, i just move symlinks, and it
accomplishes the same, but more flexible.

> it's too bad separate subvolumes don't get their own devicefiles (and
> hence, associated /dev/by-uuid/ and /dev/by-label/ symlinks) that
> would make my life easier.

indeed.  i don't know what's on the map for this... the UUID seems to
reference the whole FS; when you run blkid against it, it does has
SUB_UUID entries IIRC.  i don't know how it all connects.

> are the id's "stable"? i.e, suppose i initialise variable
> last_id=0
> for every subvolume the user wants to create, can i just do
> id=last_id+1 and assume that id will stay the same for this volume?
> (at least for the duration of the installation, i.e. not worrying about
> snapshot rollbacks etc) (and apparently, skip id 5 because that's
> already taken)

that *should* work, but only from my experiences.  0 is a special
case, i think it gets remapped to 5 internally... the btrfs root is
"special 5" for whatever reason, i know that from a message Chris sent
on the list.  subvols however, start at 256, then N+1 from there.
when you remove a subvol, i've noticed that the id can sometimes be
near immediately recycled (next snapshot has that id), but i don't
know the consistency of that.

>> the hook i'm soon to release doesn't support names; it's just too
>> inflexible.  btw, for clarity to anyone else, the default subvol is
>> not the same as the btrfs root (though initially they are the same).
>> default subvol is any subvol marked as the _mount_ default (and later
>> mountable via `subvol=.` or none at all)... the real root will always
>> be subvolid = 0 or 5.
>
> subvolid 5 ?

yeah ^^^^^, 0 -> 5 remap, but i'd have to check the code to be sure...
i know they both work, and they both mount the btrfs root.

>> the hook (and other impls i'd assume) use the btrfs root for volume
>> management, the "sub-root".  the actual "system root" is just one of
>> many subvols in the pool, and may change between reboots.  at the very
>> least, if AIF created a subvol, marked as default, and installed into
>> that subvol, my hook could then safely "rotate" the user into a more
>> advanced configuration...
>
> should i give it a specific name? or just a subvol marked as default?
> what kind of advanced config do you mean? any stuff that makes more
> sense to be done during the installation step? or does it become too
> specific to your hook?

we could choose a name, but it wouldn't be that critical really;
marking it default would be enough.  when the user decides to enable
my hook they will be in a running system... all i have to do is mount
the btrfs root, and snapshot / wherever i need, i don't even need to
know/care the name/id of /.

by advanced config i mean the full layout i described previously; if
the hook's not enabled, you don't really need such an extensive
structure... so i just mean moving stuff around in the subroot...
which is possible since the system is in a subvol to begin with :-)

although, i don't think it really hurts to do any/all of it either.  i
mean, ATM this hook is archlinux specific, and i hope to get some
officialness to it in time.  everything in the subroot is irrelevant
except to power users anyway... if you built the full layout in AIF
right away, it would give the user a good foundation to manage
snapshots, even if they didn't use the hook.  also, once its released
i intend to suggest it on the btrfs list, both for review and also to
propose as a cross distro layout to handle these use cases.

if you don't wan't to do the full layout, something bare bones like
this would be good too:

--------------------------------------------------------
(btrfs root)
|
|-- HEAD -> refs/rw/PRI
|-- refs
|   `-- rw
|       `-- PRI -> ../../vols/256
`-- vols
   `-- 256
       `-- fs (AIF INSTALLS HERE, AND SETS AS DEFAULT)
--------------------------------------------------------

if you did that, then i could just fill it out.  note, since we are
installing into a subvol, /boot would of course need to be a separate
partition/device.  AIF might want to optionally create the ORIG
snapshot too, giving the user a clean install image for whatever they
want.

>> but really, under no cases do i think the system should be installed
>> into the btrfs root, i wouldn't even offer it at install time.  if use
>> wants that they can do it themselves... they will be happy it's in a
>> subvol.
>
> okay, fair enough. i will make it so that you can't choose a mointpoint
> for the actual btrfs, only for subvolumes.  if the btrfs guys ever make
> things more flexible, it's fairly trivial for us to adapt aif/your hook.

yeah, and i hope they do.  i don't think it's a problem if the user
want's to mount the btrfs root (that's what "/var/lib/btrfsadm" is in
my structure) because they need access to that in order to maintain
their snapshots.  tbh, it might be worth encouraging this (or
explaining in AIF); i'm really trying to group everyone toward a
single, good way of doing things.

>> yup, i thinks that's everything for now!  ssd should enable
>> automatically when btrfs detects non rotating media.  and ssd_spread
>> is for cheaper flash i believe... i forget what the reason was.
>> compress we should be sure to note the CPU overhead of zlib (though
>> LZO patches will be in next kernel i believe, exciting), though for
>> many systems it may not matter.
>
> okay, but as per your advice in the previous paragraph, users won't be
> able to select a mointpoint for the btrfs itself, only the subvolumes.
> (or maybe I'll just "discourage" them with a warning message)

yeah i'm not too strong either direction; maybe not a discourage, just
an option with a _very_ clear message as to what it actually is, why
you might need it, and what it's purpose is (ie. subvol management,
not data)

> o_O O_o

heh, i know;  wanted to get it on paper so i can more-or-less copy
paste to the forum thread/etc. :-)

> so, to paraphrase: in your hook, you build this kind of tree structure
> based on the btrfs devices you find (/pool) and subvolumes (/vols), and
> create some symlinks to organize everything (/refs, /HEAD); the idea
> being that this will make things more simple during the hook processing.

pretty much, but there are a couple specific benefits to the structure:

) to cleanup, all i have to do is compare a sorted list of pointers in
/refs to a sorted list of the /vols directory.  anything extra in
/vols is dangling (like git), and can safely be removed.  anything
extra in /refs is broken, and should maybe also be removed.
) extending on ^^^^, the structure works very well for temporary
subvols (rollback mode/etc.) too.  i simply create a snapshot of my
rollback target (the snapshot user chose as a rollback base, keeping
the original clean) in the same way as normal snapshots (/vols),
except i DON'T add an entry in /refs... i point HEAD directly to it.
this is identical to a "detached head" in git.  while system is
running, if user decides to keep the snapshot, i simply make a symlink
entry in /refs for it, and update PRI/HEAD if necessary.  if user
doesn't want it, i don't do anything -- normal cleanup at boot time
will see it dangling, and remove it automatically.  very clean.

i know there was one more, but i write to damn much already :-)

> is this structure in memory only during execution of your hook, or does
> it all get written to disk (the btrfs root?) so that the real booted
> system will see it also?

no, it's on disk and persistent; this is critical.  however, the
booted system _does not_ see it, because it's "underneath" it.  the
fully booted system can only see whatever is in the "fs"
directory/subvol, because that's where it's / starts.

> Well, the /var/lib/btrfsadm tree you described seems fairly
> non-standard,

yeah, that is sort of specific to my tool, and /var/lib seemed like an
appropriate place.  but like i said before, i kind of like the idea of
mounting the btrfs root... i want users to know what is going on and
why; i'm certainly open to suggestions for a more uniform location.

> but you seem to know what you're doing.
> If I understood correctly I don't need to worry about
> the /var/lib/btrfsadm tree, right?

not too much, /var/lib/btrfsadm is just an arbitrary mount point i
chose... but note: it's actually the btrfs root (subvol 0), my tool
will use it to manage subvols from within a running system.  like i
said, if you have some ideas for a more appropriate place to mount the
btrfs root, i'm all for it, and will have the hook look there by
default.

> So you can do your thing and I'll do mine, making sure to strongly
> recommend users to put all btrfs mountpoints in separate subvolumes.

yup, let's just make it very clear that _subvolumes_ are for data/etc.
and the btrfs root is for management of those subvolumes.

all this weekend and next week my fiance and son are going to the
grandparents early for holidays... so i plan on h at ckz0r1ng nonstop and
only sleeping when my feeble human body demands it...  this hook is
the first thing on my agenda, and the fun starts saturday afternoon
:-)

C Anthony