I know there's lots of info available about dual boot - but not much I could find on Dual Root. What is Dual Root? This is a machine with 2 "root" disks where the second one is a hot standby - in event of root disk failure the second disk can be booted very quickly. This makes recovering very fast and thereafter replacing the bad drive and rebuilding very straightforward. And no new install needed! I have now set this up on a few machines and its working well - so I wrote up some notes and am sharing them in case others who may be interested in doing something similar might find them useful. My notes explaining how to do this are available here: https://github.com/gene-git/blog best gene
Hi Gene, out of curiosity: where do you see the advantages of such a setup compared to having your root filesystem on a RAID1? Regards, Uwe Am 04.03.23 um 18:56 schrieb Genes Lists:
I know there's lots of info available about dual boot - but not much I could find on Dual Root.
What is Dual Root?
This is a machine with 2 "root" disks where the second one is a hot standby - in event of root disk failure the second disk can be booted very quickly. This makes recovering very fast and thereafter replacing the bad drive and rebuilding very straightforward. And no new install needed!
I have now set this up on a few machines and its working well - so I wrote up some notes and am sharing them in case others who may be interested in doing something similar might find them useful.
My notes explaining how to do this are available here: https://github.com/gene-git/blog
best
gene
On 3/4/23 13:00, Uwe Sauter wrote:
Hi Gene,
out of curiosity: where do you see the advantages of such a setup compared to having your root filesystem on a RAID1? ,
Could be wrong, but I don't believe the <esp> is on RAID1 is it? Dual root is dual everything - esp, root, boot, the whole lot.
On 3/4/23 13:05, Genes Lists wrote:
Could be wrong, but I don't believe the <esp> is on RAID1 is it? Dual root is dual everything - esp, root, boot, the whole lot.
You can clearly have an esp on each raid disk, so this could work as well - recovery might be little different than what I did but seems feasible. dual root works to enhance any existing system and only requires setting up the new disk- just provide the disk and it will work. So that's one advantage it seems - and disks can be different sizes as well. gene
Am 04.03.23 um 19:05 schrieb Genes Lists:
On 3/4/23 13:00, Uwe Sauter wrote:
Hi Gene,
out of curiosity: where do you see the advantages of such a setup compared to having your root filesystem on a RAID1? ,
Could be wrong, but I don't believe the <esp> is on RAID1 is it? Dual root is dual everything - esp, root, boot, the whole lot.
The usual Linux MD-RAID can have its metadata placed on different positions in the partition (see man (8) mdadm, option "-e, --metadata"). Knowing this it is no problem to create a partition on each disk of type EF00, create a RAID1 with metadata version 1.0 (at end of partition) using those partitions and format that MD device with VFAT. Mountpoint should be /boot/efi. Thus EFI will see two VFAT partitions with the correct type but Linux will keep the content synchronized. There is at least one more thing to configure: /etc/mdadm.conf should include a line for this MD device. Best would be to reference the MD device by UUID. It might be required to also configure the kernel cmdline to include options to assemble the device. But I might confuse this with RHEL (Dracut) based distributions. I think Arch's mkinitrd will use /etc/mdadm.conf when properly configured…
On 3/4/23 13:21, Uwe Sauter wrote:
The usual Linux MD-RAID can have its metadata placed on different positions in the partition (see man (8) mdadm, option "-e, --metadata").
Knowing this it is no problem to create a partition on each disk of type EF00, create a RAID1 with metadata version 1.0 (at end of partition) using those partitions and format that MD device with VFAT. Mountpoint should be /boot/efi.
Thus EFI will see two VFAT partitions with the correct type but Linux will keep the content synchronized.
There is at least one more thing to configure: /etc/mdadm.conf should include a line for this MD device. Best would be to reference the MD device by UUID.
It might be required to also configure the kernel cmdline to include options to assemble the device. But I might confuse this with RHEL (Dracut) based distributions. I think Arch's mkinitrd will use /etc/mdadm.conf when properly configured…
This could be nicer way to go on new installs but might be a bit tedious to do on existing systems though. dual root is at least simple to add to any existing machine just by adding a little space. I just added a small 500 GB M.2 nvme drive to one server as its alternate root - was painless and simple. And system remained up aside from the few minutes to install the ssd :) gene
On 3/4/23 19:37, Genes Lists wrote:
On 3/4/23 13:21, Uwe Sauter wrote:
The usual Linux MD-RAID can have its metadata placed on different positions in the partition (see man (8) mdadm, option "-e, --metadata").
Knowing this it is no problem to create a partition on each disk of type EF00, create a RAID1 with metadata version 1.0 (at end of partition) using those partitions and format that MD device with VFAT. Mountpoint should be /boot/efi.
Thus EFI will see two VFAT partitions with the correct type but Linux will keep the content synchronized.
There is at least one more thing to configure: /etc/mdadm.conf should include a line for this MD device. Best would be to reference the MD device by UUID.
It might be required to also configure the kernel cmdline to include options to assemble the device. But I might confuse this with RHEL (Dracut) based distributions. I think Arch's mkinitrd will use /etc/mdadm.conf when properly configured…
This could be nicer way to go on new installs but might be a bit tedious to do on existing systems though.
I have this setup on all servers that do not have battery backed HW raid cards and use mdadm there. I use systemd-boot as bootloader. Works well and can be done on existing system with just a single reboot. It is not easy - you have to create degraded raid1 on new drive, rsync all data, boot from usb, rsync changes that were made during initial resync, boot from degraded raid and convert original drive to second raid1 member. You have to use efibootmgr to manually setup both boot entries, "bootctl update" will not work. In this setup there is a risk that UEFI firmware will write something to one of partitions and raid1 will degrade, but on all of my four machines I never experienced something like this. Even if this happens system should boot without problems. As for dual-root, I do not think it is safe to rsync running system. For example postfix uses inode numbers for queue files [1], so you need to use postsuper[2] to fix it after copy. All databases will sooner or later break, because they are well protected from sudden power loss, but not protected from situation when files are simultaneously copied and written by database process. Other software that uses multiple file databases (like samba) will probably break too. It is just a matter of luck. Your copy script is also missing flags for hard links, ACL's and extended attributes, you should use -axHAX --delete to create proper mirror. It is much better to use lvm, create snapshots for all mounted filesystems, mount them, copy and delete just after. Then after booting from second root you will be more-less in the same situation as after unexpected power loss. More-less because it is still impossible to create all snapshots at exact the same point in time. Regards, Łukasz [1] https://marc.info/?l=postfix-users&m=105009113626092&w=2 [2] https://man.archlinux.org/man/postsuper.1.en
On 3/4/23 18:08, Łukasz Michalski wrote:
I have this setup on all servers that do not have battery backed HW raid cards and use mdadm there. I use systemd-boot as bootloader. Works well and can be done on existing system with just a single reboot. It is not easy - you have to create degraded raid1 on new drive, rsync all data, boot from usb, rsync changes that were made during initial resync, boot from degraded raid and convert original drive to second raid1 member. You have to use efibootmgr to manually setup both boot entries, "bootctl update" will not work.
Thanks for sharing - indeed it does sound scary and brittle!! Thank you also for reminder on rsync - will update the notes. Agree very much with you on dynamic data (mail, databases, etc) and on my systems all such data is on RAID-6 separate from the root disk itself which is only used for booting. Those are bind mounted as needed, and ignored by the sync script - so shouldn't be a problem. But your cautionary comment is definitely something to keep an eye on. best, gene
On 3/4/23 18:22, Genes Lists wrote:
But your cautionary comment is definitely something to keep an eye on.
I already have these concerns noted at the bottom of the notes - since you pointed it out, It would be better for me to highlight them and move them earlier in the notes. thanks again. gene
On 3/4/23 13:21, Uwe Sauter wrote:
The usual Linux MD-RAID can have its metadata placed on different positions in the partition (see man (8) mdadm, option "-e, --metadata").
This is intriguing for sure but to be honest it has a bit of a brittle, hacky feel to it. My own preference is to have a clean duplicate esp partition on 1 or more of the raid disks and not depend on the efi not getting overwritten by sitting at the start of a type 1.0 metadata partition. Keep in mind too, that the default type is 1.2 not 1.0. I understand that that doesn't provide raid support for the esp itself, so for resilience I would use dual /efi/boot approach and leave rest of root on raid. This just feels cleaner to me. And certainly more visible to any system admin. Gene
Hello, I believe this would only be possible with a hardware raid 1, software raid 1 has to boot the Linux kernel before software RAID would be initialised. With dual root as you have pointed out, should be fully redundant as long as put the second esp behind the first in the boot priority as a fallback boot device. -- Polarian GPG signature: 0770E5312238C760 Website: https://polarian.dev JID/XMPP: polarian@polarian.dev
Hello,
My notes explaining how to do this are available here: https://github.com/gene-git/blog
If you could explain the benefits, we could write a ArchWiki page on this using the notes you have provided. I do believe though that hardware RAID 1 would be better, but if you do not have the ability to do hardware raid 1, then this does seem like a very good idea. The issue is you need to keep both disks syncronised, which I have seen you have done using a script, but RAID syncronises during writes, just like you were writing directly to the disk, not to two disks. Also another benefit of RAID 1 is because the data is contained on both disks, you theoretically double your read speeds, while the write speeds remain the same (as it is not striped). -- Polarian GPG signature: 0770E5312238C760 Website: https://polarian.dev JID/XMPP: polarian@polarian.dev
On 3/4/23 15:56, Polarian wrote: Thanks I mentioned a few benefits in my earlier replies here's a quick summary: Pros: - clear and simple. - any system admin can understand what it is (No hidden gimmicks) - can easily be added to any existing system with available disk - no requirements on disk other than adequate size - the 2 (or more) root disks do not need to be same size - Only downtime to doing this is being offline to add new disk. configuration all done on live system doing its normal job - its tested and works. Tested on a few different systems, some with RAID some without. Cons - Syncing is currently simple/imperfect At moment I use a simple script. This can definitely be improved using inotify. Inotify options: - inotify-tools package plus scripting - custom daemon (C or python both look quite doable) Thanks.
Hello, Would you like to see this reflected within the ArchWiki, I will speak to Erus about this tomorrow and ask where we could stick your guide so other arch users could use it. -- Polarian GPG signature: 0770E5312238C760 Website: https://polarian.dev JID/XMPP: polarian@polarian.dev
On 3/4/23 17:48, Polarian wrote:
Hello,
Would you like to see this reflected within the ArchWiki, I will speak
If you all think its worthwhile sure, that be great. My top todo item on this one is to work on an inotify sync tool. thanks :) gene
On 3/4/23 17:48, Polarian wrote:
Hello,
Would you like to see this reflected within the ArchWiki, I will speak to Erus about this tomorrow and ask where we could stick your guide so other arch users could use it.
If you're still interested in setting up an ArchWiki page I am comfortable now I have all the featiures coded and testing and working for me. I'd still appreciate wider testing but it is working :) thank you! gene
El sáb, 04-03-2023 a las 12:56 -0500, Genes Lists escribió:
I know there's lots of info available about dual boot [..]
For me there is an infinitely simpler solution to have a system in mirror mode (obviously assuming that you can not mount a hardware RAID1, because if not, I completely discourage any other method) with btrfs. The method is simple as you simply need two partitions on the two disks. The first one on each disk is the ESP and the second one is the one you are going to use for the btrfs raid. Then you simply mount the raid1 between both partitions btrfs[1] and the ESP partition you also use it as boot. The only thing left to do at the end is to keep the ESP partitions synchronized, but you can do that with a pacman hook to sync when you update the kernel or bootloader (to avoid a systemd timer). [1]: https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices -- Óscar García Amor | ogarcia at moire.org | http://ogarcia.me
On 3/5/23 05:13, Óscar García Amor wrote: ...
The method is simple as you simply need two partitions on the two disks. The first one on each disk is the ESP and the second one is the one you are going to use for the btrfs raid. Then you simply mount the raid1 between both partitions btrfs[1] and the ESP partition you also use it as boot.
Hi Óscar This is good approach as there are systemd drivers for btrfs. Its clean, simple and transparent. I would keep separate boot partition(s) - using XBOOTLDR - these can also be mirrored using btrfs. As you said, you now only need to sync <esp> and with kernels and initrds on separate boot, the <esp> will rarely change. Summary: - 2 x <esp> - kept in sync - boot - btrfs raid-1 (data and metadata) - root - btrfs raid-1 (data and metadata) I like it. The key to every approach with dual disk boot capability is having separate <esp> on different disks. Other than that, its a only a question of how much can be on safely mirrored disk and whats left to sync. I would definitely consider this for a fresh install or perhaps where the 2 disks can be set up that are separate from the existing boot disk. At least until they 2 are working - to minimize down time. Thanks! gene
El dom, 05-03-2023 a las 06:04 -0500, Genes Lists escribió:
[..] I would keep separate boot partition(s) - using XBOOTLDR - these can also be mirrored using btrfs. As you said, you now only need to sync <esp> and with kernels and initrds on separate boot, the <esp> will rarely change.
Personally I prefer to use the ESP partition as /boot to avoid having a separate partition and simplify the scheme. At the end you have to keep in mind that in /boot you are only going to store the kernel and the initrd and if you have them in the ESP partition you avoid having to "make calls" elsewhere and you can put directly in the loader conf this: linux /vmlinuz-linux initrd /initramfs-linux.img In fact as the boot line is exactly the same (since you have a single UUID for the btrfs raid system) you can use exactly the same loader entry: $ sudo btrfs filesystem show # To get UUID Label: 'disk1' uuid: 1b962b21-3130-498b-9543-e84c90f12fce Total devices 2 FS bytes used 3.86TiB devid 1 size 7.28TiB used 3.87TiB path /dev/sdb devid 2 size 7.28TiB used 2.56TiB path /dev/sdc And conf: title Arch Linux linux /vmlinuz-linux initrd /initramfs-linux.img options root=UUID=1b962b21-3130-498b-9543-e84c90f12fce rootfstype=btrfs rootflags=subvol=root add_efi_memmap rw
Summary: - 2 x <esp> - kept in sync - boot - btrfs raid-1 (data and metadata) - root - btrfs raid-1 (data and metadata)
I like it.
Summary: - 2xESP in sync -> You can use a pacman hook - btrfs raid-1 (data and metadata), remain partitions as subvol In fact at hook level you can put one like in the example of the manual page[1] that acts after any installation and does an rsync. As the rsync is immediate if there are no changes this ensures that you will always have the ESP partitions in sync and that you will not miss any package that could write to ESP and you will not notice it. [1]: https://archlinux.org/pacman/alpm-hooks.5.html -- Óscar García Amor | ogarcia at moire.org | http://ogarcia.me
On 3/5/23 07:11, Óscar García Amor wrote: Thanks Oscar - I edited my notes to show this as the preferred approach. Still needs more write up but I thought it best to get it up sooner than later. Do you know if it would work to use separate /boot partitions, as I mention above, (each XBOOTLDR) but raid-1 them together with btrfs? I imagine this would be fine, but have not tested to confirm. If so this would seem like a nice variation. These would also use same loader configs.
On 3/5/23 07:11, Óscar García Amor wrote:
In fact at hook level you can put one like in the example of the manual ml
Yes I agree that Hooks are useful, but they do only catch things on package updates as far as I know. If you want to catch manual changes, like an edit to a loader file, then an Inotify based daemon might be better approach. best gene
I have updated the notes which now shows the original way but also the approach suggested by Oscar (thank you) - this is a superior method but bit more painful for existing installs. This way has <esp> on each disk along with btrfs raid1 for the rest basically. I have a working example doing this and its pretty nice! Yes I had to repartition and reformat 2 disks for my tests but without actually doing it, its all just theory :) This had one little puzzling wrinkle that needed to be solved to make this viable. And it turned out to be quite tricky. The challenge is to identify which of the 2 <esp> was actually booted, so the correct one can be bind mounted on to /boot. Now that I solved that little puzzle, the rest is pretty straightforward. I have a little more coding to do and will make the code available soon. I still have some work to do on syncing the <esp> but now we know which esp is the 'other' one and which is the 'current' one - that work can proceed nicely. The notes are updated -thanks for the ideas and feedback - much appreciated (esp. Oscar) Current version now up on github https://github.com/gene-git/blog best gene
El dom, 05-03-2023 a las 18:11 -0500, Genes Lists escribió:
Do you know if it would work to use separate /boot partitions, as I mention above, (each XBOOTLDR) but raid-1 them together with btrfs? I imagine this would be fine, but have not tested to confirm.
To tell you the truth, since ESP partitions exist, I have never used another partition as boot, so I can't really contribute with my experience here. However, the theory is that it should work.
I have updated the notes which now shows the original way but also the approach suggested by Oscar (thank you) - this is a superior method but bit more painful for existing installs.
Yes, of course, because you have to repartition if you don't have the system set up. The good thing is that with btrfs you can do all the work on one disk and then just add the second one as raid1. You have one btrfs partition/disk: $ sudo btrfs filesystem show Label: none uuid: 14736aed-faa3-4f03-819e-24369e9bb34f Total devices 1 FS bytes used 384.00KiB devid 1 size 20.00GiB used 2.02GiB path /dev/sdb Add second partition/disk: $ sudo btrfs device add -f /dev/sdc /mnt/data $ sudo btrfs filesystem show Label: none uuid: 14736aed-faa3-4f03-819e-24369e9bb34f Total devices 2 FS bytes used 384.00KiB devid 1 size 20.00GiB used 2.02GiB path /dev/sdb devid 2 size 20.00GiB used 0.00B path /dev/sdc Convert into raid1: $ sudo btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/data Done, had to relocate 4 out of 4 chunks
This way has <esp> on each disk along with btrfs raid1 for the rest basically.
I have a working example doing this and its pretty nice! Yes I had to repartition and reformat 2 disks for my tests but without actually doing it, its all just theory :)
This had one little puzzling wrinkle that needed to be solved to make this viable. And it turned out to be quite tricky. The challenge is to identify which of the 2 <esp> was actually booted, so the correct one can be bind mounted on to /boot.
Now that I solved that little puzzle, the rest is pretty straightforward.
I have a little more coding to do and will make the code available soon. I still have some work to do on syncing the <esp> but now we know which esp is the 'other' one and which is the 'current' one - that work can proceed nicely.
Interesting, I'll take a look at it when you upload the code.
The notes are updated -thanks for the ideas and feedback - much appreciated (esp. Oscar)
You are welcome! -- Óscar García Amor | ogarcia at moire.org | http://ogarcia.me
On 3/6/23 02:50, Óscar García Amor wrote:
Interesting, I'll take a look at it when you upload the code.
I'd appreciate wider testing on the code - we all know that just because it works for me, doesn't mean it will work everywhere with certainty. It would be super helpful if others can test the code. A safe and simple test is just to run the dual-root-tool script with no arguments. This can be run as non-root user, It simply prints some information about the currently booted <esp>. Should work whether there is 1 <esp> or more. I plan to upload the code today after some more local testing here. I will also make an aur package which will provide both the tool and a systemd service to bind mount the currently booted esp onto /boot. Details are in the notes [1] Thanks again for sharing ideas and suggestions - it is definitely making things a lot better! best gene [1] https://github.com/gene-git/blog
On 3/4/23 12:56, Genes Lists wrote:
I know there's lots of info available about dual boot - but not much I could find on Dual Root.
What is Dual Root?
This is a machine with 2 "root" disks where the second one is a hot standby - in event of root disk failure the second disk can be booted very quickly. This makes recovering very fast and thereafter replacing the bad drive and rebuilding very straightforward. And no new install needed!
I have now set this up on a few machines and its working well - so I wrote up some notes and am sharing them in case others who may be interested in doing something similar might find them useful.
My notes explaining how to do this are available here: https://github.com/gene-git/blog
best
gene
I created a new repo for the code and provided an AUR package for it as well. https://github.com/gene-git/dual-root https://aur.archlinux.org/packages/dual-root I'd appreciate wider testing with the currently booted ESP detection Please run the script as non-root with no arguments (or -h for help) dual-root-tool It should identify the esp used to boot current system and where it is mounted and print them out. Should work regardless of number of esp partitions on the system (1 or more). There is a companion bind-mount-efi which will bind mount the current esp onto /boot if its not bound. It uses "-b" option of the tool to do this. See README for more info on setting up dual root. Some more coding to do still do but at this point I'm happy to share more widely. Thanks again to all. gene
You really can't replace incremental backups... On 3/4/23 12:56, Genes Lists wrote:
I know there's lots of info available about dual boot - but not much I could find on Dual Root.
What is Dual Root?
This is a machine with 2 "root" disks where the second one is a hot standby - in event of root disk failure the second disk can be booted very quickly. This makes recovering very fast and thereafter replacing the bad drive and rebuilding very straightforward. And no new install needed!
I have now set this up on a few machines and its working well - so I wrote up some notes and am sharing them in case others who may be interested in doing something similar might find them useful.
My notes explaining how to do this are available here: https://github.com/gene-git/blog
best
gene
El lun, 06-03-2023 a las 21:30 -0500, Jonathan Whitlock escribió:
You really can't replace incremental backups...
I have always understood having a RAID1 as an immediate disaster- recovery system in case of catastrophic disk failure. It is obvious that this will never replace backups as the purpose is radically different. The same goes for btrfs snapshots, they are a tool that allows you to instantly go back to a previous point in time, but they are not a substitute for backups either. I guess we are all aligned on this. Speaking of backups, I personally recommend restic[1] for this purpose. Greetings. [1]: https://restic.net/ -- Óscar García Amor | ogarcia at moire.org | http://ogarcia.me
On 3/7/23 03:50, Óscar García Amor wrote:
El lun, 06-03-2023 a las 21:30 -0500, Jonathan Whitlock escribió:
This is about having a computer that is resilient to root drive failure. This is in addition to doing backups, certainly not a replacement :) gene
All I gave updated the code and now provide an inotify based daemon to sync alternate <esp>s - and a systemd service unit to run it. I would very much appreciate if others ran this - by using the test option it does nothing but prints what would happen. And can be run as non-root user. It is working here on test machine set up with 2 disks, 2 esps and with btrfs raid1 for the root file system across the 2 disks. (thanks Oscar). esp's are mounted on /efi0 and /efi1 - and the currently booted esp is bind mounted on /boot. 1 daemon bind mounts /boot automatically from the currently booted ersp, and other monitors for change and updates any alternate esp(s). I consider this now feature complete :) Again, thanks for feedback and ideas. All code and docs and unit files are available via : the AUR https://aur.archlinux.org/packages/dual-root github https://github.com/gene-git/dual-root best gene
participants (6)
-
Genes Lists
-
Jonathan Whitlock
-
Polarian
-
Uwe Sauter
-
Óscar García Amor
-
Łukasz Michalski