[arch-general] dmraid Partitions Lost - Recovered -> Howto

David C. Rankin drankinatty at suddenlinkmail.com
Thu Jun 25 23:49:31 EDT 2009


List,

	I thought I would pass this along should anyone else experience a loss of all partitions on a drive or array. May help somebody out someday:

dmraid Partition Loss with dmraid-1.0.0rc15

	Testing dmraid-1.0.0rc15 on a box with two separate dmraid arrays, I experienced the total loss of all partitions on the second dmraid array. The first array held an openSuSE install running dmraid-1.0.0rc14 while the second held Archlinux with dmraid-1.0.0rc15 where testing was being done. All testing of dmraid-1.0.0rc15 on Archlinux went fine, the problem occurred when the machine was boot back into openSuSE. Regardless of the situation, whether using a raid setup or not, partition loss is serious business.

dmraid Partition Recovery

	Recovery of dmraid partitions proceed in the same manner as recovering partitions from a singe drive. if you haven't destroyed the information on the array, you should be able to put the pieces of the puzzle back together again. The basic outline for the process is to locate and restore the partitions on the array and then reinstall the boot loader so your box is functional again. (Note: if you were smart enough to save the "fdisk -l" information for your drives, you can simply fdisk your array and be done)

Tools Required

	Partition location and recovery software (I used testdisk)
		http://www.cgsecurity.org/
		http://www.cgsecurity.org/wiki/TestDisk_Download
		http://www.cgsecurity.org/testdisk-6.11.linux26.tar.bz2
		
	Rescue CD for your OS (generally your install CD/DVD, or knoppix, etc.)

Using testdisk

	testdisk is a great piece of GPL code written by Christophe Grenier. testdisk can be used with most operating systems and will scan you disk or array and locate partition boundaries and give you the opportunity to recover them. I had 4 partitions dedicated to my Archlinux install totaling roughly 70G on a 750G raid array. To start testdisk, for Linux26, you will untar the bzip archive and then cd into the linux subdirectory. The prebuilt binary is:
	
	./testdisk_static
	
	The first thing you will need to do is set the correct disk geometry. In my case the disk reported 254 heads and needed to be changed to 255 heads to work properly. (This is recommended if the first Quick Scan doesn't find your partitions).
	
	After setting the geometry, just choose "Analyze" and "Quick Scan" and go get a coffee or something. In my case since the 70G I was using was at the front of the 750G array, it had found my partitions within 5 minutes or so. Once all of your partitions are found you can "Stop" the scan by hitting the return key.
	
	You are then presented with the list of found partitions. They will be initially labeled "D" for deleted and you simply toggle on the partitions you need to recover by selecting ("P" Primary, "*" Primary Boot, "L" Logical or leave as "D" for Deleted). testdisk will check your selections for partition overlap and give you confirmation in green if your partition layout is OK. Just hit return to continue. Don't worry about the extended partition boundary, it will be provided. Review the partitions to be recovered and choose "Write" and your are done. (a reboot is required to activate the partitions)
	
	If no partitions were found during the "Quick Scan", then (1) check your drive geometry setting; and (2) you will be given the option to do an "In Depth Scan" (go get 4 cups of coffee, walk the dog, etc...)

Have Your Rescue CD Handy

	Once the partition information has been changed, there is a near 100% chance your boot loader configuration will be messed up. Don't worry, everything is still there, you just have to reinstall grub or lilo into the boot record to recover from the situation.

Reinstalling Grub

	Here you will be booting from your CD or DVD into rescue mode, using dmraid to activate the arrays, and then using the information about the dm nodes in /dev/mapper and the partition information in from "cat /proc/partitions" to create a chroot of your install to repair the boot loader:

(1) boot from the install DVD

(2) choose "Rescue System", login as "root" (no password needed)

(3) activate the dmraid arrays with "dmraid -ay"

(4) check which device nodes to use to create the chroot with "ls -al /dev/dm*" or "ls -al /dev/mapper". I was dealing with 2 separate arrays, 9 partitions (duplicated by having both dmraid-1.0.0rc14 and dmraid-1.0.0rc15 metadata) that left me with dm-0 to dm-20 to deal with. Compare the size shown for dm-X, /dev/mapper/raiddevice_name and the size shown from "cat /proc/partitions" to determine your "/", "/home", and "/boot" and any other partitions you need to setup in your chroot.

(5) mount all dm-X devices or /dev/mapper devices under /mnt to create your actual filesystem, and then bind dev/, proc/ and sys/ to their respective mount points under /mnt and chroot.

    **Note, you need to mount the device containing the / (root) filesystem first before mounting /boot and /home. Otherwise, the /boot and /home mount points will not exist:

	Example:

	mount /dev/dm-5 /mnt
	mount /dev/dm-7 /mnt/boot
	mount /dev/dm-6 /mnt/home
	mount -o bind /dev /mnt/dev
	mount -o bind /proc /mnt/proc
	mount -o bind /sys /mnt/sys
	cd /mnt
	chroot /mnt

(6) Reinstall grub to fix the mbr on your raid discs (mine were hd0 and hd1). See http://wiki.archlinux.org/index.php/Installing_with_Fake-RAID#Install_GRUB for my notes on getting the (hdX,Y) numbers right. When you start grub, you get a small ">" prompt, just use the following as a guide. If you only have a single array, you will only need to worry about setting up hd0:

	grub
	>root (hd0,4)
	>setup (hd0)
	>*** few lines of grub output ***
	>root (hd1,5)
	>setup (hd1)
	>*** more lines of grub output ***
	>quit

(7) check your /etc/grub.conf to make sure it agrees with the way you have just configured grub. For the example above, it should look like this for hd0 (I boot to hd0 and then chainload to get to hd1 and the second array)

	setup --stage2=/boot/grub/stage2 (hd0) (hd0,4)
	quit

(8) exit (to exit chroot) and reboot, and if you were successful (or just damn lucky), your system will be 100% again. Now immediately do "fdisk -l" on each of your arrays and drives and save that information remotely so if this happens again, you have a shortcut;-)


-- 
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com


More information about the arch-general mailing list