[arch-general] dmraid disk failure - howto rebuild new disk - gparted hates Me :-(
Listmates, My Seagate drives are dropping like flies with less than 1400 hours of run time. (that's less than 58 days of service!) Latest is a 750G (ST3750330AS) that was part of the second raid set on one of my boxes running Arch. Seagate sent a new one and now I am trying to figure out how to rebuild the array. The fakeraid setup is with a nvidia bios raid chip using software dmraid. The array is composed of /dev/sdb and /dev/sdd which are both part of /dev/mapper/nvidia_fffadgic: [00:59 ecstasy:/home/david] # dmraid -r /dev/sdd: nvidia, "nvidia_fffadgic", mirror, ok, 1465149166 sectors, data@ 0 /dev/sdc: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0 /dev/sdb: nvidia, "nvidia_fffadgic", mirror, ok, 1465149166 sectors, data@ 0 /dev/sda: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0 /dev/sdd failed and has been replaced. The partition information on /dev/sdb is: [00:43 ecstasy:/home/david] # fdisk -l /dev/sdb Disk /dev/sdb: 750.1 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 7553 60669441 5 Extended /dev/sdb5 * 1 2432 19534977 83 Linux /dev/sdb6 2433 2447 120456 83 Linux /dev/sdb7 2448 7310 39062016 83 Linux /dev/sdb8 7311 7553 1951866 82 Linux swap / Solaris /dev/sdd is blank. My first thought was simply to use fdisk to create the extended partition /dev/sdd1 and then use the gparted-live CD to copy sdb[5 6 7 8] over to sdd and be done. The setup in gparted went fine, but when gparted went to format the partitions as ext3, it gave an error and would not go any farther. The error was simply a generic "operation could not be completed..." with nothing to say why. I don't know, but maybe it has something to do with both drives being combined under the device mapper heading of nvidia_fffadgic that caused it to puke. That's pretty much where I am now. My next thought is to just use dd to copy the partitions over. I have opensuse on the sda/sdc array (mapper nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and should be easy to work with. What says the brain trust? Can you think of any way I was screwing up gparted so it wouldn't even format the copy partitions? What about the dd method? Any hints or gotchas? Any help would be appreciated. Thanks. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
David C. Rankin wrote:
Listmates,
My Seagate drives are dropping like flies with less than 1400 hours of run time. (that's less than 58 days of service!) <snip>
That's pretty much where I am now. My next thought is to just use dd to copy the partitions over. I have opensuse on the sda/sdc array (mapper nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and should be easy to work with.
What says the brain trust? Can you think of any way I was screwing up gparted so it wouldn't even format the copy partitions? What about the dd method? Any hints or gotchas? Any help would be appreciated. Thanks.
Ok, I decided on: dd bs=100M conv=notrunc if=/dev/sdb of=/dev/sdd I'll let you know how it comes out ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
For the situation you are in (failing drives), dd or ddrescue or dd_rescue (I prefer dd_rescue myself) is definitely the best method for cloning failing drives- I usually just boot up into RIPLinux or System Rescue CD (nothing gets mounted)... and away you go..!! with ddrescue, if you have setup networking in your live environment, you can use sshfs and create an error log as well.. Good luck! ----- Original Message ---- From: David C. Rankin <drankinatty@suddenlinkmail.com> To: arch-general@archlinux.org Sent: Saturday, June 13, 2009 2:32:46 AM Subject: Re: [arch-general] dmraid disk failure - howto rebuild new disk - gparted hates Me :-( David C. Rankin wrote:
Listmates,
My Seagate drives are dropping like flies with less than 1400 hours of run time. (that's less than 58 days of service!) <snip>
That's pretty much where I am now. My next thought is to just use dd to copy the partitions over. I have opensuse on the sda/sdc array (mapper nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and should be easy to work with.
What says the brain trust? Can you think of any way I was screwing up gparted so it wouldn't even format the copy partitions? What about the dd method? Any hints or gotchas? Any help would be appreciated. Thanks.
Ok, I decided on: dd bs=100M conv=notrunc if=/dev/sdb of=/dev/sdd I'll let you know how it comes out ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
David C. Rankin wrote:
David C. Rankin wrote:
Listmates,
My Seagate drives are dropping like flies with less than 1400 hours of run time. (that's less than 58 days of service!) <snip> That's pretty much where I am now. My next thought is to just use dd to copy the partitions over. I have opensuse on the sda/sdc array (mapper nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and should be easy to work with.
What says the brain trust? Can you think of any way I was screwing up gparted so it wouldn't even format the copy partitions? What about the dd method? Any hints or gotchas? Any help would be appreciated. Thanks.
Ok, I decided on:
dd bs=100M conv=notrunc if=/dev/sdb of=/dev/sdd
I'll let you know how it comes out ;-)
H E L P !!! Here is the situation. the dd rebuild of the new drive worked perfectly. However, when I went to re-add the drive to the dmraid bios setup, the dmmapper reference to the drives changed from nvidia_fffadgic to nvidia_ecaejfdi. (the nvidia bios controller wouldn't allow a formatted drive to be 'added' so we had to remove the existing drive and create a new dmmapper array). What happens during boot is, immediately after the "hooks" for "dmraid" are called, the boot process dies looking for the old nvidia_fffadgic dmraid array. When this occurs you are left in the God awful repair environment where the only thing you can do is "echo *", cd and exit (I'm sure somebody could do more in this shell, but with no vi, I was dead in the water) I have hit the obvious places (fstab, /boot/grub/device.map) and made the needed changes but when I reboot, the same thing occurs. My thought is that I have to run mkinitrd or something similar update the kernel image for the new device mapper name. If that is the case, how do I do it? ... and ... from where do I do it? (please not that awful "echo *" environment., don't throw me in that brier patch!) If I must, I must, but I'll need a wiki link (I'm off to search) If anyone has dealt with this issue before and can point me in the right direction here -- please do. Right now I'm "guessing" it is the kernel image, but if there is another dm related file I need to fix, let me know. Thanks. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
David C. Rankin wrote:
David C. Rankin wrote:
David C. Rankin wrote:
Listmates,
My Seagate drives are dropping like flies with less than 1400 hours of run time. (that's less than 58 days of service!) <snip> That's pretty much where I am now. My next thought is to just use dd to copy the partitions over. I have opensuse on the sda/sdc array (mapper nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and should be easy to work with.
What says the brain trust? Can you think of any way I was screwing up gparted so it wouldn't even format the copy partitions? What about the dd method? Any hints or gotchas? Any help would be appreciated. Thanks.
Ok, I decided on:
dd bs=100M conv=notrunc if=/dev/sdb of=/dev/sdd
I'll let you know how it comes out ;-)
H E L P !!!
Here is the situation. the dd rebuild of the new drive worked perfectly. However, when I went to re-add the drive to the dmraid bios setup, the dmmapper reference to the drives changed from nvidia_fffadgic to nvidia_ecaejfdi. (the nvidia bios controller wouldn't allow a formatted drive to be 'added' so we had to remove the existing drive and create a new dmmapper array).
What happens during boot is, immediately after the "hooks" for "dmraid" are called, the boot process dies looking for the old nvidia_fffadgic dmraid array. When this occurs you are left in the God awful repair environment where the only thing you can do is "echo *", cd and exit
S O L V E D !! First off, if you are going to make something hard on yourself, go ahead and really screw it up so you can take pride and stumbling through a really long diagnoses just to find out the answer was simple all along -- you know the type, right? When I originally installed Arch on this particular box I installed Arch in a raid1 dmraid setup. However, this was the second set of dmraid arrays in the box with the first set being 2 500G drives also raid 1 and also under the dmraid convention. SuSE is spinning on the 500G set (sda/sdc) and I had installed Arch on the 750G pair spinning on (sdb/sdd) which is where the Seagate drive failed (sdd). Unlike with pure software raid "MD" Raid, dmraid doesn't have the basic functionality to allow you to simply add a new drive as a replacement and then rebuild on the fly. Some bios raid implementations allow for the rebuild at the bios configuration stage, but -- strike 3 -- the nvidia mapper on my k9n2 doesn't. Prior to the failure the device mapper for the 1st 500G array was /dev/mapper/nvidia_fdaacfde and the mapper for the 750G array with Arch was on /dev/mapper/nvidia_fffadgic. (this unfortunately would soon change to /dev/mapper/nvidia_ecaejfdi). Why?, for starters the nvidia raid controller wouldn't allow an already formatted dick to be placed in my second array under the same dev mapper title. In it's mind it would only allow a blank disk to be added and then would rely on the windows XP raid utility to to the rebuild on the fly. Current, the isn't one for linux with dmraid (However, the next release of dmraid should include the -R --Rebuild option. With gparted's refusal to copy any partition from sdb to sdd, I just resorted to the dd approach and it worked. The manual sync of the data from the good disk to the new disk was done with dd and worked fine, It just took longer than I wanted it to because for some reason gparted wouldn't let me copy the partitions from /dev/sdb to /dev/sdd which I still have yet to grasp. The true sticky wicket in this whole conundrum was dealing with the newly renamed device mapper label when the whole system was using the old. The system boot of Array1 (suse) and then passes boot control to grub on Array2 for Arch. I had updated the /boot/grub/device.map menu.lst and fstab files to accommodate the new mapper label, be every time during boot then it hit the hooks for dmraid, I would puke and complain about wanting to find the old device mapper label to be able to find /boot /root/, etc. The extent of my stupidity would soon be revealed. It seems that when I updated the mapper to reflect the new label, I only updated the device.map, menu.lst and fstab entries for the first array and just overlooked that fact that when control is passed from Array 1 to Array 2, the same device.map and menu.lst changes were needed there and that (Hello) that information doesn't get past somehow in the chain load to the second array. So after going through the mkinitcpio page at the wiki which eliminated an image issue, I just all came down to finding the guilty dogs. Grepping on fffadgic in /boot soon showed the problem files. Updating device.map and menu.lst for the second Array and life came back to normal again. So that's the missing 1/2 of the configuration I simply didn't think about at the time. Cest La Vie ... Live and learn. Hopefully, this will help some other poor sole from the same surprise if the device mapper label on his (or her ... but I haven't seen any on the list yet) second array and he is scratching his head "now I know I already updated the boot files... ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
Nice write-up.. I've never used hardware raid.. always just software raid1, but was never actually aware of all the inherent advantages software (md) raid has over the (dm) hardware raid that you spoke of. Very good information as always sir. ----- Original Message ---- From: David C. Rankin <drankinatty@suddenlinkmail.com> To: General Discusson about Arch Linux <arch-general@archlinux.org> Sent: Sunday, June 14, 2009 4:06:39 AM Subject: Re: [arch-general] dmraid disk failure - howto rebuild new disk - gparted hates Me :-( David C. Rankin wrote:
David C. Rankin wrote:
David C. Rankin wrote:
Listmates,
My Seagate drives are dropping like flies with less than 1400 hours of run time. (that's less than 58 days of service!) <snip> That's pretty much where I am now. My next thought is to just use dd to copy the partitions over. I have opensuse on the sda/sdc array (mapper nvidia_fdaacfde), so the drives I am working with are not mounted anywhere and should be easy to work with.
What says the brain trust? Can you think of any way I was screwing up gparted so it wouldn't even format the copy partitions? What about the dd method? Any hints or gotchas? Any help would be appreciated. Thanks.
Ok, I decided on:
dd bs=100M conv=notrunc if=/dev/sdb of=/dev/sdd
I'll let you know how it comes out ;-)
H E L P !!!
Here is the situation. the dd rebuild of the new drive worked perfectly. However, when I went to re-add the drive to the dmraid bios setup, the dmmapper reference to the drives changed from nvidia_fffadgic to nvidia_ecaejfdi. (the nvidia bios controller wouldn't allow a formatted drive to be 'added' so we had to remove the existing drive and create a new dmmapper array).
What happens during boot is, immediately after the "hooks" for "dmraid" are called, the boot process dies looking for the old nvidia_fffadgic dmraid array. When this occurs you are left in the God awful repair environment where the only thing you can do is "echo *", cd and exit
S O L V E D !! First off, if you are going to make something hard on yourself, go ahead and really screw it up so you can take pride and stumbling through a really long diagnoses just to find out the answer was simple all along -- you know the type, right? When I originally installed Arch on this particular box I installed Arch in a raid1 dmraid setup. However, this was the second set of dmraid arrays in the box with the first set being 2 500G drives also raid 1 and also under the dmraid convention. SuSE is spinning on the 500G set (sda/sdc) and I had installed Arch on the 750G pair spinning on (sdb/sdd) which is where the Seagate drive failed (sdd). Unlike with pure software raid "MD" Raid, dmraid doesn't have the basic functionality to allow you to simply add a new drive as a replacement and then rebuild on the fly. Some bios raid implementations allow for the rebuild at the bios configuration stage, but -- strike 3 -- the nvidia mapper on my k9n2 doesn't. Prior to the failure the device mapper for the 1st 500G array was /dev/mapper/nvidia_fdaacfde and the mapper for the 750G array with Arch was on /dev/mapper/nvidia_fffadgic. (this unfortunately would soon change to /dev/mapper/nvidia_ecaejfdi). Why?, for starters the nvidia raid controller wouldn't allow an already formatted dick to be placed in my second array under the same dev mapper title. In it's mind it would only allow a blank disk to be added and then would rely on the windows XP raid utility to to the rebuild on the fly. Current, the isn't one for linux with dmraid (However, the next release of dmraid should include the -R --Rebuild option. With gparted's refusal to copy any partition from sdb to sdd, I just resorted to the dd approach and it worked. The manual sync of the data from the good disk to the new disk was done with dd and worked fine, It just took longer than I wanted it to because for some reason gparted wouldn't let me copy the partitions from /dev/sdb to /dev/sdd which I still have yet to grasp. The true sticky wicket in this whole conundrum was dealing with the newly renamed device mapper label when the whole system was using the old. The system boot of Array1 (suse) and then passes boot control to grub on Array2 for Arch. I had updated the /boot/grub/device.map menu.lst and fstab files to accommodate the new mapper label, be every time during boot then it hit the hooks for dmraid, I would puke and complain about wanting to find the old device mapper label to be able to find /boot /root/, etc. The extent of my stupidity would soon be revealed. It seems that when I updated the mapper to reflect the new label, I only updated the device.map, menu.lst and fstab entries for the first array and just overlooked that fact that when control is passed from Array 1 to Array 2, the same device.map and menu.lst changes were needed there and that (Hello) that information doesn't get past somehow in the chain load to the second array. So after going through the mkinitcpio page at the wiki which eliminated an image issue, I just all came down to finding the guilty dogs. Grepping on fffadgic in /boot soon showed the problem files. Updating device.map and menu.lst for the second Array and life came back to normal again. So that's the missing 1/2 of the configuration I simply didn't think about at the time. Cest La Vie ... Live and learn. Hopefully, this will help some other poor sole from the same surprise if the device mapper label on his (or her ... but I haven't seen any on the list yet) second array and he is scratching his head "now I know I already updated the boot files... ;-) -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
Jonathan Brown wrote:
Nice write-up.. I've never used hardware raid.. always just software raid1, but was never actually aware of all the inherent advantages software (md) raid has over the (dm) hardware raid that you spoke of. Very good information as always sir.
Thanks -- aside from all my weary typos in it ;-) dmraid isn't hardware raid. It, like md raid, is software raid. There are not really any disadvantages to using it over md raid. I use both. If I would have looked further into my raid bios's ability to rebuild before I set this box up, I would have used mdraid instead. With this drive failure, the only issue was tracking down what needed to be done to accommodate the device mapper label change. The fix itself was a 2 minute fix and like I said, with some hardware, you are given the opportunity to rebuild from the good disk before you even boot the machine which is an advantage a pure mdraid setup doesn't have. (I know my Gigabyte and Tyan boards will let you rebuild and I think my older MSI boards will do it as well) Both dmraid and mdraid are fantastic raid solutions. You hear people bad-mouth software raid all the time, and the criticisms are unfounded. The read/write performance penalty is virtually "0" on any machine faster than a 486 and the benefit from a raid setup is definitely worth the effort. Sure, if you want to drop $250 - $500 on a hardware controller, there is nothing wrong with that, but if you just want the protections offered by a mirrored raid setup and the box you are setting up is serving less than a few dozen workstation clients, software raid is fine. Now any raid setup isn't a substitute for backups (including an off-site copy), but it does save a whole lot of time when a disk goes bad. In my case S.M.A.R.T. alerted to the drive failing and I was able to replace the disk before any data was lost. The dm label change was just another 'learning experience' that now will not present any hassle should another disk fail in the future. The real learning here was how "poor" the raid bios features have gotten on MSI boards. Even on what was one of the high-end boards. Oh well, back up and rocking with that box. Spinning 2 500G drives in Raid1 on the first array and spinning 2 750G drives in Raid1 in the second. The amazing part is that in todays world we are able to pack 1.25 terabytes of Raid1 storage in a PC for less than $250 in drive costs. Ten years ago, that would have cost a small fortune. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
Thanks for the explanation/clarification. Keep up the good work! ----- Original Message ---- From: David C. Rankin <drankinatty@suddenlinkmail.com> To: General Discusson about Arch Linux <arch-general@archlinux.org> Sent: Monday, June 15, 2009 12:48:47 AM Subject: Re: [arch-general] dmraid disk failure - howto rebuild new disk - gparted hates Me :-( Jonathan Brown wrote:
Nice write-up.. I've never used hardware raid.. always just software raid1, but was never actually aware of all the inherent advantages software (md) raid has over the (dm) hardware raid that you spoke of. Very good information as always sir.
Thanks -- aside from all my weary typos in it ;-) dmraid isn't hardware raid. It, like md raid, is software raid. There are not really any disadvantages to using it over md raid. I use both. If I would have looked further into my raid bios's ability to rebuild before I set this box up, I would have used mdraid instead. With this drive failure, the only issue was tracking down what needed to be done to accommodate the device mapper label change. The fix itself was a 2 minute fix and like I said, with some hardware, you are given the opportunity to rebuild from the good disk before you even boot the machine which is an advantage a pure mdraid setup doesn't have. (I know my Gigabyte and Tyan boards will let you rebuild and I think my older MSI boards will do it as well) Both dmraid and mdraid are fantastic raid solutions. You hear people bad-mouth software raid all the time, and the criticisms are unfounded. The read/write performance penalty is virtually "0" on any machine faster than a 486 and the benefit from a raid setup is definitely worth the effort. Sure, if you want to drop $250 - $500 on a hardware controller, there is nothing wrong with that, but if you just want the protections offered by a mirrored raid setup and the box you are setting up is serving less than a few dozen workstation clients, software raid is fine. Now any raid setup isn't a substitute for backups (including an off-site copy), but it does save a whole lot of time when a disk goes bad. In my case S.M.A.R.T. alerted to the drive failing and I was able to replace the disk before any data was lost. The dm label change was just another 'learning experience' that now will not present any hassle should another disk fail in the future. The real learning here was how "poor" the raid bios features have gotten on MSI boards. Even on what was one of the high-end boards. Oh well, back up and rocking with that box. Spinning 2 500G drives in Raid1 on the first array and spinning 2 750G drives in Raid1 in the second. The amazing part is that in todays world we are able to pack 1.25 terabytes of Raid1 storage in a PC for less than $250 in drive costs. Ten years ago, that would have cost a small fortune. -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com
participants (2)
-
David C. Rankin
-
Jonathan Brown