http:// www.jms1.net / grub-raid-1.shtml

Installing grub on both drives of a software RAID-1

Whenever I build a server, there is almost always some kind of RAID involved. Even for a machine without a hardware RAID controller (which is something I recommend very highly, if you can afford it) I normally set up two hard drives with identical partitions, and software RAID-1 for each pair of partitions (i.e. /dev/sda1 and /dev/sdb1 become one container, /dev/sda2 and /dev/sdb2 become one container, and so forth.)

The problem is, the grub bootloader doesn't know about the software RAID, so it only installs the bootloader on the first drive. Which is good enough, unless the first drive is the one which fails after a few years...

The solution, obviously, is to allow the installer to install grub to the first drive, and then manually install it to the second drive. I've tried this several times, and in every case, when it came time to boot from the second drive, it just plain didn't work. The first part of grub came up, but it immediately threw an error message. I've been able to manually enter the commands which are in the relevant menu entry and get it to boot, but it wouldn't do it by itself.

Today I found this message in the archives of the linux-poweredge mailing list which explains, in a rather verbose manner, how to make it work. It involves temporarily changing the device mappings.

The problem I was running into before was that, when I installed grub to the second drive, the boot code knew that it was the second drive, and tried to load the second-stage boot loader from the second drive. Which is fine, unless the original first drive had failed and the second drive is now the first and only drive...

The fix, as I said above, involves temporarily changing the device mappings within grub, so it writes the code in the boot sector to work with the first drive, which makes sense because the only time that code will execute is if it is the first drive.

This is an example of what the process looks like. This machine has two SATA hard drives, /dev/sda and /dev/sdb. The first partition on each drive (i.e. /dev/sda1 and /dev/sdb1) make up a software RAID-1 which is the machine's /boot filesystem.

First I'm going to show a "normal" grub install, where the bootloader is being installed on the real first hard drive in the system.

(root@server) # grub

    GNU GRUB version 0.97 (640K lower / 3072K upper memory)

  [ Minimal BASH-like line editing is supported. For the first word, TAB
    lists possible command completions. Anywhere else TAB lists the possible
    completions of a device/filename.]

grub> find /grub/stage1 Find the partitions which contain the stage1 boot loader file.
 (hd0,0)
 (hd1,0)

grub> root (hd0,0) Specify the partition whose filesystem containins the "/grub " directory.
grub> setup (hd0) Install the boot loader code.
grub> quit

If you've ever had to install grub by hand, you will recognize thes commands shown above. As you can see, both /dev/sda1 and /dev/sdb1 contain the grub files.

Below is an example of installing grub on the second hard drive, with boot code which will work when it happens to be the first hard drive in the system. The one extra command is highlighted with a red background.

(root@server) # grub

    GNU GRUB version 0.97 (640K lower / 3072K upper memory)

  [ Minimal BASH-like line editing is supported. For the first word, TAB
    lists possible command completions. Anywhere else TAB lists the possible
    completions of a device/filename.]

grub> find /grub/stage1
 (hd0,0)
 (hd1,0)

grub> device (hd0) /dev/sdb Tell grub to assume that "(hd0)" will be "/dev/sdb" at the time the machine boots from the image it's installing.
grub> root (hd0,0)
grub> setup (hd0)
grub> quit

The "device (hd0) /dev/sdb" command makes grub behave as if (hd0) actually referred to /dev/sda, until it quits. The subsequent "root (hd0,0)" command actually refers to /dev/sdb1, and of course "setup (hd0)" installs the boot code on /dev/sdb, as if it were "(hd0)", the first hard drive in the system.