I’ve never got on well with software RAID systems on FreeBSD. I’ve tried gvinum (previously I used vinum), gmirror, and ataraid, all with varying degrees of success. The latest machine I built is using gmirror, and so far I’m happy.
However, over the past few days I’ve been having problems with a system I built a couple of years ago. It originally used vinum on FreeBSD 5.2.1, but I recently upgraded it to 5.5 and switched to gvinum. A week or so ago I noticed that the second disk in the mirror was marked stale – I guessed it was an artifact of the upgrade to 5.5. So on Tuesday I decided to resync it.
It went fine to start with, until syncing one partition produced a disk read error. This marked the whole original disk as bad, and I’d only half synced to the second disk. Thinking back I knew this disk had an error on, and I’d fully intended to replace it. Shame I didn’t do it at the time. Next I rebooted the machine to recover the disk from dead to stale, so I could force it back online. This is where the problems started.
GEOM_VINUM: subdisk swap.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk root.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk var.p1.s0 state change: down -> stale
GEOM_VINUM: subdisk usr.p1.s0 state change: down -> stale
That’s what welcomed me during bootup. Not too bad I hear you say? Well, that’s all I saw after that – it didn’t boot any further. I tried various things such as unloading the geom_vinum module, booting single user, booting the other disk, pulling one disk, but nothing worked.
In desperation I booted an older kernel. It worked! Well, when I say worked, I mean it booted past this point and asked me for a root partition – but at least I could work with that. It wasn’t immediately obvious why it had worked; my theory is that it wasn’t the fact it was an older kernel, but that it was a different kernel version to the modules on the disk, making it refuse to load the geom_vinum module.
So after getting things running again I decided to update to 6.1. I figured help would be more limited when running 5.5, and I could see changes had gone in to gvinum in 6.1. After a few hours this was done, but the result was the same; I booted to single user, typed “gvinum start”, and got the same message. Oddly this time the machine wasn’t entirely dead – I could still reboot it. But maybe this was because I’d launched it manually.
Regardless of the cause of the problem I’m now stuck. I’ve got everything running off one disk fine, but I can’t get the RAID going. The only possibility I can see is redoing the RAID configuration, but to do this I’ll need to blast the existing config off the disks, and I’m nervous about that.
The other option I’m considering is replacing the machine and starting again (it’s getting old now anyway). Maybe this time I’ll go for a hardware RAID solution, though 🙂
I have to be honest. Were I to need a server machine, I would probably run Solaris on it, if only because I know that Disksuite (or whatever its called now) works and works well. Other things in Solaris my be a pain in the bum, but at least it does RAID very well indeed.