Our High Level Design has now been revised in order to incorporate additional fault tolerance features.
We had some concerns with our original design that even a minor failure within a disk shelf would lead to service failing over to that store’s passive copy. While that in itself isn’t an issue – there still shouldn’t be any user effect – it seemed sensible to revise the plan to ensure that a localised hardware failure would have the smallest possible impact on the system.
So now we intend to attach three (instead of two) D2700 disk enclosures to each mailbox server. The sizing exercise had shown us that we need fewer servers than anticipated so this actually represents a cost reduction too: more fault tolerance and at a lower cost!
Each enclosure contains 25 2½” 10k rpm 300GB SAS disks, giving us a total of 75 disks available to each mailbox server. These will be divided into 22 database LUNs, each utilising one disk from each of the three enclosures, to create a RAID5 array that can tolerate failure of either a single disk or a whole enclosure. This accounts for 66 of the 75 disks.
Transaction logs will be allocated two disks per enclosure, a total of six disks across the three disk shelves, provisioned as a RAID1+0 array. The remaining three disks will be used to create a two-disk recovery LUN (on which the inevitable ‘I’ve deleted half of my email’ restore tasks can take place) and a hot-spare that can operate across all three enclosures.