Microsoft Windows Multi-Site Failover Cluster Best Practices

Windows Server 2012 Multi-Site Failover Cluster is one of High Availability and disaster recovery solutions, although Windows Server 2012 Multi-Site Failover Cluster installation is very straightforward similar to single site Failover Cluster, however there is no consolidated public documentation that describe Multi-Site Failover Cluster in details showing best practices or implementation recommendations, so in this post I will cover some of Microsoft Windows Multi-Site Failover Cluster basics in addition to design/implementation best practices came from my practical experience.

Multi-Site Failover Cluster Basics:

Microsoft Multi-Site Failover Cluster basically is a group of Cluster Nodes distribution through multiple sites, each site Cluster Nodes are connected to local SAN Storage in the same site, while replication between two SAN Storage from each site is handled using SAN-to-SAN replication technology that can be Hardware replication or Software replication,

Multi-Site Failover Cluster can be used for SQL Multi-Site Cluster or Hyper-V Multi-Site Cluster.

image

Multi-Site Failover Cluster Challenges:

There are some challenges with Multi-Site Failover Cluster as follows:

+ Multi-Site Failover Cluster fully depends on SAN-to-SAN replication that is owned by Storage team in most cases, so if SAN-to-SAN replication is not configured properly Multi-Site Failover Cluster will not be functional specially in case of failover to Disaster Recovery site.

+ Multi-Site Failover Cluster depends on multiple Hardware components like Servers Hardware, Host Bus Adapter (HBA), Multi-Path IO (MPIO)…etc. so if drivers versions or Firmware level does not follow the Hardware vendor recommendation for Multi-Site Failover Cluster, then Multi-Site Failover Cluster may shows unexpected behavior.

When decide to select Multi-Site Failover Cluster?

Although there are some challenges in Multi-Site Failover Cluster, but still there are some scenarios that require Multi-Site Failover Cluster, these scenarios can be consolidated into the two main scenarios below:

+ Automatic Failover is Required (need to think how will automate storage failover)

o To reduce downtime as possible.

o To provide faster disaster recovery.

+ Protect against loss of entire location

o In case application does not have native replication technology like what Exchange 2010 or later can provide (for example Cluster Continues Replication).

o If application does not support SQL Always-On for its backend database like in SharePoint 2010 or System Center Configuration Manager 2012.

Multi-Site Failover Cluster Best Practices:

Best Practices for Multi-Site Failover Cluster can be consolidated into five different areas as follows:

Design & Implementation Best Practices:

+ Be sure that customer already has SAN-to-SAN replication technology in-place, because new investment in this area can be very high.

+ Involve Storage team while designing and while implementing the Multi-Site Cluster solution, to mitigate any risk related to supportability for the existing Hardware/Storage with Windows Failover Cluster in addition to readiness for the Multi-Site Failover Cluster Storage requirements.

+ Share all implementation Storage requirements early with customer storage team, as it is it always take time from storage team to prepare the storage requirements for Multi-Site Failover Cluster.

+ As in normal Failover Cluster implementation you should run the cluster verification and be sure that no errors reported before continue in Cluster installation.

Hardware & Storage Best Practices:

+ Be Sure that existing Hardware (SAN, Servers, HBAs, Network…etc.) support Windows 2012 Clustering, not only from Microsoft side but from Hardware and Storage vendor side as well.

+ Follow Hardware vendor recommendation regarding drivers versions and Firmware Level required for Windows Failover Cluster.

+ SAN storage vendor (or customer storage team) should own and fully responsible about the SAN-to-SAN replication which is a core component in multi-Site Failover Cluster.

+ It is very important that SAN-to-SAN replication and Failover Simulation should be verified while testing the implementation.

Network Best Practices:

+ Discuss with customer his network architecture and if he can provide stretched VLANs across sites that can reduce the Multi-Site Failover Cluster complexity against different VLANs.

+ Share all implementation networking requirements early with customer network team, especially if you are going to do changes in network design related to the required VLANs.

+ Consider encryption over WAN.

Quorum Best Practices:

+ Use Node & File Share Witness (FSW) Quorum especially for even number of Cluster Nodes.

+ Host FSW in 3rd Site that has direct connection with both Cluster sites.

+ Avoid hosting FSW in a Cluster node or Virtual Machines in the same Cluster.

Hyper-V VM Configuration Best Practices:

+ In case of Hyper-V Multi-Site Failover Cluster, you should configure the sequence for Virtual Machine failover to allow Virtual machine to failover to Hyper-V hosts in the same site first, then to failover to secondary site Cluster nodes.

+ Be sure that all Multi-Site Failover Cluster nodes are configured as possible owners for each of the high available Virtual Machine.

References:

In below references you can find most of valuable Microsoft Documentations, Videos related to Multi-Site Failover Cluster.

+ Designing for a Clustered Service or Application in a Multi-Site Failover Cluster: http://technet.microsoft.com/en-us/library/dd197430.aspx

+ Setting up a Clustered Service or Application in a Multi-Site Failover Cluster – Checklist: http://technet.microsoft.com/en-us/library/dd197546.aspx

+ Requirements and Recommendations for a Multi-Site Failover Cluster: http://technet.microsoft.com/en-us/library/dd197575(v=ws.10).aspx

+ Hyper-V Multi-Site Failover Cluster Video: http://technet.microsoft.com/en-us/video/tdbe11-failover-clustering-amp-hyper-v-multi-site-disaster-recovery.aspx

Conclusion:

As a conclusion Windows Server Multi-Site Failover Cluster can provide a powerful high availability and disaster recovery in a single solution, and it is very important to consider Multi-Site Cluster challenges in addition to fulfill all Windows Server Multi-Site Cluster requirements and follow Best Practices, and Recommendations above to able to design and implement functional Multi-Site Failover Cluster.

If you have best practices or recommendations from your experience that can be added to the above list, please share so I can evaluate and add it to the post content (your name will be beside it Smile).

Leave a Reply