Skip to content

Failover clustering

jasper-zanjani edited this page Aug 3, 2020 · 4 revisions

Failover clusters are composed of computers called nodes and can be created using New-Cluster. which typically possess a secondary network adapter, used for cluster communications.

Before Windows Server 2016, all cluster nodes had to belong to the same domain, but now this is but one of several possible cluster types called a single-domain cluster. A failover cluster can also be multi-domain, or workgroup, depending on how or if the servers are joined to domains. A cluster can also be detached from AD, even though its nodes are joined.

A cluster whose servers are joined to a single domain is typically associated with a cluster name object in Active Directory, which serves as its administrative access point. A workgroup cluster or a detached cluster need to have the cluster's network name registered in DNS as its administrative access point, which can be specified in Powershell with the AdministrativeAccessPoint named parameter. Additionally, on a workgroup cluster the same local administrator account must be created on every node, preferably the builtin Administrator account, although a different account can be configured if a particular Registry key is [created][New-ItemProperty] on each node.

Nodes that are domain-joined support CredSSP or Kerberos authentication, but workgroup nodes support NTLM authentication only.

Three types of witness resources can help to ensure a quorum takes place in clusters. This is necessary to prevent a split-brain situation, where communication failures between nodes cause separate segments of the clusters to continue operating independently of each other. A witness is created when a cluster has an even number of nodes, and only one can be configured. [pwsh][Set-ClusterQuorum]

  • Disk witness: dedicated disk in shared storage that contains a copy of the cluster database
  • File Share witness: SMB file share containing a Witness.log file with information about the cluster
  • Cloud witness: blob stored in Azure

Scale-out File Server (SoFS) is a clustered role providing highly available storage to cluster nodes. SoFS ensures continuous availability in the case of a node failure. Using SoFS, multiple nodes can also access the same block of storage at the same time, and for this reason is is an active/active or dual active system, as opposed to one where only one node provides accessible shares, or an active/passive system.

SoFS is specifically recommended for use on Hyper-V and SQL Server clusters and can be installed with Add-ClusterScaleOutFileServer.

SoFS shares are created with the New-SmbShare PowerShell cmdlet. SoFS shares are located on Cluster Shared Volumes (CSV), a shared disk containing an NTFS or ReFS volume that is made accessible for read and write operations by all nodes within a failover cluster.

CSVs solved a historical problem with using NTFS volumes with VMs in previous versions of Windows Server. NTFS is designed to be accessed by only one operating system instance at a time. In Windows Server 2008 and earlier, this meant that only one node could access a disk at a time, which had to be mounted and dismounted for every VM.

The solution was to create a pseudo-file system called CSVFS, sitting on top of NTFS, that enables multiple drives to modify a disk's content at the same time, but restricting access to the metada to the owner or coordinator. The coordinator node refers to the cluster node where NTFS for the clustered CSV disk is mounted, any other node is called a Data Server (DS).

VM resiliency can be configured by adjusting settings in response to changes in VM state:

  • Unmonitored: VM owning a role is not being monitored by the Cluster Service
  • Isolated: Node is not currently an active member of the cluster, but still possess the role
  • Quarantine: Node has been drained of its roles and removed from the cluster for a specified length of time.

Cluster Operating System Rolling Upgrade is a new feature that reduces downtime by making it possible for a cluster to have nodes running both Windows Server 2012 R2 and Window Server 2016. Using this feature, nodes can be brought down for an upgrade.

When [Storage Spaces][Storage Spaces] is combined with a failover cluster, the solution is known as Clustered Storage Spaces.

Cluster management

VM Monitoring allows specific services to be restarted or failed-over when a problem occurs. To use VM Monitoring:

  • The guest must be joined to the same domain as the host
  • The host administrator must be a member of the guest's local Administrators group
  • And Windows Firewall rules in the Virtual Machine Monitoring group must be enabled.

The service can then be monitored using Add-ClusterVMMonitoredItem.

Migration

VMs can be moved from node to node of a cluster using [live][Live Migration], [storage][Storage Migration], or quick migrations.

VM network health protection is a feature (enabled by default) that detects whether a VM on a cluster node has a functional connection to a designated network. If not, the cluster live migrates the VM role to another node that does have such a connection. This setting can be controlled in Hyper-V Manager > VM Settings > Advanced Features > Protected network

Resources

  • PowerShell cmdlets in the failoverclustering module.
Clone this wiki locally