In
this section, you consider active and passive nodes, the shared disk
array, the quorum, public and private networks, and the cluster server.
Then, you learn how a failover works.
Active Nodes Versus Passive Nodes
A Windows Failover Cluster can support up to sixteen
nodes; however, most clustering deployment is only two nodes. A single
SQL Server 2012 instance can run on only a single node at a time; and
should a failover occur, the failed instance can failover to another
node. Clusters of three or more physical nodes should be considered when
you need to cluster many SQL Server instances.
In a two-node Windows Failover Cluster with SQL Server,
one of the physical nodes is considered the active node, and the second
one is the passive node for that single SQL Server instance. It doesn’t
matter which of the physical servers in the cluster is designated as
active or passive, but you should specifically assign one node as the
active and the other as the passive. This way, there is no confusion
about which physical server is performing which role at the current
time.
When referring to an active node, this particular node
is currently running a SQL Server instance accessing that instance’s
databases, which are located on a shared disk array.
When referring to a passive node, this particular node
is not currently running the SQL Server. When a node is passive, it is
not running the production databases, but it is in a state of readiness.
If the active node fails and a failover occurs, the passive node
automatically runs production databases and begins serving user
requests. In this case, the passive node has become active, and the
formerly active node becomes the passive node (or the failed node, if a
failure occurs that prevents it from operating).
Shared Disk Array
Standalone SQL Server instances usually store their
databases on local disk storage or nonshared disk storage; clustered SQL
Server instances store data on a shared disk array. Shared means that
all nodes of the Windows Failover Cluster are physically connected to
the shared disk array, but only the active node can access that
instance’s databases. To ensure the integrity of the databases, both
nodes of a cluster never access the shared disk at the same time.
Generally speaking, a shared disk array can be an iSCSI,
a fiber-channel, SAS connected, a RAID 1, a RAID 5, or a RAID 10 disk
array housed in a standalone unit, or a SAN. This shared disk array must
have at least two logical disk partitions. One partition is used for
storing the clustered instance’s SQL Server databases, and the other is
used for the quorum drive, if a quorum drive is used. Additionally, you
need a third logical partition if you choose to cluster MSDTC.
The Quorum
When both cluster nodes are up and running and
participating in their respective active and passive roles, they
communicate with each other over the network. For example, if you change
a configuration setting on the active node, this configuration is
propagated automatically, and quickly, to the passive node, thereby
ensuring synchronization.
As you might imagine, though, you can make a change on
the active node and have it fail before the change is sent over the
network and made on the passive node. In this scenario, the change is
never applied to the passive node. Depending on the nature of the
change, this could cause problems, even causing both nodes of the
cluster to fail.
To prevent this change from happening, a Windows
Failover Cluster employs a quorum. A quorum is essentially a log file,
similar in concept to database logs. Its purpose is to record any change
made on the active node. This way, should any recorded change not get
to the passive node because the active node has failed and cannot send
the change to the passive node over the network, the passive node, when
it becomes the active node, can read the quorum log file to find out
what the change was. The passive node can then make the change before it
becomes the new active node. If the state of this drive is compromised,
your cluster may become inoperable.
In effect, each cluster quorum can cast one “vote,”
where the majority of total votes (based on the number of these cluster
quorums that are online) determine whether the cluster continues running
on the cluster node. This prevents more than one cluster node
attempting to take ownership of the same SQL Server instance. The voting
quorums are cluster nodes or, in some cases, a disk witness or file
share witness. Each voting cluster quorum (with the exception of a file
share witness) contains a copy of the cluster configuration. The cluster
service works to keep all copies synchronized at all times.
Following are the four supported Windows Failover Cluster quorum modes:
- Node Majority: Each node that is available and in communication can vote. The cluster functions only with a majority of the votes.
- Node and Disk Majority: Each node plus a designated disk in the cluster storage (the “disk witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes.
- Node and File Share Majority: Each node plus a designated file share created by the administrator (the “file share witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes.
- No Majority: Disk Only: The cluster has a quorum if one node is available and in communication with a specific disk in the cluster storage. Only the nodes that are also in communication with that disk can join the cluster. The disk is the single point of failure, so use highly reliable storage. A quorum drive is a logical drive on the shared disk array dedicated to storing the quorum and as a best practice should be around 1GB of fault tolerant disk storage.
No comments:
Post a Comment