A frequent cluster network connection issue we see happens when the cluster cannot use WMI. WMI is Windows Management Instrumentation, which is an interface through which Windows components can provide information and notifications to each other, often between remote computers (more info about WMI). Failover Clustering and System Center Virtual Machine Manager (SCVMM) often use WMI to communicate between cluster nodes, so if there is an issue contacting a cluster node, WMI may be the culprit. We use WMI in most of our wizards, such as ‘Create Cluster Wizard’, ‘Validate a Configuration Wizard’, and ‘Add Node Wizard’, so any of the following messages and warnings we list could be due to WMI issues:
· "RPC Server Unavailable" error.
· Access is Denied.
· The computer ‘Node1’ could not be reached.
· Failed to retrieve the maximum number of nodes for ‘{0}’.
· The computer ‘Node1.contoso.com’ does not have the Failover Clustering feature installed. Use Server Manager to install the feature on this computer.
o Note: first confirm you have installed the Failover Clustering feature on this node
Troubleshooting Steps
Follow these series of troubleshooting steps to allow you to continue connecting your cluster.
1) Ensure it is not a DNS Issue
It is possible that the reason you cannot contact the other servers is due to a DNS issue. Before troubleshooting WMI, try connecting to that cluster, node or server using these methods when prompted by the cluster:
a) Network Name for the cluster or node
a. Example: MyNode
b) FQDN for the cluster or node
a. Example: MyNode.contoso.com
c) IP Address for the cluster or node
a. Example: 10.10.10.123
d) Some wizard pages have a ‘browse’ button which allows you to find other clusters in the domain through Active Directory
2) Check your that WMI is Running on the Node
Windows Server Failover Clustering supports PowerShell and earlier version also come with a lightweight WMI client (WBEMTest). Using either PowerShell or Wbemtest you can confirm that WMI is up and running. Although you can use WMI remotely, it is better to test this directly on the server to ensure there are no other networking or firewall issue affecting the connection.
WMI Service
First check that the ‘Windows Management Instrumentation’ Service has started on each node by opening the Services console on that node. Also check that its Startup Type is set to Automatic.
Next we will check that Failover Clustering WMI (MSCluster) is running. These tests would be applicable after the cluster has already been created since we are checking for cluster-specific WMI functionality.
WBEMTest or directly on the server
· Launch CMD
· CMD > WBEMTest
· The Windows Management Instrumentation Tester will launch
· Select Connect
· Namespace: Root\MSCluster
· Select Connect
o If you see more options available, it means you are connected and WMI is working
§ Feel free to try a query to confirm, such as selecting ‘Query’ and enter: SELECT * from MSCluster_Resource
o If you see an error, there is a WMI issue
PowerShell or remotely from another node within the same cluster (2008 R2 or higher only)
· Launch Elevated PowerShell
· PS > get-wmiobject mscluster_resourcegroup -computer MyNode -namespace "ROOT\MSCluster“
o If you see a lot of information displayed, WMI is running
o If you see an error, there is a WMI or firewall issue
3) Check your Firewall Settings
When a cluster is created, we automatically open up all the firewall settings you need. However enterprise security policies can make changes over time, so it is worth checking that the firewall on each server is allowing cluster communication. WMI request a DCOM connection to be made between the nodes, so you need to ensure that the ‘Remote Administration’ setting is enabled on every cluster node. This can be done through the Windows Firewall GUI or running the elevated command:CMD > netsh firewall set service RemoteAdmin enable. You will see a variety of errors or warnings if your firewall is not property configured. For more information about how WMI uses the firewall and troubleshooting firewall issues, visit:http://msdn.microsoft.com/en-us/library/aa389286(VS.85).aspx.
4) Reboot the Node
This can often fix intermittent issues. Follow best practices when rebooting the server, such as live migrating VMs and gracefully failing over other services and applications to reduce downtime. Only do this if the other troubleshooting attempts described above have failed.
5) Rebuild a Corrupt WMI Repository
If you continue to see errors after checking that WMI is running, the firewall is properly configured and rebooting, it is possible that your WMI repository has become corrupt so the cluster can no longer read from it. The following steps will enable you to rebuild your repository so that the other nodes can read from it again. Rebuilding the repository should be your last troubleshooting step, not your first.
· In the Services console, manually stop the WMI service to ensure that dependent services are stopped
· Start WMI service again
· Launch and elevated CMD or PowerShell
· CMD/PS > winmgmt /ResetRepository
6) Patch WMI for Performance Improvements
You initial connection problems should now be fixed. If you continue to experience intermittent connection issues caused by WMI, it could be due to the performance of your servers. We have released a hotfix for 2008 R2 which improves the speeds at which we return WMI queries, and this is optimized for the most common WMI calls which SCVMM makes. Get it here:http://support.microsoft.com/kb/974930.
Good luck in resolving your cluster connection issues with WMI!
No comments:
Post a Comment