Resolving Cluster Shared Volumes put in redirected access on Hyper-V Cluster

Recently I’ve stumbled on issue where the clustered shared volumes on Hyper-V cluster were put in redirected access. But this doesn’t happens by itself. Here is how the issue appeared and how it was resolved.

I support 5 node Hyper-V cluster which have 5 clustered shared volumes and quorum disk. It all began when there was some minor interruption in SAN storage service. Even though Storage team didn’t detect issues this interception was detected by all servers that had LUNs connected to that storage so not only the Hyper-v cluster was the affected one. After the storage issue was fixed we noticed that there were several alerts related to the cluster’s nodes in SCOM. The description of the alerts was the following:

Cluster Shared Volume ‘Volume1′ (Volume1’) is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device over the network through the node that owns the volume. This may result in degraded performance. If redirected access is turned on for this volume, please turn it off. If redirected access is turned off, please troubleshoot this node’s connectivity to the storage device and I/O will resume to a healthy state once connectivity to the storage device is reestablished.

There was separate alert for every CSV on the cluster. At first I tried to return one of the CSV’s to normal state by going in the Failover Cluster Manager console->Cluster Shared Volumes->Right click on one of the volumes->More Actions->Turn on redirected access for this Cluster shared volume.

clip_image006

This didn’t work out. The command stared execution but later it timed out and I cancelled it. So I searched in Bing to find more information about the problem. And I found the following article:

http://blogs.technet.com/b/askcore/archive/2010/12/16/troubleshooting-redirected-access-on-a-cluster-shared-volume-csv.aspx

The article was clear statement that this issue was caused by storage connectivity issue. After some more granular investigation I noticed that on one of the nodes in the cluster the LUNs were not present in the Disk Management console. And because one of the nodes didn’t had this configuration the cluster was not fully healthy and in order to preserve it’s integrity forced itself to work in redirected access mode. Because of that all the virtual machines on the cluster were still up and running.

In such situation I had two choices to resolve the issue:

  • Restart the server and see if disk configuration will return
  • Add the LUNs to the server again

I decided to go with the first option because it was more easy for execution and I could always rely on option 2 if 1 was unsuccessful. I’ve put the faulty node in maintenance mode in Virtual Machine Manager and in Operations Manager. All virtual machines were migrated and I restarted the server. After the server was up and running again the configuration in Disk Manager was back and all CSV’s were no longer I redirected access mode. I’ve stopped maintenance mode in VMM and the node was back on the cluster.

I suspect why exactly this node lost its disk configuration  during the Storage service interruption: of all 5 nodes in the cluster only this one had different HBA cards than the other four. But of course this would never happened if Storage service didn’t had issues that day.

 

P.S. The screenshot was copied from the mentioned article.

How SCVMM 2012 calculates overcommitment for clusters

The algorithm in VMM 2012 for calculating overcommitment on clusters is changed from the previous version VMM 2008 R2. The result is significantly more accurate calculation. Hilton Lange from VMM team explains it perfect in an article. Click on the link and read.

http://blogs.technet.com/b/scvmm/archive/2012/03/27/system-center-2012-vmm-cluster-reserve-calculations.aspx

Automating Patching Hyper-V clusters with SCVMM, SCOM and WUInstall

To automate the patch process is one of the demands to achieve Private Cloud. Of course because of the many dependencies in the different environments that is not completely possible. But this automation can be started from the Hyper-V clusters. On the link below you will find script created by Jason T. Ruiz that automates the patching process using SCVMM, SCOM and WUInstall. Click on the link and patch.

http://gallery.technet.microsoft.com/Update-Hyper-V-Hosts-via-71b86c1a