I’m unable to view hardware health status data while a host is in maintenance mode in my vSphere 4.0 Update 1 environment.
A failed memory module was replaced on a host but I’m skeptical about taking it out of maintenance mode until I am sure it is healthy. There is enough load on this cluster such that removing the host from maintenance mode will result in DRS moving VM workloads onto it within five minutes. For obvious reasons, I don’t want VMs running on an unhealthy host.
So… I need to disable DRS at the cluster level, take the host out of maintenance mode, verify the hardware health on the Hardware Status tab, then re-enable DRS. It’s a round about process, particularly if it’s a production environment which requires a Change Request (CR) with associated approvals and lead time to toggle the DRS configuration.
Taking a look at KB 1011284, VMware acknowledges the steps above and considers the following a resolution to the problem:
Resolution
By design, the host monitoring agents (IPMI) are not supported while the ESX host is in maintenance mode. You must exit maintenance mode to view the information on the Hardware Status tab. To take the ESX host out of maintenance mode:
1.Right click ESX host within vSphere Client.
2.Click on Exit Maintenance Mode.
Fortunately, this design specification has been improved by VMware in vSphere 4.1 where I have the ability to view hardware health while a host is in maintenance mode.
Jason I’m sure you’re aware but I wouldn’t disable DRS but rather change it to partially automated because when you disable DRS any per-VM settings are removed and don’t come back when you re-enable DRS. This bit us with Exchange 2010 DAG servers that need to have DRS disabled, we turned off DRS for similar reasons and when we re-enabled it we lost a server that DRS decided to vmotion despite it previously being marked as disabled.
@AFidel: I agree. When I say disable DRS, I’m implying disabling of the functionality, preferably by setting the automation level to manual, because cluster level Resource Pools would be removed (that’s bad).
Why not just remove the host from the cluster? Then exit maintenance, check everything out, and move it back into the cluster?
Jason
I may be missing something here but rather than recon-configuring DRS, why wouldnt you just move the host out of the DRS Cluster whilst it is Maint Mode, do your checks, then move it back into the cluster??
Removing the host from the cluster involves additional steps but if your environment requires those steps, they will indeed satisfy the requirement.
Your missing the point of the blog post which is that prior to vSphere 4.1, extra steps are required to view hardware health of a host which is in maintenance mode. VMware recognized this inefficiency and corrected it in vSphere 4.1.