This post is to shared 1 of the incident happened to the previous deployment. There were 2 types of Network adapters running on the physical host, which are NetXen and Broadcom in this case. For the Broadcom adapter, it come with TOE, therefore we had configured the VMkernal on the adapters to handle the IP base datastore for the ESX host.

There was a case that the ESX host become isolated and unresponsive due to the Netxen Driver failure, which we will still able to type command from the server console. While this happen, as the isolation respond option was set to leave power on, the VMware HA will not kick in to force the virtual machine for the fail over process.

There are pros and cons while doing these, but according to my Local VMware friend, the system recognize the lock session on the specified virtual machines were detected in the datastore, therefore the HA clustering will not allow the surviving systems to take over the session. The explanation fall exactly the same with our incident as the VMkernel traffic was still alive during the host network outage happen on the Netxen NICs.