Recently I had gone through a lot of posts from the internet as well as some discussion I had with the persons I met, some of them are concerns about the increasing number of virtual machines into a single physical host which generally putting too much eggs in 1 bucket. They could be right in certain extend, but I will not say they are absolutely correct. They are few missing items that they had forgotten how the IT suppose to run before the virtualization came in to the market with all the capabilities they demostrated VS traditional physical systems.

You may have 30 to 50 VMs into single host today due to high density server with more CPU core or more memory per single system. In the next second, you may face the hardware failure on 1 of the host, there will be around 50 VMs down at 1 time and require another 20 mins before all the virtual machine could be successful restarted on the surviving host. Some of them may consider this is high impact, therefore you decide to restrict the number of virtual machine in single host around 10 to 20 VM per ESX. What happen next, the TCO is high, and ROI is not efficient.  Is a tough point for most administrators to choose in this scenario. I will urge you to backward a little bit and look at the scenerio again. Before Virtualization, all the business system that only invested with standalone server without physical clustering, they do not entitle any HA as the aware off. If they want a HA in physical system, they will had to invest extra CAPEX and OPEX to maintain a same set of hardware and operating system, just for failover purpose. Again, even the operating system clustering does not provide 100% uptime.

When they adopted to Virtualization today, they do know how the HA will work, as I even personally demo to the business and explain how the HA work in ESX servers. Well, they pay less to the system cost VS Standalone servers but gain. They are happy and acceptable with the 20 mins recovery time if the host are failed due to the hardware failure on the ESX server.  Well, if 20 mins are not acceptable for them today, what about the users who even refuse to host their system in the virtual infrastructure today? There are still many of the systems are currently running on the standalone host without physical clustering. VMware had done the great job by providing ESX cluster and Virtual machine heartbeat monitoring. Users will not get this if they are still on standalone host today.

If you would like to minimize the system down time due to ESX host failure, you can always build the Microsoft cluster or Linux cluster on top of the virtual machines. Of course, this will require extra efforts to manage and maintain just like 2 physical system.  Another choice you may think of, is the Fault Tolerance function from VMware. Of course, you should always configure the policies to control the DRS activities to ensure all your same functionalities system are always split to multiple ESX hosts in the same cluster.

From my experience on deploying multi tier application system in x86 platform, I will say the application design are much critical to improve the uptime from time to time. Those intelligent application today, are able to scale dynamically from multi-tier perspective with auto failover and load balance mode enable. In the event of any application server are down, the users will be auto redirect to the available application server automatically. Well, even we had planned everything in place, there are still much more thing to be involved to prevent the single point of failure such as, SAN storage and Networking.

My opinion on high consolidation ratio in virtualization today, does not mean the risks are increasing. It is well depend on the architecture planning, consideration, design and implementation by the team.