What is the most likely cause of the failure?

You have 50 virtual machines (VMs) running across five ESXi hosts in a DRS Cluster in a production environment for a number of weeks. Several VMs running on one of the ESXi hosts suddenly blue-screen. What is the most likely cause of the failure?

A patch recently applied to all virtual machines.

A driver recently applied to all virtual machines.

A VMkernel panic on the ESXi host running the failed virtual machines.

A hardware problem on the ESXi host running the failed virtual machines.

4 Comments on “What is the most likely cause of the failure?

  1. Daniel says:

    I agree.
    A host HW problem would properly show itself by Purple screen of Death or just a drop.
    A and B can’t be it as it affects only ONE Esxi host
    C – never heard of an VMKernel panic…

  2. Mark Dean says:

    I agree that this is another vague and nonsensical questions. The only time I’ve ever experienced bunches of VMs blue screening or kernel panics is when there were LUN issues where the LUNs simply disappeared out from under the VMs-not a fun day for sure. I have experienced issues with ESXi drivers that caused ALL VMs on a host to simply drop off the network, this was the Broadcom bnet2 10GB NIC driver in HP 495C blades. Poke around the driver section on vmware.com where there are updated drivers fixing issues, you’ll see some scary stuff that happens to hosts.

    Before I continue, how many of you folks were using ESX 2.x? Do you remember the infamous VMware Tools memory leak that occurred on Windows 2003 OSes? I believe it was the actual tools service that had the issue. There’s an example of “A” or “B” being a correct answer. Remember, there are drivers and services in the VMware Tools package.

    Having said that, as with all these tests, it all comes down to wording. I may be stretching it a bit but it didn’t say ALL VMs on a host (which would occur with a PSOD), just “Several” and they were all on the same host. It is definitely in the realm of possibility that one of the physical CPUs could experience some sort of failure or a multibit memory error that ECC could not correct could be corrupting memory locations where _some_ VMs are running causing BSODs leaving the vmkernel and _other_ VMs OK or other hardware failures that create a similar symptom.

    So as others have pointed out, D makes the most sense in the context of the question.

Leave a Reply

Your email address will not be published. Required fields are marked *