Wednesday, January 21, 2009

Another ESX bug

I was planning to upgrade my VirtualCenter from version 2.5 Update 1 to version 2.5 Update 3, and then I received an email from VMware. This email states there is an issue with ESX 3.5 U3 could adversely effects my environment with I/O failure on SAN LUNs. I guess ESX 3.5 U3 can’t go in my environment, and I won’t be getting any benefits from VC 2.5 U3. I told my boss to hold off on this upgrade to VC 2.5 U3.

Here is the detail of this email:

ISSUE DETAILS:

VMware ESX and ESXi 3.5 U3 I/O failure on SAN LUN(s) and LUN queue is blocked indefinitely. This occurs when VMFS3 metadata updates are being done at the same time failover to an alternate path occurs for the LUN on which the VMFS3 volume resides. The effected releases are ESX 3.5 Update 3 and ESXi 3.5 U3 Embedded and Installable with both Active/Active or Active/Passive SAN arrays (Fibre Channel and iSCSI).

PROBLEM STATEMENT AND SYMPTONS:

ESX or ESXi Host may get disconnected from Virtual Center

All paths to the LUNs are in standby state

Esxcfg-rescan might take a long tome to complete or never complete (hung)

VMKernel logs show entries similar to the following:

Queue for device vml.02001600006006016086741d00c6a0bc934902dd115241 49442035 has been blocked for 6399 seconds.

Please refer to KB 1008130.

SOLUTION:

A reboot is required to clear this condition.

VMware is working on a patch to address this issue. The knowledge base article for this issue will be updated after the patch is available.

NEXT STEPS:

If you encounter this condition, please collect the following information and open an SR with VMware Support:

1. Collect a vsi dump before reboot using /usr/lib/vmware/bin/vsi_traverse.

2. Reboot the server and collect the vm-support dump.

3. Note the activities around the time where a first “blocked for xxxx seconds” message is shown in
the VMkernel.

1 comment:

Byron Zhao said...

Finally VMware came out with their patch for this bug at KB 1006651.