Wednesday, November 02, 2011

How to make DRACT run in a 64-bit system

Dell™ Remote Access Configuration Tool (DRACT) is a tool that provides a central console to discover and configure Dell Remote Access Controller (DRAC) for all the servers in your network. It automates some of the common repetitive tasks, such as firmware upgrade, and AD authentication configuration.

As of Nov 2011, the latest version of DRACT is 1.0, which can be run in 32-bit Windows systems only. Nowadays, 64-bit systems are commonly used everywhere. Setting up a system just DRACT is kind of wasting resource. After some research, I found out that we CAN make it run in 64-bit environment.

You can have DRACT installed into a 64-bit system without problems. The symptom showing it is not working is that it can not discover any DRAC in your network. Here is what you need to do to make it work.
1) You need a file: corflags.exe in Microsoft SDK. You can install Microsoft SDK, or you can just copy corflags.exe from a system that has Microsoft SDK installed. Corflags.exe is a Conversion tool allows you to configure the CorFlags section of the header of a portable executable image.
2) Run the following command of the ract.exe executable file: 
corflags.exe /32bit+ “D:\Program Files (x86)\Dell\RACT\RACT.exe”
The /32bit+ sets the 32-bit flag, so that the executable file always runs under WOW64 in a 64-bit system. (WOW64 is a compatibility environment that enables a 32-bit application to run on a 64-bit system. WOW64 is included in the system.)

That is it. Now DRACT will be able to discover the DRAC with the IPs that you provide in your network.

Wednesday, November 17, 2010

How to Cancel a Stuck VMware Tools Install

Some co-worker ran into a problem with not able to VMotion a VM off a vSphere host. The error is “The virtual machine is installing VMware Tools and cannot initiate a migration operation.” Right click to “End VMware tools install” won’t help. After some digging, we found it can be resolved in two ways.
 
1) Click on the VM, go to the toolbar on the top in vSphere client, click on the CD/DVD icon -> CD/DVD Drive 1-> disconnect. This should resolve the issue and you should be able to VMotion this VM after this. This mostly happens to a Linux VM after it upgrades its VMware tools.

2) Found this in Bob Plankers’ The Lone Sysadmin blog. It works great too.
First, you need the ID of the VM (all on one line if it wraps):
/usr/bin/vmware-cmd /vmfs/volumes/datastore-name/vm-folder/vmx-file.vmx getid
Then you can do a:
/usr/bin/vmware-vim-cmd vmsvc/tools.cancelinstall idnumber

Wednesday, September 08, 2010

P2V fails at 94% after finished cloning disks

Issue:
Physical server rebooted itself when VMware Converter reached 94% or 98%. By the time of its crash, it had already finished cloning its disks, and failed while it was trying to connect to the VM or reconfigure the VM. And the VM always goes to BSOD when it boots up.

Cause:
The cause the reboot of the physical server is unknown. It involves the physical hardware, the OS, and the application, and none of them can pinpoint where the failure comes from.
The cause of the BSOD in the VM is the virtual disk controller driver isn’t in place and the VM isn’t configured to be ready to run in a virtual environment.

Solution:
In the VMware Converter, run “Configure Machine” and use the wizard to configure the VM. After it is finished, we should be able to power on the VM into Windows.

Thursday, November 19, 2009

VMware Update Manager 4.0

Can't download patchs via VMware Update Manager 4.0

My environment is known for having complex firewalls and proxy. Lots of time it takes lots of effort and time to make application works. This time is no exception for VMware Update Manager 4.0.

The installation process for VUM 4 was very straight forward. Then it comes to the hard part. It can't download any patches for ESX hosts nor for VMs. After lots of testing, and some help from VMware's communities forum, finally I made it work. Here is what I have done.

1) Have proxy server opened up HTTP/HTTPS to the following 3 domains: *.vmware.com, *.shavlik.com, and *.microsoft.com. The last domain is not for downloading patches, but for Test Connection since that is one of the websites that it use to test connectivity.

2) Set up proxy setting. Do not put http in front of your proxy server name in the box. Make sure you have the right authentication information in it if it is required.


If you are seeing some errors like this, it could be caused by your proxy setting.

httpDownload, 730 Error 12007 from WinHttpSendRequest for url                          https://hostupdate.vmware.com/software/VUM/PRODUCTION/index.xml

or

'httpDownload' 4284 ERROR [httpDownload, 726] Error 12029 from WinHttpSendRequest for url https://hostupdate.vmware.com/software/VUM/PRODUCTION/index.xml


Please check out the following KB for more information.
http://kb.vmware.com/kb/1012926

Friday, June 05, 2009

How to replace SSL cert in VirtualCenter

When replacing a SSL cert, there might be chance to cause encryption problems between ESX and VC. Doing it improperly will lose permissions setting, and it will take a long time for it to get stabilized.

To generate a new SSL cert, we can follow the instruction here at KB 1009092, or in Leo's blog. The way they listed to install a new SSL cert could be working for someone, but it didn't work on many other folks like this one, and this one. Here I list some basic steps that needed to perform in order to have a smooth change.

1) Disable HA

2) Disconnect ESX from VC

3) Stop VC service

4) Replace SSL cert

5) Start VC service

6) Reconnect ESX to VC with root/pw


If it still have issues, like having an error of vim.faultlogin, we can remove VC agent, and delete vpxuser from SC. Reconnecting ESX to VC will recreate the user and reinstall the VC agent.

Here are a couple KBs and articles might be helpful with changing SSL cert:

Enabling Server-Certificate Verification for Virtual Infrastructure Clients

Configuring Custom SSL Certificates in VirtualCenter 2.5

Wednesday, April 29, 2009

ESX WRITE10 error

Recently a WRITE10 error in one of my ESX host caught my attention, and it occurs more than 10 times every second.

Apr 29 12:01:10 cla1011 vmkernel: 11:22:45:34.946 cpu4:1077)WARNING: VSCSI: 5291: WRITE10 past end of virtual device with 29365, length 128

After search on Google and VMware's communities, and still could not find detail information and solution about it, I turned to VMware technical support. The technical support engineer sent me their internal KB.



Symptoms

Repeatedly logging messages similar to either of the following in /var/log/vmkernel (or /var/log/messages on ESXi):

Feb 5 15:44:46 USPLVS02 vmkernel: 63:05:31:58.181 cpu3:1129)WARNING: VSCSI: 5292: WRITE10 past end of virtual device with 33554432 numBlocks, offset 33554351, length 128

Feb 12 17:03:04 pa-tse-h02 vmkernel: 156:05:50:47.889 cpu0:1174)WARNING: VSCSI: 3430: READ10 past end of virtual device with 20971520 numBlocks, offset 20980737, length 16

These messages indicate that I/O is being attempted that is outside the boundaries of the virtual device (virtual disk). In layman's terms, the VM has a list of ten items, and the guest OS is asking for the 12th item on the list.

Resolution

These messages indicate that I/O is being attempted that is outside the boundaries of the virtual device (virtual disk). In layman's terms, the VM has a list of ten items, and the guest OS is asking for the 12th item on the list.

To find out which VM is responsible for these, the World ID (WID) must be determined from the log messages. The WID is after the cpu specifier, and before the WARNING in the above messages. In the case of the WRITE10 message above, the WID is 1129; for the READ10 message the WID is 1174.

If you look in /proc/vmware/sched/cpu, then the vcpu column (first one) will list the number identified in the logs.

To determine the VM responsible if it is not running:

cat `ls -rt vmkern*` | less

Find the first instance of the log message (using "/WRITE10" or "/READ10" will likely find it for you very well) Then search backwards in the logs for the WID value (in less this can be done with "?". ex: ?1129 Note: It searches beginning just before the top line on screen. Press 'n' to find the next match.) and keep searching earlier in logs until you find something similar to:

Feb 12 16:51:55 pa-tse-h02 vmkernel: 156:05:39:38.873 cpu2:1173)Sched: vm 1174: 4836: adding 'vmm0:ProblemVMName': group 'host/user/pool0': cpu: shares=2911 min=0 max=-1

The text will show you the name of the problematic VM after the vmm entry. In this case, "adding 'vmm0:ProblemVMName'" shows that the VM causing the issue is named ProblemVMName.

If you look at the contents of the descriptor file for the offending Virtual Machine's disks, you will find an entry listing the number of cylinders for the virtual disk. As an example:

ddb.geometry.cylinders = "2088"

In this case, the virtual disk has 2088 cylinders. Running "fdisk -l" against the flat file of the virtual disk will return information similar to:

You must set cylinders.

You can do this from the extra functions menu.

Disk ANSGOOD-flat.vmdk: 0 MB, 0 bytes

255 heads, 63 sectors/track, 0 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes


Device Boot Start End Blocks Id System

ANSGOOD-flat.vmdk1 * 1 2089 16779861 7 HPFS/NTFS

Partition 1 has different physical/logical endings:

phys=(1023, 254, 63) logical=(2088, 254, 63)


Note that in this case, the end value for the partition/disk set in the partition table is 2089, exceeding the number of cylinders set in the descriptor file. If this were proper, it would show 2088 as the end value, instead of 2089. The operating system, as a result of this incorrect partition table

Extending the VMDK just enough to allow it to contain the size of the partition table might fix this, but because it is an invalid conglomeration of settings, it can not be safely assumed that this can be fixed. Other possible means of fixing the behavior is to cause the partition table to fit within the disk, by correcting it's ending value. Moving the data to a new, properly configured disk/partition table/file system is the best bet as the state of the file system by trying to modify the VMDK or partition table isn't known, and may be damaged by the changes, or already is damaged. Give the customer the options, and let them choose how to handle the changes to their system, as they can best judge how they want to protect their data.



The solution to it is either extend the VMDK file or shrink the partition table. Extending seems safer than shrink. The solution that I chose was using VMware Converter. By the way, VMware Converter 4 is offering some cool features over the previous version 3.0.3.

Friday, April 17, 2009

How to upgrade HBA firmware

In my case, I was to upgrade firmware on Emulex HBA for my ESX 3.5 Update 1 host. Here is my hardware:

Server: Dell PowerEdge 6950

HBA: Emulex LPe1150-E

  1. Download HBAnyware Library and Utilities Kit at Emulex Download section for Dell.

  2. Download Emulex LPe1150-E latest firmware (as of this writing, it is 2.80A4)

  3. Upload both the HBAnywhere and firmware to the ESX host

  4. Verify the firmware version before the upgrade
    • cat /proc/scsi/lpfc/*

  5. Install HBAnywhere by running
    • rpm -ivh elxvmwarecorekit-2.1a40-1.i386.rpm

    • The binary executable files will be store at /usr/sbin/hbanyware/

  6. List the HBA that are manageable by HBAnywhere
    • /usr/sbin/hbanyware/hbacmd ListHBAs

  7. Upgrade firmware in HBA
    • ./hbacmd Download 10:00:00:00:c9:63:f4:19 /location_of_firmware(etc, /root/wf280a4.all)

  8. Now we can verify the upgraded firmware version
    • cat /proc/scsi/lpfc/*