Workaround for vSphere 5.1 Guest Unable to collect IPv4 routing table

Folks this has become an interesting discussion on a VMware Community Forum as of a few weeks ago about this vSphere 5.1 error.  In fact I myself ran into the same issue on a new build with the vCloud Service Evaluation cloud a couple of weeks ago.  I decided being a good citizen and VMware employee to start an internal discussion of this to see where it could lead.  Below is a screen show you MIGHT see showing the error, however what I can tell you now is this error actually has NOTHING to do with the root cause issue so let me explain.

vSphere

The vSphere Setup To Reproduce:

  1. vSphere 5.1 and/or vCloud Director
  2. CentOS 6.3 Guest (although other versions also seem to hang)
  3. Tools installed post OS installation
  4. Wait about 10-15 minutes
  5. Reboot the Guest Virtual Machine

This is rather easy and is something anyone can do.  What I can say for sure, this is not tools version related, it is not even virtual hardware related, it is something with the tools and vSphere 5.1 specifically, regardless of versions.  I actually tested both:

  • HW8 and Tools Version 8 running on vSphere 5.1
  • HW9 and Tools Version 9 running on vSphere 5.1
  • Uninstalled the tools and things work fine

The error itself is arbitrary and happens to be presented while the Virtual Machine is simply trying to finish the boot.  I know this as I was asked to start renaming certain libraries including libguestinfo.so which removed the error in the screen, but the Virtual Machine still stalls on boot.  This error is completely un-related to the underlying root cause which is actually that the Virtual Machine is hanging on boot for about 10-15 minutes.

The Real Problem

So in the spirit of troubleshooting, engineering used my test Virtual Machines to go through other libraries until we isolated the libtimeSync.so library.  Now, if you recall way back early in vCloud Director I ran into a time sync issue that was ultimately being caused by the fact NTP was not enabled on a cell.  However, what I pointed out in that article was that EVERY virtual machine, regardless if the tools are set to sync with the host or not when the tools running will ALWAYS time sync initially on boot if the tools are installed.  Then once the tools are running, the settings of sync with host or not are enforced.

What appears to be happening here and we don’t yet have the “why” is that the time sync library is causing the hang on the initial boot.  So now that we know the reason it is hanging and the library causing it we can do a quick work around which for most people should be good until we can get a final root cause as to the reason the library is just taking so long.

The vSphere Workaround

This is so simple you are going to wonder why I made you read the rest of the article.  Well, frankly I think it’s important to understand the troubleshooting we have taken to understand and isolate the issue so you know why it’s happening not just the “fix”.  So what that the answer right now seems to be simple assuming you only have access to the vSphere guest itself like I do in a fully hosted environment:

  • Rename libtimeSync.so to libtimeSync.so.bak (or something else you choose)
  • There is a KB Article that you might want to read and see it you can try that as well if you have access to the full back end

The File locations are as follows:

/usr/lib/vmware-tools/plugins32/vmsvc/libtimeSync.so
/usr/lib/vmware-tools/plugins64/vmsvc/libtimeSync.so

Since most of you using Linux are most likely using NTP, this shouldn’t interfere with the overall time sync of your guest, as NTP should start-up soon after and do it’s sync and you should be good to go for now until a more long-term explanation and fix is put forth.  This should get you by and allow you to keep the tools installed without the warm reboot hanging issue.

I hope this helps, and always troubleshooting is an ongoing effort fo trial and error.  Thanks to the vCloud Service Evaluation setup for providing a fast easy way to re-produce AND provide access to others to help continue to troubleshoot this problem.  If this was only in my lab if would have been much harder to provide the access.

About Chris Colotti

Chris is currently a Field CTO for Tintri. In his role he spends the majority of his time talking to customers and partners alike helping develop use case architectures for the Tintri platform and software. He also acts as an active interface between the field and engineering/product management. Chris is active on the VMUG and event speaking circuit and is available for many events if you want to reach out and ask. Previously to this he spent close to a decade working for VMware as a Principal Architect. Previous to his nine plus years at VMware, Chris was a System Administrator that evolved his career into a data center architect. Chris spends a lot of time mentoring co-workers and friends on the benefits of personal growth and professional development. Chris is also amongst the first VMware Certified Design Experts (VCDX#37), and author of multiple white papers. In his spare time he helps his wife Julie run her promotional products as the accountant, book keeper, and IT Support. Chris also believes in both a healthy body and healthy mind, and has become heavily involved with fitness as a Diamond Team Beachbody Coach using P90X and other Beachbody Programs. Although Technology is his day job, Chris is passionate about fitness after losing 60 pounds himself in the last few years. Now he spreads both the word of technology and fitness along with the Team Beachbody Business through his blogs.

8 comments

  1. Great tip!

    “…when the tools running will ALWAYS time sync initially on boot if the tools are installed”

    This is tedious and would require downtime for running VMs, but couldn’t you fully disable time sync, but editing the vmx file and adding:

    tools.syncTime = “FALSE”
    time.synchronize.continue = “FALSE”
    time.synchronize.restore = “FALSE”
    time.synchronize.resume.disk = “FALSE”
    time.synchronize.shrink = “FALSE”
    time.synchronize.tools.startup = “FALSE”

    I have seen this done in environment where time must be in sync at all times.

    Sources:
    * http://kb.vmware.com/kb/1189
    * http://pubs.vmware.com/vsphere-50/topic/com.vmware.vmtools.install.doc/GUID-678DF43E-5B20-41A6-B252-F2E13D1C1C49.html

    • These that was in the KB article I referenced as well, however in a vCloud setup the consumer has no access to the VMX file and it would be extremely hard for the Provider to do this for every consumer VM spun up. I am still working internally to make sure we figure out the long term fix other than these workarounds.

  2. The root cause of this problem is race condition between network starting and vmware tools starting.

    Before RHEL 6.3, chkconfig (traditonal unix startup management) was used for
    vmware tools. Vmware moves to initctl (introduced in RHEL 6.3) to start/stop
    vmware tools which causes race condition. If you check old vmware tools startup
    script you can see it starts vmware tools before network (S03 vs S10). Vmware-tools
    don’t keep this dependency in 6.3 (two different ways to manage network and
    vmware tools startup). Here is a better workaround for this issue (basically
    revert to previous release boot sequence):

    mv /etc/init/vmware-tools.conf
    /etc/vmware-tools/

    ln -s /etc/vmware-tools/services.sh /etc/rc2.d/S03vmware-tools

    ln -s /etc/vmware-tools/services.sh /etc/rc3.d/S03vmware-tools

    ln -s /etc/vmware-tools/services.sh /etc/rc5.d/S03vmware-tools

    ln -s /etc/vmware-tools/services.sh /etc/rc0.d/K99vmware-tools

    ln -s /etc/vmware-tools/services.sh /etc/rc1.d/K99vmware-tools

    ln -s /etc/vmware-tools/services.sh /etc/rc6.d/K99vmware-tools

  3. give this man a Bells….

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Scroll To Top
%d bloggers like this: