This is just a quick post to help folks troubleshoot the startup of their vCloud Director Cell services. I have seen only a few things that will prevent a cell from starting up all the way. From my experience these things are the most common and may present after you actually had the cells running, but made changes.
- Cannot bind to IP Addresses or Certificates
- DB Connection Issues
- DNS Lookup both Forward and Reverse
- Cannot verify the transfer space
Of the three above the two most common are Database connection issues and the transfer space. Database connection issues can occur if the table spaces fill up and the DBA’s have not allowed growth, or the Database server is simply down. The transfer space issue is usually related to the mount point in FSTAB you may have updated to support multiple Cells. The first place to look is $VCLOUD_HOME/logs/cell.log.
NOTE: $VCLOUD_HOME is usually /opt/vmware/cloud-director for 1.0 and /opt/vmware/vcloud-director for 1.5
In both cases one of them can cause a cell not to fully startup. Below is an example of a fully started cell.log file so you can use it to compare. This log re-writes every time the application restarts so unless you have a copy from the first time it started up, you can use this for reference.
Application startup begins: 8/21/11 7:30 AM Successfully bound network port: 80 on host address: 192.168.110.xxx Successfully bound network port: 443 on host address: 192.168.110.xxx Application Initialization: 9% complete. Subsystem 'com.vmware.vcloud.common.core' started Successfully connected to database: jdbc:oracle:thin:@Oracle01.test.local:1521/orcl Successfully bound network port: 443 on host address: 192.168.110.yyy Successfully bound network port: 61616 on host address: 192.168.110.xxx Successfully bound network port: 61613 on host address: 192.168.110.xxx Application Initialization: 18% complete. Subsystem 'com.vmware.vcloud.common-util' started Application Initialization: 27% complete. Subsystem 'com.vmware.vcloud.consoleproxy' started Application Initialization: 36% complete. Subsystem 'com.vmware.vcloud.vlsi-core' started Application Initialization: 45% complete. Subsystem 'com.vmware.vcloud.vim-proxy' started Successfully verified transfer spooling area: /opt/vmware/cloud-director/data/transfer Application Initialization: 54% complete. Subsystem 'com.vmware.vcloud.backend-core' started Application Initialization: 63% complete. Subsystem 'com.vmware.vcloud.imagetransfer-server' started Application Initialization: 72% complete. Subsystem 'com.vmware.vcloud.rest-api-handlers' started Application Initialization: 81% complete. Subsystem 'com.vmware.vcloud.ui.configuration' started Application Initialization: 90% complete. Subsystem 'com.vmware.vcloud.jax-rs-servlet' started Application Initialization: 100% complete. Subsystem 'com.vmware.vcloud.ui-vcloud-webapp' started Application Initialization: Complete. Server is ready in 0:46 (minutes:seconds) Successfully initialized ConfigurationService session factory Successfully started scheduler Successfully started remote JMX connector on port 8999
Some key things to note in the above log for sure. If the Cell does not get past 9%, check with the DBA’s since the next step is the database connection. If the startup fails to verify the transfer space you will get an error here. The most common reason is that the transfer space is not writable. If the issue is binding to the IP or certificates you should also see that here. Per Timo’s comment I added DNS in the list above. I have always had DNS configured in my lab so I have not found this particular. Thanks to @Timo for pointing that one out!
Transfer Space Permissions
In case I did not document it elsewhere I will do so here on the permissions of $VCLOUD_HOME/opt/data/transfer. vCloud DIrector creates a “vcloud” user and a “vcloud” group. When you remount the transfer directory for NFS BOTH the UID and GID for “vcloud” must be set on the mount point. If only one is it will affect the ability to validate the transfer space. Note the group and user ownership below. This must also be RECURSIVE on the sub folders as well. Depending on the storage device you may need to query the GID and UID and give it to the storage folks for permissions. If you have multiple cells NOT created from template you may have DIFFERENT GID’s and UID’s and you might want to edit them to be the same.
[[email protected] data]# ls -l total 16 drwx------ 3 vcloud vcloud 4096 Aug 21 07:30 activemq drwx------ 2 vcloud vcloud 4096 Sep 13 2010 generated-bundles drwxr-x---+ 4 vcloud vcloud 33 Aug 23 09:17 transfer drwx------ 2 vcloud vcloud 4096 Aug 21 07:30 txlog
If all these things are working the cell should startup and you will see something similar to the above example in cell.log