Load Balancing Considerations for vCloud (Updated 7/18/11)

In my previous post about setting up a vCloud In a Box for your Lab, I also mentioned that I happened to install the Zeus Load Balancer to examine some of the options available for load balancing you vCloud Director Cells.  Based on some of my lab testing I wanted to share a few of the findings some folks may find interesting.  Generally speaking the use of a Load Balancer is easy to manage and does provide some high availability to multiple cells as well as user layer abstraction like any other stateless web service.  Bear in mind there are specific configuration requirements when using multiple cells that are not covered very basically below

Configuration Requirements For Multiple Cells

  • Create an FSTAB entry on each Cell to mount an NFS share to be mounted to the transfer folder similar to this:

192.168.120.2:/nfs/vCloudTransfer /opt/vmware/cloud-director/data/transfer/ nfs     intr    0 0

  • Ensure the Transfer Mount permissions are properly set.  The vcloud user AND group must have access to the share.  Some NAS devices may require performing CHMOD and CHOWN or the Cell may not start.  You will know because the log will state the transfer folder is not writable.  In my lab when this was mounted the owner and user were ROOT and needed to be modified first.

drwxr-x—+ 4 vcloud vcloud   33 Nov 20 18:14 transfer

  • Configure the additional Cells per the instructions found in the Installation Guide.  If you got the first cell to start after the transfer folder changes the second one should also start with the same modifications.

Handling Of SSL Certificates For vCloud Cells

Q.  What is the best way to handle SSL Certificates for multi-cell deployments—self-signed or through trusted CAs when DNS entries are used to point to the virtual IPs of the load balancer?

A.  It depends. There are a couple of ways to handle this and the deciding factor is the load balancer itself. In most cases we need to understand that the provider will also create a DNS record for the virtual IP.  For this example, let’s use vcloud.company.com. If the CA is not trusted, the user will always see an “Untrusted” error on self-signed certificates.  The self-signed certificates in vCD typically expire in a short timeframe so most customers are generating trusted certificates.  Based on this basic understanding there is three real options I found that may work for you.  Additionally today there is no way to connect a Load Balancer to the Cells using HTTP, only HTTPS which in some load balancers can cause a problem.

  1. If the load balancer does NOT support SSL OR of it cannot do BOTH SSL offload as well as HTTPS to the pools, which is the case with F5 and Zeus, then each cell should generate an individual CSR or self-signed certificate per cell.  When creating the CSR or self signed certificate on each cell, use the same FQDN in the request that matches the DNS entry of the virtual IP (vcloud.company.com).  This means the user will not get a DNS/hostname mismatch when connecting to the load balancer because the user will be directly hitting each server with the HTTP request.  For F5 and other devices this is referred to as SSL passthrough mode.  If you do happen to use the cell hostname in the certificate generation, and the users are connecting though a DNS name to the virtual IP, their browsers will show an SSL error for name mismatch.  By the same token if you generate the certificate on each host with the shared FQDN, and attempt to connect to the individual cells you will also see a hostname mismatch.  This situation cannot be avoided, and the lesser of two evils is to make sure the load balanced connection is the correctly resolved one.  This seems to be the most common Load Balancer configuration option.
  2. If the load balancer supports SSL offload as well as HTTPS to the Cell pools, and direct creation of CSRs, then generate the CSR for vcloud.company.com on the load balancer. The cells can then create hostname-based CSRs or self-signed certificates to match the hosts cell01.company.com and cell02.company.com. This will allow a user to hit the DNS of the virtual IP or the hostname in a browser, both without error. This assumes that the load balancer supports SSL from the device to the nodes in the pool, however in the current release of vCloud SSL is still required from the Load Balancer to the Cells.  To date I have not seen a Load Balancer support BOTH SSL Offload and SSL to the hosts in the backend pool, so this option will most likely not work for a while.  It does appear that most load balancers will not allow SSL offload AND SSL communication to the nodes in a pool as mentioned earlier. They require the load balancer to node connection be HTTP instead of HTTPS if you want SSL offload.  Unfortunately today we do not have an option to disable HTTPS and connect through HTTP only to the Cells.  For this case, stick with scenario #1 and do not use the SSL offload feature of the load balancer.
  3. Additionally, you may want to setup a VIP for the console proxy because without one you will be directed to the Cell you were load balanced to for console connection.  You can in fact also load balance this by creating a VIP and DNS name for the Console proxy and re-direct all connections back to that address by configuring the administration options in vCloud Director.  IMPORTANT NOTE ABOUT THE CONSOLE CERTIFICATE: If you are using subject alternative names in your certificates it seems the Console Connection ONLY resolves the first one in the list.  Therefore to avoid a certificate mismatch error on your client, be sure to make sure the FQDN version is the first in the list that matches your Console Public Address defined in the external URL’s section of vCloud Director.
  4. UPDATE 7/18/11:  It has also been said that trying to do SSL offload on the Console IP’s does not work and you HAVE to do SSL pass-through.  I have not been able to verify this myself yet but I may try in the coming weeks.  This may be due to the fact the Console connection is NOT an HTTPS connection but rather a pure socket connection.  This fact also made me realize that an HTTPS Health monitor for this IP will NOT work.  See additional information below.
  5. Lastly if you have installed the cells behind the load balancer be sure to configure the Administration option for external URL’s as shown in the example screen shot below
vCD External URL's (Click to Enlarge) the IP used for API address is the Load Balancer VIP

The procedures for updating and creating SSL certificates from the vCloud cells is documented in this VMware KB, as well as the administrator guides.  Based on your configuration you will need to decide the best method of user access to the load balancer and cells.  Generally the Cells are stateless so you ca use the Load Balancer VIP/DNS for any access to the environment.  This means either to the built in portal pages or using the supplied vCloud API’s.

Handling Heath Check Rules For The Cells

Q.  What is the best way to handle a service health monitor for the vCloud Cells?

A.  You can point your Load Balancer to http://<Cell-Hostname>/cloud/server_status.  UPDATE 7/18/11:  This check ONLY monitors the HTTPs IP and not the Console IP directly.  Although if the services are stopped here the Cell should be taken out of service this does not independently monitor the Console Proxy IP for the load balanced Console connections.

UPDATE 7/18/11:

Q.  What is the best way to monitor the Console Proxy IP’s independently?

A.  You will have to configure the Load Balancer for a pure socket connection check on port 443 to the Console Proxy IP’s.  This port will not accept HTTP calls like the HTTP port.  If you configure two separate Health Monitors and have the Console Proxy and HTTP ports in separate pools, then you can ensure Independent health status for HTTP and Console connections.  This means your Cell HTTP services may be up but maybe the console proxy is down for some reason and the Load Balancer will prevent console connections but still also portal connections.  Below is a screen shot of a Zeus Load Balancer set of pools and the respective Health Monitors for each pool being different.

Update 8/5/11:

I got word from the engineer that wrote the code that you can monitor the Console Proxy Pool with
http://<Console_ProxyIP>/sdk/vimServiceVersions.xml and not a TCP/IP Connect String.

Preventing Access To A Cell by Users

Q.  Can I use my Load Balancer to disable user access to the Cells in the load balanced pool?

A.  The easy answer is of course yes.  In most Load Balancers you can disable a node in the pool and the users will no longer have access to that Cell.  HOWEVER it is is important to note that the Cells will still be talking to each other on the back end for task scheduling.  This means a user may only access Cell 2, but their tasks and commands could be run on Cell 1.  The Cells themselves load balance tasks through an internal schedular.  Really, the best way to ensure a Cell is FULLY offline would be to simply stop the vCloud Service on the cell.  If you configured the health check above the Load Balancer should automatically remove the Cell from the pool.  This also ensures that the Cell will not accept new tasks in the Cell Cluster.

IMPORTANT NOTE: If you stop a Cell’s service abruptly and tasks are currently running those tasks will fail and will need to be restarted manually.

Q.  Is there a tool available to gracefully shutdown a Cell then?

A.  YES!!  There is now a tool called the vCloud Cell Management Tool you can download and install.  It is not a GUI based set of commands but it does help you “Drain” a cell properly for upgrades and other Operating System patches.  Provided the healthcheck pages are configured this should also take care of the load balancing.

About Chris Colotti

Chris is active on the VMUG and event speaking circuit and is available for many events if you want to reach out and ask. Previously to this he spent close to a decade working for VMware as a Principal Architect. Previous to his nine plus years at VMware, Chris was a System Administrator that evolved his career into a data center architect. Chris spends a lot of time mentoring co-workers and friends on the benefits of personal growth and professional development. Chris is also amongst the first VMware Certified Design Experts (VCDX#37), and author of multiple white papers. In his spare time he helps his wife Julie run her promotional products as the accountant, book keeper, and IT Support. Chris also believes in both a healthy body and healthy mind, and has become heavily involved with fitness as a Diamond Team Beachbody Coach using P90X and other Beachbody Programs. Although Technology is his day job, Chris is passionate about fitness after losing 60 pounds himself in the last few years.

20 comments

  1. fantastic guide. you rock dude.

  2. Great post Chris. Just what I was looking for. So we plan on having multi-node configuration with load balancer. We want to use signed certificates. If I understand right I just need two certificates one for the Public VIP and one for the Console Proxy VIP. I assume we would then use the same keystore file on all of my cell nodes, is that right? And there is no keystore values specific to the real cell node hostnames?

    • Ken you may be able to also have hostname based certs on the cells if the lb can do ssl to the server pool. That way any direct connection to the cells is properly encrypted with an individual hostname based cert. I am pretty sure the cisco and F5 units can do this. So you have you’re VIP cert for console and http on the lb and you still have real hostname certs on your cells.

  3. We are using Cisco ACE appliance and to my knowledge only way we could do SSL Certs on the ACE would be if vCloud could take inbound traffic over http natively.

    • INteresting I was told by someone ACE would support the SSL on the front and to the pools. We have a request to allow HTTP to cells but we are not there yet due to this very issue. If that is the case they you may not be able to do SSL on the VIP. you pass SSL through the LB to each cell but use the same cert on the cells as you originally indicated.

  4. Hi Chris,

    Just about any modern load balancer can perform both SSL offload /and/ SSL encryption to the back-end nodes – Zeus and F5 certainly can, and it looks like ACE can as well.

    Before explaining the SSL decrypt/reencrypt deployment, you can instead just run in SSL Passthrough mode. Here the load balancer treats the connection as a raw TCP connection; it cannot apply any inspection of the payload, smart load balancing etc. In Zeus’ case, there are a couple of very minor optimizations the load balancer can make (http://www.zeus.com/community/answer/whats-difference-between-ssl-virtual-server-and-simple-client-first-virtual-server-) but these are little consolation…

    For SSL decrypt/encrypt, it’s simple to set up with Zeus, and probably with other load balancers too.

    For decrypt: Create an HTTP virtual server, then enable SSL decryption, associate a certificate with it, and change the port from :80 to :443.

    To re-encrypt: Edit the pool and enable the ‘SSL Encryption’ option; make sure that the nodes in the pool correspond to the HTTPS listeners (not the HTTP ones).

    Then, all the traffic is decrypted in the Zeus device, so you can use the full inspection and rewriting capabilties, and encrypted again when sent to the nodes.

    You can use a self-signed certificate on the load balancer (in which case, users will recieve a warning in their browsers), you can use a publically signed one (for a fee), or if you run your own certificate authority, you can sign and distribute certs yourself.

    • Owen,

      This is a great addition! Thanks for the detail and I will personally try and set this up in my lab. Knowing this will allow me to test a few other ideas out. I want to verify the possible limitation on the Console Proxy that is must be pass through only.

  5. Hi,

    We are starting a vcloud deployment, and the budget dont allow mw me to by a F5 or a Zeus load-balancer. Can you please advice me for some opensource projects that can do this job. We are thinking in putting the LB no a VM in the managemnet cluster. Is this a correct decision?

    Tank you and sorry for my bad Englih

  6. Hi Chris

    Tank you for your response, but both Zeus and F5 in OVF edition are not free, so I will need to continue to find a solution for this issue.

    Thank you.

    PT

  7. Chris, I was wondering if you can share the details on how you have the Monitor Pool script setup for the two nodes under VCD_Pool_A called “VCD_Monitor”?

    Thanks in advance!

  8. Well, even in this post you’ve indicated what to monitor for in terms of individual Cells and Console parameters to expect. I’m more interested in specifics on what you’ve leveraged on Zeus side to built the custom monitor for each one. There is a built-in Full HTTP monitor which can parse the HTTP response body vs custom external program monitor to read server_status.

    Based on the screenshot you have, the console is simply using the CONNECT monitor or like you indicated you can leverage XML API call. For the individual cells, you have it listed as “VCD_Monitor”. What hides under that?

    Thanks!
    Mike

    • I used the FULL HTTPS monitor for and added the server status to that monitor for vCD_Monitor. I was using the CONNECT monitor at first for the Console, but later I learned that fills up the logs with connection logging and I found out about the XML string. I could go in tomorrow and grab the screen shot if you want to send me an E-Mail.

      I admit on the Zeus I could not get it to work with the XML call. However my device was a bit out dated, and of course I had no support on it as it was a developer license. If you get that particular monitor working I’d love to know how on the Zeus. 🙂 I still need a new Developer license though, I changed the IP and now it stopped working.

Leave a Reply

Your email address will not be published. Required fields are marked *