How To Handle Some vCloud Director Challenges

The other day I was sent a link to a 9-slide deck titled “Life before and after vCloud Director”, put together by someone I do not know that takes time to point out some specific challenges with vCloud Director, mostly with networking and vShield Edge.  From what I have learned this deck was previously circulated and has recently re-surfaced.  It tries to explain that datacenters after vCloud Director are “Extremely Fragile” due mainly to the fact we use vShield Edge.  As a vCloud person myself I felt a bit obligated to address some of these for some of you in a more structured approach.  Some of the noteable points that are presented as “facts” in the slides are as follows:

  • “The Entire Networking Functions of vCloud Director relies on a single VM, and the Entire Datacenter performance and capabilities are then as powerful as this device…”
This is not entirely true.  The vShield Edge Appliances are an optional component to deploy based on your chosen network configuration.  It should be noted the vShield Manager appliance, however, is a requirement component to complete the vCloud Director configuration.  As we know these are two different things within vCloud Director, and should not be confused.  If you chose not to use it when setting up networks, then all your networking is backed by standard vSphere switch port groups, and networking is unchanged.  Some other notes pointed out about the vShield Appliance.

  • “One vShield is needed for every network”
  • “It Can Fail”
  • “It has no redundancy capability”
  • “It is the firewall, router, DHCP, and Load Balancer to the vCD system”
  • “vCloud does not support other 3rd party alternatives”
  • “It creates very complex network connectivity”

A vShield appliance is only needed if you choose to NAT route the Organization networks or the vApp networks.  These NAT routed networks are not technically required, but are used if the design considerations call for it.  Of course using them within vCLoud Director is a preferred means to achieve easy multi-tenancy.  Yes, vShield Edge devices and vShield Manager could fail.  Let’s be honest…ANYTHING can fail, so that statement is pretty broad and without much merit.  However, it is a VM protected most likely by VMware HA as are so many other production Virtual Machines today.  There is also multiple blog posts about how VMware Fault Tolerance can be used to protect the vShield Manager.  Unfortunately at this time FT does not work properly on the edge devices themselves, but we should see that change in the future.

The appliance is the firewall, router, DHCP, and Load balancer for Selected Networks and Organizations, but not for the “vCD System”.  You can always use direct connected networks and external firewalls, as well as load balancers and VPN devices.  Again, vShield is NOT a requirement it is simply a tool to assist in the design of a multi-tenant vCloud Director deployment.  We have also had folks deploy other Virtual Machines in the cloud itself to handle some of these functions including virtual load balancers.

I have always said in public forums the networking is complex and is something that people need to start understanding.  This is no different than when VMware administrators needed to start to understand and learn about VLANs, and trunking back in the early days.  As things evolve they inherently become more complex.  That the nature of the beast and the new learning curve we all have to deal with.  Has storage become less complex over time?  What about networking in general with VXLAN, or other new technologies?  People in general are afraid of new complexity because it is hard, and most people fear change and learning something new.  Yes, it’s complex, life is complex….learn it and move onto the next thing to learn that is more complex.

Let’s be honest here.  Yes, there are some challenges with vCloud Director in some cases more than the networking alone, nobody will deny that I think.  The difference is many good architects have designed around them with what I call “Creative Critical Thinking”.  The points above are narrowly focussed on a few aspects and don’t tell the whole story in 9 slides.  I would submit that anyone can address many of the concerns, and many have including some large service providers.  it’s about architecting around the challenges.  Some of which may even be addressed in future releases of vCloud Director.  Talk to a couple of vCloud Director customers and community experts to understand how these things can be addressed.

 

About Chris Colotti

Chris is active on the VMUG and event speaking circuit and is available for many events if you want to reach out and ask. Previously to this he spent close to a decade working for VMware as a Principal Architect. Previous to his nine plus years at VMware, Chris was a System Administrator that evolved his career into a data center architect. Chris spends a lot of time mentoring co-workers and friends on the benefits of personal growth and professional development. Chris is also amongst the first VMware Certified Design Experts (VCDX#37), and author of multiple white papers. In his spare time he helps his wife Julie run her promotional products as the accountant, book keeper, and IT Support. Chris also believes in both a healthy body and healthy mind, and has become heavily involved with fitness as a Diamond Team Beachbody Coach using P90X and other Beachbody Programs. Although Technology is his day job, Chris is passionate about fitness after losing 60 pounds himself in the last few years.

8 comments

  1. Nice post Chris.

    We identified these challenges during testing of vCD and have options for our customers if the vShield VM doesn’t meet their needs.  As you mentioned, other external network connections can be used to connect Organizations to other network devices.

  2. all VMs on routed/NATed networks deployed right now with vCD around the world are going to fail once a single server running vshield-edge as thier GW, fail. this means a single server failure effects many many other servers and this is the new feature introduced with vCD. you also need to care about SLA (BW, latency) and the double usage of vlans per every vshield-edge in the system (and there is a lot of them in systems right now, funny enough).

    solutions (to be deployed as fast as possible to avoid total crash of vCD systems) :

    1. First and foremost the gateway needs to be a robust, scalable device (one with common capabilities like FHRP, QOS and much more …)
    2. Affinity rules needs to apply a logic of placing the ‘Gateway VM’ on a specific server, this server should host only ‘gateway VM’ devices.
    3. Affinity rules needs to apply a logic of placing the ‘Redundant Gateway VM’ (if available with FHRP etc) on a totally different server, and dedicated one.

    This is the only way to allow SLA parameters like latency and throughput and REAL HA (not a one that is provided in few minutes and crash an entire data center that relies on vshield edge).

    BTW : The above solution  is called ‘Network Appliance’ (pre vCD that is ….)

  3. vshield edge is also needed if you just choose to have DHCP , it is deployed all over with customer not knowing those caveats effects his entire data center

  4. if this is all bullshit why not letting everybody download the deck and test for themselves if those things are true or not (adding notes from vmware etc ….)? aren’t we all technical consultants caring about customer’s data centers….let them figure out the truth before it’s too late even for vmware …

Leave a Reply

Your email address will not be published. Required fields are marked *