There is a reason why a Validation Plan is part of the VCDX Documentation. Funny thing is many people don’t seem to take it that seriously either for the certification itself or for actual use in a production environment. When I was growing up as a young IT boy starting my career, I was taught that NOTHING goes into production without proper validation. Every application or server build had various levels of testing that had to be performed and signed off on before it was ever released to production. So Why am I on a bit of a rant? Mostly because I can be, this is my blog, but I digress…
Pre-solve Network Problems in vSphere
In recent weeks I’ve had the pleasure of helping some folks out with some designs and troubleshooting networking issues. What I have found is the vast number of issues that simply stem from mis-configured VLAN’s either on vSphere or upstream. What makes this disturbing is that they issues are not found until the host is put in the cluster and the first VM is migrated to it. When I asked “was a validation plan run to ensure all the networking was in working order prior to putting the host in production”, the answer most of the time was dead silence.
When I was on a long-term residency the customer was building and deploying ESX hosts almost weekly and each host had in excess of 30 VLAN port groups. I established a simply testing process of building a VM or two, and systematically putting reserved testing IP addresses on each and every VLAN. We would then ensure that every network port group on the new host in fact worked on ALL physical interfaces by testing through the various VLAN’s on an interface, then repeating the test on all other interfaces. This even involved pulling cables to ensure the port group teaming worked properly in the process. You can bet that sometimes we found that a port trunk was sometimes missing or incorrect on an interface here and there.
Validation Testing Is Tedious
You are darn right it is. Those hosts we were building took a few hours each to test back in the day. We also did not use any automation we just deployed templates, manually updated IP’s changed settings, etc. We did it based on a written validation plan that could today be somewhat if not heavily automated. If these were being added to an existing cluster this was VERY important. If it was a new cluster we would simply use vMotion to test through each of the new hosts since the entire cluster was not in production yet.
Testing sucks, it’s slow, boring, tedious, BUT it is paramount to reducing the headache you might cause later with troubleshooting something. It’s also referred to as preventative maintenance. You should not wait until it breaks to find out there is a problem, you should be proactive about it. In the end it will save you, and countless other people a lot of wasted time. Don’t be lazy just writing a validation plan…..USE it each and every time and make sure others use it too.