I was working with a customer recently and ended up spending a lot of time talking about how to get better performance while migrating virtual machine (s) from Source Data Center to Target Data Center using VMware HCX technology. Here are some of the factors (not limited to) to consider to achieve better performance while using HCX. Keep in mind HCX is not just for VMware Cloud on AWS, it can also be used with your existing vSphere environments. With that said some of these suggestions may or may not apply to the VMware Cloud on AWS version.
- Check to see if you have Jumbo Frames enabled (MTU size of 9000) end to end, at both source data center as well as target data center. One test which can be done, is to do a Ping test from virtual machine on source DC with DF bit set and an ICMP payload of 1472 or more. Check to see, setting the payload to one byte larger if it fails with the expected “Would Fragment but DF bit set” fail, this test can give us some data point as to what is the value of MTU in the network path.
- Ensure that HCX-IX CPU/Memory is reserved and is not constrained. Sharing resource pool with another workload in the cluster may have diminishing effect. A dedicated resource pool would help.
- Migrate HCX-IX/WANOPT virtual machine on a fastest available storage in the environment (greater IOPS is always better). If the environment has SSD all flash array, using SSD storage for HCX-IX/WANOPT appliances / virtual machines will be beneficial. Please do remember a lot depends upon how fast we copy data from storage to host eventually to get transferred across.
- In my opinion Bulk migration is slower and far more resilient than vMotion migration. Bulk migration depends on several factors including data store latencies, number of hosts in host cluster, CPU wait times, latency on path between source and target hosts, target data store latencies.
- Try running vMotion on a test VM from source DC to target DC without WANOPT, check receive/transmit bandwidth on Cloud Gate Way (CGW) appliance as well as vCenter performance stats during migration. Consideration to make long distance cross vCenter migration needs to met.
- The HCX-IX (CGW) appliances meant for migrations establishes 3 tunnels and each can provide upto 1.2Gbps. If you use WAN Optimization there are higher chances that all 3 tunnels will be used provided all paths are in good state. In your environment try to deploy multiple HCX-IX-CGW, HCX-IX-NetworkExtension, HEX-WAN-Optimization appliance per host cluster or per virtual distributed switch to achieve parallelism. Move groups for virtual machine can be designed in such a way which is spread across several host cluster, to take care of multiple HCX-IX appliances deployed.
- Cold migration is for powered off VMs. It can’t be scheduled and can only be used for 8 parallel migrations (vs 100 with Bulk Migration). Bulk is low downtime, the VM is running right until cut over time or schedule window. One should measure network bandwidth with couple of Cold Migration too, to get a better picture.
- Generally speaking, Bulk migration uses a lot of I/O compared to cold migration. In case of a bulk migration your cut over moment (and downtime) is relatively small and can be scheduled. With cold migration you can get faster network speeds and less I/O, but your vm is down during the complete migration process. Hence it helps to started the bulk migration one days/week before the scheduled cut over time, so as to make sure data gets synced in time and utilizes available bandwidth.
- Bulk Migration uses VMware NFC protocol for syncing of data. The initial sync is the hardest hit on your storage array. After the initial sync the deltas are synced which could be faster than the initial sync.
- How much bandwidth is actually available on the path between the source and target environment? You may want to do a simple iperf test measurements outside of VMware HCX appliance between the same networks. In other words take a test Linux virtual machine at Source Data Center start iperf client and start iperf server on a another Linux virtual machine at destination data center. Note down the speed and compare with HCX speeds. Available optimization and compression inherently available with HCX, may get you better performance with HCX.
- Reducing Non HCX vMotion(s) tasks running on vCenter during HCX migration window can give you better performance with HCX.
- Using dedicated VMkernel port for vSphere Replication traffic at source vCenter and Target vCenter will help tremendously in faster speed than co-mingling and sharing vSphere Replication traffic with management traffic.
- Consider a dedicated Uplink network between the HCX-IX appliances between the source and target data centers, so that you have direct adjacency.
- HCX Network Extension appliance are IO intensive, hence try to vMotion any other powered on VM’s residing on the same host as of IX appliances off of that host. Try to consolidate all the HCX Network Extension appliances into separate dedicated host, where there are no other workloads VM’s reside and then test the performance.
What are the steps to bypass the wan-op when doing vMotion of the vm from source to target?
You would need to edit your Service Mesh and un-check wAN Optimization. Save and the WO Appliance will be removed.