Server Cluster High Availability: DNS and Failover

  

High availability HA is still one of the most difficult components in a cluster, even as virtualization matures. The server cluster can initiate high availability, which is a hypervisor feature that limits downtime when a virtual machine crashes. VMware vSphere, Microsoft Hyper-V, and Citrix XenServer all offer high-availability capabilities that mitigate disaster recovery tasks in virtual infrastructure.

Too many people implement virtualization projects without understanding high availability. To make matters worse, the administrator ignored high availability during the implementation of the server cluster, which led to the discovery that it changed from a solution to a problem to a problem that needs to be solved.

In fact, high availability performance solves some of the problems. It's a simple service that restarts a virtual machine after a host failure, no matter which hypervisor you use. Continuous availability is an ideal goal, but virtual machines still experience some downtime.

High availability is usually related to hot migration, such as XenMotion, vMotion, but it is not. I have seen a lot of problems in the server cluster after the first host failure, due to confusion between the two concept.

High availability technology is getting smarter, but be aware that the following issues can crash your server cluster.

How DNS affects high availability

In combination with VMware HA, Domain Name Server (DNS) resolution can become a serious problem. To allow server cluster nodes to communicate with each other, VMware is responsible for DNS resolution. Usually, this is not a problem. But today many IT people are used to DNS is a service concept, no need to manage.

Part of the reason for this non-intervention policy is the dynamic DNS capabilities of Windows. Many administrators don't take the DNS seriously as they did before, because Dynamic DNS now automates most tasks. But VMware servers don't use dynamic DNS.

If you use VMware HA in a server cluster, make sure your management network IP address and associated hostnames are all in DNS. Make changes or add attachments to the virtual environment. Manual operation and maintenance are required. If the DNS is not configured properly, VMware will have obvious hints, but it is easy to ignore this prompt if it is found too late.

DNS resolution in a multi-site server cluster

DNS resolution issues can also affect multi-site Hyper-V clusters. Hyper-V's Windows Failover Clustering service now spans subnets. In some ways, this architecture is good because you no longer need to use complex network technologies to manage across different locations. On the other hand, virtual machines that fail over to the second site usually need to process new subnets.

This is not a big problem from the server side, but it causes problems for the client. The client is configured with a time-to-live value that determines how long it takes to cache DNS reports. These reports are outdated after the failover. In physical disaster recovery, it's usually not a problem, because you may need to deal with more important issues, such as "Data center is crashing!" But in a virtual architecture, when a virtual machine accidentally migrates to another replaceable site, problem appear.

High availability issues do not specifically appear in Hyper-V clusters. Server clusters that start disaster recovery for virtual machines on different subnets experience similar problems.

Importance of Failback Commands

The DNS issue highlights the fact that the Failback command is important in server cluster management. Some server cluster organization failure recovery commands are better than others. For example, VMware HA lets the server cluster handle the failback commands itself. Others, such as Hyper-V, administrators manually determine where the virtual machine will migrate after a failure.

What you don't want to see is that the virtual machine moves to an inappropriate server cluster node, such as to the other end of the multi-site cluster, or to an overloaded node. Pay special attention to your fault recovery commands to ensure a balanced cluster load.

What should I do with host isolation?

Host isolation occurs when the server cluster host is still online, but it is no longer able to communicate with other nodes. The problem with host isolation is that the isolated host still runs the virtual machine. In VMware HA quarantine events, these virtual machines typically run on different virtual switches without being affected by quarantine. The cluster may want to recover these virtual machines out of the quarantine, but it cannot be achieved if an isolated host lacks the virtual machine's disk files.

There are several ways to fix this problem. Obviously, it is best to recall the isolated host back online. But if you can't do this, you need to shut down the virtual machine and let the surviving cluster nodes fail over the virtual machines. Pay attention to the isolation response settings of the high-tech solution to determine which setting meets your specific needs. Many features allow you to choose to continue running or shut down a virtual machine when the host is isolated.

High availability is a useful component in virtual infrastructure, but it does not circumvent important settings in the server cluster to manage exciting load balancing. Otherwise, there will be many tricky problems.

Copyright © Windows knowledge All Rights Reserved