High availability of Linux operating system clusters

  

System availability can be greatly improved by hardware redundancy or software. Hardware redundancy mainly ensures that the redundant components can continue to be used to provide services by maintaining multiple redundant components such as hard disks and network cables in the system. The software method is to use software to multiple machines in the cluster. The operational status is monitored to initiate the backup machine to take over the work of the failed machine when a machine fails to continue to provide service.

In general, you need to ensure high availability of the cluster manager and high availability of nodes. Eddie, Linux Virtual Server, Turbolinux, Piranha, and Ultramonkey all use a high availability solution similar to Figure 1.

1 a schematic diagram of high availability solutions

High Availability Cluster Manager

Cluster Manager for shielding failure, it needs to establish a backup machine. The heartbeat program runs on both the main manager and the backup manager to monitor the health of the other party by sending messages such as "I am alive". When the backup machine cannot receive such information within a certain period of time, it activates the fake program, allowing the backup manager to take over the main manager to continue to provide services; when the backup manager receives it from the main manager "I am alive " such information, it invalidates the fake program, thus releasing the IP address, so the main manager starts the cluster management work again.

High Availability for Nodes

High availability of nodes can be achieved by constantly monitoring the state of the nodes and the running state of the applications on the nodes. When the nodes are found to have failed, the system can be reconfigured. And the workload is handed over to those nodes that are functioning properly. As shown in Figure 1, the system monitors the health of the service program on the actual server in the cluster by running the Mon Wizard on the cluster manager. For example, use fping.monitor to monitor whether the actual server is still running at regular intervals; use http.monitor to monitor http services, ftp.monitor to monitor ftp services, and so on. If an actual server is found to be down, or if the service on it has failed, all rules for this actual server are removed in the cluster manager. Conversely, if it is discovered that the system has been able to provide services again in the near future, all the corresponding rules are added. In this way, the cluster manager can automatically mask the server and the failure of the service programs running on it, and can rejoin the cluster system when the actual server is up and running.

Copyright © Windows knowledge All Rights Reserved