Windows system >> Linux system Tutorial >> Linux Tutorial

Analysis of the main points of Linux system cluster technology

Nowadays, many enterprises and websites are using the Linux operating system. The advantages of Linux make people abandon Microsoft. Today, I will talk about Linux cluster technology, let you know Linux more and know the powerful features of Linux cluster technology. Give you a reference to the system.

One of the reasons why Linux is very competitive is that it can run on extremely popular PCs without the need to purchase expensive dedicated hardware. On several PCs running Linux, as long as the corresponding cluster software is added, a Linux cluster with superior reliability, load capacity and computing power can be formed. Each server in the cluster is called a node.

Depending on the focus, Linux clusters can be divided into three categories. One type is a high-availability cluster that runs on two or more nodes in order to continue to provide services in the event of some system failure. The design philosophy of high availability clusters is to minimize service downtime. The most famous of these clusters are Turbolinux TurboHA, Heartbeat, Kimberlite and so on. The second type is a load-balancing cluster, which aims to provide load capacity proportional to the number of nodes. This cluster is suitable for providing large-access Web services. Load-balancing clusters often have certain high-availability features. Turbolinux Cluster Server and Linux Virtual Server are all load balancing clusters. The other type is supercomputing clusters, which can be divided into two types according to the degree of computing association. One is the task slice mode, which divides the computation task into task slices, then assigns the task slices to each node, calculates them separately on each node, and then summarizes the results to generate the final calculation result. The other is the parallel computing method, in which the nodes exchange data in a large amount during the calculation process, and the calculation with strong coupling relationship can be performed. These two supercomputing clusters are suitable for different types of data processing work. With supercomputing cluster software, enterprises can use a few PCs to complete computing tasks that are usually only possible with supercomputers. Such software includes Turbolinux EnFusion, SCore and so on.

High availability clusters and load balancing clusters work differently and are suitable for different types of services. Generally, load balancing clusters are suitable for services that provide static data, such as HTTP services. High-availability clusters are suitable for services that provide static data, such as HTTP services, and services that provide dynamic data, such as databases. The reason why high-availability clusters can be applied to services that provide dynamic data is because nodes share the same storage medium, such as RAIDBox. That is, in a high-availability cluster, only one copy of user data for each service is stored in the shared storage device. On the other hand, only one node can read and write this data at any time.

Taking Turbolinux TurboHA as an example, there are two nodes A and B in the cluster. This cluster only provides Oracle services, and user data is stored in the partition /dev/sdb3 of the shared storage device. In the normal state, node A provides the Oracle database service, and the partition /dev/sdb3 is loaded by node A on /mnt/oracle. When a system failure occurs and is detected by the TurboHA software, TurboHA will stop the Oracle service and uninstall the partition /dev/sdb3. After that, the TurboHA software on Node B will load the partition on Node B and start the Oracle service. For the Oracle service to have a virtual IP address, when the Oracle service is switched from Node A to Node B, the virtual IP address is also bound to Node B, so the user can still access the service.

As can be seen from the above analysis, a high-availability cluster does not have a load balancing function for a service, which can improve the reliability of the entire system, but cannot increase the load capacity. Of course, a high-availability cluster can run multiple services and be properly distributed on different nodes. For example, node A provides Oracle services, and node B provides Sybase services. This can also be seen as a load balancing in some sense, but this is For the distribution of multiple services.

A load-balancing cluster is suitable for services that provide relatively static data, such as HTTP services. Because usually there is usually no shared storage medium between the nodes of the load balancing cluster, the user data is copied into multiple copies and stored on each node that provides the service. The following is a brief introduction to the working mechanism of the load balancing cluster using Turbolinux Cluster Server as an example. There is a master node in the cluster called Advanced Traffic Manager (ATM). Assume that this cluster is only used to provide an HTTP service, and the remaining nodes are all set to HTTP service nodes. The user's request for the page is sent to the ATM, because the external IP address of the service is bound to the ATM. The ATM sends the received request to the service nodes evenly. After receiving the request, the service node directly sends the corresponding web page to the user. In this way, if there are 1000 HTTP page requests in 1 second and there are 10 service nodes in the cluster, each node will process 100 requests. Thus, from the outside world, it seems that there is a 10x speed high-speed computer that handles user access. This is the true sense of load balancing.

But ATM has to deal with all 1000 page requests, will it become the bottleneck of cluster processing speed? Since the amount of data requested for a page is relatively small, the amount of data returned to the page content is relatively large, so this method is still very efficient. Failure of the ATM will not cause the entire system to fail. Turbolinux Cluster Server can set one or more computers as backup ATM nodes. When the primary ATM node fails, a new primary ATM will be generated in the backup ATM to take over its work. It can be seen that this load balancing cluster also has a certain high availability.

HTTP pages are relatively static, but sometimes they need to be changed. Turbolinux Cluster Server provides a data synchronization tool that makes it easy to synchronize changes to the page to all nodes that provide the service.

The following describes the combination of high availability clusters and load balancing clusters. If the user has a minimum cluster of two nodes, is it possible to get the benefits of both high availability clusters and load balancing clusters? The answer is yes. Since high-availability clusters are suitable for services that provide dynamic data, and load-balanced clusters are suitable for services that provide static data, we might as well assume that both Oracle and HTTP services are available. Users should install TurbolinuxTurboHA and TurbolinuxClusterServer software on nodes A and B. Node A is the node that Oracle works normally, and Node B is the backup node of the Oracle service. This is for TurboHA software. For the ClusterServer software, the node B is set to be the primary ATM node, the node A is the backup ATM node, and the node A and the node B are both HTTP service nodes.

In this way, both Node A and Node B are both roles, and the user gets both a highly available Oracle service and a load balancing HTTP service. Even if one node fails, neither the Oracle service nor the HTTP service will be interrupted.

But for the same service, you can't get high availability and load balancing at the same time. For a service, there is only one piece of data, placed on a shared storage device, accessed by one node at a time for high availability; or copied into multiple copies, stored on each node's local hard disk, the user's Requests are sent to multiple nodes simultaneously for load balancing.

For high-availability clusters, because of its design time to minimize the service interruption time, service switching has received a lot of attention. When a service on a node fails, it is quickly detected and switched to another node. However, the protection of data integrity cannot be ignored when switching.

Under what circumstances will data integrity be destroyed? Since there are at least two nodes in the high-availability cluster connected to a shared storage device, for non-naked partitions, if the two nodes simultaneously read and write, the file system will be destroyed. Therefore, I need to use the I /O barrier to prevent this event from happening.

The purpose of the I/O barrier is to ensure that the faulty node can no longer continue to read and write shared partitions of a service. There are several ways to implement it. Kimberlite uses hardware switches to implement. When one node fails, if another node can detect it, it will issue a command through the serial port to control the hardware switch connected to the faulty node power supply, temporarily power off, and then The way the power is turned on causes the failed node to be restarted.

I/O barriers come in many forms. For storage devices that support the SCSI Reserve/Release command, the SG command can also be used to implement the I/O barrier. Normal nodes should use the SCSI Reserve command "lock" shared storage device to ensure that it is not read or written by the failed node. If the cluster software on the failed node is still running, if you find that the shared storage device has been locked by the other party, you should restart yourself to resume normal operation.

The above introduces the basic principles of Linux cluster technology, and also introduces several well-known software. In short, Linux cluster technology maximizes the advantages of PCs and networks, can bring considerable performance, and is a promising technology. I hope you can learn more about Linux cluster technology through this article.