Windows system >> Linux system Tutorial >> Linux Tutorial

Introduction to Linux Cluster Technology

1 Cluster Definition Cluster is a set of services that work together to provide a more stable, efficient, and scalable service platform than a single service. From the outside, the cluster is an independent service entity. But in fact, within the cluster, there are two or more service entities that coordinate and cooperate to complete a series of complex tasks. A cluster generally consists of two or more servers. Each server is called a cluster node. The cluster nodes can communicate with each other. There are two ways to communicate. One is based on RS232 line heartbeat monitoring. The other is to use a separate network card to run the heartbeat. Therefore, the cluster has the service status monitoring function between nodes, and must also have the extended function of the service entity, which can flexibly add and remove a service entity. In a cluster, the same service can be provided by multiple service entities. Thus, when a node fails, another node of the cluster can automatically take over the resources of the failed node, thereby ensuring a persistent and uninterrupted service. Therefore, the cluster has an automatic failover function. A cluster system must have shared data storage, because the services provided by the cluster are consistent. When any cluster node runs an application, the application data is stored centrally in the node shared space, and each node's operation is performed. On the system
, only the services of the application are run, and the application files are stored. In summary, building a cluster system requires at least two servers, as well as serial lines, cluster software, and shared storage devices (such as disk arrays). Linux-based clusters stand out in a variety of enterprise applications with their high computing power, scalability, availability, and optimized price/performance. Become a hotspot of Linux applications that everyone cares about now, master the knowledge of Linux clusters, and make high-performance applications at low prices. Save costs for businesses and individuals. The domestic large-scale websites Sina, NetEase, etc. all use the Linux cluster system to build high-performance web applications. The famous search engine google uses tens of thousands of linux servers to form a large cluster. These examples illustrate the status and importance of clusters in linux applications. Sex. 2 Cluster features and functions 2.1 High availability and scalability 1. High availability For some real-time applications, the service must be guaranteed for 24 hours of continuous operation. Due to various reasons such as software, hardware, network, and human, it is difficult for a single service operation environment to meet this requirement. Building a cluster system is a good choice. One of the biggest advantages of building a cluster is that the cluster is highly available. In the event of a service failure, the cluster system can automatically switch services from the failed node to another standby node, providing uninterrupted service. It ensures the continuous operation of the business. 2. Scalability With the increase of business volume, when the existing cluster service entity cannot meet the demand, one or more service nodes can be dynamically added to the cluster to meet the needs of the application and enhance the overall performance of the cluster. This is the scalability of the cluster. 2.2 Load Balancing and Error Recovery 1. The biggest feature of the load balancing cluster system is that it can share the system load flexibly and effectively. The load sharing policy defined by the cluster is used to allocate the client's access to each of the following service nodes. For example, the polling allocation policy can be defined and the request averaged. The allocation to each service node can also define a minimum load allocation policy. When a request comes in, the cluster system determines which service node is relatively free, and distributes the request to this node. 2. Error recovery When a task has not been completed on one node, for some reason, the execution fails. At this time, another service node should be able to complete this task. This is the error recovery function provided by the cluster, through the wrong redirect. , to ensure that each execution task can be completed effectively. 2.3 heartbeat detection and drift IP1. Heartbeat monitoring In order to achieve load balancing, provide high-availability services, and perform error recovery, the cluster system provides heartbeat monitoring technology. Heartbeat monitoring is implemented by heartbeat. Devices that can be used for heartbeat have RS 232 serial cable, or can be independent. A network card to run the heartbeat, can also be a shared disk array, etc., the number of heartbeats should be reduced by 1 for the number of cluster nodes. It should be noted that if the network card is used to make a heartbeat, each node needs two network cards, of which A piece of private network is directly connected to the corresponding network card of the other machine to monitor the heartbeat of the other party. The other one is connected to the public network to provide external services, and the IP addresses of the heartbeat network card and the service network card are not allowed to be in one network segment. The efficiency of heartbeat monitoring directly affects the length of the failover time. The cluster system maintains internal effective communication between nodes through heartbeat technology. 2. The drift IP address is in the cluster system. In addition to the real IP address of each service node itself, there is also a drift IP address. Why is the drift IP? Because this IP address is not fixed, for example, two nodes on two nodes. In hot standby, in normal state, the drift IP is located on the primary node. When the primary node fails, the drift IP address is automatically switched to the standby node. Therefore, in order to ensure the uninterrupted service, the externally provided services are provided in the cluster system. IP must be this drift IP address, although the IP of the node itself can provide external services, but when this node fails, the service switches to another node, but the service IP is still the IP address of the faulty node. At this time, the service is followed. Interrupted.

3 Classification of clusters 3.1 High availability clusters 1. High-availability concept The full name of the high-availability cluster is High Availability Cluster (HA Cluster). The meaning of high availability is that it can be used to the utmost. From the name of the cluster, the function implemented by such a cluster is to protect the user's application. The program provides long-lasting, uninterrupted service. When the application fails, or the system hardware or network fails, the application can automatically and quickly switch from one node to another, thus ensuring continuous and uninterrupted external service delivery. This is the function realized by the high-availability cluster. 2. Common HA Clusters We often say that dual-system hot standby, dual-machine mutual standby, and multi-machine mutual standby are all in the category of high-availability clusters. These clusters generally have two or more nodes. Typical dual-system hot standby structure

Two-node hot standby is the simplest application mode, which is often called active/standby mode. It uses two servers and one as the main server (action) to run the application. The program provides external services, and the other serves as a standby, installs the same application as the primary server, but does not start the service and is in standby. The host and the backup device are monitored by the heartbeat technology. The monitored resources can be the network, the operating system, or the service. The user can select the resources to be monitored according to their own needs. When the standby device monitors a resource of the host. In the event of a failure, according to a pre-set strategy, the IP is first switched over, and then the application service is taken over, and then the standby machine provides external services. Since the switching process is very late, the user does not feel the program at all. The problem, and also switched, to ensure a durable, uninterrupted service for the application. Dual-machine mutual backup is based on dual-system hot backup. Two independent applications run simultaneously on two machines, which are mutually active and standby. That is, two servers are both a host and a standby. When any application fails, The other server can take over the application of the faulty machine in a short time, thus ensuring continuous and uninterrupted operation of the service. The advantage of the two-machine mutual backup is that the device resources are saved. The dual-system hot backup of the two applications requires at least four servers, and the dual-machine mutual backup requires only two servers to complete the high-availability cluster function, but the two-machine mutual backup also has Disadvantages of its own: After a node fails over, the services of the two applications are running on the other node at the same time, and the load may be too large.