Windows system >> Linux system Tutorial >> Linux Tutorial

Design and Implementation of Linux-based Cluster Management System

Computer Shop News With the continuous development of high-speed networks, network users and various network applications are growing rapidly, making the load capacity of network servers gradually become a bottleneck and weak link in high-speed networks. A single server requires high hardware costs. Meeting such high load requirements, and sometimes even difficult to meet. Therefore, a variety of relatively inexpensive and stable high-availability cluster systems are more widely used, and various commercial, non-commercial cluster systems are rapidly developing. Among them, based on Linux LV S (Luxux virtual server) load balancing cluster system is widely used due to its openness, high availability, and high scalability. However, LV S cluster lacks a comprehensive management system to monitor and manage the cluster and ensure the system. Stable operation. When the cluster size is large, it is very cumbersome to manage a single node or a cluster system in the cluster. This paper designs a more complete management system for the LV S cluster, so as to expand its cluster size and improve the LV S cluster. Availability and versatility; and partial implementation of the system rea lserver cluster.
1 Overall The design of this system refers to the design idea of some mature cluster management systems, without changing the original LV S cluster system. Modular design, different functions are independent into modules, each module does not affect each other, separate Exchange information with a unified management interface for easy modification and expansion. The system is divided into four parts: monitoring of cluster node status, rapid installation and recovery of cluster nodes, dynamic scheduling of clusters, and management interface. Cluster node status monitoring includes intra-cluster The status of each node's hardware and software is monitored and alarmed according to the set hazard value. The rapid installation and recovery of the node refers to the quick and convenient installation of the cluster node operating system and software, and can quickly return to the initial test state when the node fails. The hardware configuration of each node in the cluster needs to be consistent. The dynamic scheduling of the cluster requires fast and smooth switching when the number of nodes in the cluster increases or decreases, and does not affect the operation of the cluster. The friendly management interface is the channel for the cluster administrator to interact with the management system. And the various parts of the system are combined through the management interface to form an organic Overall. The implementation of the whole system is not all from scratch. There are many OpenSou rce network management projects for reference. At the same time, there are some simple open source management software for LV S clusters. You can also refer to the above open code modification. On the basis of this, the self-developed code is added to form a complete management system. The bottom layer of the system mainly uses the international standard protocol SNM P simple network management protocol to manage the cluster and facilitate expansion. 2 Specific design of each part 2. 1 Monitoring of cluster nodes The monitored objects mainly include node memory, CPU usage, node load status and service process health. When the above object is abnormal, the system will issue an alarm, manually or automatically troubleshoot. The monitoring part is open source project MRTG (mu lt i2rou tert Based on raff ic grapher) and MON. MRTG is a network traffic monitoring tool. It can also monitor the flow of hardware and specific services such as CPU, memory, Iö O, and graphically monitor the results through W EB. The mode is displayed. MRTG uses the SNM P protocol to set up the network. Monitoring, and MRTG provides an interface to draw various display graphics with third-party tools. This system uses RRDToo ls to draw the graphics required by the system management interface. MON is a service availability monitoring tool that can alert when the service fails. MON The monitoring process can be divided into two separate parts: monitoring conditions, actions triggered when conditions fail. MON monitors the monitored process or device in the form of a monitor (mon ito r), triggers the corresponding Alarm program (alert); The alarm program can be automatically processed according to the settings and notified to the administrator by means of ma il. The two parts can be set independently and very flexible. MON is responsible for monitoring the availability of the node service and issuing an alarm when an abnormality occurs. And the corresponding processing, and MRTG is responsible for the system collection and display the running status of the node, providing intuitive and detailed data for analyzing cluster performance and judging the cause of the failure. 2. 2 Quick installation and recovery of cluster nodes With the number of nodes in the cluster Adding, the installation of the node operating system and software will become a very cumbersome task. Load balancing and the same functions are implemented in each node in the cluster. The installed operating system and software are also the same. Under the premise of the same hardware configuration, the software can be automatically and quickly installed. At the same time, the fault node can be quickly restored. The system uses the PXE (p reboo texecu t ion environmen t) remote boot standard and System Imager system image tool defined by Intel. PXE is a replacement technology of RPL (remo te p rogram load), which can remotely boot W indow s series, Linux A variety of operating systems. System Imager mirroring tool can mirror the system of the sample machine, the mirror server through the network will be completely compatible with the sample machine software system installed on other machines, and can be the client's IP, host name, etc. Do a simple configuration. The mirror server is also a PXE server. First, the sample node is installed, and then the image is generated on the mirror server. The node in the cluster to be installed by the system is booted by PXE to install the image of the sample node, which enables fast and automatic installation. When the cluster is changed Just change the settings of the sample machine and update the image. It is possible to update the entire cluster system. When there is a major failure in the nodes in the cluster, the node system can be reinstalled in this way to recover. 2. 3 Dynamic scheduling of the cluster It is relatively easy to add or reduce nodes in the LV S cluster. You only need to use ipvsadm to set up the load balancing node, and the transition is very smooth. This part mainly does interface with other parts such as interface, monitoring, etc., and needs to be extended to content-based load balancing dynamic scheduling. Good preparation. 2. 4 Management interface The entire management system forms an organic whole through the management interface. The administrator interacts with the management system through the management interface to realize the management of the cluster system. The management interface adopts CöS mode, for security reasons, The management interface will be developed independently. You can refer to some simple graphical LV S management tools, such as lvs2gu i, lvsm, etc. 3 Future expansion LV S cluster system is still developing, will support content-based load Equilibrium. The current LV S is only load balanced at the Layer 3 network layer, and the cluster can only provide a single service. Content-based load balancing will enable the same cluster to provide multiple services. This is in preparation for the expansion of the system when designing the management system. Mainly in the node installation, dynamic scheduling two parts. To be able to pass the node installation system Pre-configured to install different software for nodes that provide different services. This can be modified by referring to the script installation tool, or making multiple images for selection. In the dynamic scheduling part, it is required to dynamically schedule the cluster according to the load status of each service. The number of nodes providing different services. When the load of a service is too heavy, the node with light load is converted into the node that provides the service. The distributed shell (dist ribu ted shell) can be used to implement this scheduling. Tools such as Cfengine. 4 System Implementation The system has been partially implemented in the realserver cluster of the VOD video on demand system of Dalian University of Technology. The monitoring and automatic installation has been implemented, and the nodes in the cluster can be easily scheduled. × 24 h stable work, after the unified management interface is implemented Li will be more convenient, the operational efficiency of the system will be further improved.