Operation and maintenance experience sharing: server cost optimization strategy

  

In the current data center TCO cost structure, server and cabinet costs are the maximum cost, accounting for about 54% of the total cost, the second place is power supply and cooling, about About 21%, it can be seen that server cost optimization is critical to the overall operating cost control.

The current server development trend is large-scale (high-density, customized), lower power consumption, higher performance, major companies mainly cooperate with OEMs, customized according to the business types of each company, customization process Adhere to multiple brands, differentiate, and shield brand information and package. Hardware-based customization combined with server classification, grading, and tiering to achieve cost optimization are as follows:

Server Classification

Internet companies have multiple Product lines, such as Tencent and Baidu, have nearly one hundred product lines, each product line is divided into products, each product corresponds to different hardware and network. The importance and scale of these products are different, and the corresponding types need to be selected. The server hardware, as shown in Figure 11-1, to avoid resource inactivity and waste, usually classifies servers as follows.

1, access server, mainly used for WEB access server, I /O intensive and less CPU processing, this category is a low-cost server, to think of a single power module and expansion slot, no Hot swap, no RAID, etc., can be based on CPU, memory, SSD, hard disk, etc., and then use 2 to 3 categories, or use high-density servers, such as one U two, two U four, the density is increased by 50%, work Reduced consumption by 15% and rack rental costs by half.

2, balanced server, mainly used for application services, can be understood as a general-purpose server, separate from the dedicated server, for logical services or middle-tier services, can be based on CPU, memory, SSD 2, 3 types of hard drives, etc.

3, storage server, mainly used for online and offline storage services, large hard disk large storage space, can be based on hard disk space and type, then 2 to 3 categories, such servers are also Internet The main server of the large company storage cloud.




Table 11-1 Module Classification and Server Classification

Server Distribution
Layer

According to the product architecture layering, each layer uses different types of servers, each layer of service uses a type of server, which can balance performance, maximize the use of server resources, and is also beneficial for easy management, batch shelf expansion, batch Decommissioned, excellent operational planners can make the most reasonable use of server resources for each layer of application, thus avoiding resource idleness and waste.

Server Rating

According to product importance, revenue size, online time, etc., you can classify the corresponding server,

1. Excellent products For example, high-yield products and star new products, each layer of modules use new, high-profile servers, keep low capacity, and prepare sufficient budget to expand quarterly.

2, stable products, 2 to 3 years stable product server to maintain high capacity, use the server of the corresponding age to update and expand, and even merge servers according to capacity downline.

3, historical products, products of less than 4 years, this kind of business has gone through many years of research and development, and has been used by multiple product lines. These servers will eventually face the aging of servers and need to be advanced in advance. Such business modules are merged or platformed, and even stripped off the assembly line.

Server classification, grading, and layering customization are suitable for non-cloudized companies and products. Excellent operational planners, like housekeepers, will balance online services and costs to a balance point, and the best use of resources No waste.

Business Classification Platforms Reduce Operating Costs

Operational Resources (Servers, Bandwidths, Dedicated Lines, QOS for Each Region + IDCs) and Existing Architecture of Product Lines Actual incremental demand (including future architectural changes, expansion, and optimization) is related to capacity management and cost management, ultimately driving the implementation of budget and budget models, and ultimately through operational costs.

Large companies have multiple products in multiple divisions, and each product requires multiple operational resources, all of which require capacity management, cost and budget management, so that each team has a good plan for each product. The ability is unrealistic, and the module classification and extraction in the product are combined into a platform, which can be unified planning and management, and the operational resources can be effectively controlled. The service classification platform is shared as follows:

General Application classification platformization

When it comes to Taobao, everyone should have a deep impression on Taobao's CDN platform. In 2012, the double 11 maximum traffic reached 2000G, which is the largest static application platform of the Internet company. It is not difficult to find out that Taobao website is 80%~90% of Taobao's traffic is contributed by static pictures. This case can be associated with all e-commerce websites, community websites, portal websites, etc. Usually, the platform mainly has the following three. direction.

1, static class platform, large picture, small picture, text, JS, download, video, etc.

2, dynamic class platform, logic, queue, message, recommendation, account, relationship, PHP, Java, etc.

3, data platform, log, calculation, storage, database, etc.

The ultimate in platform applications

Taobao's CDN is an example of scale, architecture, hardware, content, speed, and cost pursuit. This is a typical application platform success story. This application platform can achieve departmental, company-level platforms, and even the best in the industry. Platform.

Platforms are not built overnight, from small to large, from extensive to fine, and continue to absorb the same applications in history and new products, and continue to grow. The traditional operation and maintenance of the new line of business is like the need to prepare a lot of raw materials, and then processing the raw materials, and after the platform, only need to assemble the components, and these components do not need to be maintained.

Mixed to maximize resource utilization

With the application platformization and intensification, the application platform has gradually replaced the traditional operation and maintenance object, that is, the product line, and the application platform is Application clusters, so the cluster has become the basic unit of operation and maintenance, and with the development of various businesses, the scale of various clusters is expanding at a rate of several times per year, and the number of large application platform servers has reached tens of thousands. Server size.

The functions and roles of these platforms are different. The overall is divided into three categories: CPU-intensive, I/O-intensive, and storage-intensive. If the average, the overall resource utilization of the server is not high. Unbalanced resource utilization, the greater the number of platforms and clusters, the more waste, due to historical expansion and server hardware inconsistency, etc., will increase the idleness and waste of such resources. This kind of regular and regular resources can be idle. Mix to increase resource usage.

Prerequisites for Service Distribution

1. Platform-based services, platform-type services have scale attributes, and also have the necessary accumulation of multi-region and multi-IDC distribution, data distribution, backup, etc. Quickly integrate and streamline.

2, non-burst, different types of business, burst-type business can increase the load to several times due to hot events, so it is not suitable for mixed distribution, the same type of business due to resource competition, is not suitable Mixed, low usage rate of similar business resources can be decided according to the trend of resource usage.

3, hardware configuration, network distribution is close, in the life cycle of multiple products, the IDC of the server and server is purchased in batches and online, because the product is not a short-term scale, and The hardware changes greatly every year, and the hardware is close to allow the performance of the modules in each area to be balanced. If there is a big difference in hardware, there will be a performance unevenness, and hardware upgrades or replacements can be performed for the distribution.

Problems with service distribution

1. Cluster cross-effects, mixed clusters have different cross-impacts due to different user scales, different user resolution strategies, and different resource usage trends. This effect will occur in an overloaded state, so the capacity management can be effectively avoided under the premise.

2, hardware cross-impact, hardware failure is inevitable and uncertain, the hardware here refers to all online production environment hardware, network hardware, server and rack hardware, etc., hardware failure can lead to mixed Clusters are not directly available, so the mix relies on platform health monitoring and automatic recovery capabilities.

Copyright © Windows knowledge All Rights Reserved