Case Sharing: Fantasy Westward Journey Server Optimization

  

Patching in historical engineering is a hassle.

The first two days talk about the optimization of the Fantasy Westward Journey server. I have stayed in Guangzhou these days and I plan to spend a week specializing in this matter. Since I used to be chatting online, I can only really understand the problem if I sit together.

At present, Fantasy Westward Journey uses only a single machine, with a maximum of 8 CPUs and 8G memory. Even the most lively servers can't use these resources (about 3 CPUs and half of the memory). The core program was written almost 10 years ago, and it continues from the Westward Journey to the West. I have been enjoying a free lunch for the past two years. With the upgrade of hardware configuration, the single server has an online capacity of 12,000. A chart looking at the response speed of the server reveals that the current problem is that there is a slow response at regular intervals. Periodically, the server response time will exceed 1000ms. The reason for this was that at that time, disk IO was very congested. The use of IO for services that regularly store player data, as well as scripts that SA makes regular backups of data, take up a lot of IO time. Eventually caused the machine to be overloaded.

How IO overloading ultimately affects the performance of gaming services is not an exhaustive discussion. I have mainly analyzed the system structure in the past two days and thought about the improvement plan.

In fact, the old system is not complicated, and the amount of code is quite small. The relevant service code is just a few thousand lines of clean C code. No one has been moving it because it is a big deal, involving millions of users and billing processes. Regardless of whether the design is good or bad, there is no problem with the performance achieved, and it gives way to stability. All the "historical reasons" can only be complained when chatting. If you redesign, you will definitely not write it. In the past two years, I have become more and more indifferent to the refactoring of this matter. Why not do it, why not? In more cases, it is only the programmers’ lunch. Once each system is written, it is full of regrets. If it works, the biggest possibility is that it will continue to be used. Leave all the new ideas for the next time.

For an old system that has been running stably for many years, it is of little significance to find a good way to transform it. The most important thing is how to add something to the existing system with minimal impact and improve performance. Clear division between modules is quite important. The independence of the service is also necessary. It is a big mistake to put a running data service and billing and user authentication service in a service program. It makes it very difficult for us to strip out the data.

The data service uses a C/S structure. But instead of using the database, the local file system is used directly. The whole design is good, but the mechanism of the data service itself is very bad. Shared memory is used to exchange data between C and S in order to improve IPC performance. There is only one C, which is the main process of the game, and there can be more than one S. Services can be provided concurrently. Pipeline commands between multiple Ss and Cs to exchange data with shared memory. The intention is good, but the protocol design is problematic. Because C directly manipulates the data area and is unique, the result is designed to place the block management of the data area on C instead of S.

For example, if the game process (C) needs to load a user's data, it first looks for the space in the data area, and then tells S to load the user's data into its specified data location. The cleaning of the data area is also done by C. This makes S not able to do Cache directly in the data area. If you need Cache temporarily unused data (such as a player offline), you have to do it yourself. Or an extra Cache service (which requires twice as much memory and memory copying operations). I realized that the implementation was originally considered to have multiple Ss simultaneously serving a C service, but I can only think of it as a design. Poor.

The result is that the entire data service, whether reading or writing, is Cache-free. Cache relies solely on the OS. This is no problem when it is an order of magnitude lower. However, after the number of online users reached 10,000, the problem was revealed. After all, the more customization you have for the ultimate demand, the more you can get the most out of your hardware.

The following is a record of the design of the memory/key database I have implemented.

To achieve the strategy of saving the difference a few days ago, only the strategy of saving the difference information (the measured IO operation can be reduced by 90%), the location of the data read/write service must be unified first. Can not rely on the local file system for data exchange. I have previously examined several in-memory databases, such as Redis, and finally decided to implement one myself. Because I already know the requirements very well, I can highly customize the algorithm to maximize the power of the hardware. The amount of code will not be too large. (Controlled within 500 lines of C code, and finally written down, but 300 lines of C program)

Our demand is this: the service program will stop once a week. The total number of player data involved per week is 100,000 groups. Each set of data is between 4k and 32K. All are text data. Can be seen as an id to data string key /value data storage service. It is estimated that the total data can be put into memory. The data is updated frequently and the length will change after the update.

I spent a day implementing this k/v memory data service. In order to maximize the use of memory, while ensuring efficiency, and the simplicity of the code implementation. I used a scheme that pre-allocated the entire block of memory to cut the memory into blocks of 1K. And use a singly linked list to string together. Consider the hit efficiency of the memory cache. The linked list pointer itself is separated from the data storage area. (Most of the time, we only need to access the linked list pointers without having to access specific data.)

The linked list pointers are serial numbers, not memory addresses. This allows a 4-byte index (which can be used to manage up to 4T data, even on a 64bit system). A singly linked list can save half the pointer operation and save a small amount of memory compared to a doubly linked list. The price is that the code is a bit more complicated to write.

All memory blocks are divided into two parts: free blocks and used blocks. At the beginning all space is free. Once a piece of data is placed inward, enough blocks are taken from the free list and placed at the end of the used list. If the cache space is full, some space is removed from the used block list header and returned to the space block (these data areas have not been accessed for a long time). Each time you read a piece of data, adjust it to the end of the used list to ensure that it is finally cleaned up.

Add another hash table, mapping from id to the header of the block segment in the cache (since it is a singly linked list, the previous node should be saved in the implementation). In this way, O(1) time can be used to query the data area corresponding to the specified id.

The data stored in the cache does not have to be completely continuous at the address, which is like the cluster management of the disk. Unlike disks, memory has less random access performance and sequential access performance. This is beneficial to the efficiency of memory space utilization.

Copyright © Windows knowledge All Rights Reserved