Windows system >> Computer software Tutorial >> Server technology >> About the server

Ten rules to kill IIS server performance

Each of the following rules will effectively affect the performance and scalability of your code. In other words, try not to follow the commandments as much as possible! Below, I will explain how to destroy them in order to improve performance and scalability.

1. Multiple objects should be allocated and released

You should try to avoid over-allocating memory, because memory allocation can be costly. Releasing memory blocks can be more expensive because most allocation operators always attempt to connect adjacent freed memory blocks into larger blocks. Until Windows NT® 4.0 service pack 4.0, in multithreaded processing, the system heap usually runs badly. The heap is protected by a global lock and is not extensible on multiprocessor systems.

2. Should not consider using processor cache

Most people know that hard page faults caused by virtual memory subsystems are expensive and best avoided. But many people think that there is no difference in other memory access methods. Since 80486, this view is not right. Modern CPUs are much faster than RAM, RAM requires at least two levels of memory cache, high-speed L1 cache can hold 8KB of data and 8KB of instructions, while slower L2 cache can hold hundreds of kilobytes of data and code, mixed with data and code. Together. A reference to a memory region in the L1 cache takes one clock cycle, a reference to the L2 cache takes 4 to 7 clock cycles, and a reference to the main memory requires many processor clock cycles. The latter number will soon exceed 100 clock cycles. In many ways, caching is like a small, high-speed, virtual memory system.

The basic memory unit associated with caching is not a byte but a cache column. The Pentium cache column is 32 bytes wide. The Alpha cache column is 64 bytes wide. This means that there are only 512 slots in the L1 cache for code and data. If multiple data are used together (time position) and not stored together (spatial location), performance will be poor. The spatial position of arrays is good, and the list of interconnected lists and other pointer-based data structures tend to be poor.

Packaging data into the same cache column will usually improve performance, but it will also degrade the performance of multiprocessor systems. It is difficult for the memory subsystem to coordinate caching between processors. If a read-only data used by all processors shares a cache column with a data that is used by one processor and updated frequently, the cache will take a long time to update the copy of the cache column. This Ping-Pong high-speed game is often referred to as "cache sloshing". If the read-only data is in a different cache column, sloshing can be avoided.

Space optimization for code is more efficient than speed optimization. The less code there is, the less pages the code occupies, which requires fewer run settings and fewer page faults, and fewer cache columns. However, some core functions should be speed optimized. Profilers can be used to identify these functions.
3. Never cache frequently used data.
Software caching can be used by a variety of applications. When a calculation is expensive, you save a copy of the result. This is a typical time-space compromise: save some time by sacrificing some storage space. If done well, this approach can be very effective.

You must cache properly. If the wrong data is cached, the storage space is wasted. If you cache too much, there will be very little memory available for other operations. If you cache too little, the efficiency will be low because you have to recalculate the missing data. If time-sensitive data is cached for too long, the data will be out of date. In general, servers are more concerned with speed than space, so they have more cache than desktop systems. Be sure to remove unused caches on a regular basis, otherwise there will be problems with running settings.

4. Multiple threads should be created, the more the better.

It is important to adjust the number of threads that work in the server. If the thread is I/O-bound, it will take a lot of time to wait for the completion of the I/O - a blocked thread is a thread that does not do any useful work. Adding extra threads can increase throughput, but adding too many threads will degrade server performance because context swapping will be a significant overhead. There are three reasons why the context exchange speed should be low: context swapping is pure overhead, and there is no benefit to the application's work; context swapping runs out of valuable clock cycles; worst of all, context swap fills the processor's cache For useless data, replacing this data is costly.

There are many things that depend on your threaded structure. One thread per client is absolutely inappropriate. Because it is not scalable for a large number of clients. Contextual exchanges become unbearable, and Windows NT runs out of resources. The thread pool model will work better, in which a worker thread pool will process a request column because Windows 2000 provides the corresponding APIs, such as QueueUserWorkItem.
5. Should use global locks on data structures

The easiest way to make data threads safe is to put it on a large lock. For the sake of simplicity, all things use the same lock. There is a problem with this approach: serialization. In order to get the lock, every thread that wants to process the data must wait in line. If the thread is blocked by a lock, it is not doing anything useful. This problem is not common when the load on the server is light, because only one thread at a time needs a lock. In the case of heavy load, the fierce competition for locks may become a big problem.

Imagine an accident on a multi-lane highway where all the vehicles on the highway were turned to a narrow road. If the vehicle is small, the impact of this transition on the rate of traffic flow is negligible. If there are a lot of vehicles, traffic jams can extend for miles when the vehicle slowly merges into that single lane.

There are several techniques to reduce lock competition.

· Don't overprotect, that is, it is not necessary to lock the data. Only hold the lock when you need it, and don't take too long. It's important not to use locks around large chunks of code or in frequently executed code.
· Split the data so that it can be protected with a separate set of locks. For example, a symbol table can be split by the first letter of the identifier, so that when the value of a symbol whose name begins with Q is modified, the value of the symbol whose name begins with H is not read.
· Use the Interlocked series of APIs (InterlockedIncrement, InterlockedCompareExchangePointer, etc.) to automatically modify the data without the need for a lock.
· Multi-reader/single-writer locks can be used when data is not often modified. You will get better concurrency, even though the cost of the lock operation will be higher and you may be at risk of starving the author.
· Use the loop counter in key parts. See the SetCriticalSectionSpinCount API in Windows NT 4.0 service pack 3.
· If you can't get the lock, use TryEnterCriticalSection and do some other useful work.
High competition leads to serialization, and serialization leads to lower CPU utilization, which prompts users to join more threads, and things get worse.

6. Don't pay attention to multiprocessor machines

Your code runs worse on multiprocessor systems than on single processor systems, which can be disgusting. thing. A natural idea is that it would be better to run N times on an N-dimensional system. The reason for poor performance is competition: lock competition, bus contention, and/or cache column competition. Processors are vying for ownership of shared resources rather than doing more work.

If you must write a multi-threaded application, you should perform strength testing and performance testing on your application on a multiprocessor box. A single processor system provides a concurrency artifact by executing threads in time slices. Multiprocessor boxes have true concurrency, and competitive environments and competition are more likely to occur.
7. Modular calls should always be used; they are interesting.

Using synchronous modular calls to perform I/O operations is appropriate for most desktop applications. However, they are not a good way to use the CPU(s) on the server. I/O operations take millions of clock cycles to complete, and these clock cycles could have been better utilized. With asynchronous I/O you can get significantly higher user request rates and I/O throughput, but with additional complexity added.

If you have modular calls or I/O operations that take a long time, you should test how much resources are allocated to them. Do you want to use all the threads or have a limit? In general, it is better to use a limited number of threads. Build a small thread pool and queue, using the queue to schedule the thread's work to complete the modular call. This way, other threads can pick up and process your non-modular requests.

8. Don't measure

When you can measure what you are talking about and express it with numbers, this means that you have a certain understanding of him; but if you can't use numbers When expressing, your knowledge is unsatisfactory and unsatisfactory; this may be the beginning of knowledge, but it is simply impossible for you to raise your mind to the level of science.

- Lord Kelvin (William Thomson)

If you don't measure, you can't understand the features of the application. You are groping in the dark, half by guess. If you don't identify performance issues, you can't make any improvements or make a workload plan.

Measurements include black box measurement and profiling. Black box measurement means collecting data displayed by performance counters (memory usage, context exchange, CPU utilization, etc.) and external inspection tools (flux, reflection time, etc.). To profile your code, you compile a tool version of the code, then run it under various conditions and collect statistics on execution time and process call frequency.

Measurements are not useful if they are not used for analysis. The measurement will not only tell you that there is a problem, but it can even help you find out where the problem is, but it can't tell you why there is a problem. Analyze the problem so that you can correct them correctly. It is necessary to solve the problem fundamentally instead of staying on the surface.

When you make changes, you need to re-measure. You need to know if your changes are valid. Changes may also expose other performance issues, and the measurement-analysis-correction-re-measurement cycle will start over. You must also make measurements regularly to find performance degradation issues.
9. A single user, single request test method should be used.

A common problem with writing ASP and ISAPI applications is to test the application with just one browser. When they applied their programs on the Internet, they found that their applications could not handle high loads, and the throughput and response time were pitiful.

Testing with a browser is necessary but not enough. If the browser doesn't react fast enough, you know you are in trouble. But even if it's fast when using a browser, you don't know how well it handles the load. What happens if more than a dozen users request at the same time? One hundred? What kind of throughput can your application tolerate? What kind of reaction time does it provide? What will happen to these numbers at light loads? What about medium load? Overloaded? What happens to your application on a multiprocessor machine? Intensity testing of your application is fundamental to finding bugs and finding performance issues.
Similar load testing considerations apply to all server applications.

10. The actual environment should not be used.

People tend to tweak applications only in a few specific, artificial environments (such as benchmarks). It is important to choose the various situations that correspond to the actual situation and to optimize for each operation. If you don't do this, your users and critics will definitely do this, and they will use this to judge the quality of your application.