Windows system >> Linux system Tutorial >> Linux Tutorial

Linux process scheduling

operating system
To achieve multi-process, process scheduling is essential. How important is process scheduling? First, we need to be clear: process scheduling is to schedule the process in the TASK_RUNNING state. If the process is unexecutable (sleeping or otherwise), it has little to do with process scheduling. The Linux kernel divides processes into two levels: normal processes and real-time processes. Real-time processes have higher priority than normal processes, and their scheduling strategies are different.
1.1 Real-time process scheduling
Real-time process real-time, the original meaning is "the given operation must be completed within a certain time". The point is not how fast the operation must be handled, but the time to be controllable (in the worst case, it cannot break through the given time). Such “real time” is called “hard real time” and is used in very sophisticated systems (such as rockets, missiles, etc.). In general, hard real-time systems are relatively specialized.
The general-purpose operating system like Linux obviously can't meet such requirements. The existence of interrupt processing, virtual memory, and other mechanisms brings great uncertainty to the processing time. Hardware caches, disk seeks, and bus contention can also introduce uncertainty. Linux claims to implement the "real-time" general-purpose operating system, in fact, it only implements "soft real-time", that is, to meet the real-time needs of the process as much as possible.
If a process has real-time requirements (it is a real-time process), as long as it is executable, the kernel keeps it executing to meet its CPU needs as much as possible until it is done. The thing then sleeps or quits (becomes non-executable). If there are multiple real-time processes in the executable state, the kernel will first satisfy the CPU of the highest priority real-time process until it becomes non-executable. Thus, as long as the high-priority real-time process is always in the executable state, the low-priority real-time process can not get the CPU; as long as the real-time process is always in the executable state, the normal process can not get the CPU. (Later, the kernel added the /proc/sys/kernel/sched_rt_runtime_us and /proc/sys/kernel/sched_rt_period_us parameters, which limited the time that the real-time process can only run sched_rt_runtime_us for so many times in the period of sched_rt_period_us. Just if there is always a real-time process in an executable state, give the normal process a little chance of being executed. So, if multiple real-time processes of the same priority are in an executable state, then there are two A scheduling strategy is available:
l SCHED_FIFO
First in, first out. Until the first executed process becomes non-executable, subsequent processes are scheduled to execute. Under this strategy, the first process You can execute the sched_yield system call, voluntarily give up the CPU to give power to subsequent processes;
l SCHED_RR:
Rotate scheduling. The kernel allocates time slices for real-time processes, and lets the next process use when the time slice runs out. The difference between the CPU and the FIFO is that the time slice is used; under linux, the user program can be set by the sched_setscheduler system call. The scheduling policy and related scheduling parameters; the sched_setparam system call is only used to set the scheduling parameters. These two system calls require the user process to have the ability to set the process priority (CAP_SYS_NICE, generally requires root privileges) by setting the policy of the process For SCHED_FIFO or SCHED_RR, the process becomes a real-time process, and the priority of the process is specified by setting the scheduling parameters through the above two system calls. For real-time processes, the kernel does not attempt to adjust its priority. No? How real is it? These questions are related to the application scenario of the user program. Only the user can answer, the kernel can't be interrupted. In summary, the scheduling of real-time processes is very simple. The priority of the process and the scheduling strategy are all The user is dead, the kernel only needs to always select the highest priority real-time process to schedule execution. The only trouble is that when selecting real-time processes with the same priority, two scheduling strategies should be considered.
1.2 Scheduling of ordinary processes
The central idea of real-time process scheduling is Let the highest-priority real-time process in the executable state occupy the CPU as much as possible because it has real-time requirements; while the normal process is considered to be a process without real-time requirements, so the scheduler tries to make each of the executable states The process shares the CPU in a peaceful coexistence, so that users feel that these processes are running at the same time. The scheduling of ordinary processes is much more complicated than real-time processes. The kernel needs to consider two troubles: (1) dynamically adjust the priority of the process According to the behavior characteristics of the process, the process can be divided into "interactive process" and "batch process": the main task of interactive processes (such as desktop programs, servers, etc.) is to interact with the outside world. Such processes should have a higher priority and they always sleep and wait for input from the outside world. When the input comes in and the kernel wakes it up, they should be scheduled to execute quickly to respond. For example, if a desktop program does not respond after half a second mouse click, the user will feel the system "card"; the main task of the batch process (such as the compiler) is to perform continuous operations, so they will continue to be in Executable state. Such a process generally does not require high priority. For example, the compiler runs for a few seconds, and the user will not care too much. If the user can clearly know what priority the process should have, you can use the nice and setpriority system calls to prioritize. Level settings. However, applications are not necessarily as typical as desktop programs and compilers. The behavior of a program can vary, and it may be like an interactive process for a while and a batch process for a while. It is difficult for the user to set a proper priority for it. Thus, in the end, the task of distinguishing between interactive processes and batch processes falls to the kernel's scheduler. The scheduler focuses on the performance of the process over a period of time (mainly checking its sleep time and running time). Based on some empirical formulas, is it judged whether it is interactive or batch? What is the extent? Finally decided to make some adjustments to its priority. After the priority of the process is dynamically adjusted, two priorities appear: a. The priority set by the user program (or default value if not set), called static priority. This is the benchmark of the process priority. It is often not changed during the process of executing the process. b. After the priority is dynamically adjusted, the actual priority is called the dynamic priority. This value is likely to change all the time. (2) Fairness of scheduling In a system that supports multiple processes, ideally, each process should fairly occupy the CPU according to its priority. And it will not appear that "who is lucky, who is much more" is such an uncontrollable situation. The fair scheduling of linux is basically two kinds of ideas: a. Allocating time slices to the process in the executable state (according to the priority), the process running out of the time slice is placed in the "expiration queue". The processes in the executable state are expired, and the time slice is re-allocated; b. The priority of the process is dynamically adjusted. As the process runs on the CPU, its priority is continuously lowered so that other lower priority processes get the opportunity to run; the latter way has a smaller scheduling granularity, and will be "fairness" & <; Dynamically prioritizing & rdquo; Two things are combined to greatly simplify the code of the kernel scheduler. Therefore, this method has become the new favorite of the kernel scheduler. Emphasize that the above two points are only for ordinary processes. For real-time processes, the kernel can neither dynamically adjust the priority and have no fairness.
1.3 The efficiency of the scheduler
“Priority> It is clear which process should be scheduled to execute, and the scheduler must also care about efficiency issues. The scheduler is executed as often as many processes in the kernel. If the efficiency is not good, it will waste a lot of CPU time and cause system performance to drop. In Linux 2.4, the executable state process is hung in a linked list. For each schedule, the scheduler needs to scan the entire linked list to find the optimal one to run. The complexity is O(n); In the early days of linux 2.6, the executable state process was hung in N (N=140) linked lists, each linked list represents a priority, and how many priorities are supported in the system. Linked list. For each schedule, the scheduler only needs to take the process at the head of the list from the first list that is not empty. This greatly improves the efficiency of the scheduler, the complexity is O(1); in the recent version of linux 2.6, the executable state process is hung in a red-black tree (conceivable as a balanced binary tree) in order of priority. . For each schedule, the scheduler needs to find the highest priority process from the tree. The complexity is O(logN). So, why from the early linux 2.6 to the recent linux 2.6 version, the complexity of the scheduler selection process has increased? This is because, at the same time, the scheduler's implementation of fairness changes from the first idea mentioned above to the second idea (by dynamically adjusting the priority). The O(1) algorithm is implemented based on a small number of linked lists. According to my understanding, this makes the priority range of values small (the discrimination is very low) and does not meet the fairness requirements. The use of red-black trees has no limit on the priority value (32-bit, 64-bit, or more bits can be used to indicate the value of the priority), and the complexity of O(logN) is still very efficient.
1.4 Timing of Scheduling Trigger
The triggering of scheduling mainly has the following situations: (1) The status of the current process (the process running on the CPU) becomes non-executable. The process execution system call actively becomes non-executable. For example, the implementation of nanosleep goes to sleep, performs exit exit, and so on; the resources requested by the process are not satisfied and are forced to go to sleep. For example, when the read system call is executed, the disk cache does not have the required data, so that sleep waits for the disk IO; the process responds to the signal and becomes non-executable. For example, responding to SIGSTOP into a paused state, responding to a SIGKILL exit, etc.; (2) preemption. When the process runs, it is unintended to be deprived of CPU usage rights. This is divided into two situations: the process runs out of time slices, or a higher priority process occurs.
1.5 Other problems
l Load balancing under multiple processors
Under linux, each CPU has a corresponding executable queue, and an executable state process can only be in one at the same time. Execution queue.
So, & ldquo; multi-processor load balancing & rdquo; this trouble is coming. The kernel needs to pay attention to the number of processes in each CPU executable queue, and make appropriate adjustments when the number is not balanced. When to adjust, how much to adjust the process, these are the kernel needs to care. Of course, try not to adjust the best. After all, it takes CPU and lock the executable queue. The cost is still small. In addition, the kernel has to care about the relationship of each CPU. The relationship between the two CPUs, which may be independent of each other, may be shared caches, or even virtualized by the same physical CPU through hyper-threading technology, is also an important basis for load balancing. The closer the relationship, the less expensive the process will move between them.
l Priority inheritance
Because of mutual exclusion, a process (set to A) may sleep because it waits to enter the critical section. Process A is awakened until the process that is occupying the corresponding resource (set to B) exits the critical section. There may be situations where A has a very high priority and B has a very low priority. B enters the critical section, but is preempted by other higher-priority processes (set to C), and cannot run out of the critical section. Then A can't be awakened. A has a high priority, but now it has fallen to B, and it has been preempted by C, which is not too high in priority, causing execution to be postponed. This phenomenon is called priority inversion. This phenomenon is very unreasonable. A better countermeasure is: When A starts to wait for B to exit the critical section, B temporarily obtains the priority of A (or assumes that A has a higher priority than B) in order to successfully complete the processing and exit the critical section. After that B's priority is restored. This is the method of priority inheritance. In order to achieve priority inheritance, the kernel has to do a lot of things.
l Interrupt Processing Threading

Under Linux, the interrupt handler runs in an unschedulable context. From the CPU response hardware interrupt automatically jumps to the kernel-set interrupt handler to execute, until the interrupt handler exits, the entire process can not be preempted. If a process is preempted, it can be resumed at some later time by the information stored in its process control block (task_struct). The interrupt context has no task_struct, and it can't be recovered if it is preempted. The interrupt handler cannot be preempted, which means that the interrupt handler's "priority" is higher than any process (you must wait for the interrupt handler to complete before the process can be executed). However, in actual application scenarios, it may be that some real-time processes should receive higher priority than interrupt handlers. Thus, some systems with higher real-time requirements give the task handler a task_struct and priority so that they can be preempted by high-priority processes when necessary. But obviously, doing this work will cause some overhead to the system, which is also a concession to performance in order to achieve "real time".