Understanding the Linux Process

  
In general, a process is a program or code that is being executed. We know that the program itself is a bunch of code, stored on disk at the beginning, when it is static, inanimate; only when the code of the program is loaded into memory, the code has life, can be dynamically moved by the CPU Execution. The problem is that the current operating system can execute multiple programs in parallel, that is, the code that stores multiple programs in memory at the same time. In order to facilitate management, it is necessary to organize them reasonably. The way is to add some metadata to each piece of code by the operating system. This metadata is the PCB, which is the task control block. It is not difficult to understand that the code of each program can actually be divided into two parts: the data of the instruction. Instructions are the various operations specified by the program code; data is the object of these operations. A program can be loaded into memory multiple times into multiple processes, such as opening two vim at the same time to edit different files. Then the question is: Is it necessary to store multiple copies of the program's instructions in memory at the same time? The answer is not necessary. The instruction part is set to read-only and allows this code to be shared between two or more processes running in the system; the data part is private to each process and cannot be shared. For example, each vim can only edit itself. document. So what's in the PCB? Process id. Each process in the system has a unique id, which is represented by the pid_t type in C, which is actually a non-negative integer. The state of the process, such as running, suspending, stopping, zombies, etc. Some CPU registers that need to be saved and restored when the process switches. Describe the information of the virtual address space. Describe the information that controls the terminal. Current Working Directory. Umask mask. A file descriptor table containing a number of pointers to the file structure. Signal related information. User id and group id. Control terminals, sessions, and process groups. The resource limit that the process can use. It can be seen that the structure of the PCB is quite complicated in order to control the process. The creation of the process often hears "Create a process", what is going on? The first thing that can be thought of is that the process is not a granddaughter monkey. It is impossible to find it out by yourself. It must be someone else's "life". In Linux, the process is created by the parent process. To be precise, the instruction part of the code in the parent process actively uses the function fork() to create the process, and then a child process is "synthesized". How does the fork function work? Since each process has a PCB, it first needs to apply for a PCB with the operating system (the PCB is limited), then allocate the new process memory, and then copy the code of the parent process. In fact, fork is very lazy to copy the parent process, that is, during the fork function call, there will be two almost identical processes in memory, except for the process number (which is unique). After the process is copied, both processes have a fork function waiting to return (note, it is a return, because the fork function itself is also a piece of code, the previous part completes the copy function, after the child process appears, it returns to the part The code is), and their return results are different (the operating system controls the return result): the fork in the parent process returns the pid of the child process; the fork in the child process returns 0; if the fork fails, the return is -1 . Fork just created two almost identical processes, they run the same code, which is different from what was said at the beginning, because we create a new process that is mostly used to execute new code. At this time we need the exec class function, when the process calls an exec function, the program code of the process is completely replaced by the new program, starting from the startup routine of the new program. Calling exec does not create a new process, so the id of the process does not change before and after exec is called. If the call succeeds, the new program is loaded from the startup code and is no longer returned. If the call fails, it returns -1, so the exec function has only the return value of the error and no successful return value. The exec system call passes the command line arguments and environment variable tables to the main function when executing the new program. The environment variable table is a description of the system environment in which the process is located. If a piece of code is to be executed normally, it must use various system resources. The environment variable table is an abstraction of it. However, the exec class function needs to be explicitly called, and the child process will not actively load the new program code! Therefore, it is generally in the code of the parent process, according to the return value of the fork to write a branch, the branch of the child process explicitly calls exec. The termination of a process will terminate all file descriptors when it terminates, freeing the memory allocated in user space, but its PCB remains, and the operating system saves some information in it: if it is normal termination, it saves the exit status. If it is abnormally terminated, it holds the signal that caused the process to terminate. The parent of this process can call wait or waitpid to get this information and then completely clean up the process. We know that the exit status of a process can be viewed in the shell with the special variable $? because the shell is its parent process. When it terminates, the shell calls wait or waitpid to get its exit status and completely cleans up the process. If a process has terminated, but its parent process has not yet called wait or waitpid to clean it up, the state of the process is called a Zombie process. Any process that is just terminated is a zombie process. Under normal circumstances, the zombie process is immediately cleaned up by the parent process. If a parent process terminates and its child processes still exist (these child processes are still running, or are already zombie processes), the parent processes of these child processes are changed to init processes. Init is a special process in the system. Usually the program file is /sbin/init and the process id is 1. It is responsible for starting various system services when the system starts. After that, it is responsible for cleaning up the child process. As long as the child process terminates, init will Call the wait function to clean it up. The zombie process cannot be cleaned up with the kill command because the kill command is only used to terminate the process, and the zombie process has terminated. So a feasible way is to kill its parent process. The prototypes of the wait and waitpid functions are:
#include #include pid_t wait(int *status);pid_t waitpid(pid_t pid, int *status, int options);

If the call succeeds, the cleaned up child process is returned. Id, returns -1 if the call is in error. When the parent process calls wait or waitpid it may: + block (if all its child processes are still running). + The termination information with the child process is returned immediately (if a child process has terminated, waiting for the parent process to read its termination information). + Error returns immediately (if it does not have any child processes).

Copyright © Windows knowledge All Rights Reserved