File descriptors in server programming

  
        All files in the Linux system are shielded by the virtual file system (VFS) mechanism. Users can implement different operations on different drivers through a unified interface. For each file, a reference is needed to indicate the file description. The application is similar to the file handle under widows. Most of the operations on the file are handled by this descriptor, such as read, write. For each file descriptor, the kernel is managed using three data structures.

(1) Each process has a record in the process table, and each record has an open file descriptor table, which can be treated as a vector, and each descriptor occupies one. Associated with each file descriptor is:

(a) File descriptor flag. (currently only one file descriptor flag FD_CLOEXEC is defined)

(b) Pointer to a file table entry.

(2) The kernel maintains a file table for all open files. Each file entry contains:

(a) File status flags (read, write, add, sync, non-block, etc.).

(b) The current file displacement. (ie the value operated by the lseek function)

(c) Pointer to the v-node entry of the file.

(3) Each open file (or device) has a v-node structure. The v-node contains pointer information for the file type and the functions that perform various operations on this file. For most files, the v node also contains the i-node (inode) of the file. This information is read into the memory from the disk when the file is opened, so all information about the file is quickly available. For example, the i-node contains the owner of the file, the file length, the device where the file is located, a pointer to the actual data block used by the file on the disk, and so on.

After three layers of the above file system, each layer is responsible for different responsibilities. The first layer from top to bottom is used to identify files, the second layer is used to manage process independent data, and the third layer manages file system. Metadata, directly associated with a file. One advantage of this layered idea is that the upper layer can reuse the structure of the lower layer. There may be multiple file descriptor entries pointing to the same file table entry, or multiple file entries pointing to the same V node.

If two separate processes open the same file, each process that opens the file gets a file entry, but the V node pointers of the two file entries point to the same V node. Arrange so that each process has its own current displacement of the file and supports different open modes (O_RDONLY, O_WRONLY, ORDWR).

When a process creates a child process through fork, at this time, the file descriptors in the parent and child processes share the same file table entry, that is, the file descriptors of the parent and child processes point to the same. Generally, we will turn off the fd that is not needed after the fork. For example, the parent and child processes communicate through pipe or socketpair, and often close the end that they do not need to read (or write). The close operation actually destroys the current file entry data structure when there is no file descriptor to reference the current file entry, which is somewhat similar to the idea of ​​reference counting. This is also the difference between the close and shutdown functions in network programming. The former only disconnects when the last process that uses the handle of the socket is closed, and the latter does not discuss directly disconnecting one side. However, in a multi-threaded environment, since the parent and child threads share the address space, the file descriptors are owned by one at a time, and there is only one copy, so it is impossible to close the fd that is not needed in the thread, otherwise it will cause other fd to be needed. Threads are also affected. Because the parent, the file descriptors opened in the child process share the same file table entry, so in some system server programming, if the preforking model is adopted (the server pre-derived multiple child processes, listen to listenfd to accept connection in each child process) This will lead to a cluster phenomenon. The server-derived sub-processes each call accept and are thus put to sleep. When the first client connection arrives, all processes are awakened, even though only one process gets connected. Performance is impaired. See UNP P657.

At the same time, if exec is called after fork, all file descriptors will remain open. This can be used to pass certain file descriptors to the program after Exec.

You can also explicitly copy a file descriptor via dup or fcntl, which point to the same file entry. Copy the file descriptor to the specified value via dup2.

Each process has a file descriptor table, independent of the process, the file descriptor between the two processes is not directly related, so the file descriptor can be passed directly within the process, but if the process is crossed The transfer loses its meaning, and Unix can pass a special file descriptor via sendmsg/recvmsg (see the UNP section 15.7). The first three file descriptors of each process correspond to standard input, standard output, and standard error. However, the number of file descriptors that a process can open is limited. If there are too many file descriptors open, there will be a problem with "Too many open files". In the web server, when the accept is called by the listenfd call, it is reflected as an EMFILE error. This is mainly because the file descriptor is an important resource of the system, the system resources are exhausted, and the system limits the default value to the single process file descriptor. It is 1024 and can be viewed using the ulimit -n command. Of course, you can also increase the number of process file descriptors, but this is a way to cure the problem, because when dealing with high concurrent services, server resources are limited, and resources are inevitably exhausted.

When using epoll's horizontal triggering method to listen to lisenfd's connection, a large number of socket connections come in. If you do not handle the connection queue that will fill TCP, listenfd will always generate readable events, and the server will be busy waiting. Using C++ open source network library Muduo author Chen Shuo's approach is to prepare an idle file descriptor in advance, when the EMFILE error occurs, first close the free file, obtain a file descriptor quota, and then accept a file description of the socket connection. Then, immediately close, so elegantly disconnected from the client, and finally reopen the free file, fill in the "Pit", in case of this situation again.

//At the beginning of the program first "occupy>; a file descriptor int idlefd = open("/dev/null", O_RDONLY 
						
Copyright © Windows knowledge All Rights Reserved