UNIX/LINUX Platform Executable File Format Analysis

  

This article discusses three main executable file formats for the UNIX/LINUX platform: a.out (assembler and link editor output assembler and link editor output), COFF (Common Object File Format), ELF (Executable and Linking Format executable and link format). The first is an overview of the executable file format and describes the relationship between the executable file content and the load run operation by describing the ELF file load process. The three file formats were discussed in this way, and the dynamic connection mechanism of ELF files was discussed, and the evaluation of the advantages and disadvantages of various file formats was interspersed. Finally, there is a brief summary of the three executable file formats, and some comments on the author's evaluation of the file format are presented.

References 17) The writing skills are a practical example of breaking this protection.

2: The kernel analyzes the dynamic connector name corresponding to the segment of the ELF file marked PT_INTERP and loads the dynamic connector. The dynamic linker for modern LINUX systems is usually /lib/ld-linux.so.2, and the details are described in detail later.

3: The kernel sets some tag-value pairs in the stack of the new process to indicate the relevant operations of the dynamic linker.

4: The kernel passes control to the dynamic connector.

5: Dynamic Connector checks the program's dependencies on external files (shared libraries) and loads them as needed.

6: The dynamic linker relocates the external reference of the program. In layman's terms, it tells the program the address of the external variable/function it references. This address is in the interval in which the shared library is loaded in memory. Dynamic linking also has a Lazy positioning feature that relocates only when "true" requires a reference symbol, which greatly improves the efficiency of the program.

7: The dynamic linker executes the code of the section marked as .init in the ELF file to initialize the program run. In earlier systems, the initialization code corresponds to the function _init(void) (function name is forced to be fixed). In modern systems, the corresponding form is void__attribute((constructor))init_function(void){……}

where the function name is arbitrary.

8: The dynamic linker passes control to the program and executes from the program entry point defined in the ELF file header. In the a.out format and the ELF format, the value of the program entry point is explicitly present, and in the COFF format it is implicitly defined by the specification.

As you can see from the above description, the most important thing to load a file is to do two things: load the program segment and data segment into memory; perform relocation of the externally defined symbol. Relocation is an important concept in program linking. We know that an executable program usually consists of a main program file containing main(), several object files, and several shared libraries (Shared Libraries). (Note: With some special tricks, you can also write programs without the main function, see Reference 2) A C program may reference variables or functions defined by shared libraries. In other words, these variables/functions must be known at runtime. the address of. In a static connection, all external definitions that the program needs to use are completely contained in the executable program, while dynamic connections only set some reference information about the external definition in the executable file. The real relocation is when the program is running. . There are two big problems with static connections: if there are any changes to the variables or functions in the library, you must recompile the linker; if multiple programs reference the same variable/function, the variable/function will appear multiple times in the file/memory. , wasting hard disk /memory space. Comparing the size of the executable files generated by the two connection methods, it can be seen that there is a clear difference.




Back to top


References 16 and reading reference 15 Source code deepening An understanding of the a.out format. Reference 12 discusses how to run a.out format files in "modern" Red Hat Linux.




Back to top


Reference 18, some UNIX systems also have COFF The format has been extended, such as the XCOFF (extended common object file format) format, which supports dynamic connections, see Reference 5.

The header of the file is optional. The COFF file format specification specifies that the length of the optional header can be 0, but the optional header must exist in the LINUX system. The following is the data structure of the optional header under LINUX: typedef struct { char magic[2]; /* magic number */char vstamp[2]; /* version number */char tsize[4]; /* length of text segment */char dsize[4]; /* initialized data segment length */char bsize[4]; /* uninitialized data segment length */char entry[4]; /* program entry point */char text_start[4] ; /* text segment base address */char data_start[4]; /* data segment base address */}COFF_AOUTHDR;

When the field magic is 0413, the COFF file is executable, and the optional header is noted. The program entry point is explicitly defined in the section. The standard COFF file does not explicitly define the value of the program entry point, usually starting from the .text section, but this design is not good.

Before we mentioned that the COFF format has one more segment table than the a.out format, a section header entry describes the details of a section data, so the COFF format can contain more sections, or can be based on The actual need to add a specific section, specifically in the definition of the COFF format itself and the COFF format extension mentioned earlier. I personally think that the appearance of a segment table may be the biggest improvement in the COFF format relative to the a.out format. Below we will briefly describe the data structure of the section in the COFF file, because the meaning of the section is more reflected in the compilation and connection of the program, so this article does not describe it more. In addition, the ELF format and the COFF format define the sections very similarly, and we will omit the discussion in the subsequent ELF format analysis. Struct COFF_scnhdr { char s_name[8]; /* section name */char s_paddr[4]; /* physical address */char s_vaddr[4]; /* virtual address */char s_size[4]; /* section length* /char s_scnptr[4]; /* The offset of the section data relative to the file */char s_relptr[4]; /* section relocation information offset */char s_lnnoptr[4]; /* section information offset */char s_nreloc[2]; /* number of section relocation entries */char s_nlnno[2]; /* number of section information items */char s_flags[4]; /* section mark */};
< There is one point to note: the comment on the field s_paddr in the header file coff.h in the Linux system is "physical address", but it seems to be understood as the length of space occupied by the " section being loaded into memory". The field s_flags marks the type of the section, such as a text segment, a data segment, a BSS segment, and the like. Line information also appears in the COFF section. The line information describes the mapping between the binary code and the line number of the source code, which is useful during debugging.

References 19 is a Chinese language detailed description of the COFF format. For more details, please refer to Reference 20.




Back to top


The author of the reference 1 talks about the section header All the data is set to 0, the program can also run correctly! The ELF header is a road map of this document that describes the structure of the file as a whole. The following is the data structure of the ELF header:

Copyright © Windows knowledge All Rights Reserved