Linux compiles the kernel and adds system calls

  

We all know that system function calls are the interfaces that the Unix/Linux operating system provides to support user programs. Through these interfaces, applications request services from the operating system, and control is transferred to the operating system. After the service is completed, the control and results are returned to the user program.

The main purpose of the system call is to enable the user to use the functions provided by the operating system regarding device management, input/output systems, file system and process control, communication, and storage management without having to know the system program. Internal structure and related hardware details to reduce user burden and protect the system and improve resource utilization.

When the system function is called, it will be transferred from the user state (also called the state) to the nuclear state (also called the state). Yesterday, my friend of Xinan also asked me the difference between my management and my attitude. I think that there is a big difference here. In the management state, all the resources of the system can be used to call the privileged function of the system. The attitude is not acceptable. It must be executed in a managed state when the system function call is executed.

And if I need to add a system function call to my Linux system, I have to figure out how the system calls those function functions, and how it changes from a state to a state. In Linux, you can enter the state of the pipe through interrupts. This type of interrupt is called an access interrupt.

Procedures for System Calls in the Kernel

In Linux systems, system calls are implemented as an exception type that executes the appropriate machine code instructions to generate an exception signal. An important effect of generating an interrupt or exception is that the system automatically switches the user state to the core state to process it.

User-mode programs can only execute privileged kernel functions by trapping them into the system kernel (executing int instructions). After the system call is completed, the system executes another set of feature instructions (iret instructions) to return the system to the user mode, and control returns to the process.

The actual instruction that Linux uses to implement system call exceptions is:

int $0x80

This instruction uses interrupt/exception vector number 128 (ie, hexadecimal 80) Transfer control to the kernel (for mode switching).

In order to achieve programming without using machine instructions when using system calls, a short subroutine is provided for each system call in the standard C language library to complete the programming of the machine code.

In fact, the machine code segment is very short. All it has to do is load the parameters sent to the system call into the CPU registers, and then execute the int $0x80 instruction. Then run the system call.

The return value of the system call is sent to a register in the CPU. The standard library subroutine takes this return value and sends it back to the user program.

Let's take the call to the getuid() system call as an example:

We can see that there are some macro definitions, we can look at the definition of these macros

(arch/i386/kernel/entry.S).

………

#define SAVE_ALL \\

cld; \\

pushl %es; \\

pushl %ds; \\

pushl %eax; \\

pushl %ebp; \\

pushl %edi; \\

pushl %esi; \\

pushl %edx; \\

pushl %ecx; \\

pushl %ebx; \\

movl $ (__USER_DS),%edx; \\

movl %edx,%ds; \\

movl %edx,%es;

We can see that SAVE_ALL is mainly a save register Information, that is, on-site retention. Among them, movl $(__USER_DS), %edx; from this sentence is to refill the DS, ES segment.

#define RESTORE_INT_REGS \\

popl %ebx; \\

popl %ecx; \\

popl %edx; \\

popl %esi; \\

popl %edi; \\

popl %ebp; \\

popl %eax

#define RESTORE_REGS \\

RESTORE_INT_REGS; \\

1: popl %ds; \\

2: popl %es; \\

.section .fixup,"ax"; \\

3: movl $0,(%esp); \\

jmp 1b; \\

4: movl $0,(%esp); \\

jmp 2b; \\

.previous; \\

.section __ex_table,"a";\\

.align 4; \\

.long 1b,3b; \\

.long 2b,4b; \\

.previous

ENTRY(ret_from_fork)

pushl %eax

call schedule_tail< Br>

GET_THREAD_INFO(%ebp)

popl %eax

jmp syscall_exit

Here I mainly complete the live recovery and return.

ENTRY(system_call)

pushl %eax # save orig_eax

SAVE_ALL

GET_THREAD_INFO(%ebp)

# system call tracing In operation

/* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */

testw $(_TIF_SYSCALL_TRACE| _TIF_SYSCALL_AUDIT| _TIF_SECCOMP), TI_flags(%ebp)

jnz syscall_trace_entry

cmpl $(nr_syscalls), %eax

jae syscall_badsys

syscall_call:

call *sys_call_table(,%eax,4)

movl %eax,EAX(%esp) # store the return value

syscall_exit:

cli # make sure We don't miss an interrupt

#定 need_resched or sigpending

# between sampling and the iret

movl TI_flags(%ebp), %ecx

testw $_TIF_ALLWORK_MASK, %cx # current->work

jne syscall_exit_work

restore_all:

movl EFLAGS(%esp), %eax # mix EFLAGS, SS and CS

# Warning: OLDSS(%esp) contains the wrong/random values ​​if we

# are returning to the kernel.

# See comments in process.c: Copy_thread() for details.

movb OLDSS(%esp), %ah

movb CS(%esp), %al

andl $(VM_MASK |  (4 << 8) |  3), %eax

cmpl $((4 << 8) |  3), %eax

je ldt_ss # returning to user-space with LDT SS

restore_nocheck:

RESTORE_REGS

addl $4, %esp

1: iret

In this section, the main call is to complete the call. Eax places the system call number, because eax may be used, so save its value first. Call *sys_call_table(,%eax,4) This is the entry for the calculation call.

Among them, sys_call_table is the system call table of LINUX, which exists under the directory arch/i386/kernel/sys_call_table.S.

.data

ENTRY(sys_call_table)

.long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */

.long sys_exit

.long sys_fork

.long sys_read

.long sys_write

.long sys_open /* 5 */

……

……

.long sys_mq_timedreceive /* 280 */

.long sys_mq_notify

.long sys_mq_getsetattr< Br>

.long sys_ni_syscall /* reserved for kexec */

.long sys_waitid

.long sys_ni_syscall /* 285 *//* available */

Copyright © Windows knowledge All Rights Reserved