How to deal with the dead loop of user state process under Linux

  

When doing Linux system operation, sometimes encounter a dead loop of user state process, that is, the system is slow, the process hangs, etc. How to solve the problem? The following small series will introduce you to the next user-process process infinite loop.

1, the problem phenomenon

business process (user mode multithreaded programs) linked to death, OS unresponsive, system log there is no exception. From the kernel-state stack of the process, it seems that all threads are stuck in the following stack flow in kernel mode:

[root@vmc116 ~]# cat /proc/27007/task/11825/stack

["ffffffff8100baf6"] retint_careful+0x14/0x32

["ffffffffffffffff"] 0xffffffffffffffffff

2, problem analysis

1) kernel stack analysis

From the kernel stack, all processes are blocked on retint_careful. This is the flow in the interrupt return process. The code (assembly) is as follows:

entry_64.S

The code is as follows: Br>

ret_from_intr:

DISABLE_INTERRUPTS(CLBR_NONE)

TRACE_IRQS_OFF

decl PER_CPU_VAR(irq_count)

/* Restore saved previous stack */< Br>

popq %rsi

CFI_DEF_CFA rsi,SS+8-RBP /* reg/off reset after def_cfa_expr */

leaq ARGOFFSET-RBP(%rsi), %rsp< Br>

CFI_DEF_CFA_REGISTER rsp

CFI_ADJUST_CFA_OFFSET RBP-ARGOFFSET< Br>

. . .

retint_careful:

CFI_RESTORE_STATE

bt $TIF_NEED_RESCHED,%edx

jnc retint_signal

TRACE_IRQS_ON

ENABLE_INTERRUPTS( CLBR_NONE)

pushq_cfi %rdi

SCHEDULE_USER

popq_cfi %rdi

GET_THREAD_INFO(%rcx)

DISABLE_INTERRUPTS(CLBR_NONE)

TRACE_IRQS_OFF

jmp retint_check

This is actually the process that the user-mode process returns from the interrupt after the user state is interrupted by the interrupt, combined with retint_careful+0x14/0x32, disassembling , you can confirm that the blocking point is actually

SCHEDULE_USER

This is actually calling schedule() for scheduling, which means that when the process goes to the process of interrupt return, it needs to be scheduled ( TIF_NEED_RESCHED is set, so scheduling occurs here.

There is a question: Why can't I see the stack frame of schedule() in the stack?

Because this is directly called by the assembly, there is no related stack frame push and context save operation.

2) Performing state information analysis

From the results of the top command, the relevant thread is actually in the R state, the CPU is almost completely exhausted, and most of it is consumed in the user state:

[root@vmc116 ~]# top

top - 09:42:23 up 16 days, 2:21, 23 users, load average: 84.08, 84.30, 83.62

Tasks: 1037 total, 85 running, 952 sleeping, 0 stopped, 0 zombie

Cpu(s): 97.6%us, 2.2%sy, 0.2%ni, 0.0%id, 0.0%wa, 0.0%hi , 0.0%si, 0.0%st

Mem: 32878852k total, 32315464k used, 563388k free, 374152k buffers

Swap: 35110904k total, 38644k used, 35072260k free, 28852536k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

27074 root 20 0 5316m 163m 14m R 10.2 0.5 321:06.17 z_itask_templat

27084 root 20 0 5316m 163m 14m R 10.2 0.5 296:23.37 z_itask_templat

27085 root 20 0 5316m 163m 14m R 10.2 0.5 337:57.26 z_itask _templat

27095 root 20 0 5316m 163m 14m R 10.2 0.5 327:31.93 z_itask_templat

27102 root 20 0 5316m 163m 14m R 10.2 0.5 306:49.44 z_itask_templat

27113 root 20 0 5316m 163m 14m R 10.2 0.5 310:47.41 z_itask_templat

25730 root 20 0 5316m 163m 14m R 10.2 0.5 283:03.37 z_itask_templat

30069 root 20 0 5316m 163m 14m R 10.2 0.5 283:49.67 Z_itask_templat

13938 root 20 0 5316m 163m 14m R 10.2 0.5 261:24.46 z_itask_templat

16326 root 20 0 5316m 163m 14m R 10.2 0.5 150:24.53 z_itask_templat

6795 root 20 0 5316m 163m 14m R 10.2 0.5 100:26.77 z_itask_templat

27063 root 20 0 5316m 163m 14m R 9.9 0.5 337:18.77 z_itask_templat

27065 root 20 0 5316m 163m 14m R 9.9 0.5 314:24.17 Z_itask_templat

27068 root 20 0 5316m 163m 14m R 9.9 0.5 336:32.78 z_itask_templat

27069 root 20 0 5316m 163m 14m R 9.9 0.5 338:55.08 z_itask_templat

27072 root 20 0 5316m 163m 14m R 9.9 0.5 306:46.08 z_itask_templat

27075 root 20 0 5316m 163m 14m R 9.9 0.5 316:49.51 z_itask_templat

. . .

3) Process scheduling information

See the scheduling information of the relevant thread:

[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat< Br>

15681811525768 129628804592612 3557465

[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat

15682016493013 129630684625241 3557509

[root@ Vmc116 ~]# cat /proc/27007/task/11825/schedstat

15682843570331 129638127548315 3557686

[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat

15683323640217 129642447477861 3557793

[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat

15683698477621 129645817640726 3557875

Discovering related threads The scheduling statistics have been increasing, indicating that the relevant thread has been scheduled to run, and its state has always been R, speculation that it is likely to have an infinite loop (or non-sleep deadlock) in the user state.

There is another problem here: Why is the CPU usage of each thread from top to only about 10%, instead of the 100% occupancy caused by the infinite loop process normally seen?

Because there are a lot of threads and the same priority, according to the CFS scheduling algorithm, the time slice will be evenly distributed, and one of the threads will not be allowed to monopolize the CPU. The result is a round-robin scheduling between multiple threads, consuming all the CPUs. .

Another question: Why is the kernel not detecting softlockup in this case?

Because the priority of the business process is not high, it will not affect the scheduling of the watchdog kernel thread (the highest priority real-time thread), so there will be no softlockup.

Another question: Why do you always block retint_careful every time you look at the thread stack, not elsewhere?

Because this (when the interrupt returns) is the timing of the scheduling, scheduling can not occur at other points in time (regardless of other circumstances ~), and we look at the behavior of the thread stack, must also rely on the process scheduling So every time we look at the stack, it is the time to view the stack process (cat command) is dispatched, this time is the time when the interrupt returns, so the blocking point just seen is retint_careful.

4) User State Analysis

From the above analysis, it is assumed that the user state has a deadlock.

User mode confirmation method:

Deploy debug information, then gdb attach related processes, confirm the stack, and combine code logic analysis.

Final confirmation that the problem did create an infinite loop in the user state process.

The above is the processing method of the next user state process infinite loop of the Linux system. First, analyze the cause of the problem, and then deal with it according to the reason. Have you learned it?

Copyright © Windows knowledge All Rights Reserved