Novice to see: Linux system startup time limit optimization

  

(1) First is the tracking and analysis of the Linux boot process, generating a detailed startup time report.


The simpler and more feasible way is to add timestamps to all kernel information of the boot process through the PrintkTime function, which is convenient for summary analysis. PrintkTime was originally a kernel patch provided by CELF, which was officially incorporated into the standard kernel in later versions of Kernel 2.6.11. So you may be able to enable this feature directly in the new version of the kernel. If your Linux kernel can't be updated to version 2.6.11 for some reason, you can modify or directly download the patches provided by CELF: http://tree.celinuxforum.org/CelfPubWiki/PrintkTimes


The method to enable the PrintkTime function is very simple, just add "time" to the kernel startup parameters. Of course, you can also choose to directly specify "Show timing information on printks" in "Kernel hacking" when compiling the kernel to force a timestamp for kernel information for each boot. There is another benefit to this approach: you can get all the information about the kernel before parsing the startup parameters. Therefore, I chose the latter method.


When the above configuration is completed, restart Linux, and then output the kernel boot information to the file with the following command:


dmesg -s 131072 > ktime


Then use a script "show_delta" (located in the scripts folder of the Linux source code) to convert the above output file to the time increment display format:


/usr/src/Linux-x.xx.xx/scripts/show_delta ktime > dtime


This way, you get a detailed report on Linux boot time consumption.


(2) Then, we will use this report to find out the relatively time-consuming process in the startup.


It must be clear that there is no necessary correspondence between the time increment in the report and the kernel information. The real time consumption must be analyzed from the kernel source.


This is not difficult for a friend who is a little familiar with programming, because the time increment is just the time difference between calls to printk twice. Generally speaking, during the kernel startup process, after completing some time-consuming tasks, such as creating a hash index, a probe hardware device, etc., the result will be printed out by printk. In this case, the time increment often reflects the information corresponding process. Time-consuming; but sometimes, the kernel starts the corresponding process after calling the printk output information, then the time consumption of the corresponding process of the kernel information in the report corresponds to the time increment of the next line; and sometimes, the time consumption In an indeterminate period of time between the output of the kernel information, such time increments may not be reflected by the kernel information at all.


Therefore, in order to accurately determine the true time consumption, we need to analyze the kernel source code. When necessary, for example, in the third case above, you have to insert printk prints into the source code to further determine the actual time consumption process.


The following is the startup analysis of the Linux kernel after my last cut:


The total time of kernel startup: 6.188s


Key consumption Time part:

1) 0.652s - Timer, IRQ, Cache, initialization of core parts such as Mem Pages

2) 0.611s - Synchronization of core and RTC clock

3 0.328s - Calculate Calibrating Delay (total consumption of 4 CPU cores)

4) 0.144s - Calibrate APIC clock

5) 0.312s - Calibrate Migration Cost

6) 3.520s - Intel E1000 NIC Initialization


Below, each part will be analyzed and resolved one by one.


(3) Next, carry out specific sub-optimization.


CELF has proposed a set of startup optimization solutions for embedded Linux used in consumer electronics, but because it is geared to different applications, we can only partially learn from their experience and target ourselves. Make a specific analysis and try on the problem.


The initialization of the key parts of the kernel (Timer, IRQ, Cache, Mem Pages...) is currently not a reliable and feasible optimization solution, so it is not considered.


For the 2 and 3 of the above analysis results, CELF has a special optimization scheme: “RTCNoSync” and “PresetLPJ”.


The former is easier to implement by masking the RTC clock synchronization during startup or by putting this process on after startup (depending on the application's need for clock accuracy). Need to patch the kernel. It seems that the current work of CELF is simply to remove the process, and does not implement the "snoozed" processing of the RTC clock synchronization mentioned. For this reason, this optimization has not been introduced in my solution for a while (after all, the time drift that it has brought has reached the "second" level), continue to pay attention.


The latter skips the actual calculation process by forcing the LPJ value to be specified in the startup parameters, which is based on the fact that the LPJ value does not change under the same hardware conditions. Therefore, after recording the value of "Calibrating Delay" in the kernel information after normal startup, you can force the LPJ value to be specified in the startup parameters in the following form:


lpj=9600700


The 4 and 5 of the above analysis results are all part of the SMP initialization, so they are not in the scope of CELF research (maybe there will be multi-core MP4 in the future?...), and can only be self-reliant. After studying the initialization code of SMP, I found that "Migration Cost" can also skip the calibration time in a preset way like "Calibrating Delay". The method is similar, and finally added in the kernel startup parameters:


migration_cost=4000,4000


And Intel's NIC driver initialization optimization is more troublesome, although it is also open source However, reading the hardware driver is no better than reading the general C code. Moreover, the "optimization" modification based on such superficial understanding is also difficult to protect. Based on reliability considerations, I finally gave up on this road after both attempts failed. Then, for a different angle of thinking, you can learn from the "parallel initialization" idea of ​​CELF in the "ParallelRCScripts" scheme, compile the NIC driver independently into modules, and load them in the initialization script synchronously with other modules and applications, thus eliminating the startup of the Probe blocking. The impact of time. Considering that application initialization may also use the network, in our actual hardware environment, only eth0 is used for provisioning, so the 0.3s time of the first network port initialization needs to be counted.


In addition to the above optimization points encountered in my solution, CELF also proposes specific optimizations that you may be interested in, such as:


ShortIDEDelays - Shorten IDE detection time (I don't use hard disk in my application, so I can't use it)

KernelXIP - Run kernel directly in ROM or Flash (considering compatibility factor, not used)

IDENoProbe - Skip the IDE port of unconnected devices

OptimizeRCScripts - Optimize the linuxrc script in initrd (I used BusyBox for a more concise linuxrc)


As well as other optimization scenarios that are still in the envisioned stage, interested friends can visit the CELF Developer Wiki for more information.


(4) Optimization results


After the above special optimization, and the redundancy reduction of inittab and rcS scripts, the startup time of the entire Linux kernel is from the 6.188 before optimization. s drops to the final 2.016s. If it does not contain the initialization of eth0, it only needs 1.708s (the eth0 initialization can be parallel with the system middleware and some application loads), which basically achieves the set goal. In cooperation with Kexec, the reset time caused by software failure can be greatly reduced, and the reliability of the product is effectively improved.

Copyright © Windows knowledge All Rights Reserved