Detailed sort order command under Linux

  
        

1 How sort works

sort compares each line of a file as a unit and compares them with each other. The comparison principle is to compare the first character backwards, followed by the ASCII code value, and finally output them in ascending order. .

[zookeeper@master rh]$ cat seq.txtbananaapplepearorangepear[zookeeper@master rh]$ sort seq.txt applebananaorangepearpear

2 sort -u option

Its role is very simple, It is to remove duplicate lines in the output line.

[zookeeper@master rh]$ sort -u seq.txt applebananaorangepear

Pear is ruthlessly deleted by the -u option.

3 sort -r option

[zookeeper@master rh]$ cat number.txt1357112461089[zookeeper@master rh]$ sort number.txt --sort The default sorting method is ascending 1101123456789 [zookeeper@master rh]$ sort -n number.txt -- The sorter sorts these numbers by character. The sorter compares 1 and 2 first, obviously 1 small, so put 10 in front of 21234567891011[zookeeper @master rh]$ sort -n -r number.txt --r means descending, n means sorting by number 1110987654321

4 sort of -o options

Since sort defaults to outputting results to standard Output, so you need to use redirection to write the result to a file, like sort filename > newfile.

However, if you want to output the sort result to the original file, you can't use the redirect.

[zookeeper@master rh]$ sort -n -r number.txt > number.txt[zookeeper@master rh]$ cat number.txt [zookeeper@master rh]$ 

number is cleared . So we need to use the -o option, which successfully solves this problem, allowing you to safely write the results to the original file. This may also be the only advantage of -o over redirects.

[zookeeper@master rh]$ sort -n -r number.txt -o number.txt[zookeeper@master rh]$ cat number.txt 1110987654321

5 sort of -t option and -k Options

[zookeeper@master rh]$ cat facebook.txtbanana:30:5.5apple:10:2.5pear:90:2.3orange:20:3.4[zookeeper@master rh]$ sort -n -k 2 -t : facebook.txtapple:10:2.5orange:20:3.4banana:30:5.5pear:90:2.3

This file has three columns, the columns are separated by colons, and the first column indicates the type of fruit. The second column indicates the number of fruits and the third column indicates the price of the fruit. Then I want to sort by the number of fruits, which is sorted by the second column. How to use sort to achieve? Fortunately, sort provides the -t option, which can be followed by a spacer. (Is not remembered the -d option of cut and paste, resonance ~~)

After specifying the spacer, you can use -k to specify the number of columns. We use the colon as a spacer and sort the values ​​in ascending order for the second column, and the results are satisfactory.

6 Other sort options commonly used

-f will convert lowercase letters to uppercase letters for comparison, ie ignore case

-c will check if the file is Ordered, if out of order, output the information of the first out-of-order row, and finally return 1

-C will check whether the file is sorted, if it is out of order, no output, Only return 1

-M will be sorted by month, such as JAN is less than FEB, etc.

-b will ignore all blanks before each line, starting with the first visible character.



Sort sort command under Linux (2)


Sometimes learning the script, you will find the sort command followed by a bunch of similar -k1, 2, or -k1.2 -k3.4 Dongdong, some incredible. Today, let's fix it —-k option!

1 Prepare the material

[root@FDMdevBI opt]# cat testsort.txt google 110 5000baidu 100 5000guge 50 3000sohu 100 4500

The first domain is the company name, the second domain It is the number of companies, and the third is the average salary of employees. (In addition to the company name, other letters, ^_^)

2 I want this file to be sorted alphabetically by the company, that is, sorted by the first domain: (this facebook The .txt file has three fields)

[root@FDMdevBI opt]# sort -t ' ' -k 1 testsort.txt baidu 100 5000google 110 5000guge 50 3000sohu 100 4500

See it, use it directly -k 1 setting is fine. (In fact, it is not strict here, you will know later)

3 I want facebook.txt to be sorted by company number

[root@FDMdevBI opt]# sort -n -t ' ' -k 2 testsort.txt guge 50 3000baidu 100 5000sohu 100 4500google 110 5000

However, there is a problem here, that is, the number of companies in baidu and sohu is the same, both are 100 people. What should I do at this time? According to the default rule, the sorting is performed from the first field, so baidu is ranked in front of sohu.

4 I want facebook.txt to be sorted by number of companies, and the same number of people are sorted in ascending order of employee average salary:

[root@FDMdevBI opt]# sort -n -t ' ' -k2 -k3 testsort.txt guge 50 3000sohu 100 4500baidu 100 5000google 110 5000

See, we added a -k2 -k3 to solve the problem. For the drop, sort supports this setting, that is, sets the priority of the domain sorting, first sorts by the second domain, and if it is the same, sorts by the third domain. (If you like, you can always write it down and set a lot of sorting priorities.)

5 I want facebook.txt to be sorted in descending order of employee salary. If the number of employees is the same, sort by company number in ascending order. :(This is a bit difficult)

[root@FDMdevBI opt]# sort -n -t ' ' -k3r -k2 testsort.txt baidu 100 5000google 110 5000sohu 100 4500guge 50 3000

Used here Some tips, you take a closer look, secretly added a lowercase letter r after -k 3. Think about it, combined with our previous article, can you get the answer? Revealed: the role of the r and -r options is the same, that is, the reverse order. Because sort is sorted by default in ascending order, you need to add r here to indicate that the third field (average employee salary) is sorted in descending order. Here you can also add n, which means that when sorting this field, you should sort by numerical value, for example:

[root@FDMdevBI opt]# sort -t ' ' -k3rn -k2n testsort.txt baidu 100 5000google 110 5000sohu 100 4500guge 50 3000

Look, we removed the first -n option and added it to every -k option.

The specific syntax of the 6-k option

To continue to go deeper, you have to come to some theoretical knowledge. You need to understand the syntax of the -k option as follows:

[ FStart [ .CStart ] ] [ Modifier ] [ , [ FEnd [ .CEnd ] ][ Modifier ] ]

This syntax format It can be divided into two parts, the Start part and the End part by the comma (“, & rdquo;).

First instill a thought into you, that is, "If you do not set the End part, then you think End is set to the end of the line". This concept is important, but often you won't value it.

The Start section is also composed of three parts, the Modifier part of which is the part of the options similar to n and r we said before. We focus on the FStart and C.Start in the Start section.

C.Start can also be omitted. If omitted, it means starting from the beginning of this field. The -k 2 and -k 3 in the previous example are examples of omitting C.Start.

FStart.CStart, where FStart is the domain used, and CStart means the first character in the FStart field is “ sort first character ”.

Similarly, in the End section, you can set FEnd.CEnd. If you omit .CEnd, it means the end to the "field end", the last character of this field. Or, if you set CEnd to 0 (zero), it also means ending to “field end”.

7 The whimsy, sorted from the second letter of the company's English name:

Copyright © Windows knowledge All Rights Reserved