Linux text processing commands (sort and uniq)

  

Sort command

The function of the sort command is to sort the lines in the file. The sort command has a number of very useful options that were originally used to sort the contents of a file in a database format. In fact, the sort command can be thought of as a very powerful data management tool for managing files with content similar to database records.


The Sort command will sort the contents of the file line by line. If the first characters of the two lines are the same, the command will continue to compare the next characters of the two lines. If they are the same, The comparison will continue.


Syntax:


sort [options] file


Description: The sort command specifies the file All the rows in the row are sorted and the results are displayed on the standard output. If you do not specify an input file or use “- ”, then the sorting content comes from standard input.


Sort sorting is done based on comparing one or more keywords extracted from the input line. The sort key defines the smallest sequence of characters to use for sorting. By default, the entire behavior keyword is sorted in ASCII character order.


The options for changing the default settings are:


- m If the given file is sorted, merge the files.


- c Check if the given files are sorted, if they are not in order, print an error message and exit with a status value of 1.


- u Leave only one of the lines considered to be the same after sorting.


- o The output file writes the sorted output to the output file instead of the standard output. If the output file is one of the input files, sort first writes the contents of the file to a temporary file. , then sort and write the output.


The options for changing the default collation are:


- d Sort by lexicographical order, comparing only letters, numbers, spaces, and Tabs make sense.


- f Treat lowercase letters with uppercase letters.


- I Ignore non-printing characters.


- M as a month comparison: <;JAN”<“FEB”

- r Output the sort results in reverse order.


+posl - pos2 Specify one or several fields as sort keys. The field position starts from posl and ends at pos2 (including posl, excluding pos2). If pos2 is not specified, the keyword is from posl to the end of the line. The position of the fields and characters starts at 0.


- b Ignore leading whitespace (spaces and tabs) when looking for sort keywords in each row.


- t separator Specifies the character separator as the field separator.


Here are a few examples to illustrate the use of sort.


Use the sort command to sort the lines in the text file and output the result. Note that the first word on the second and third lines of the original file is identical, and the command will continue to compare from their second word, vegetables, with the first character of the fruit.


$ cat text


vegetable soup


fresh vegetables

< Br>

fresh fruit


lowfat milk



$ sort text


fresh fruit


fresh vegetables


lowfat milk


vegetable soup


The user can save the sorted file contents or output the sorted file contents to the printer. In the following example, the user saves the sorted file contents to a file named result.


$ sort text>result


Sort the contents of the file example with the second field as the sort key.


$ sort +1-2 example


Reverse sorting the contents of file1 and file2 files, the result is placed in outfile, using the second The first character of the field is used as the sort key.


$ sort -r -o outfile +1.0 -1.1 example


Sort sorting is often used in conjunction with other commands in the pipeline. More complex functions, such as using the pipeline to sort the files in the current working directory to the sort, the sort key is the sixth to eighth fields.


$ ls - l |  Sort +5 - 7


The sort command can also operate on standard input. For example, if you want to merge several lines of text and sort the merged lines of text, you can first merge the multiple files with the command cat and then pipe the merged lines of text into the command sort. The sort command will output these merged and sorted lines of text. In the following example, the text line of the file veglist and the file fruitlist are merged and sorted and saved to the file clist.


$ cat veglist fruitlist |  Sort > clist



uniq command


The file may be duplicated in its output file after processing Line. For example, if you use the cat command to merge two files and then use the sort command to sort them, duplicate lines may appear. You can use the uniq command to remove these duplicate lines from the output file, leaving only a unique sample of each record.


Syntax:


uniq [options] File


Description: This command reads input File and compare adjacent rows. Under normal circumstances, the second and subsequent more repeated lines will be deleted, and the line comparison is based on the sorting sequence of the character set used. The processed result of this command is written to the output file. The input file and output file must be different. If the input file is represented by “- ”, it is read from standard input.


The meaning of each command in this command is as follows:


- c In the output, at the beginning of each line, the line appears in the file. The number of times. It can replace the -u and -d options.


- d Show only duplicate lines.


- u Only displays lines that are not duplicated in the file.


- n The first n fields are ignored along with the space before each field. A field is a non-space, non-tab-type string separated by tabs and spaces (the fields are numbered from 0).


+n The first n characters are ignored and the previous characters are skipped (characters are numbered starting from 0).


- f n is the same as -n, where n is the number of fields.


- s n is the same as +n, where n is the number of characters.


For example:


1. Display the lines that are not duplicated in the file example.


uniq - u example


2. Display lines that are not repeated in the file example, starting with the second character of the second field to compare.


uniq - u - 1 +1 example

Copyright © Windows knowledge All Rights Reserved