Windows system >> Linux system Tutorial >> About Linux

How to use uniq command to delete text duplicate lines in Linux system

In the Linux system operation, the content of the text will inevitably appear duplicate lines. If you delete it manually, it will be more troublesome when you have more quantity. So, is there any way to quickly delete duplicate lines? The following small series will introduce you to how to use the uniq command to delete duplicate lines in Linux.

a, uniq used to do

repeated lines of text, basically not what we want, so we should get rid of. There are other commands under Linux that can remove duplicate lines, but I think uniq is still a convenient one. When using uniq, pay attention to the following two points

1. When working with text, it is generally used in combination with the sort command, because uniq does not check duplicate lines unless they are adjacent lines. If you want to sort the input first, use sort -u.

2, when the text operation, if the field is a first null character (usually including spaces and tabs), then non-null characters, the null characters before the characters in the field will be skipped

Second, uniq parameter description

The code is as follows:

[zhangy@BlackGhost ~]$ uniq --help

Usage: uniq [options]. . . [File]

Filter adjacent matching lines from input files or standard input and write to output files or standard output. "/p" "p" does not attach any options when matching lines will be merged at the first occurrence. "/p" The parameters that must be used for the "p" long option are also required for short options.

-c, --count //prefix each line with a prefix number indicating the number of occurrences of the corresponding line

-d, --repeated //output only duplicate lines

-D, --all-repeated //Export only duplicate lines, but there are a few lines that output a few lines

-f, --skip-fields=N //-f Number of segments ignored , -f 1 ignore the first paragraph

-i, --ignore-case //not case

-s, --skip-chars=N //root-f a bit Like, but -s is ignored, how many characters in the back - s 5 ignore the last 5 characters

-u, --unique //remove the duplicate, all displayed, root mysql's distinct function It's a bit like

-z, --zero-terminated end lines with 0 byte, not newline

-w, --check-chars=N //after the Nth character of each line The content is not checked

--help //Show this help and exit

--version //Show the version information and exit

where -z doesn't know what With

three, test the text file uniqtest

code as follows:

This is a test

this is a test

i am tank

i love tank

i love Tank

this is a test

whom have a try

WhoM have a try

you have a try

i want to Abure

those are good men

we are good men

IV, example explanation

The code is as follows:

[zhangy@BlackGhost Mytest]$ uniq -c uniqtest

3 this is a test

1 i am tank

2 i love tank

1 this is a test //and the first line is repeated

1 whom have a try

1 WhoM have a try

1 you have a try

1 i want To abroad

1 we are good men

From the above example we can see that a feature of uniq, when checking for duplicate lines, Only adjacent rows are checked. Repeat the data, there must be a lot of them that are not adjacent.

The code is as follows:

[zhangy@BlackGhost mytest]$ sort uniqtest