uniq 命令用于检查及删除文本文件中重复出现的行列,一般与 sort 命令结合使用。
uniq 可检查文本文件中重复出现的行列。
命令语法:
uniq [-c/d/D/u/i] [-f Fields] [-s N] [-w N] [InFile] [OutFile]
参数解释:
-c: 在每列旁边显示该行重复出现的次数。-d: 仅显示重复出现的行列,显示一行。-D: 显示所有重复出现的行列,有几行显示几行。-u: 仅显示出一次的行列-i: 忽略大小写字符的不同-f Fields: 忽略比较指定的列数。-s N: 忽略比较前面的N个字符。 -w N: 对每行第N个字符以后的内容不作比较。[InFile]: 指定已排序好的文本文件。如果不指定此项,则从标准读取数据;[OutFile]: 指定输出的文件。如果不指定此选项,则将内容显示到标准输出设备(显示终端)。
栗子
# uniq.txtMy name is Delav My name is Delav My name is Delav I'm learning Java I'm learning Java I'm learning Java who am i Who am i Python is so simple My name is Delav That's good That's good And studying Golang
1. 直接去重
uniq uniq.txt
结果为:
My name is Delav I'm learning Java who am i Who am i Python is so simple My name is Delav That's good And studying Golang
2. 显示重复出现的次数
uniq -c uniq.txt
结果为:
3 My name is Delav3 I'm learning Java1 who am i 1 Who am i 1 Python is so simple1 My name is Delav2 That's good 1 And studying Golang
你会发现,上面有两行 ”My name is Delav ” 是相同的。也就是说,当重复的行不相邻时,uniq 命令是不起作用的。所以,经常需要跟 sort 命令一起使用。
sort uniq.txt | uniq -c
结果为:
1 And studying Golang3 I'm learning Java4 My name is Delav1 Python is so simple2 That's good 1 who am i 1 Who am i
3. 只显示重复的行,并显示重复次数
uniq -cd uniq.txt
结果为:
3 My name is Delav3 I'm learning Java2 That's good
显示所有重复的行,不能与 -c 一起使用
uniq -D uniq.txt
结果为:
My name is Delav My name is Delav My name is Delav I'm learning Java I'm learning Java I'm learning Java That's good That's good
4. 忽略第几列字符
下面这里 -f 1 忽略了第一列字符,所以"who am i" 和 "Who am i" 判定为重复
uniq -c -f 1 uniq.txt
结果为:
3 My name is Delav3 I'm learning Java2 who am i 1 Python is so simple1 My name is Delav2 That's good 1 And studying Golang
5. 忽略大小写
下面这里 -i 忽略了大小写,所以"who am i" 和 "Who am i" 判定为重复
uniq -c -i uniq.txt
结果为:
3 My name is Delav3 I'm learning Java2 who am i 1 Python is so simple1 My name is Delav2 That's good 1 And studying Golang
6. 忽略前面N个字符
下面这里 -s 4 表示忽略前面四个字符,所以"who am i" 和 "Who am i" 判定为重复
uniq -c -s 4 uniq.txt
结果为:
3 My name is Delav3 I'm learning Java2 who am i 1 Python is so simple1 My name is Delav2 That's good 1 And studying Golang
7. 忽略第N个字符后的内容
uniq -c -w 2 uniq.txt