脱掉衬衫

字符串匹配

我们再来看几个字符串匹配的示例:

  1. $ awk '$6 ~ /FIN/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
  2. 1 Local-Address Foreign-Address State
  3. 6 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
  4. 9 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
  5. 13 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
  6. 18 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
  7. $ $ awk '$6 ~ /WAIT/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
  8. 1 Local-Address Foreign-Address State
  9. 5 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
  10. 6 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
  11. 9 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
  12. 11 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
  13. 13 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
  14. 15 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
  15. 18 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2

上面的第一个示例匹配FIN状态, 第二个示例匹配WAIT字样的状态。其实 ~ 表示模式开始。/ /中是模式。这就是一个正则表达式的匹配。

其实awk可以像grep一样的去匹配第一行,就像这样:

  1. $ awk '/LISTEN/' netstat.txt
  2. tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
  3. tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
  4. tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
  5. tcp 0 0 :::22 :::* LISTEN

我们可以使用 “/FIN|TIME/” 来匹配 FIN 或者 TIME :

  1. $ awk '$6 ~ /FIN|TIME/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
  2. 1 Local-Address Foreign-Address State
  3. 5 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
  4. 6 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
  5. 9 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
  6. 11 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
  7. 13 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
  8. 15 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
  9. 18 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2

再来看看模式取反的例子:

  1. $ awk '$6 !~ /WAIT/ || NR==1 {print NR,$4,$5,$6}' OFS="\t" netstat.txt
  2. 1 Local-Address Foreign-Address State
  3. 2 0.0.0.0:3306 0.0.0.0:* LISTEN
  4. 3 0.0.0.0:80 0.0.0.0:* LISTEN
  5. 4 127.0.0.1:9000 0.0.0.0:* LISTEN
  6. 7 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
  7. 8 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
  8. 10 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
  9. 12 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
  10. 14 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
  11. 16 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
  12. 17 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
  13. 19 :::22 :::* LISTEN

或是:

awk '!/WAIT/' netstat.txt

折分文件

awk拆分文件很简单,使用重定向就好了。下面这个例子,是按第6例分隔文件,相当的简单(其中的NR!=1表示不处理表头)。

  1. $ awk 'NR!=1{print > $6}' netstat.txt
  2. $ ls
  3. ESTABLISHED FIN_WAIT1 FIN_WAIT2 LAST_ACK LISTEN netstat.txt TIME_WAIT
  4. $ cat ESTABLISHED
  5. tcp 0 0 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
  6. tcp 0 0 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
  7. tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
  8. tcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
  9. tcp 0 0 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
  10. tcp 0 0 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
  11. $ cat FIN_WAIT1
  12. tcp 0 1 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
  13. $ cat FIN_WAIT2
  14. tcp 0 0 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
  15. tcp 0 0 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
  16. tcp 0 0 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
  17. $ cat LAST_ACK
  18. tcp 0 1 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
  19. $ cat LISTEN
  20. tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
  21. tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
  22. tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
  23. tcp 0 0 :::22 :::* LISTEN
  24. $ cat TIME_WAIT
  25. tcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
  26. tcp 0 0 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
  27. tcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT

你也可以把指定的列输出到文件:

awk 'NR!=1{print $4,$5 > $6}' netstat.txt

再复杂一点:(注意其中的if-else-if语句,可见awk其实是个脚本解释器)

  1. $ awk 'NR!=1{if($6 ~ /TIME|ESTABLISHED/) print > "1.txt";
  2. else if($6 ~ /LISTEN/) print > "2.txt";
  3. else print > "3.txt" }' netstat.txt
  4. $ ls ?.txt
  5. 1.txt 2.txt 3.txt
  6. $ cat 1.txt
  7. tcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
  8. tcp 0 0 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
  9. tcp 0 0 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
  10. tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
  11. tcp 0 0 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
  12. tcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
  13. tcp 0 0 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
  14. tcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
  15. tcp 0 0 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
  16. $ cat 2.txt
  17. tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
  18. tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
  19. tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
  20. tcp 0 0 :::22 :::* LISTEN
  21. $ cat 3.txt
  22. tcp 0 0 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
  23. tcp 0 0 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
  24. tcp 0 1 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
  25. tcp 0 1 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
  26. tcp 0 0 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2

统计

下面的命令计算所有的C文件,CPP文件和H文件的文件大小总和。

  1. $ ls -l *.cpp *.c *.h | awk '{sum+=$5} END {print sum}'
  2. 2511401

我们再来看一个统计各个connection状态的用法:(我们可以看到一些编程的影子了,大家都是程序员我就不解释了。注意其中的数组的用法)

  1. $ awk 'NR!=1{a[$6]++;} END {for (i in a) print i ", " a[i];}' netstat.txt
  2. TIME_WAIT, 3
  3. FIN_WAIT1, 1
  4. ESTABLISHED, 6
  5. FIN_WAIT2, 3
  6. LAST_ACK, 1
  7. LISTEN, 4

再来看看统计每个用户的进程的占了多少内存(注:sum的RSS那一列)

  1. $ ps aux | awk 'NR!=1{a[$1]+=$6;} END { for(i in a) print i ", " a[i]"KB";}'
  2. dbus, 540KB
  3. mysql, 99928KB
  4. www, 3264924KB
  5. root, 63644KB
  6. hchen, 6020KB