用Shell脚本分析Nginx日志

铁匠 Nginx (4.4万) 2017-02-15 13:00:24

本节将介绍用Shell脚本来分析Nginx负载均衡器的日志，这样可以快速得出排名靠前的网站和IP等，推荐大家使用线上环境下的Shell脚本。本节中的Shell脚本又分为两种情况，第一种情况是Nginx作为最前端的负载均衡器，其集群架构为Nginx+Keepalived时，脚本内容如下所示：

[root@tiejiang ~]# vim log-nginx.sh
 #!/bin/bash  
 if ［$# -eq 0 ］; then  
  echo "Error: please specify logfile."  
  exit 0  
 else  
  LOG=$1  
 fi  
 if ［ ! -f$1 ］; then  
  echo "Sorry, sir, I can't find this apache log file, pls try again!"  
 exit 0  
 fi  
####################################################  
 echo "Most of the ip:"  
 echo "-------------------------------------------"  
 awk '{ print$1 }'$LOG| sort| uniq -c| sort -nr| head -10  
 echo  
 echo  
 ####################################################  
 echo "Most of the time:"  
 echo "--------------------------------------------"  
 awk '{ print$4 }'$LOG| cut -c 14-18| sort| uniq -c| sort -nr| head -10  
 echo  
 echo  
 ####################################################  
 echo "Most of the page:"  
 echo "--------------------------------------------"  
 awk '{print$11}'$LOG| sed 's/^.*\\(.cn*\\)\"/\\1/g'| sort| uniq -c| sort -rn| head -10  
 echo  
 echo  
 ####################################################  
 echo "Most of the time / Most of the ip:"  
 echo "--------------------------------------------"  
 awk '{ print$4 }'$LOG| cut -c 14-18| sort -n| uniq -c| sort -nr| head -10 > timelog  
 for i in 'awk '{ print$2 }' timelog'  
 do  
  num='grep$i timelog| awk '{ print$1 }''  
  echo "$i$num"  
  ip='grep$i$LOG| awk '{ print$1}'| sort -n| uniq -c| sort -nr| head -10'  
  echo "$ip"  
  echo  
 done  
 rm -f timelog

#!/bin/bash  
if ［$# -eq 0 ］; then  
 echo "Error: please specify logfile."  
 exit 0  
else  
 LOG=$1  
fi  
if ［ ! -f$1 ］; then  
 echo "Sorry, sir, I can't find this apache log file, pls try again!"  
exit 0  
fi  
####################################################  
echo "Most of the ip:"  
echo "-------------------------------------------"  
awk '{ print$1 }'$LOG| sort| uniq -c| sort -nr| head -10  
echo  
echo  
####################################################  
echo "Most of the time:"  
echo "--------------------------------------------"  
awk '{ print$4 }'$LOG| cut -c 14-18| sort| uniq -c| sort -nr| head -10  
echo  
echo  
####################################################  
echo "Most of the page:"  
echo "--------------------------------------------"  
awk '{print$11}'$LOG| sed 's/^.*\\(.cn*\\)\"/\\1/g'| sort| uniq -c| sort -rn| head -10  
echo  
echo  
####################################################  
echo "Most of the time / Most of the ip:"  
echo "--------------------------------------------"  
awk '{ print$4 }'$LOG| cut -c 14-18| sort -n| uniq -c| sort -nr| head -10 > timelog  
for i in 'awk '{ print$2 }' timelog'  
do  
 num='grep$i timelog| awk '{ print$1 }''  
 echo "$i$num"  
 ip='grep$i$LOG| awk '{ print$1}'| sort -n| uniq -c| sort -nr| head -10'  
 echo "$ip"  
 echo  
done  
rm -f timelog

#!/bin/bash  
if ［$# -eq 0 ］; then  
 echo "Error: please specify logfile."  
 exit 0  
else  
 cat$1| egrep -v '203.93.236.141|145' > LOG  
fi  
if ［ ! -f$1 ］; then  
 echo "Sorry, sir, I can't find this apache log file, pls try again!"  
exit 0  
fi  
####################################################  
echo "Most of the ip:"  
echo "-------------------------------------------"  
awk '{ print$1 }' LOG| sort| uniq -c| sort -nr| head -10  
echo  
echo  
####################################################  
echo "Most of the time:"  
echo "--------------------------------------------"  
awk '{ print$4 }' LOG| cut -c 14-18| sort| uniq -c| sort -nr| head -10  
echo  
echo  
####################################################  
echo "Most of the page:"  
echo "--------------------------------------------"  
awk '{print$11}' LOG| sed 's/^.*\(.cn*\)\"/\1/g'| sort| uniq -c| sort -rn| head -10  
echo  
echo  
####################################################  
echo "Most of the time / Most of the ip:"  
echo "--------------------------------------------"  
awk '{ print$4 }' LOG| cut -c 14-18| sort -n| uniq -c| sort -nr| head -10 > timelog  
for i in 'awk '{ print$2 }' timelog'  
do  
 num='grep$i timelog| awk '{ print$1 }''  
 echo "$i$num"  
 ip='grep$i LOG| awk '{ print$1}'| sort -n| uniq -c| sort -nr| head -10'  
 echo "$ip"  
 echo  
done  
rm -f timelog

第二种情况是以Nginx作为Web端，置于LVS后面，这时要剔除掉LVS的IP地址，比如LVS服务器的公网IP地址(如203.93.236.141、203.93.236.145等)。这样可以将第一种情况的脚本略微调整一下，如下所示：

#!/bin/bash  
if ［$# -eq 0 ］; then  
 echo "Error: please specify logfile."  
 exit 0  
else  
 cat$1| egrep -v '203.93.236.141|145' > LOG  
fi  
if ［ ! -f$1 ］; then  
 echo "Sorry, sir, I can't find this apache log file, pls try again!"  
exit 0  
fi  
####################################################  
echo "Most of the ip:"  
echo "-------------------------------------------"  
awk '{ print$1 }' LOG| sort| uniq -c| sort -nr| head -10  
echo  
echo  
####################################################  
echo "Most of the time:"  
echo "--------------------------------------------"  
awk '{ print$4 }' LOG| cut -c 14-18| sort| uniq -c| sort -nr| head -10  
echo  
echo  
####################################################  
echo "Most of the page:"  
echo "--------------------------------------------"  
awk '{print$11}' LOG| sed 's/^.*\(.cn*\)\"/\1/g'| sort| uniq -c| sort -rn| head -10  
echo  
echo  
####################################################  
echo "Most of the time / Most of the ip:"  
echo "--------------------------------------------"  
awk '{ print$4 }' LOG| cut -c 14-18| sort -n| uniq -c| sort -nr| head -10 > timelog  
for i in 'awk '{ print$2 }' timelog'  
do  
 num='grep$i timelog| awk '{ print$1 }''  
 echo "$i$num"  
 ip='grep$i LOG| awk '{ print$1}'| sort -n| uniq -c| sort -nr| head -10'  
 echo "$ip"  
echo  
done  
rm -f timelog

我们可以用此脚本分析文件名为www_tomcat_20110331.log的文件。

［root@localhost 03］# sh counter_nginx.shwww_tomcat_20110331.log 
大家应该跟我一样比较关注脚本运行后的第一项和第二项结果，即访问我们网站最多的IP和哪个时间段IP访问比较多，如下所示：Most of the ip:  
-------------------------------------------  
 5440 117.34.91.54  
9 119.97.226.226  
4 210.164.156.66  
4 173.19.0.240  
4 109.230.251.35  
2 96.247.52.15  
2 85.91.140.124  
2 74.168.71.253  
2 71.98.41.114  
2 70.61.253.194  
Most of the time:  
--------------------------------------------  
  12 15:31  
 11 09:45  
 10 23:55  
 10 21:45  
 10 21:37  
 10 20:29  
 10 19:54  
 10 19:44  
 10 19:32  
 10 19:13

如果对日志的要求不高，我们可以直接通过Awk和Sed来分析Linux日志(如果对Perl熟练也可以用它来操作)，还可以通过Awstats来进行详细分析，后者尤其适合Web服务器和邮件服务器。另外，如果对日志有特殊需求的话，还可以架设专用的日志服务器来收集Linux服务器日志。总之一句话：一切看需求而定。

THE END

用Shell脚本分析Nginx日志

Leave a Reply Cancel reply

相关文章阅读

Nginx配置最全详解(万字图文总结)

Nginx曝DNS解析器Off-by-One堆写入高危漏洞CVE-2021-23017

2W字文档，带你深入了解Nginx。

Nginx 五大常见应用场景

栏目最新文章

Nginx配置最全详解(万字图文总结)

亚洲诚信HTTPS

站长同款笔记本支架

Nginx配置最全详解(万字图文总结)

Nginx曝DNS解析器Off-by-One堆写入高危漏洞CVE-2021-23017

2W字文档，带你深入了解Nginx。

Nginx 五大常见应用场景

如何通过gzip和nginx来提高网站打开速度及整体性能

Nginx高性能Web服务器：Nginx HTTP负载均衡和反向代理的配置与优化（第六章）

Nginx高性能Web服务器：Nginx与jsp、asp.net、perl的安装与配置（第五章）

Nginx高性能Web服务器：Nginx与PHP（FastCGI）的安装、配置与优化（第四章）

Nginx高性能Web服务器：Nginx的基本配置与优化（第三章）

2、Nginx服务器安装与配置——Nginx高性能Web服务器