Creating requests per time graph from nginx or apache access log

2012-10-20 Bertas Linux

To create script that calculates values was simple part, but to create graph was a little bit tricky. But lets start from beginning…

Log file this script analyze look like this:

xx.xxx.xxx.xxx - - [20/Oct/2012:06:25:22 +0300] "GET ... HTTP/1.1" 200 80638 "..." "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0"
xx.xx.xxx.xxx - - [20/Oct/2012:06:25:24 +0300] "GET ... HTTP/1.1" 200 80638 "..." "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C)"
xxx.xx.x.xx - - [20/Oct/2012:06:25:25 +0300] "GET ... HTTP/1.1" 200 81302 "..." "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E; InfoPath.1)"
xx.xx.xx.xx - - [20/Oct/2012:06:25:25 +0300] "GET ... HTTP/1.1" 200 102001 "..." "Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20100101 Firefox/16.0"

In order to calculate requests per time frame we need to have those time intervals. So we take first date from log file and change it to seconds since 1970-01-01 00:00:00 UTC and add 300s (5 mins).

FDATE=$(head -1 $FILE |awk '{print $4}'|sed -e 's/\[//'|sed -e 's/\//-/g'|sed -e 's/:/ /')
FDATE_S=$(date -d "$FDATE" '+%s');
FDATE_T=$((FDATE_S + 300));

Next we read line from log file and check if the line is in this time interval. If so we add 1 to counter. If not then we need to output value for this interval and find next time interval.

while read line
do
    LDATE=$(echo $line|awk '{print $4}'|sed -e 's/\[//'|sed -e 's/\//-/g'|sed -e 's/:/ /')
    LDATE_S=$(date -d "$LDATE" '+%s');
    if (( LDATE_S < FDATE_T )); then
        COUNT=$((COUNT + 1))
    else
        echo "$(date -d @"$((FDATE_T - 300))" '+%Y-%m-%d %H:%M:%S') $COUNT" >>$DATAFILE
        FDATE_T=$((FDATE_T + 300))
        COUNT=1;
    fi
done <$FILE

When the script finishes to read log we will have data file which looks like this:

2012-10-19 10:50:27 14693
2012-10-19 10:55:27 12019
2012-10-19 11:00:27 11409
2012-10-19 11:05:27 12984
2012-10-19 11:10:27 12087
2012-10-19 11:15:27 11161

Now we need to create histogram from this data.
The hard part there is to understand how to write plot command to gnuplot.

plot "$DATAFILE" using 1:3

In data file we have 3 columns, date, time and requests, but to gnuplot we tell that it should use first one, and third. It automatically takes second one (this was hard to find and understand).

Complete script:

#!/bin/bash
 
FILE=$1
 
FDATE=$(head -1 $FILE |awk '{print $4}'|sed -e 's/\[//'|sed -e 's/\//-/g'|sed -e 's/:/ /')
FDATE_S=$(date -d "$FDATE" '+%s');
FDATE_T=$((FDATE_S + 300));
COUNT=0
 
DATAFILE=$(mktemp)
RESULTFILE="result-"$(date -d "$FDATE" '+%Y-%m-%d')".png"
 
while read line
do
    LDATE=$(echo $line|awk '{print $4}'|sed -e 's/\[//'|sed -e 's/\//-/g'|sed -e 's/:/ /')
    LDATE_S=$(date -d "$LDATE" '+%s');
    if (( LDATE_S < FDATE_T )); then
        COUNT=$((COUNT + 1))
    else
        echo "$(date -d @"$((FDATE_T - 300))" '+%Y-%m-%d %H:%M:%S') $COUNT" >>$DATAFILE
        FDATE_T=$((FDATE_T + 300))
        COUNT=1;
    fi
done <$FILE
 
gnuplot << EOF
reset
set xdata time
set timefmt "%Y-%m-%d %H:%M:%S"
set format x "%H:%M"
set autoscale
set ytics
set grid y
set auto y
set term png truecolor
set output "$RESULTFILE"
set xlabel "Time"
set ylabel "Request per 5min"
set grid
set boxwidth 0.95 relative
set style fill transparent solid 0.5 noborder
plot "$DATAFILE" using 1:3 w boxes lc rgb "green" notitle
EOF
 
rm -f $DATAFILE

And graph it creates:


Leave a Reply

Powered by WordPress. Designed by elogi.