Categories
Code Programming

Bash Functions I use for access logs

The command line is my IDE.  Vim is my editor and all the functions and programs in bash help me be a better developer.  As much time as I am writing code though, I also am often spending a lot of time looking through logs to see what is going on.  Over the last five to ten years I’ve collected a number of bash functions to help make working with access log files easier. Some are stolen from my old coworker Drew, others from various places online, and others I’ve cobbled together.

function fawk {
    first="awk '{print "
    last="}'"
    cmd="${first}\$${1}${last}"
    eval $cmd
}

This simple function allows me to pull one whitespace broken element out. Imagine an access file filled with lines like:

172.16.194.2 - - [13/Jan/2017:14:55:31 -0500] "GET /infidelity-husband-had-affair/ HTTP/1.1" 200 20685 "https://www.google.com/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36" "1.1.1.1, 2.2.2.2, 127.0.0.1, 127.0.0.1"

I can run cat access.log | fawk 7 to pull out the urls. I can further pipe that to sort | uniq -c | sort -nr | head to pull out the most popular urls. I also have a function for visualizing these results.

function histogram {
 UNIT=$1
 if [ -z "$UNIT" ]; then
 UNIT="1";
 fi

 first="sort|uniq -c|awk '{printf(\"\n%s \", \$0); for (i =0; i<\$1; i=i+"
 last=") {printf(\"#\")};}'; echo \"\""
 cmd="${first}${UNIT}${last}"
 eval $cmd
}

For example, If I want to see all the response codes, of the last 500 responses I can do something like

tail -n 500 nginx/access.log | fawk 9 | histogram 10

 466 200 ###############################################
 8 301 #
 5 302 #
 1 304 #
 18 404 ##
 2 429 #

I often want to look at more than one access log at a time, but they are gzipped to save space after rotating. I have a function to cat or zcat all of them.

# cat or zcat all the access logs in a folder 
# Pass in folder to search in as the only param
# Likely want > into another file for further use
access_concat(){
	find $1 -name "acc*" -not -name "*.gz" -exec cat '{}' \;
	find $1 -name "acc*" -name "*.gz" -exec zcat '{}' \;
}

When it comes to working across many servers, I still rely on dancers shell in concert with these functions.  The occasional access log spelunking is much easier with these tools.

Categories
Code Programming Uncategorized

Aggregate Multiple Log Files From Multiple Servers

The majority of the time I need to analyze logs across multiple servers, I use logstash.  Sometimes though I want to aggregate the actual files on the server and go through them myself with awk and grep.  For doing that, I use two tools.

  1. In my bash config, I have a function called access_concat that reads out regular and gzipped access logs.
    access_concat(){
        find $1 -name "acc*" -not -name "*.gz" -exec cat '{}' \;
        find $1 -name "acc*" -name "*.gz" -exec zcat '{}' \;
    }

    I can pass a path that log files are stored in and it will search them to find the files I actually want.

  2. Dancer’s Shell (or DSH) makes it easy for me to run a command across multiple servers.

Combining these two, I can run:  dsh -M -c -g prd-wp 'access_concat /logs >> ~/oct22.logs' to concatenate all of the log files that exist on the server today. I then just need to scp down oct22.logs and I can easily run my analysis locally.

Note that to do this, you need to configure dsh so that the servers you want to access are in the prd-wp group (or better yet, the logical name for whatever you are working on).