Categories
mostly pointless. WordPress

An adventure with a super useless one-liner to find the most common words in WordPress commit messages

I read some insight into Drupal committing and they had a chart of the most common words in drupal commit messages. I thought it would be interesting to do something like that with WordPress Core, so I through together a bash one-liner to find this. It’s not the most eloquent solution, but it answers the question that I had. Here is what I initially came up with.

svn log http://develop.svn.wordpress.org/trunk -rHEAD:1 -v --xml | xq '.log.logentry | .[].msg' | sed 's/.$//' | sed 's/^.//' | sed 's/\\n/ /g' |  tr ' \t' '\n' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | head -n 25

Let’s walk through this since there is enough piping going on, that it may not be the easiest to follow.

svn log http://develop.svn.wordpress.org/trunk -rHEAD:1 -v --xml

I start by getting an xml version of the SVN history, starting at the first changeset and going until the current head.

xq '.log.logentry | .[].msg'

Next, I use xq which takes xml and allows me to run jq commands on it. It’s a handy tool if you ever need to use xml data on the command line. In this case, I am taking what is inside <log><logentry> and then for each sub element, extracting the msg. At this point, the messages are on a single line wrapped in quotation marks with \n to signify newlines. So I run three seds to fix that up.

 sed 's/.$//' | sed 's/^.//' | sed 's/\\n/ /g'

I’m sure there is a better way to do this, but the first one removes the last character, the next one removes the first character, and the last one converts new lines to spaces. Since words are what we are aiming to look at, we need to get all the words onto their own lines.

 tr ' \t' '\n'

tr is a powerful program for doing transforms of text. In this case, I am taking whitespace and turning it into actual newlines (rather than just the new line charachters). There is likely a more elegant way to have solved this, but my goal isn’t the best solution it’s the working one.

tr '[:upper:]' '[:lower:]'

Word and word are not equal, so we need to make everything a single case. In this case, I am again using tr, but now I am transforming upper case characters to lowercase.

sort | uniq -c | sort -nr | head -n 25

Counting things on the command line is something I have done so many times, I have an alias for a version of this. Sort puts everything in alphabetical order, uniq -c then counts how many uniq values there are and outputs it along with how many of each it counted. uniq requires things common things to be in adjacent lines, hence the initial sort. Next up, we want to sort based on the number and we want high numbers first. Finally, we output the top 25.

 28997 the
 27463 
 20429 fixes
 17844 to
 17818 props
 15251 for
 15189 in
 14441 see
 10856 and
 10272 a
 7549 of
 5594 is
 5227 when
 5133 add
 4444 from
 4143 fix
 3847 *
 3821 on
 3489 use
 3320 that
 3267 this
 3064 with
 3043 remove
 2983 be
 2766 as 

That’s not super helpful. The isn’t my idea of interesting. So I guess I need to remove useless words. Since I have groff on this machine, I can use that and fgrep

 fgrep -v -w -f /usr/share/groff/1.19.2/eign

I also noticed that the second most common word is whitespace. Remember when we used to put two spaces between sentences? WordPress Core commit messages remember. So let’s add another sed command to the chain:

sed '/^$/d'

And now the final command to see the 25 most used words in WordPress Core Commit messages:

svn log http://develop.svn.wordpress.org/trunk -rHEAD:1 -v --xml | xq '.log.logentry | .[].msg' | sed 's/.$//' | sed 's/^.//' | sed 's/\\n/ /g' | tr '[:upper:]' '[:lower:]' | tr ' \t' '\n' | fgrep -v -w -f /usr/share/groff/1.19.2/eign | sed '/^$/d' | sort | uniq -c | sort -nr | head -n 25

And since you’ve made it this far, here is the list

 20429 fixes
 17818 props
 15189 in
 5594 is
 5133 add
 4143 fix
 3847 *
 3320 that
 3267 this
 3064 with
 3043 remove
 2766 as
 2435 an
 2432 it
 2109 post
 2103 if
 2080 are
 1889 don't
 1793 update
 1735 -
 1688 twenty
 1523 more
 1500 make
 1471 docs:
 1416 some 

Have an idea for another way to do this with the command line? I would love to hear it.

Categories
Code

New BASH Prompt

For the last few years, I’ve been using impromptu for setting my bash prompt. However, it felt in perpetual beta status and I wanted to try out something new, so today I installed bash-git-prompt and am giving it a try.

It was super easy to get started with it following the instructions.

  • Run brew update
  • Run brew install bash-git-prompt for the last stable release or brew install --HEAD bash-git-prompt for the latest version directly from the repository
  • Now you can source the file in your ~/.bash_profile as follows:
if [ -f "$(brew --prefix)/opt/bash-git-prompt/share/gitprompt.sh" ]; then
  __GIT_PROMPT_DIR=$(brew --prefix)/opt/bash-git-prompt/share
  source "$(brew --prefix)/opt/bash-git-prompt/share/gitprompt.sh"
fi

So far, it seems like something that should do the job. If not, I’ll look for a new solution. Let me know in the comments how you setup and manage your prompt.

Categories
Code Programming

Bash Functions I use for access logs

The command line is my IDE.  Vim is my editor and all the functions and programs in bash help me be a better developer.  As much time as I am writing code though, I also am often spending a lot of time looking through logs to see what is going on.  Over the last five to ten years I’ve collected a number of bash functions to help make working with access log files easier. Some are stolen from my old coworker Drew, others from various places online, and others I’ve cobbled together.

function fawk {
    first="awk '{print "
    last="}'"
    cmd="${first}\$${1}${last}"
    eval $cmd
}

This simple function allows me to pull one whitespace broken element out. Imagine an access file filled with lines like:

172.16.194.2 - - [13/Jan/2017:14:55:31 -0500] "GET /infidelity-husband-had-affair/ HTTP/1.1" 200 20685 "https://www.google.com/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36" "1.1.1.1, 2.2.2.2, 127.0.0.1, 127.0.0.1"

I can run cat access.log | fawk 7 to pull out the urls. I can further pipe that to sort | uniq -c | sort -nr | head to pull out the most popular urls. I also have a function for visualizing these results.

function histogram {
 UNIT=$1
 if [ -z "$UNIT" ]; then
 UNIT="1";
 fi

 first="sort|uniq -c|awk '{printf(\"\n%s \", \$0); for (i =0; i<\$1; i=i+"
 last=") {printf(\"#\")};}'; echo \"\""
 cmd="${first}${UNIT}${last}"
 eval $cmd
}

For example, If I want to see all the response codes, of the last 500 responses I can do something like

tail -n 500 nginx/access.log | fawk 9 | histogram 10

 466 200 ###############################################
 8 301 #
 5 302 #
 1 304 #
 18 404 ##
 2 429 #

I often want to look at more than one access log at a time, but they are gzipped to save space after rotating. I have a function to cat or zcat all of them.

# cat or zcat all the access logs in a folder 
# Pass in folder to search in as the only param
# Likely want > into another file for further use
access_concat(){
	find $1 -name "acc*" -not -name "*.gz" -exec cat '{}' \;
	find $1 -name "acc*" -name "*.gz" -exec zcat '{}' \;
}

When it comes to working across many servers, I still rely on dancers shell in concert with these functions.  The occasional access log spelunking is much easier with these tools.

Categories
Code Programming Uncategorized

Aggregate Multiple Log Files From Multiple Servers

The majority of the time I need to analyze logs across multiple servers, I use logstash.  Sometimes though I want to aggregate the actual files on the server and go through them myself with awk and grep.  For doing that, I use two tools.

  1. In my bash config, I have a function called access_concat that reads out regular and gzipped access logs.
    access_concat(){
        find $1 -name "acc*" -not -name "*.gz" -exec cat '{}' \;
        find $1 -name "acc*" -name "*.gz" -exec zcat '{}' \;
    }

    I can pass a path that log files are stored in and it will search them to find the files I actually want.

  2. Dancer’s Shell (or DSH) makes it easy for me to run a command across multiple servers.

Combining these two, I can run:  dsh -M -c -g prd-wp 'access_concat /logs >> ~/oct22.logs' to concatenate all of the log files that exist on the server today. I then just need to scp down oct22.logs and I can easily run my analysis locally.

Note that to do this, you need to configure dsh so that the servers you want to access are in the prd-wp group (or better yet, the logical name for whatever you are working on).

Categories
Code Uncategorized

Always check your diffs

One habit that I’ve gotten into that has saved me from looking like an idiot nearly as often is to always look at a diff before I commit. I don’t do this as much in Mercurial, but with SVN when you do automatic deployments to testing servers, a stray alert in your javascript or var_dump in your php can screw up other people’s work. I wrote a small bash script to make it easier for me to check my diffs. Feel free to make it your own. If you have any suggestions for improving it, I’m always looking for ways to improve my dev process. I call it difff for:

Differentiate
Improvements
From
F***ing
Failures

[bash]
difff() {
svn diff > ~/diff.diff
vim ~/diff.diff
}
[/bash]

EDIT: Make sure to check out the comments below to see Jon Cave’s take on this.

Categories
Code Uncategorized WordPress

PHPXref for WordPress, BuddyPress, bbPress and Thematic: Local and Updated

I often am working on a variety of WordPress, BuddyPress, and BBPress projects and enjoy working from a lot of not internet connected locations such as trains, buses, parks and the occasional rooftop. As such it’s very handy for me to a local version of PHPXref documents for the current trunk of these products. I’ve written a handy bash script to handle this for me.

[bash]
#!/bin/bash

# define locations
PHPXREFLOCATION=’/full/path/to/phpxref/folder/with/no/trailing/slash’

# check our internet connection
wget -q –tries=10 –timeout=5 http://core.svn.wordpress.org/trunk/ -O /tmp/wordpress.svn &> /dev/null
if [ ! -s /tmp/wordpress.svn ];then

# Remove old version
rm -r $PHPXREFLOCATION/output/*

# SVN up each piece
cd $PHPXREFLOCATION/source/thematic
svn up
cd $PHPXREFLOCATION/source/wordpress
svn up
cd $PHPXREFLOCATION/source/bbpress
svn up
cd $PHPXREFLOCATION/source/buddypress
svn up
cd $PHPXREFLOCATION/

# Build our new phpxref
perl phpxref.pl

#remove the tmp file so it’s not there for next time
rm /tmp/wordpress.svn

fi
[/bash]

I have a cron set up to run this at a few intervals that are least likely to bother me (currently 5am and 4pm). I hope this script helps you out. If you have a similar or better one, please let me know.

Categories
Code Uncategorized

A Basic Database and Content Backup Script

I presented an unconference session at the great WordCamp Portland and have cleaned up my backup script for WordPress (and really any db/single uploading/plugin/theme folder program). Comment here if you have any questions about how it works or suggestions for how to improve it. Like WordPress, it’s available under the GPL v2
[bash]#! /bin/sh

# config
DBPASSWORD=’password’
DBTABLE=’DB’
DBUSER=’user’
CONTENTPATH=’path/to/wp-content’
DESTINATION=’/destination/for/backups/’
# end config

DATE=`date +%Y.%m.%d.%k.%m | tr " " _`
echo
echo Licensed under the GPL version 2 http://www.gnu.org/licenses/gpl-2.0.html
echo Copyright 2009 Aaron Jorbin http://aaron.jorb.in
echo
echo $DATE start backup with mysqldump
mysqldump $DBTABLE –add-drop-table -h localhost -u $DBUSER –password=$DBPASSWORD –single-transaction > db.$DATE
echo mysqldump done, gziping it
tar cfz db.$DATE.tar.gz db.$DATE
rm db.$DATE
echo Database backup gzipped, Tarring up our WordPress folder
tar cfz mu.$DATE.tar.gz -C / $CONTENTPATH
echo copying files to backup folder
mv db.$DATE.tar.gz $DESTINATION/
mv mu.$DATE.tar.gz $DESTINATION/
echo Cleaning Up
#clean up files that are sitting there
find $DESTINATION -type f -mtime +14 -exec rm {} \;
echo Thank you for backing up WordPress.
echo —
echo The Poetic Code
echo Options, Comments, Posts and more
echo Safer then before[/bash]