mostly pointless. WordPress

An adventure with a super useless one-liner to find the most common words in WordPress commit messages

I read some insight into Drupal committing and they had a chart of the most common words in drupal commit messages. I thought it would be interesting to do something like that with WordPress Core, so I through together a bash one-liner to find this. It’s not the most eloquent solution, but it answers the question that I had. Here is what I initially came up with.

svn log -rHEAD:1 -v --xml | xq '.log.logentry | .[].msg' | sed 's/.$//' | sed 's/^.//' | sed 's/\\n/ /g' |  tr ' \t' '\n' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | head -n 25

Let’s walk through this since there is enough piping going on, that it may not be the easiest to follow.

svn log -rHEAD:1 -v --xml

I start by getting an xml version of the SVN history, starting at the first changeset and going until the current head.

xq '.log.logentry | .[].msg'

Next, I use xq which takes xml and allows me to run jq commands on it. It’s a handy tool if you ever need to use xml data on the command line. In this case, I am taking what is inside <log><logentry> and then for each sub element, extracting the msg. At this point, the messages are on a single line wrapped in quotation marks with \n to signify newlines. So I run three seds to fix that up.

 sed 's/.$//' | sed 's/^.//' | sed 's/\\n/ /g'

I’m sure there is a better way to do this, but the first one removes the last character, the next one removes the first character, and the last one converts new lines to spaces. Since words are what we are aiming to look at, we need to get all the words onto their own lines.

 tr ' \t' '\n'

tr is a powerful program for doing transforms of text. In this case, I am taking whitespace and turning it into actual newlines (rather than just the new line charachters). There is likely a more elegant way to have solved this, but my goal isn’t the best solution it’s the working one.

tr '[:upper:]' '[:lower:]'

Word and word are not equal, so we need to make everything a single case. In this case, I am again using tr, but now I am transforming upper case characters to lowercase.

sort | uniq -c | sort -nr | head -n 25

Counting things on the command line is something I have done so many times, I have an alias for a version of this. Sort puts everything in alphabetical order, uniq -c then counts how many uniq values there are and outputs it along with how many of each it counted. uniq requires things common things to be in adjacent lines, hence the initial sort. Next up, we want to sort based on the number and we want high numbers first. Finally, we output the top 25.

 28997 the
 20429 fixes
 17844 to
 17818 props
 15251 for
 15189 in
 14441 see
 10856 and
 10272 a
 7549 of
 5594 is
 5227 when
 5133 add
 4444 from
 4143 fix
 3847 *
 3821 on
 3489 use
 3320 that
 3267 this
 3064 with
 3043 remove
 2983 be
 2766 as 

That’s not super helpful. The isn’t my idea of interesting. So I guess I need to remove useless words. Since I have groff on this machine, I can use that and fgrep

 fgrep -v -w -f /usr/share/groff/1.19.2/eign

I also noticed that the second most common word is whitespace. Remember when we used to put two spaces between sentences? WordPress Core commit messages remember. So let’s add another sed command to the chain:

sed '/^$/d'

And now the final command to see the 25 most used words in WordPress Core Commit messages:

svn log -rHEAD:1 -v --xml | xq '.log.logentry | .[].msg' | sed 's/.$//' | sed 's/^.//' | sed 's/\\n/ /g' | tr '[:upper:]' '[:lower:]' | tr ' \t' '\n' | fgrep -v -w -f /usr/share/groff/1.19.2/eign | sed '/^$/d' | sort | uniq -c | sort -nr | head -n 25

And since you’ve made it this far, here is the list

 20429 fixes
 17818 props
 15189 in
 5594 is
 5133 add
 4143 fix
 3847 *
 3320 that
 3267 this
 3064 with
 3043 remove
 2766 as
 2435 an
 2432 it
 2109 post
 2103 if
 2080 are
 1889 don't
 1793 update
 1735 -
 1688 twenty
 1523 more
 1500 make
 1471 docs:
 1416 some 

Have an idea for another way to do this with the command line? I would love to hear it.

Art Code Four Short Things WordPress

Four Short Things – 23 February 2019

Inspired by O’reilly’s Four Short Links, here are some of the things I’ve seen, read, or watched recently.

Leukemia has won

WordPress has allowed me the opportunity to meet hundreds of people first online and then offline, but Alex “Viper007Bond” was the first. When I first started getting involved in WordPress, I spent many late nights in the IRC #wordpress channel on freenode, at first seeking help but then providing it. Viper was commonly there helping others and likely answered more than a few questions of mine as well. He’s been publicly battling leukemia for 2.5 years. His blog is a great tale of the ups and downs of cancer. Alex and those that care about him are in my thoughts right now.

Kevin Beasley: A view of a Landscape

On view at the Whitney until 10 March, this exhibit on the top floor is one encompassing sound and visuals. Featuring the motor from a cotton gin and giant sculptures with Cotton, it explores race, history and the evolution of America.

Writing CSS Algorithms

Lara has done more to change my opinions on CSS than anyone else. This post is a companion piece to a talk she gave at WordCamp US and one that everyone web developer should read.

Pento hits 1000 Commits

13 people have made over 1000 commits to WordPress core over the past nearly 16 years. Gary Pendergast joined the club during the 5.1 release. Overall, there have been 44767 commits so Gary’s count only represents 2.2% of the total activity.

Four Short Things is a series where I post a small collection of links to art, news, articles, videos and other things that are me. Follow my RSS feed to see Four Short Things whenver it comes out.

About Aaron Uncategorized WordPress

Five Years of Contributing to WordPress Core

Five years ago today, my first patch was accepted to WordPress core. Oh how the time has flown.

This last year has been one of my most exciting years as a part of the WordPress contributor community. At the end of September, I was given commit access to WordPress Core. I was excited to join a group that includes some of the smartest people I know, while also being terrified at the responsibility being handed to me. It’s been fun so far.

One of the coolest things in WordPress core that I worked on this year is the “Log Out of Other Sessions” button on the bottom of users profile screens. This seems like a simple button, but adding this iteration (which was a part of the 4.1 release) was the result of live user testing I organized as a part WordCamp San Francisco 2014.

In celebration, I made two commits to core today. One of them was to start user testing WordPress with PHP 7. I’m excited to see how we perform vs. the nightly builds there. The other introduced a new version of grunt-patch-wordpress which is one of my favorite parts of WordPress that I’ve been able to spearhead.

I’m lucky to share my committiversary with my partner who got her first props one year ago. I’m even more lucky that WordPress helped me meet her.

Five years of contributing is a long time. I’m especially happy that five years in, I’m more excited than ever to help build the software that powers so much of the web. Here is to another five years!

Programming Uncategorized WordPress

Commit: The Story of Writing a WordPress Patch

Hanging out in the #WordPress irc channel or on the wp-hackers mailing list, a question that comes up from time to time is “How do I get a bug patched”. I recently had a patch committed, so I thought I would detail the process from start to finish to help others get an idea of the process. I can’t guarantee that others will have the same experience, or that even I will have the same experience next time, but this was how I had my first substantial patch committed to WordPress.

The process for me started by seeing a post by Jane Wells talking about a few UX enhancements she wanted to see handled during the recent patch sprint. One that I noticed hadn’t received any attention was Showing the status of an admin attempts an e-mail change under the new multisite configuration. I took a quick look at the relevent code and figured this was something I could patch.

Lesson 1: Make it easy for coders and non-coders to see the change

After I wrote my first iteration of a patch, I hopped into #wordpress-dev and Andrew Nacin recommended that I add a screenshot to make it easier for Jane to see the change.  I couldn’t agree more that this was a great idea. After all, why should you need to apply my patch & test it, or understand the code behind it just to comment on it.

Lesson 2: Just because you write the patch, doesn’t mean others don’t have good ideas for it

I then waited till I was in #wordpress-dev at the same time as Jane and brought up the ticket. This led to a conversation between Jeremy Clarke, jane and myself about the best way to let users know. During this time I went thought a few other iterations and shared those on the ticket. We didn’t come to a firm conclusion, and the next morning Nacin commented with a suggestion on the ticket.

Lesson 3: Sometimes one small change leads to another

The next day I once again headed to #wordpress-dev where Jane, Nacin, and I took another stab at it and decided that an inline warning box would give the proper notification without being distracting. While writing the final version of the patch I noticed that all warning boxes automatically moved to the top of the screen. Nacin took ownership of fixing this, made a few minor changes to my patch and committed changeset #13446.  Now not only will users be able to see pending admin e-mail changes, but developers can use the existing UI warnings inline.

Overall, the process to fix this UX bug took four people and multiple iterations.  I want to thank all three others for assisting me. While there are times that errors make it into the released version of WordPress, I hope this story gives you the idea of the effort that the core development team takes to make sure only the highest quality code and user experience for users.