# Unix text tools to the rescue

Yesterday at work, I wrote a report summarizing a bunch of tests I’ve been running for the past month or so. There were several tables of results, including one that listed, among other things, all of the parts and the temperatures at which they were tested. The table had a few dozen rows and five or six columns, but the pertinent portion of it looked something like this, with the rows sorted by part number:

| Part no. | Temperature |
|:--------:|:-----------:|
|   1105   |     150     |
|   1308   |     150     |
|   1847   |      RT     |
|   2505   |      RT     |
|   2876   |     150     |
|   3289   |      RT     |
|   3527   |     150     |
|   4010   |     200     |
|   4087   |      RT     |
|   4689   |      RT     |
|   4871   |     200     |
|   5067   |     150     |
|   5886   |      RT     |
|   6544   |     150     |
|   6957   |     150     |
|   7174   |      RT     |
|   8035   |     200     |
|   8294   |     150     |
|   8322   |      RT     |
|   8434   |     150     |


As I’ve mentioned before, I write my reports in (Multi)Markdown, which I then convert to a PDF that I can print or, more commonly nowadays, email to the client.

I didn’t send the report to the client yesterday, because I wanted to let it “rest” overnight and reread it with fresh eyes this morning.1 Sure enough, when I looked at it this morning, I decided it needed a new paragraph in which I listed all the parts that were tested at room temperature, all those tested at 150°, and all those tested at 200°.

There are lots of ways to pull those part numbers out of the table. For example:

1. I could go through the table line by line and copy all the “RT” part numbers and paste them into a list, and then repeat that for “150” and “200.” This is both tedious and subject to error.
2. I could transform the table into tab-separated values, paste that into a spreadsheet, sort by temperature, and then copy out the rows of part numbers corresponding to each temperature. The hardest part of this is the transformation into tab-separated values, which I’d probably do with a regular expression.
3. I could do it pretty much all at once by copying the table to the clipboard and piping it through an awk one-liner.

I chose door number 3.

Here’s one-liner:

pbpaste | awk 'BEGIN {ORS=", "} / RT / {print \$2}'


which gives an output of

1847, 2505, 3289, 4087, 4689, 5886, 7174, 8322,


Now it’s true that I needed to delete the trailing comma and stick an “and” before the last item, but I’d have to do that with the other two methods, too. Once I had the room temperature part numbers, I repeated the command with “150” in place of “RT” and then with “200” in place of “RT.”

The pbpaste command is a OS X-specific utility that sends the contents of the clipboard to standard input output.2 The awk script, between the single quotes, has two parts:

• The BEGIN clause sets the output record separator (ORS), which is a newline by default, to a comma-space pair. This clause runs before any input is read.
• The second clause looks for lines that contain RT surrounded by spaces and prints the second field of those lines. Awk’s default is to consider whitespace as the field delimiter, so the part numbers are in the second field.

Honestly, even if the spreadsheet method were faster, I’d still prefer to do this on the command line. It’s just more fun.

1. It should be clear from the typos and missing/extra words that I don’t do this with blog posts.

2. Thanks to Vitor Galvão for catching this. I got ahead of myself—it’s the pipe that turns it into standard input.