Automation via embarrassment

This morning I was adding an entry to my homemade wiki when I became acutely embarrassed by how convoluted the entry was becoming. I stopped writing, created an automation that made the process I was describing much simpler, and rewrote the entry feeling much better about myself.

My homemade wiki—or personal knowledge management system, as they are often called nowadays—is just a set of Markdown files that gets turned into a static set of web pages. It’s called Hints, and I’ve described it this way:

First, you should know that I have a very specific use for this wiki. It tells me how to do certain things on my computer. These are typically programming hints—techniques and code excerpts that have helped me solve problems that I expect to run into again—but some are little configuration hacks that make using my computer easier. They’re mostly the culmination of long stretches of research and trial-and-error testing, put into my own words with links back to the original sources. Despite my calling it a wiki, there are very few internal links.

The entry I was writing today described how I take a table of items in LaTeX format and turn them into a one-item-per-line list that I can process in a variety of ways that we won’t get into here. Here’s a made-up example, a list of randomly chosen apartment numbers in a 20×3 table:

\begin{table}[htbp]
\begin{center}
\begin{tabular}{
    @{\hspace*{5pt}}
    ccc
    @{\hspace*{5pt}}
}
\toprule
Apts 1--20 & Apts 21--40 & Apts 41--60 \\
\midrule
3603 & 303 & 1704 \\
1701 & 3205 & 3206 \\
203 & 603 & 4001 \\
1107 & 2708 & 1308 \\
601 & 2705 & 808 \\
\addlinespace
701 & 4407 & 902 \\
2001 & 4207 & 1008 \\
1104 & 3303 & 3501 \\
2001 & 4401 & 303 \\
2702 & 405 & 2707 \\
\addlinespace
2403 & 3107 & 4301 \\
3203 & 2407 & 404 \\
903 & 3708 & 3708 \\
4008 & 703 & 4205 \\
3301 & 4402 & 1107 \\
\addlinespace
3805 & 3305 & 3405 \\
4108 & 4102 & 2304 \\
1406 & 4306 & 502 \\
4002 & 3502 & 305 \\
607 & 2607 & 4101 \\
\bottomrule
\end{tabular}
\caption{Random selection of apartments.}
\label{apt-list}
\end{center}
\end{table}

In a report, the table would look like this:

Sample table output

(I use the booktabs package to get this formatting.)

As you can see from the header line, the list is laid out column-by-column, and there’s an order to it that needs to be preserved. My old way to get a one-item-per-line list from this would be to copy the LaTeX, paste it into a new BBEdit document, and do a few obvious edits to turn it into a tab-separated-values (TSV) file. I’d then open the TSV file in Numbers, copy/paste the second and third columns below the first, and save it back out, still as TSV.

This was not an especially onerous process, but writing out the details of each step sure made it seem like one. Also, it made me acutely aware of how often I went through these steps and how often I caught myself making mistakes by doing things in the wrong order. Even though my Hints wiki is only for my own use, I couldn’t bring myself to memorialize such a clumsy procedure.1

So I built a simple Keyboard Maestro macro that does the work for me. Here’s what it looks like:

KM LaTeX Table to TSV macro

I select the data in the LaTeX table, run the macro, and my clipboard then has the data in a single column that I can paste into a new BBEdit document. The key step is the shell script, which is a pipeline composed of three steps:

sed -E '/^\\/d;s/ +\& +/\t/g;s/ *\\\\$//' |\
rs -c -T |\
rs 0 1

The sed command deletes all lines that start with a backslash, changes the space-ampersand-space between each column into a tab, and deletes the trailing space and double backslash. For the example table above, it would output

3603    303     1704
1701    3205    3206
203     603     4001
1107    2708    1308
601     2705    808
701     4407    902
2001    4207    1008
1104    3303    3501
2001    4401    303
2702    405     2707
2403    3107    4301
3203    2407    404
903     3708    3708
4008    703     4205
3301    4402    1107
3805    3305    3405
4108    4102    2304
1406    4306    502
4002    3502    305
607     2607    4101

(I’m showing it here with spaces between the columns to make it look nice. In the macro, it puts tabs between the columns.)

The first rs command transposes (-T) the data from column-by-column to row-by-row. The -c option tells rs to use tabs as column separators in the input. This turns the 20×3 input into 3×20 output. The second rs command takes that and reformats it into a single column.2 Keyboard Maestro then puts the single-column output onto the clipboard.

I once read something by one of the early Unix programmers (Dennis Ritchie, I think) who said that an advantage of programmers writing their own man pages was that a full description of the how the code worked (and, in the BUGS section, how it didn’t) was strong incentive to go back and make the code better. No one wanted others to see their ungraceful work. I don’t even want to see it myself.


  1. You might be wondering why I didn’t create a one-entry-per-line file right off the bat, at the same time I was creating the LaTeX table. It’s because I didn’t know when I was making these tables that I’d ever have a need for a single-column version. 

  2. I feel certain there’s a way to do this with a single rs command, but I haven’t found it. I keep thinking rc -c -t 0 1 should work, but it doesn’t.