Automation via embarrassment
March 6, 2023 at 1:30 PM by Dr. Drang
This morning I was adding an entry to my homemade wiki when I became acutely embarrassed by how convoluted the entry was becoming. I stopped writing, created an automation that made the process I was describing much simpler, and rewrote the entry feeling much better about myself.
My homemade wiki—or personal knowledge management system, as they are often called nowadays—is just a set of Markdown files that gets turned into a static set of web pages. It’s called Hints, and I’ve described it this way:
First, you should know that I have a very specific use for this wiki. It tells me how to do certain things on my computer. These are typically programming hints—techniques and code excerpts that have helped me solve problems that I expect to run into again—but some are little configuration hacks that make using my computer easier. They’re mostly the culmination of long stretches of research and trial-and-error testing, put into my own words with links back to the original sources. Despite my calling it a wiki, there are very few internal links.
The entry I was writing today described how I take a table of items in LaTeX format and turn them into a one-item-per-line list that I can process in a variety of ways that we won’t get into here. Here’s a made-up example, a list of randomly chosen apartment numbers in a 20×3 table:
\begin{table}[htbp]
\begin{center}
\begin{tabular}{
@{\hspace*{5pt}}
ccc
@{\hspace*{5pt}}
}
\toprule
Apts 1--20 & Apts 21--40 & Apts 41--60 \\
\midrule
3603 & 303 & 1704 \\
1701 & 3205 & 3206 \\
203 & 603 & 4001 \\
1107 & 2708 & 1308 \\
601 & 2705 & 808 \\
\addlinespace
701 & 4407 & 902 \\
2001 & 4207 & 1008 \\
1104 & 3303 & 3501 \\
2001 & 4401 & 303 \\
2702 & 405 & 2707 \\
\addlinespace
2403 & 3107 & 4301 \\
3203 & 2407 & 404 \\
903 & 3708 & 3708 \\
4008 & 703 & 4205 \\
3301 & 4402 & 1107 \\
\addlinespace
3805 & 3305 & 3405 \\
4108 & 4102 & 2304 \\
1406 & 4306 & 502 \\
4002 & 3502 & 305 \\
607 & 2607 & 4101 \\
\bottomrule
\end{tabular}
\caption{Random selection of apartments.}
\label{apt-list}
\end{center}
\end{table}
In a report, the table would look like this:
(I use the booktabs package to get this formatting.)
As you can see from the header line, the list is laid out column-by-column, and there’s an order to it that needs to be preserved. My old way to get a one-item-per-line list from this would be to copy the LaTeX, paste it into a new BBEdit document, and do a few obvious edits to turn it into a tab-separated-values (TSV) file. I’d then open the TSV file in Numbers, copy/paste the second and third columns below the first, and save it back out, still as TSV.
This was not an especially onerous process, but writing out the details of each step sure made it seem like one. Also, it made me acutely aware of how often I went through these steps and how often I caught myself making mistakes by doing things in the wrong order. Even though my Hints wiki is only for my own use, I couldn’t bring myself to memorialize such a clumsy procedure.1
So I built a simple Keyboard Maestro macro that does the work for me. Here’s what it looks like:
I select the data in the LaTeX table, run the macro, and my clipboard then has the data in a single column that I can paste into a new BBEdit document. The key step is the shell script, which is a pipeline composed of three steps:
sed -E '/^\\/d;s/ +\& +/\t/g;s/ *\\\\$//' |\
rs -c -T |\
rs 0 1
The sed
command deletes all lines that start with a backslash, changes the space-ampersand-space between each column into a tab, and deletes the trailing space and double backslash. For the example table above, it would output
3603 303 1704
1701 3205 3206
203 603 4001
1107 2708 1308
601 2705 808
701 4407 902
2001 4207 1008
1104 3303 3501
2001 4401 303
2702 405 2707
2403 3107 4301
3203 2407 404
903 3708 3708
4008 703 4205
3301 4402 1107
3805 3305 3405
4108 4102 2304
1406 4306 502
4002 3502 305
607 2607 4101
(I’m showing it here with spaces between the columns to make it look nice. In the macro, it puts tabs between the columns.)
The first rs
command transposes (-T
) the data from column-by-column to row-by-row. The -c
option tells rs
to use tabs as column separators in the input. This turns the 20×3 input into 3×20 output. The second rs
command takes that and reformats it into a single column.2 Keyboard Maestro then puts the single-column output onto the clipboard.
I once read something by one of the early Unix programmers (Dennis Ritchie, I think) who said that an advantage of programmers writing their own man pages was that a full description of the how the code worked (and, in the BUGS section, how it didn’t) was strong incentive to go back and make the code better. No one wanted others to see their ungraceful work. I don’t even want to see it myself.
-
You might be wondering why I didn’t create a one-entry-per-line file right off the bat, at the same time I was creating the LaTeX table. It’s because I didn’t know when I was making these tables that I’d ever have a need for a single-column version. ↩
-
I feel certain there’s a way to do this with a single
rs
command, but I haven’t found it. I keep thinkingrc -c -t 0 1
should work, but it doesn’t. ↩