Reshaping text

It’s not often that I run into a new (to me) Unix command. But when I had a problem yesterday, I came across a simple solution using the rs command, which I’d never heard of before. Maybe you’ll find it useful, too.

As I mentioned in a post last year, I often have deal with lists of alphanumeric strings. Usually, these are serial numbers, but the lists I was dealing with yesterday were lists of apartment numbers in a high-rise building. After various manipulations and comparisons, I had two lists that I wanted to send to a client in an email so we could discuss them. I could have put the lists into a spreadsheet, but that would mean the client would have to open the email and then open the attachment. If you knew the level of computer literacy my clients have, you would understand why I wanted to avoid that and just have the lists in the body of the email.

The problem was that the lists were long. One had 84 items and the other had 30. Pasting them into the message in the form I had them—one item per line—would take up too much space. Pasting them in as a comma-separated series—which I could convert them to with a simple search-and-replace—would be very hard to read. What I wanted was nicely formatted set of rows and columns that would be easy to read without taking up too much space. But I couldn’t think of a quick, automated way to put the list into that form.1

After some Googling, I came across this Ask Ubuntu question, which led me to rs, an old BSD command that comes installed on macOS. The name stands for “reshape,” and it will take any row/column set of data and rearrange it into a different number of rows and columns. Because my lists were a special case—just one column each—they fit the input requirements.

With my list of 84 items on the clipboard, I tried the simplest form of the command,

pbpaste | rs 14

and got this output:

203   204   207   215   302   306
1502  1507  1509  1510  1512  1515
1610  1700  1701  1704  1705  1706
1710  1801  1803  1806  1809  1815
1903  1905  2000  2001  2002  2004
2009  2101  2102  2103  2105  2106
2108  2112  2201  2203  2208  2209
2302  2306  2307  2309  2406  2407
2503  2504  2506  2509  2515  2606
3003  3007  3008  3009  3100  3201
3202  3205  3208  3300  3301  3415
3500  3505  3506  3600  3606  3700
3706  3715  3807  3815  3900  3907
4009  4111  4115  4203  4306  4308

The first argument to rs is the number of rows in the reshaped output. Since 84 is the product of 6 and 14, I got 14 rows of 6 items each. This was fine, but I wanted a little more room between the columns. In rs parlance, the intercolumn space is called the gutter, and you set the size of the gutter with the -g option. So

pbpaste | rs -g5 14

gave me

203      204      207      215      302      306
1502     1507     1509     1510     1512     1515
1610     1700     1701     1704     1705     1706
1710     1801     1803     1806     1809     1815
1903     1905     2000     2001     2002     2004
2009     2101     2102     2103     2105     2106
2108     2112     2201     2203     2208     2209
2302     2306     2307     2309     2406     2407
2503     2504     2506     2509     2515     2606
3003     3007     3008     3009     3100     3201
3202     3205     3208     3300     3301     3415
3500     3505     3506     3600     3606     3700
3706     3715     3807     3815     3900     3907
4009     4111     4115     4203     4306     4308

Note that the number used for the gutter size has to come immediately after the -g. You can’t leave a space between the option and its argument as you can with many other commands.

Again, this was nice, but I thought it would be better if the list were laid out column-by-column instead of row-by-row. That called for the -t (transpose) option,

pbpaste | rs -g5 -t 14

which gave me

203      1701     2002     2302     3008     3606
204      1704     2004     2306     3009     3700
207      1705     2009     2307     3100     3706
215      1706     2101     2309     3201     3715
302      1710     2102     2406     3202     3807
306      1801     2103     2407     3205     3815
1502     1803     2105     2503     3208     3900
1507     1806     2106     2504     3300     3907
1509     1809     2108     2506     3301     4009
1510     1815     2112     2509     3415     4111
1512     1903     2201     2515     3500     4115
1515     1905     2203     2606     3505     4203
1610     2000     2208     3003     3506     4306
1700     2001     2209     3007     3600     4308

Finally, I wanted the three-digit numbers properly aligned. You right-justify the items within their columns using the -j option,

pbpaste | rs -g5 -tj 14

which prints out

 203     1701     2002     2302     3008     3606
 204     1704     2004     2306     3009     3700
 207     1705     2009     2307     3100     3706
 215     1706     2101     2309     3201     3715
 302     1710     2102     2406     3202     3807
 306     1801     2103     2407     3205     3815
1502     1803     2105     2503     3208     3900
1507     1806     2106     2504     3300     3907
1509     1809     2108     2506     3301     4009
1510     1815     2112     2509     3415     4111
1512     1903     2201     2515     3500     4115
1515     1905     2203     2606     3505     4203
1610     2000     2208     3003     3506     4306
1700     2001     2209     3007     3600     4308

I did a similar thing with the list of 30 and then sent off a nice, compact email.

I should mention that rs is reasonably smart about leaving blanks at the ends of rows or columns. If there were 79 items in my list instead of 84,

pbpaste | rs -g5 -tj 14

would give

 203     1701     2002     2302     3008     3606
 204     1704     2004     2306     3009     3700
 207     1705     2009     2307     3100     3706
 215     1706     2101     2309     3201     3715
 302     1710     2102     2406     3202     3807
 306     1801     2103     2407     3205     3815
1502     1803     2105     2503     3208     3900
1507     1806     2106     2504     3300     3907
1509     1809     2108     2506     3301     4009
1510     1815     2112     2509     3415
1512     1903     2201     2515     3500
1515     1905     2203     2606     3505
1610     2000     2208     3003     3506

with the empty “cells” at the end of the last column. Without the -t option,

pbpaste | rs -g5 -j 14

gives

 203      204      207      215      302      306
1502     1507     1509     1510     1512     1515
1610     1700     1701     1704     1705     1706
1710     1801     1803     1806     1809     1815
1903     1905     2000     2001     2002     2004
2009     2101     2102     2103     2105     2106
2108     2112     2201     2203     2208     2209
2302     2306     2307     2309     2406     2407
2503     2504     2506     2509     2515     2606
3003     3007     3008     3009     3100     3201
3202     3205     3208     3300     3301     3415
3500     3505     3506     3600     3606     3700
3706     3715     3807     3815     3900     3907
4009

with the empty cells at the end of the last row.

There are lots of options for dealing with different types of input and generating different types of output. I don’t see any value in trying to remember them; I can look them up in the man page as needed. The most important thing is to know that rs exists.


  1. Yes, I could use column selection in BBEdit to edit my lists into rows and columns by hand, but if you think I’d do that, you haven’t been reading this blog very long. 


No reason to get excited

Last week’s Apple Event didn’t give me the push to get a new watch that I was expecting. Yes, my Series 3 won’t accept watchOS 9, but I’ve known that since WWDC. And while both the SE and the Series 8 are clear improvements over the Series 3, I’m not sure what I will get out of them. As I’ve mentioned before, my 3’s response to taps and swipes has become sluggish since the upgrade to watchOS 8, but it still tracks my daily movements and handles my notifications well. Will I get $250 or $400 of value out of a new model? If battery life were a problem, the answer would be yes, but I still get a full day’s use on a charge.

One thing that might have put me on the upgrade path happened last Tuesday morning as I was preparing for a business trip. I wanted to switch out my colorful orange and blue1 bands for a more staid black set. But I couldn’t slide the blue one out of the watch—the release button wouldn’t push in.

When I got back from the trip, I went to my local Apple Store and was handed off to a technician. She told me she’d have no trouble removing the band, but it was unlikely to survive the process. I said that was fine, as I’d already started a small tear on the stubborn band when I tried to remove it on Tuesday.

Also, this is not a special orange-and-blue band—it’s one half of an orange band and the other half of a blue band. I bought the two cheap third-party bands shortly after getting the watch, and I still have the other two halves. I hadn’t expected them to last this long.

After 5–10 minutes, the tech came out from the back room with my pieces. as expected, the release button on the watch was fine; it was the lugs on the band that had crapped out. Common in third-party bands, she said. Which I’m sure is true, but I spent a total of $18 on two bands over four years ago, and I still have, in practical terms, one band left. Hard to argue with that kind of value.

Watch with broken band

I came home, dug out the other halves, and put them on the watch. Smooth as silk. I have to say, though, after four-plus years of having the blue part coming off the top of the watch and the orange part coming off the botton, having the colors reversed looks a little weird.

After putting on the new/old straps, I noticed that I had the orientation of the broken piece wrong in the photo above. I was too lazy to remove the new straps and reshoot the photo, but I did reorient the broken piece and push it together with its mate for a closeup photo. You can see how the spring-loaded button in the center came apart.

Closeup of broken band

A Series 3 that was stuck with one (slightly torn) band might have induced me to get a new watch. Now that it’s back to full functionality, I’ll have to think harder. I’m not sure what any of it is worth.


My contribution to Markdown

I was listening to the most recent episode of The Talk Show this morning, and my extremely important contribution to Markdown came up. Sort of.

Rosemary Orchard is the guest, and she does a great job with the single most important task of a Talk Show guest: staying engaged and making relevant contributions while Gruber is on a long digression. Lots of guests find it hard to balance letting Gruber go while also making sure the audience knows you’re still there. Rosemary does this perfectly.

Anyway, it was during a digression—actually a digression within a digression— that Gruber talks about code blocks in Markdown and how one of his favorite features is that you don’t have to escape anything in a code block. You can paste source code directly into your Markdown document without any changes, and it will appear as expected in the rendered HTML.1 That’s my doing.

In 2004, any code that contained backslashes (\) was likely to need some editing when placed in a Markdown code block. Backslashes weren’t treated literally within a code block; they acted as escape characters and made the output look different from the input. Very un-Markdown-ish.

I pointed this out in the Markdown mailing list, and Gruber agreed that it should be changed. In the next Markdown release—which was, I believe, his last—he made the change, and all the text in code blocks has been treated literally ever since.

Undoubtedly, as Markdown became more popular, someone else would have pointed out this problem. Gruber himself would have been annoyed by it if he ever needed to write a code block with backslashes in it. But I was there first. And you’re welcome.


  1. The code block has to be indented, of course, or—in many implementations, but not Gruber’ssurrounded by fences


Filtering my RSS reading

A couple of weeks ago, I decided to cut back on my RSS feed reading.1 Not by reducing the number of feeds I’m subscribed to, but by filtering articles to eliminate those that would just be a waste of my time. The change was inspired by a particularly stupid post by Erik Loomis at Lawyers, Guns & Money. I realized that in all the years I’ve been reading LGM, I’ve liked very few Loomis articles. I start out thinking “maybe this one will be different,” but it seldom is. I just needed to cut him out.

My feedreader is NetNewsWire, which has been working well for me since I started using it about a year ago. Although there’s been some talk of adding filtering to NNW, it hasn’t happened yet. So what I need to do is set up filtered feeds and subscribe to them.

In olden times, I might have used Yahoo Pipes to do the filtering. Today’s equivalents are Zapier and IFTTT. After a bit of reading, it seemed like the parts of Zapier I’d need would require a $20/month subscription. And while I feel certain IFTTT could do what I wanted, I’m not interested in learning to write IFTTT applets—if I’m going to write filtering code, I’d rather do it in a more general purpose way.

I could subscribe to Feedbin or a similar service and point NetNewsWire to my subscription. This would be the right choice if, in addition to filtering, I wanted to fold a bunch of other things Feedbin does—like email newsletters, for example—into my RSS reading, but I’m not interested in that. If I’m going to spend %5/month, I’ll get a lot more out of a low-end virtual machine at Linode or Digital Ocean, which could host both my RSS filtering and other cloud-related services I build. And since I already have such a subscription…

My approach is very Web 1.0. For each feed I want to filter, I create a CGI script on my server. The script reads the original feed, filters out the articles I don’t want, and returns the rest. The URL of that script is what I subscribe to in NetNewsWire.

So what should the script be? My first thought was to use Python. It has the feedreader library, which I’ve used before. It parses the feed—from almost any format—and builds a dictionary from it. At that point, it’s easy to filter the articles using standard dictionary methods. Unfortunately, the filtered dictionary then has to be converted back out into a feed, which feedreader can’t do. I got around this by printing out the filtered dictionary as a JSON Feed. Since Brent Simmons is the driving force behind both NetNewsWire and the JSON Feed standard, I knew NNW would be able to parse the output of my filtering script.

This worked fine, and I used it for a couple of days, but it felt wrong. RSS and Atom feeds are XML files, and XML is supposed to be filtered using XSLT. The thing is, I haven’t used XSLT in ages, and I didn’t much care for it then. It was invented back when clever people thought everything was going to be put in XML format, so they built a programming language in XML. I’m sure they thought this was great—just like Lisp programs being written as Lisp lists—but it wasn’t. I’m sure there are many reasons XML hasn’t turned out to be as revolutionary as was thought 20 years ago, but one of them has to be the shitty language used for XML transformations.

Still, all I wanted to do was search for certain text in a certain node and prevent those records from appearing in the output. Everything else would be passed through as-is. Sal Mangano’s XSLT Cookbook has an example of a simple pass-through XSLT file (also known as the identity transform), which I used as the basis for my script:2

xml:
1:  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
2:   
3:  <xsl:template match="node() | @*">
4:    <xsl:copy>
5:      <xsl:apply-templates select="@* | node()"/>
6:    </xsl:copy>
7:  </xsl:template>
8:   
9:  </xsl:stylesheet>

XSLT is a rule-based language. The rules define how the various elements of the incoming XML document are to be treated. In the pass-through example, the match in the template rule on Line 3 matches all the elements (node()) and all the attributes (@*). The copy command then copies whatever was matched, which was everything.

With the pass-through rule in place, the script can be expanded to add additional rules that are more specific matches to particular elements or attributes. The Lawyers, Guns & Money feed identifies the author of each post this way:

xml:
<item>
  [other tags]
  <dc:creator><![CDATA[Erik Loomis]]></dc:creator>
  [more tags]
</item>

So I needed to add the following to the pass-through script:

Here’s what I came up with:

xml:
 1:  <xsl:stylesheet version="1.0"
 2:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 3:    xmlns:dc="http://purl.org/dc/elements/1.1/">
 4:  
 5:    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
 6:  
 7:    <xsl:template match="node() | @*">
 8:      <xsl:copy>
 9:         <xsl:apply-templates select="node() | @*"/>
10:      </xsl:copy>
11:    </xsl:template>
12:  
13:    <xsl:template match="item[contains(dc:creator, 'Loomis')]"/>
14:  </xsl:stylesheet>

You can see the namespace addition in Line 3 and the new rule for the <dc:creator> element in Line 13. Because there’s no action within this rule, nothing is done when an <item> contains “Loomis” in its <dc:creator> tag. And by “nothing,” I really mean nothing—there’s no output associated with this rule, which means Loomis’s posts are omitted.

With this XSLT file in place, I just needed a shell script to download the original feed and process it through the filter.

bash:
 1:  #!/bin/bash
 2:  
 3:  echo "Content-Type: application/rss+xml"
 4:  echo
 5:  
 6:  curl -s  https://www.lawyersgunsmoneyblog.com/feed \
 7:  | xsltproc loomis-filter.xslt -

Lines 3–4 provide the header and blank separator line. Lines 6–7 contain the pipeline that downloads the LGM feed via curl and passes it to xsltproc for filtering with the above XSLT file. xsltproc is part of the GNOME XML/XSLT project. It’s not the most capable XSLT processor around (it’s limited to XSLT 1.0, which is missing a lot of nice features), but it’s perfectly fine for this simple application, and it’s quite fast.

Assuming the CGI shell script is named filtered-lgm-feed and it’s on a server called mycheapserver.com, the URL I use for the subscription is

https://mycheapserver.com/cgi-bin/filtered-lgm-feed

Once I had this filtered feed working, I thought of other parts of my regular reading that could use some pruning. Here’s the filter I wrote for the Mac Power Users forum:

xml:
 1:  <xsl:stylesheet version="1.0"
 2:    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 3:    xmlns:dc="http://purl.org/dc/elements/1.1/">
 4:    
 5:    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
 6:    
 7:    <xsl:template match="node() | @*">
 8:      <xsl:copy>
 9:         <xsl:apply-templates select="node() | @*"/>
10:      </xsl:copy>
11:    </xsl:template>
12:    
13:    <xsl:template match="item[contains(translate(title, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'obsidian')]"/>
14:    <xsl:template match="item[contains(translate(title, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'devon')]"/>
15:  </xsl:stylesheet>

I wish all of you who use Obsidian and DEVONthink the best, but I don’t want to read about them anymore.

The translate function in Lines 13 and 14 converts all uppercase letters to lowercase before passing the result on to the contains function. Unlike the previous filter, which expects “Loomis” to have consistent capitalization (it does), this one doesn’t trust the forum users to capitalize the trigger words in any standardized way. This is especially important for the various products from DEVONtechnologies, which get almost every possible permutation of capitalization: DevonThink, devonTHINK, DevonTHINK, etc.

Using translate is a verbose way of making the change, but unfortunately XSLT 1.0 doesn’t have a lower-case function. XSLT 2.0 does, but xsltproc doesn’t support XSLT 2.0. The Java-based XSLT processor, Saxon, does, and for a while I had an XSLT 2.0 version of the MPU filter running through Saxon. But it was way slower than using xsltproc, so I returned to the more clumsy filter you see above.

The script that runs the filter and returns the Obsidian- and DEVON-less MPU posts looks pretty much like the LGM script:

bash:
#!/bin/bash

echo "Content-Type: application/rss+xml"
echo

curl -s  https://talk.macpowerusers.com/latest.rss \
| xsltproc topic-filter.xslt -

Although this post is kind of long-winded. building the filters didn’t take much time. It’s easy to download an RSS feed and look through it to see which nodes and attributes to use for the filter. Now that I have a couple of examples to build on, I expect to be adding more filters soon.


  1. Throughout this post, I’ll be using “RSS” as a catch-all term for any kind of feed, regardless of format. My apologies to all the Atom fans out there. 

  2. No, I don’t own the XSLT Cookbook, but my local library provides its patrons with a subscription to O’Reilly’s ebooks and courseware. It’s a good service, and you should look into whether your library does the same.