Erm

I did kind of van Hœtty thing on Twitter today, and I’d like to explain myself. In doing so, I will become even more van Hœtty.

Jean MacDonald (@macgenie) of Smile Software tweeted a short shell script that could be used as a TextExpander snippet to strip out dashes from phone numbers. The idea is that you have a phone number (with dashes) on your clipboard, and you want to paste it into a web form, but the idiot designers of the site don’t know how to do regular expressions in JavaScript and insist that you enter the number without dashes.

(shell script snippet to remove dashes, thx to @macgreg)

#! /bin/tcsh

echo -n `pbpaste` | perl -e 'while (<>) { s/-//g; print $_; }'

1:43 PM Thu Dec 1, 2011

The @macgreg in the tweet is Greg Scown, also of Smile.

Here’s the thing: I’m no software developer. I have a lot of respect for good software developers, and Smile makes great applications. I can’t, for example, imagine working without TextExpander. So I really have no business telling folks at Smile how to write a snippet for their own product.

But that’s not a good shell script. Let’s rewrite it so the line breaks don’t get messed up and see why.

#!/bin/tcsh

echo -n `pbpaste` | perl -e 'while (<>) { s/-//g; print $_; }'

We’ll start with the first line, which invokes the tcsh shell. Tcsh is an enhanced version of the old Berkeley C-shell, csh. Once upon a time, it was the default shell in OS X’s Terminal, but that hasn’t been the case since Tiger (or maybe Jaguar), when bash replaced it. For a simple two-command pipeline like this, it doesn’t really matter which shell you use, but I think seeing tcsh instead of bash in the shebang line would throw off most new shell scripters.

Also, many experienced scripters dislike the C-shell intensely. I haven’t used the C-shell more than a few times, but if Tom Christiansen says it sucks, I’m guessing it sucks.

Moving on, we come to the first command in the pipeline:

echo -n `pbpaste`

The backticks run the pbpaste command, which returns the contents of the clipboard and feeds the output to echo, which sends it to standard output. But pbpaste already sends the clipboard to standard output, so the echo is redundant. Better to just use pbpaste by itself and call only one command instead of two.

(I suppose you could argue that since the -n option to echo strips any trailing newlines from the pbpaste output, the echo really does serve a purpose. I would say that using echo to strip newlines is misuse. If you really need to strip newlines, there are better ways.)

The second command in the pipeline takes the output of the first and deletes all the hyphens:

perl -e 'while (<>) { s/-//g; print $_; }'

First, when you see a while (<>) {} construct in a Perl one-liner, it’s a good bet that the programmer’s a Perl neophyte. And when you see print $_, it’s a sure bet. Here’s an idiomatic Perl one-liner that does the same thing:

perl -pe 's/-//g'

The -p option tells Perl to go through the file line-by-line, do whatever command is called for by the -e option (in this case, the substitution), and print the result. It’s basically the while (<>) and the print $_ boiled down to a single letter.

But frankly, Perl isn’t the best tool for this job. There’s an older Unix command specifically designed to substitute (or translate) characters: tr.

In its basic form, tr takes two strings as arguments. It goes through the standard input fed to it, translating each character from the first string into the corresponding character of the second string. With the -d option, it takes only one string as its argument and deletes every instance of every character in that string. So a shorter, more focused replacement for the Perl one-liner would be

tr -d '-'

Thus, the TextExpander shell snippet as a whole would be

#!/bin/bash
pbpaste | tr -d '-'

Boom.

Phone numbers can, of course, have more “junk” characters than just hyphens. To get rid of parentheses, spaces, periods, slashes, and any stray newlines that might be at the end, we could do this:

#!/bin/bash
pbpaste | tr -d '() -/\n'

That does more than the original script, and is still shorter, easier to read, and uses fewer command calls.

We could generalize our snippet even more by using another tr option: -c. This option tells tr to use the complement of the set of characters in the string argument. So instead of giving tr a list of characters we want to delete, we can give it a list of characters we want to preserve.

#!/bin/bash
pbpaste | tr -dc '0123456789'

Or, since tr understands character classes the same way grep does:

#!/bin/bash
pbpaste | tr -dc '[:digit:]'

This, finally, is a pretty readable little script. It takes what’s on the clipboard,1 deletes everything that isn’t a digit, and sends what’s left to standard output.

I’ve tweeted bits and pieces of this since this afternoon, starting with the workalike pbpaste | tr -d '-' and then extending it to get rid of all non-digits. (When I first saw her tweet, I didn’t really know what Jean intended to use the script for. A bit of back-and-forth with J.F. Brissette [@ProfMac] got me straightened out, which led to the generalization.)

Although I generally prefer scripting in a “real language,” the old Unix tools really shine when you’re doing text transformations like this. They were written by very smart people and have been improved over decades of experience. And as I’ve said before, Apple’s addition of pbcopy and pbpaste fit nicely into that ecosystem.

So.


  1. Which is sometimes known in developer documents as the pasteboard, hence the pb prefix.