Erm
December 1, 2011 at 10:38 PM by Dr. Drang
I did kind of van Hœtty thing on Twitter today, and I’d like to explain myself. In doing so, I will become even more van Hœtty.
Jean MacDonald (@macgenie) of Smile Software tweeted a short shell script that could be used as a TextExpander snippet to strip out dashes from phone numbers. The idea is that you have a phone number (with dashes) on your clipboard, and you want to paste it into a web form, but the idiot designers of the site don’t know how to do regular expressions in JavaScript and insist that you enter the number without dashes.
(shell script snippet to remove dashes, thx to @macgreg)
#! /bin/tcsh
echo -n `pbpaste` | perl -e 'while (<>) { s/-//g; print $_; }'
The @macgreg in the tweet is Greg Scown, also of Smile.
Here’s the thing: I’m no software developer. I have a lot of respect for good software developers, and Smile makes great applications. I can’t, for example, imagine working without TextExpander. So I really have no business telling folks at Smile how to write a snippet for their own product.
But that’s not a good shell script. Let’s rewrite it so the line breaks don’t get messed up and see why.
#!/bin/tcsh
echo -n `pbpaste` | perl -e 'while (<>) { s/-//g; print $_; }'
We’ll start with the first line, which invokes the tcsh shell. Tcsh is an enhanced version of the old Berkeley C-shell, csh. Once upon a time, it was the default shell in OS X’s Terminal, but that hasn’t been the case since Tiger (or maybe Jaguar), when bash replaced it. For a simple two-command pipeline like this, it doesn’t really matter which shell you use, but I think seeing tcsh instead of bash in the shebang line would throw off most new shell scripters.
Also, many experienced scripters dislike the C-shell intensely. I haven’t used the C-shell more than a few times, but if Tom Christiansen says it sucks, I’m guessing it sucks.
Moving on, we come to the first command in the pipeline:
echo -n `pbpaste`
The backticks run the pbpaste
command, which returns the contents of the clipboard and feeds the output to echo
, which sends it to standard output. But pbpaste
already sends the clipboard to standard output, so the echo
is redundant. Better to just use pbpaste
by itself and call only one command instead of two.
(I suppose you could argue that since the -n
option to echo
strips any trailing newlines from the pbpaste
output, the echo
really does serve a purpose. I would say that using echo
to strip newlines is misuse. If you really need to strip newlines, there are better ways.)
The second command in the pipeline takes the output of the first and deletes all the hyphens:
perl -e 'while (<>) { s/-//g; print $_; }'
First, when you see a while (<>) {}
construct in a Perl one-liner, it’s a good bet that the programmer’s a Perl neophyte. And when you see print $_
, it’s a sure bet. Here’s an idiomatic Perl one-liner that does the same thing:
perl -pe 's/-//g'
The -p
option tells Perl to go through the file line-by-line, do whatever command is called for by the -e
option (in this case, the substitution), and print the result. It’s basically the while (<>)
and the print $_
boiled down to a single letter.
But frankly, Perl isn’t the best tool for this job. There’s an older Unix command specifically designed to substitute (or translate) characters: tr
.
In its basic form, tr
takes two strings as arguments. It goes through the standard input fed to it, translating each character from the first string into the corresponding character of the second string. With the -d
option, it takes only one string as its argument and deletes every instance of every character in that string. So a shorter, more focused replacement for the Perl one-liner would be
tr -d '-'
Thus, the TextExpander shell snippet as a whole would be
#!/bin/bash
pbpaste | tr -d '-'
Boom.
Phone numbers can, of course, have more “junk” characters than just hyphens. To get rid of parentheses, spaces, periods, slashes, and any stray newlines that might be at the end, we could do this:
#!/bin/bash
pbpaste | tr -d '() -/\n'
That does more than the original script, and is still shorter, easier to read, and uses fewer command calls.
We could generalize our snippet even more by using another tr
option: -c
. This option tells tr
to use the complement of the set of characters in the string argument. So instead of giving tr
a list of characters we want to delete, we can give it a list of characters we want to preserve.
#!/bin/bash
pbpaste | tr -dc '0123456789'
Or, since tr
understands character classes the same way grep
does:
#!/bin/bash
pbpaste | tr -dc '[:digit:]'
This, finally, is a pretty readable little script. It takes what’s on the clipboard,1 deletes everything that isn’t a digit, and sends what’s left to standard output.
I’ve tweeted bits and pieces of this since this afternoon, starting with the workalike pbpaste | tr -d '-'
and then extending it to get rid of all non-digits. (When I first saw her tweet, I didn’t really know what Jean intended to use the script for. A bit of back-and-forth with J.F. Brissette [@ProfMac] got me straightened out, which led to the generalization.)
Although I generally prefer scripting in a “real language,” the old Unix tools really shine when you’re doing text transformations like this. They were written by very smart people and have been improved over decades of experience. And as I’ve said before, Apple’s addition of pbcopy
and pbpaste
fit nicely into that ecosystem.
So.
-
Which is sometimes known in developer documents as the pasteboard, hence the pb prefix. ↩