A stolen word count Quick Action

I mentioned yesterday that installing NetNewsWire brought back some old RSS feeds that I had mistakenly dropped somewhere along the road from the old NetNewsWire to Google Reader to my homemade RSS reader. One of them is Erica Sadun’s blog. A lot of what she writes is too deep into Xcode and real programming for me to understand, but her more elementary stuff is at my level.1

A few months ago, she wrote about a simple, two-step word-counting Quick Action she built in Automator. It takes whatever text you have selected—in any app—and pops up a window with the word and character counts.2

Sadun Word count

The first step is this shell script:

echo `echo $1 | wc -w` words. `echo $1 | wc -c` characters.

And the second step is this AppleScript:

applescript:
on run {input, parameters}
  display dialog input as string buttons {"OK"}
end run

After copying these steps outright, I decided to make a few changes. First, I noticed that the results window didn’t have a title, and the OK button wasn’t set to be the default—tapping the Return key wouldn’t dismiss it. So I made a couple of additions to the AppleScript:

applescript:
on run {input, parameters}
  display dialog input as text buttons {"OK"} default button 1 with title "Word Count"
end run

As for the shell script, I felt a little nervous about passing a long stretch of text in as an argument, so I decided to change the script to this:

wc -wc | awk '{printf "%d words and %d characters", $1, $2}'

and have the selected text come in as standard input instead of as an argument. The output of wc -wc is a string with a pair of numbers separated by whitespace. Awk is perfect for handling text like this because it reads stdin automatically, splits it on whitespace, and assigns the resulting substrings to the variables $1, $2, $3, etc.

This worked well, and if I were smart I would’ve stopped there. But I thought about using this Quick Action on longer stretches of text and how it wouldn’t format the numbers with commas at the hundreds/thousands boundary. Large counts would be easier to read with commas.

As it happens, awk’s printf command inherits a formatting code from the system printf that puts commas at the appropriate places. The code, unfortunately, is %'d, and the single straight quotation mark is a pain in the ass when constructing a shell command. I’d like to be able to use this in the pipeline:

awk '{printf "%'d words and %'d characters", $1, $2}'

but that won’t work because the shell interprets all the single quotation marks as string delimiters—the ones I want to use as formatting codes never get to awk.

As is often the case in shell scripting, there’s a way around this, but it’s incredibly confusing and ugly:

awk '{printf "%'"'"'d words and %'"'"'d characters", $1, $2}'

I found this solution here, and it took me a while to figure out how it works. Basically, it’s concatenating five separate strings, which you can probably see better if I color-code them:

'{printf "%'"'"'d words and %'"'"'d characters", $1, $2}'

Two of the strings (the blue ones) are delimited by single quotes and contain a double quote. Two others (the yellow ones) are delimited by double quotes and consist entirely of a single quote. Also, it’s important that the variables $1 and $2 are in single quotes to keep the shell from interpreting them before awk gets a chance to. When all of these are put together, this is the command string that awk sees:

{printf "%'d words and %'d characters", $1, $2}

Whew!

And that wasn’t the end of it. Even though this awk command worked at the command line, it didn’t work in Automator because Quick Actions don’t run in my normal command line environment.

The problem was with the locale, which isn’t set in the environment under which Quick Actions run.3 Luckily, the same web page that showed me the multiple quoting trick also showed me how to set the environment variable. Ultimately, the shell script step in my Quick Action was4

wc -wc | LC_ALL="en_US" awk '{printf "%'"'"'d words and %'"'"'d characters", $1, $2}'

Did I say “ultimately”? That was premature. The Quick Action worked fine with this shell script, but those five consecutive quotation marks bothered me. I knew I’d have trouble understanding them later (even if I had this blog post to explain them). I also knew that Python has a straightforward way to format numbers with commas. So I threw away the awk command and substituted in a longer, but easier to read, chunk of Python:

wc -wc | python3 -c 'import sys
w, c = map(int, sys.stdin.read().split())
print(f"{w:,d} words and {c:,d} characters")'

Python is a modular language and doesn’t automatically parse its input, so I needed to do some extra work on the front end. The second line reads in the standard input, splits it on whitespace, and converts the resulting strings into integers. That set up the variables w and c to be interpreted by the f-string in the print command. This is distinctly longer than the awk solution, but it’s also distinctly clearer.5

Here’s a screenshot of the Quick Action in Automator:

Word Count Quick Action

And here’s an example of its output:

Word count window

I hope Ms. Sadun forgives me for what I did to her simple automation.


  1. In this way, she’s a lot like Michael Tsai. I have to skip past many of his posts because they’re way over my head, but I stay subscribed for the ones I can follow. 

  2. For me, the value of this Quick Action isn’t for counting words I’m writing; BBEdit will tell me that in the status bar at the bottom of the window. This is for counting words in other people’s writing on (mainly) web pages. 

  3. You may have run into similar problems in which the Quick Action environment doesn’t set the PATH to what you expect. 

  4. Strictly speaking, setting LC_ALL is overkill. Just setting LC_NUMERIC to “en_US” would be sufficient to get the comma separators working. 

  5. Note that this script works only in Python 3. Apple has recently (starting with Catalina?) supplied Python 3 in addition to Python 2, but you have to install the Command Line Developer Tools to get it.