How many words have I written?

I’m about an hour into the current episode of The Talk Show (“Prison Oreos” with Jason Snell), and as I was listening to John Gruber talk about the word count statistics for Daring Fireball—two million words and growing—I thought about how I could do the same thing here.

My task is simplified by the way I have the blog structured. All the Markdown source files (and all the source files are in Markdown format) have a .md extension and are in a set of nested folders—a folder for each year and a folder for each month within each year.

Let’s start by going to the top level source folder and figuring out how many posts I’ve written.

find . -iname *.md | wc -l

The find command prints out all the .md files in the underlying folders, one per line, and the wc command counts the lines. That yields 2,395 blog posts.

A simple word count can be done by altering the pipeline a bit.

find . -iname *.md | xargs wc -w

The xargs command gets the file names from find and feeds them as arguments to wc. This prints out the word counts for each file in turn and then gives the total.

     562 ./2017/08/a-riveting-show.md
     345 ./2017/08/apple-sales.md
     569 ./2017/08/bulk.md
     203 ./2017/08/familiar-tools.md
     891 ./2017/08/my-jxa-problem.md
     537 ./2017/08/return-to-textexpander.md
     610 ./2017/08/subscriptions.md
 1422476 total

It must be those extra 600,000 words Gruber’s written that makes his site more popular than mine.

But I really should refine my word count down. In 2008 and 2009, I had the bad idea to automatically post a summary of my tweets from the previous day. There were, I’m sorry to say, 333 such posts, and although all the words in them were mine, including them is cheating. They all had titles like “Tweets for January 15, 2009, so we can filter them out of what find returns by looking for the string tweets-for- in the file name.

find -E . -not -regex '.+tweets-for-.+' -iname *.md | xargs wc -w

The -regex '.+tweets-for-.+' part finds files with the telltale string, and the -not excludes them from the list. The -E tells find to use “extended” regular expressions, which is the kind I’m familiar with.

Excluding the tweet posts brings us down to 1,391,790 words. What else can we filter out?

Every source file includes header lines at the top with the title, date, keywords, and other metadata that shouldn’t be counted as writing. These lines are separated from the body of the post by at least one blank line, and we can use sed to get rid of them:

find -E . -not -regex '.+tweets-for-.+' -iname *.md | xargs -I {} sed '1,/^$/ d' {} | wc -w

Here, sed '1,/^$/ d' deletes the lines from the top of the file through the first blank line. To make sure sed acts on each individual file instead of the concatenation of all of them, we add the -I {} part to xargs and {} to the end of the sed command. This feeds the file names to sed and generates a long, long string of text that’s piped to wc -w to get the word count.

Without headers, we’re down to 1,347,380 words. I think this is a legitimate word count, but maybe we should filter out the URLs of links. Because I use reference links, they’re pretty easy to find and delete with another sed invocation.

find -E . -not -regex '.+tweets-for-.+' -iname *.md | xargs -I {} sed '1,/^$/ d' {} | sed -E '/^\[.+\]: / d' | wc -w

The sed -E/^[.+]: / d’ command deletes all lines that start with bracketed text followed by a colon and space. This is the format for reference links. The -E flag tells sed to use extended regex syntax.

Now we’re down to 1,283,149 words, and I suppose if we’re going to exclude link URLs we should exclude source code of scripts, too. No matter how long they took to write, the scripts (usually) weren’t written for the purposes of blogging, they were written to solve a problem and then pasted into the blog.

Source code blocks start with four or more spaces at the beginning of each line, so the filter for them is easy to add:

find -E . -not -regex '.+tweets-for-.+' -iname *.md | xargs -I {} sed '1,/^$/ d' {} | sed -E '/^\[.+\]: / d' | sed '/^    / d' | wc -w

Now we’re down to 1,147,413 words. Frankly, I thought deleting the source code would bring it down further than that.

This was a fun exercise and didn’t take too long. I’d never used -not or -regex with the find command before, so I’m a little smarter than I was at the beginning of the day. Similarly for the -I {} option to xargs. The sed commands were nothing new, but I probably should get in the habit of adding the -E option regardless of whether it’s needed. The syntax of “basic” regular expressions is something I don’t know and don’t ever want to learn.


A riveting show

If I were to rank Marvel’s recent Defenders Netflix series, I’d put it more or less even with the second series of Daredevil and below Jessica Jones, Luke Cage, and the first series of Daredevil. I didn’t watch Iron Fist when it came out, based mainly on the awful reviews. Now, after seeing the Danny Rand character in Defenders, I feel good about that decision.

But this post isn’t about the quality of The Defenders or the ins and outs of its plot. For that, I suggest you subscribe to the Defenders episodes of The Incomparable’s TeeVee podcast, with Lisa Schmeiser, Phil Mozolak, and Tony Sindelar. They’re going to do ten episodes: an introduction, one for each episode of the show, and—I assume—a wrapup. The introductory episode was posted a couple of days ago.

What I want to talk about here is something that made me smile in the last episode of the show. I don’t think this is any kind of spoiler, but if you’re super sensitive about that, stop reading.

Claire Temple (the Rosario Dawson character who’s all over the Marvel Netflix series) and Colleen Wing (Jessica Henwick) go into the bowels of a brand new high-rise armed with explosives to bring the building down.

I’m not going to complain that they seem to go to only one column in the building, nor am I going point out how unlikely it is that a nurse and a martial arts specialist would be able to find the key column based on a single architectural drawing. I’m not even going cast doubt on the ability of a mere architect (as opposed to a structural engineer) to determine which column is the key.

No, I’m just going to say how funny I found it when we finally get to see this all-important column. Here’s our first view of it, in the background just left of center:

Defenders column 1

Here’s a closer view, as they start stacking the packages of explosive around it:

Defenders column 2

Look at those beautiful rivets. The art department did a great job making sure they look rough instead of pristine. They even misaligned one of the rivets in the bottom row to give the column a gritty verisimilitude. Even a superhero show benefits from attention to detail like this. Except…

Rivets aren’t used in buildings anymore and haven’t been in decades.

That doesn’t mean, though, that it was wrong for the set designer to make up a fake column with fake rivets. Rivets say “steel structure”—even to those of us who know better—in a way that bolts and welds simply don’t. It’s a kind of skeuomorphism. Despite it looking completely unlike any steel structure built in the past half-century or so, it manages to draw on some cultural memory and seem right.

It’s like the Save button icon that looks like a floppy disk. Nobody’s used floppies in a dozen years or more. Many full-fledged adults have never even seen one. But through some sort of self-perpetuating habit, we see those buttons and think “save.”

Anyway, my thanks to the makers of The Defenders for some unplanned entertainment.


A Reminders informational macro

My motivation for exploring JavaScript for Automation—as outlined in the last post—was the simple Keyboard Maestro macro we’ll talk about today.

Back in January, I wrote about how I use Reminders to keep track of outstanding invoices at work and remind me to follow up with clients who are late in paying. I’ve changed some of the details since then (which means there’ll probably be a post about that soon), but the basics are the same:

Invoice reminders

The job name, invoice number, and invoice amount are all in the reminder’s name, which is formatted this way:

Invoice reminder format

What I wanted was a way to see the number of outstanding invoices and the unpaid total. I can do that through our accounting system, of course, but that involves launching the accounting software. Doing it through Reminders, which is always running on my iMac, takes no time at all.

Here’s the Keyboard Maestro macro, which I’ve bound to the ⌃⌥⌘N keystroke combination and is set to run only when Reminders is the current application.

Keyboard Maestro invoice reminder macro

The macro has only one action, which is this JXA script:

javascript:
 1:  var app = Application('Reminders')
 2:  
 3:  // get the names of the invoice reminders
 4:  var invoices = app.lists.byName('Invoices').reminders.whose({completed: false}).name()
 5:  
 6:  // extract the amounts and add them
 7:  var sum = 0.0
 8:  var invCount = invoices.length
 9:  for (var i = 0; i < invCount; i++) {
10:    var amtStart = invoices[i].lastIndexOf('$') + 1
11:    var amtEnd = invoices[i].lastIndexOf(')')
12:    var amt = invoices[i].slice(amtStart, amtEnd)
13:    amt = amt.replace(',', '')
14:    sum += parseFloat(amt)
15:  }
16:  
17:  invCount + " invoices for $" + sum.toLocaleString()

Lines 1 and 4 are what we talked about in the previous post; they generate an array of all the names of the active reminders in the Invoices list and put it in the variable invoices.

Line 7 initializes the variable sum, which is used to accumulate the unpaid total. Line 8 puts the number of unpaid invoices into invCount. Lines 9–15 loop through invoices, extracting the amount of each invoice and adding it to sum. I’m sure I could’ve used a clever regex to get the amount in one step, but I didn’t feel like taking the time to work that out.

Finally, Line 17 returns a string with the information I want. The toLocaleString method is convenient way to format the amount with a comma as the thousands separator.

Keyboard Maestro’s “display results briefly” setting (the popup menu just above the text field with the source code) puts the result in a notification box that slides out from the top right corner of my screen and slides back after a few seconds.

Keyboard Maestro notification

I decided to use JXA instead of AppleScript because I thought it would give me some JXA practice and because I hate doing text manipulation in AppleScript. I ended up with the utility I wanted, and I learned some things along the way.


My JXA problem

I was optimistic when Apple introduced JavaScript for Automation (curiously abbreviated JXA) a few years ago. Even though I’m not a fan of JavaScript, it is at least a “normal” language, unlike JXA’s much older cousin, AppleScript. And because JavaScript is the language of the web, the web itself can provide the answer to almost any JavaScript question at the top of Google’s search results. While I didn’t expect JXA to fully displace AppleScript—there’s just too much AppleScript history for that—I did expect an explosion of JXA example code that I could read and learn from (and steal).

But things haven’t worked out that way. JXA often seems weird and unnatural in how it interacts with apps and in the results it returns. It reminds me of appscript, which worked well but never really felt Pythonic.

But Sal Soghoian keeps telling us JXA is good, so I feel compelled to keep trying. The other day, I wanted to write a script that extracted text from a Reminders list and manipulate it into another form for printing. Because text manipulation in AppleScript can be a horror show,1 I gave JXA a shot. The text manipulation part was breeze, but the extraction of the text from Reminders—the interaction with an app that’s the heart of JXA—made no sense to me. But after lots of experimentation with a stripped-down version of the code, I’m starting to understand. At least I think so.

Let’s assume we have a Reminders list called “Test” that looks like this

Reminders list

and we want to put the text of all the items in “Test” into an array.

This JXA script will do it:

javascript:
Application('Reminders').lists.byName('Test').reminders.name()

Executing it in the Script Editor gives the result

["First", "Second", "Third", "Fourth", "Fifth"]

This is nicely compact, but how does it work, and why?

Most JXA tutorials suggest that the equivalent of AppleScript’s

applescript:
tell application "Reminders"
  whatever
end tell

is

javascript:
rem = Application('Reminders')
rem.whatever

so the Application('Reminders') part makes sense. What about the next part, lists?

Let’s look at an excerpt from the JavaScript dictionary for Reminders

JavaScript dictionary for Reminders

It says the Reminders application contains lists, but if we try to run

javascript:
Application('Reminders').lists

we’ll get this unhelpful result in the lower pane of the Script Editor window:

Application("Reminders").lists

Nice tautology. Let’s try this:

javascript:
typeof(Application('Reminders').lists)

Now our result is "function". Well, if it’s a function, maybe we should see what happens if we execute it.

javascript:
Application('Reminders').lists()

This gives us the more useful result

[Application("Reminders").lists.byId("47012FD8-87CB-490D-8C27-D957EBCB2C86"), 
Application("Reminders").lists.byId("0DAC8B42-EF53-4F50-AF6B-EE8B9D98BE99"), 
Application("Reminders").lists.byId("24611E1A-712A-4FD3-9908-AD30E9CD9825"), 
Application("Reminders").lists.byId("FBC65F34-1F32-460F-B11B-C4AAFCFD3F75")]

We’d get essentially the same result from this AppleScript:

applescript:
tell application "Reminders" to get every list

OK, so now we see that lists is a function, but doesn’t the dictionary say lists are objects? Yes, but in JavaScript functions are objects and they can have their own properties and functions (or methods, if you prefer). The documentation in the dictionary isn’t lying to us, but it isn’t being entirely forthcoming, either.

It was when I realized that I could execute some of the things identified as objects in the dictionary that JXA started to make some sense to me.

The result indicates that we can access individual Reminders lists by their ID through the cleverly named byID function. Is there a similar byName function? Yep. This will get the list we want:

javascript:
Application('Reminders').lists.byName('Test')

And if we want the reminders in that list, we run

javascript:
Application('Reminders').lists.byName('Test').reminders()

which returns

[Application("Reminders").reminders.byId("x-apple-reminder://5F123EA1-681B-4A70-822B-734B23145A3C"), 
Application("Reminders").reminders.byId("x-apple-reminder://A017D812-A6A5-4E7D-A8CE-57F08B711DA8"), 
Application("Reminders").reminders.byId("x-apple-reminder://356AA6D9-E7B0-40F5-B46C-07697E6E2249"), 
Application("Reminders").reminders.byId("x-apple-reminder://B15B20C7-5DDD-4BB2-B985-EADD33628803"), 
Application("Reminders").reminders.byId("x-apple-reminder://68AE8F47-2D73-45CF-AAAB-2CD65E3374F6")]

And if we want to get the names of these reminders, we’re back to where we started:

javascript:
Application('Reminders').lists.byName('Test').reminders.name()

Suppose we’ve completed the fourth reminder and want only the names of the reminders that haven’t been completed. In AppleScript, this would mean adding a whose clause in the middle of the specification, and JXA works basically the same way.

javascript:
Application('Reminders').lists.byName('Test').reminders.whose({completed: false}).name()

yields

["First", "Second", "Third", "Fifth"]

What you pass to whose is a property/value pair in curly braces. If you need to do a more complicated comparison (not just a simple equality), I suggest you look at the Filtering Arrays section of the JXA 10.10 release notes. For example, to show all the reminders completed before a certain date, you’d do something like this:

javascript:
targetDate = new Date(2017, 8, 26, 0, 0, 0, 0)
Application('Reminders').lists.byName('Test').reminders.whose({completionDate: {_lessThan: targetDate}}).name()

For our example, that would yield the result

["Fourth"]

This is too long for comfortable reading, so it would be better to break it up into at least two parts:

javascript:
targetDate = new Date(2017, 8, 26, 0, 0, 0, 0)
rem = Application('Reminders').lists.byName('Test').reminders
rem.whose({completionDate: {_lessThan: targetDate}}).name()

The intent of this is easy to work out, but I don’t think I’ll ever be able to carry this kind of syntax in my head. I’m OK with the braces within braces, because that’s basically JSON, but the leading underscore just seems stupid. But now I have a blog post that’ll help me find it the next time I need it.


  1. If I never have to set and reset AppleScript text item delimiters again, I’ll die happy. ↩︎