Passphrases via shell pipeline

I read this article by Micah Lee at The Intercept on how to generate secure passphrases using the Diceware technique and thought it would be fun to see if I could do the equivalent from the command line. It turned out to be fairly easy, although I did need to use a GNU utility that doesn’t come with OS X.

Before we start, I should point out that the purpose of the shell command is not to generate passwords for websites—that’s what 1Password is for. It’s for generating things like a master password for 1Password or a passphrase for GPG. My 1P master password is getting old and was never as secure as it should have been. I’m going to use the shell command to make a newer and better one.

In the Diceware method, you roll dice to generate a five-digit number from 11111 to 66666 (but which doesn’t include zeros or digits higher than six) and turn that number into a word using a table. Repeat that six or seven times and you’ll have a passphrase that’s very hard to break. The Intercept article has the straightforward mathematical details on calculating the difficulty of breaking a passphrase through brute force. The trick is estimating how fast guesses can be generated—Lee assumes a trillion guesses per second in his calculations.

Calling the result a passphrase is a stretch, because the words won’t bear any relation to one another, but that’s one of the reasons the passphrase will be hard to guess. It won’t be a sequence of words found in any database of literature, song lyrics, newspaper articles, etc. The passphrase won’t be memorable, but it will be memorizable.

My first thought was to use the list of 7776 (that’s 6 5) Diceware words as the corpus from which to pluck random words. There’s a plain text version of the Diceware table, and it’s easy to edit out the numbers, leaving just a 7776-line file with one word per line. Then I could use the shuf command from the GNU core utilities set to pull out six words at random from the file:

shuf --random-source=/dev/random -r -n 6 < diceware.wordlist.asc

In general, shuf shuffles the lines of standard input and sends the results to standard output. The -r option tells shuf to allow lines (in this case, words) to be repeated, and the -n 6 option tells it to output just six lines. (Update: see below about the --random-source option.) This is the command line equivalent of the Diceware method.

The shuf command isn’t distributed with OS X, but it’s easily installed as part of the coreutils package via Homebrew:

brew install coreutils

Because many of the commands in coreutils are GNU versions of commands already installed with OS X, the Homebrew versions are all prefixed with “g.” So the actual command I ran was

gshuf --random-source=/dev/random -rn6 < diceware.wordlist.asc

where I took advantage of the fact that you can mush single-character options together. This gives a list of six words, like

jb
raj
adult
ob
stash
reveal

But I wasn’t happy with this solution. The words in the Diceware table are short and easy to type, but too many of them have no style. I’d almost rather have my 1Password file cracked than have “ob” or “a&p” (yes, that’s one of the words in the table) in my passphrase. The same can be said for the Unix spelling dictionary at /usr/share/dict/words. Too many of its words are nasty agglomerations of prefixes and suffixes, like “irreconcilableness.”

So where to get a stylish corpus? The same place I got the source for my dissociated Darwin TextExpander snippet: books from Project Gutenberg. I already have the full text of Origin of Species, Alice’s Adventures in Wonderland, Pride and Prejudice, and A Princess of Mars sitting in a folder in my home directory. It would be easy to add others.

Using well known texts will not make our passphrases easier to guess because we won’t be using consecutive words from them. After all, the Diceware list is very well known, but using it is still secure because the order of the words picked from it is random. We’ll do the same with our Gutenberg files.

The Gutenberg files, though, aren’t in the format that gshuf wants. We need to process them to get all of the unique words into a list, one word per line. That happens to be part of a very well-known Unix solution: Doug McIlroy’s six-command pipeline for counting words. Because we don’t need to count, we can get by with just the first four commands:

tr -cs A-Za-z '\n' < species.txt  | tr A-Z a-z | sort | uniq

This will output all the unique “words” in Origin of Species in alphabetical order. A word here is defined as any string of consecutive letters separated by non-letters. We could make the definition more inclusive—to add contractions, for example—but why bother? We’re looking for a passphrase, not doing textual analysis.

How many unique words are there in Origin of Species? The pipeline

tr -cs A-Za-z '\n' < species.txt  | tr A-Z a-z | sort | uniq | wc -l

gives us the answer: 8185. Because this is more than the 7776 words in the Diceware list, our passphrase will be a little bit harder to crack than a Diceware phrase, even if we don’t allow repeated words. The pipeline

tr -cs A-Za-z '\n' < species.txt  | tr A-Z a-z | sort | uniq | gshuf --random-source=/dev/random -n6

gives us output like

rate
tobacco
invertebrate
dashing
perpetuation
reindeer

You have to admit, “dashing perpetuation reindeer” is more fun than “jb raj adult,” even if it does take longer to type.

Using Pride and Prejudice as our corpus leads to passphrases like

wickedest
disgraceful
regain
natural
confirms
lock

and using Alice’s Adventures in Wonderland yields phrases like

terms
mushroom
stolen
leave
treat
furiously

Alice, though, has only 2569 unique words, so maybe you shouldn’t use a six-word passphrase with it. At a trillion guesses per second, it would take only about four years, on average, to crack. Assuming, of course, the attacker knew you were using Alice for your word list.

Update 4/3/15 9:22 PM
I know a bit about pseudorandom number generators (PRNGs) because I’ve had to do Monte Carlo simulations and develop random sampling plans at various points in my career. But I don’t consider myself an expert on them. And I’m really out of my depth when it comes to cryptographically secure pseudorandom number generators (CSPRNGs). So it was nice to get this advice from Jeffrey Goldberg, the Chief Defender Against the Dark Arts at AgileBits (makers of 1Password) and author of some very good security posts on the Agile Blog (i.e., someone who knows what he’s talking about):

@drdrang Really nice job with leancrew.com/all-this/2015/… But please use —random-source=/dev/random with shuf. Also …
Jeffrey Goldberg (@jpgoldberg) Apr 3 2015 7:37 PM

I’ve updated the invocations of shuf and gshuf above.