July 1, 2008 at 1:26 PM by Dr. Drang
In a footnote to my last post, I mentioned that typing “bookkeeper” over and over made me realize that it had three consecutive doubled letters, and I wondered if it was the only such word in English. Today I learned that the answer is basically “yes.”
Unix and Unix-like systems (Linux, OS X) have a builtin spellchecker that accesses a simple text-only dictionary at
/usr/share/dict/words. This is a file with, on my machine, 234,936 words, one per line. Finding out how many have three consecutive doubled letters is a simple Perl one-liner:
perl -ne 'print if /(.)\1(.)\2(.)\3/' /usr/share/dict/words
n option tells Perl to read the file one line at a time and apply the program to each line in turn. The
e option tells it that the program is on the command line, not in a file. The program itself--between the single quotes--prints the line if it matches the regular expression between the slashes. The regular expression looks for
(.) any character (captured) \1 the first captured expression (.) any character (captured) \2 the second captured expression (.) any character (captured) \3 the third captured expression
which is regular expression-speak for three consecutive doubled characters. The result is
bookkeeper bookkeeping subbookkeeper
So “bookkeeper” and other forms of it are the only words in the dictionary with three consecutive doubled letters. I find “subbookeeper” rather suspect; it seems like a word coined specifically to have four consecutive doubled letters. It can be found on the Internet, but I’m not buying it.
The one-liner can be generalized to find all the words with three doubled letters, regardless of whether they’re consecutive.
perl -ne 'print if /(.)\1.*(.)\2.*(.)\3/' /usr/share/dict/words
This gives a list of 170 words. Some, like “whippoorwill,” “successfully,” and “committee,” are pretty common. Most, like “unpossessedness” and “buttressless,” are weird constructions with “-ness” or “-less” suffixes that only a lawyer or bureaucrat would use.
There are four words with four doubled letters. The one-liner
perl -ne 'print if /(.)\1.*(.)\2.*(.)\3.*(.)\4/' /usr/share/dictwords
killeekillee possessionlessness subbookkeeper successlessness
I love “successlessness” because it defines whoever would use it.