Wordle permutations

Last week, as a followup to my most recent Wordle/grep post, John Gruber sent me an email in which he told me about a script he wrote to do the same thing but with less fuss. Inspired by this, I wrote my own script.

As you may recall, I had beaten my family by getting this Wordle in three guesses:

Wordle guesses

At the time, I went with TOTAL as my third guess because I couldn’t think of another word that fit the restrictions of the first two guesses. But a quick review with grep,

egrep [^irepch][^irepcha]t[^irepch]{2} wordle.txt | egrep a

revealed 22 acceptable guesses,

altos    dotal    lotsa    outta
antas    gotta    lotta    sutta
antsy    gutta    lytta    total
astun    jotas    motza    untax
autos    kutas    notal
botas    lotas    oktas

of which TOTAL and ANTSY seemed like the only likely Wordle solutions.

Two things are annoying about using grep the way I did:

There might be a clever way to avoid the second pass through grep but if there is, it would make the regular expression even longer than it is.

Gruber’s script avoids the repetition in both calls to grep and the building of the regex:

fives -without IREPCH -with a23 ..t..

As you can see, the argument to fives is a simple regex with the correct (green) letters given in their positions and periods elsewhere. This is supplemented by the -without option, which is a list of the (black) letters that can’t appear in the solution, and the -with option, which gives the (yellow) letters which must appear and a list of positions where they cannot appear.

It wasn’t clear from this example how fives handles situations in which we have more than one yellow letter. It could be that you should use one -with for each yellow letter; it could be that the argument to -with combines the information for all the yellow letters. Either way, -with is a very smart way of handling the yellow letters. It combines the positive—this is a letter that must be in the solution—with the negative—these are the positions it cannot be.

Because Gruber didn’t send me fives itself, I was compelled to write my own utility, with the less clever name of wordle. Here’s how I’d use it to get the possible third guesses for the game above:

wordle -g ..t.. -b irepch -y a23

As you can see, I can’t be bothered with long option names, so I used the colors as their mnemonic. Also, I put the green letter string as an option instead of the argument to wordle itself. That seemed more symmetric.

As you might expect, while Gruber wrote fives in Perl, I wrote wordle in Python. Here it is:

python:
  1:  #!/usr/bin/env python
  2:  
  3:  import re
  4:  import itertools
  5:  from docopt import docopt
  6:  import os
  7:  import sys
  8:  
  9:  # Usage message
 10:  usage = '''Usage:
 11:    wordle [-g GREEN -y YELLOW -b BLACK]
 12:  
 13:  Print possible Wordle guesses based on the current state of green,
 14:  yellow, and black letters.
 15:  
 16:  Options:
 17:    -g GREEN    correct letter pattern [default: .....]
 18:    -y YELLOW   string of present letters and positions [default: ]
 19:    -b BLACK    string of absent letters [default: ]
 20:  
 21:  GREEN is a 5-character string like '..e.t', where the correct
 22:  letters are at the solved positions and the periods are at the
 23:  unsolved positions.
 24:  
 25:  BLACK is a list of letters that aren't in the word.
 26:  
 27:  YELLOW is a string of yellow letters followed by their positions.
 28:  For example, if your previous guesses have yellow Rs in the second
 29:  and fourth positions and a yellow E in the third position, the
 30:  argument would be 'r24e3'.
 31:  '''
 32:  
 33:  # Get all the words as a string with each word on its own line
 34:  wordle = open(os.environ['HOME'] + '/blog-stuff/wordle/wordle.txt').read()
 35:  
 36:  # Process the options
 37:  args = docopt(usage)
 38:  
 39:  # Green letters
 40:  green = args['-g']
 41:  greenPositions = [ i for i, v in enumerate(green) if v != '.' ]
 42:  greenPositions = set(greenPositions)
 43:  
 44:  # Black letters
 45:  black = args['-b']
 46:  
 47:  # Yellow letters. In the dictionary, the keys are the letters, and
 48:  # the values are sets of yellow positions.
 49:  yellow = {}
 50:  for m in re.finditer(r'([a-z])(\d+)', args['-y']):
 51:    yellow[m.group(1)] = set( int(i) - 1 for i in m.group(2) )
 52:  
 53:  # Dictionary of impossible positions for the yellow letters. Like
 54:  # the yellow dictionary above, but with the green letter positions
 55:  # added.
 56:  impossible = {}
 57:  for k in yellow.keys():
 58:    impossible[k] = yellow[k] | greenPositions
 59:  
 60:  # Base regex patterns for each character position. Start with the
 61:  # green positions, and then turn the periods into negated character
 62:  # classes from the black and yellow letters.
 63:  basePattern = list(green)
 64:  unsolved = sorted(list(set(range(5)) - greenPositions))
 65:  for i in unsolved:
 66:    basePattern[i] = '[^' + black + ']'
 67:    for k in yellow.keys():
 68:      if i in yellow[k]:
 69:        basePattern[i] = basePattern[i].replace(']', k + ']')
 70:    if basePattern[i] == '[^]':
 71:      basePattern[i] = '.'
 72:  
 73:  # Starting point for permuting the yellow letters
 74:  start = list(yellow.keys()) + ['~']*(5 - len(yellow.keys()))
 75:  
 76:  # Set of regexes for searching the wordle string. Each regex is
 77:  # based on the basePattern but with some of the negated character
 78:  # classes replaced by possible permutations of the yellow letters.
 79:  regexes = set()
 80:  
 81:  def possible(s):
 82:    for k in yellow.keys():
 83:      if s.index(k) in impossible[k]:
 84:        return False
 85:    return True
 86:  
 87:  for s in filter(possible, set(itertools.permutations(start))):
 88:    newPattern = basePattern[:]
 89:    for k in yellow.keys():
 90:      newPattern[s.index(k)] = k
 91:    regexes |= {'^' + ''.join(newPattern) + '$'}
 92:  
 93:  # Accumulate Wordle words that match
 94:  matches = set()
 95:  for r in regexes:
 96:    for m in re.finditer(r, wordle, re.M):
 97:      matches |= {m.group(0)}
 98:  
 99:  # Print out the matches in alphabetical order
100:  print('\n'.join(sorted(list(matches))))

The overall idea behind wordle is to create a set of regexes, each of which does the following:

The first three of these is basically what I was doing by hand in my grep solution. The fourth is a sort of combinatoric way of dealing with the presence of the yellow letters in positions where they haven’t yet been. For the example we’ve been looking at, the regular expressions wordle searches on are

^a[^irepcha]t[^irepch][^irepch]$
^[^irepch][^irepcha]t[^irepch]a$
^[^irepch][^irepcha]ta[^irepch]$

As you can see, the A is put sequentially in all of its possible positions: first, fourth, and fifth. Because this example has only one letter, it’s very simple, but wordle can handle multiple yellow letters.

Suppose my first guess was LATER. What could the next guess be? Here’s the wordle command I’d run:

wordle -g ..t.. -b er -y l1a2

Note that the -y argument combines the yellow letters and their positions into a single string. The results are

altho    cital    ictal    total
altos    dital    notal    vital
aptly    dotal    octal

and the regexes that were searched were

^alt[^er][^er]$
^[^erl][^era]tal$
^[^erl]lt[^er]a$
^a[^era]tl[^er]$
^a[^era]t[^er]l$
^[^erl][^era]tla$
^[^erl]lta[^er]$

As you can see, the L and A are both prevented from being in the first and second positions, respectively, and are otherwise placed in all of their possible permutations.

Let’s go through the code and see what it does.

First, the options are handled by docopt, a lovely library that parses the options from the usage message instead of creating a usage message from an options specification. It’s my favorite way of writing scripts that need options.

The newline-separated list of possible Wordle guesses is stored in $HOME/blog-stuff/wordle/wordle.txt which is where Line 34 reads the text that we’re going to search.

Lines 40–42 parse the -g option and create both a green regex and a greenPositions set of known character positions. Line 45 parses the -b option and creates the black string of letters that cannot be in the solution. We’ll use that to create a negated character class.

Lines 49–51 parse the -y option and build a yellow dictionary from it. yellow’s keys are the yellow letters, and its corresponding values are sets of the positions of the yellow letters. Note that because Python lists are zero-based and most human beings are one-based, the positions in yellow are one less than what’s given in the -y argument. Note also that I’m using sets instead of lists to avoid repeating positions in the next step.

Lines 56–58 create an impossible dictionary that extends the yellow dictionary to include the green positions.

Lines 63–71 build a list of regexes for each character based on the green letters, the black letters and where the yellow letters can’t be. It’s basically the regex I would use in a grep-based solution, except each character position is an item in a list. For the game at the beginning of the post, basePattern would be

['[^irepch]', '[^irepcha]', 't', '[^irepch]', '[^irepch]']

Lines 74–91 use basePattern and what we know about the yellow letters to build the set of regular expressions we’re going to search for. The start variable in Line 74 is a five-character list that begins with the yellow characters and is filled out with tildes (any non-letter character would do for this fill). In Line 87, we generate all the permutations of this list using the permuatations function from the itertools library. This will always yield an iterator that’s 120 lists long (that’s 5 factorial), but many of the permutations will be, for our purposes, identical and can be eliminated by converting the iterator into a set.

Let’s use our LATER example to see how this works. Recall that if our first guess in the game above had been LATER, we would have executed wordle this way:

wordle -g ..t.. -b er -y l1a2

That would give us a start list of ['l', 'a', '~', '~', '~']. A call to

itertools.permutations(start)

would yield these 120 lists (where I’ve collapsed the lists into strings to make it easier to read):

al~~~    a~~~l    l~~a~    ~a~~l    ~l~~a    ~~la~
al~~~    a~~~l    l~~a~    ~a~~l    ~l~~a    ~~la~
al~~~    a~~~l    l~~~a    ~a~~l    ~l~~a    ~~l~a
al~~~    a~~~l    l~~~a    ~a~~l    ~l~~a    ~~l~a
al~~~    la~~~    l~~~a    ~a~~l    ~~al~    ~~l~a
al~~~    la~~~    l~~~a    ~a~~l    ~~al~    ~~l~a
a~l~~    la~~~    l~~~a    ~la~~    ~~al~    ~~l~a
a~l~~    la~~~    l~~~a    ~la~~    ~~al~    ~~l~a
a~l~~    la~~~    ~al~~    ~la~~    ~~al~    ~~~al
a~l~~    la~~~    ~al~~    ~la~~    ~~al~    ~~~al
a~l~~    l~a~~    ~al~~    ~la~~    ~~a~l    ~~~al
a~l~~    l~a~~    ~al~~    ~la~~    ~~a~l    ~~~al
a~~l~    l~a~~    ~al~~    ~l~a~    ~~a~l    ~~~al
a~~l~    l~a~~    ~al~~    ~l~a~    ~~a~l    ~~~al
a~~l~    l~a~~    ~a~l~    ~l~a~    ~~a~l    ~~~la
a~~l~    l~a~~    ~a~l~    ~l~a~    ~~a~l    ~~~la
a~~l~    l~~a~    ~a~l~    ~l~a~    ~~la~    ~~~la
a~~l~    l~~a~    ~a~l~    ~l~a~    ~~la~    ~~~la
a~~~l    l~~a~    ~a~l~    ~l~~a    ~~la~    ~~~la
a~~~l    l~~a~    ~a~l~    ~l~~a    ~~la~    ~~~la

When creating permutations, each tilde is considered a separate item, which is why so many of these lists look the same. There are 6 identical lists (3 factorial) for each unique position of L and A. We don’t need the duplicates and can get rid of them with

set(itertools.permutations(start))

to reduce the number of lists down to just 20:

al~~~    la~~~    ~al~~    ~l~a~    ~~la~
a~l~~    l~a~~    ~a~l~    ~l~~a    ~~l~a
a~~l~    l~~a~    ~a~~l    ~~al~    ~~~al
a~~~l    l~~~a    ~la~~    ~~a~l    ~~~la

Of course, even these 20 lists are more than we need because many of them are impossible. Anything with an L in the first position or an A in the second should be filtered out. That’s what the possible function in Lines 81–85 and the filter function in Line 87 are for. possible returns False for all the permutations that have one or more letters in an impossible position and True for all the others. Ultimately, the

for s in filter(possible, set(itertools.permutations(start))):

loop that starts in Line 87 goes through seven of these permutations and combines them with basePattern to return the regexes set that we showed above:

^alt[^er][^er]$
^[^erl][^era]tal$
^[^erl]lt[^er]a$
^a[^era]tl[^er]$
^a[^era]t[^er]l$
^[^erl][^era]tla$
^[^erl]lta[^er]$

Finally, Lines 94–97 search the wordle string for each of these patterns in turn and collect all of them into the matches set. Line 100 then sorts the collection of matches alphabetically and prints them out, one per line.

I could have written a script that hewed more closely to the logic of my grep pipeline. Such a script would have created basePattern, searched on that, and then searched that intermediate result for words that have all of the yellow letters. It would have been easier to write and might have run faster, too. But I wanted to do something new. While I’ve used the itertools library before, I’ve never used permutations. And the filter function was new to me, too, despite it being built-in. I had fun writing wordle and learned some things I may find useful in the future.

While I’m happy with my script, I hope John Gruber publishes his. He used to publish more scripty articles, and I miss them. He’s told me he might write about it, or at least publish it as a Gist, and I look forward to that. Don’t pester him about it, though—he had a tough weekend.