Learning from (Wordle) failure

This morning, after 108 straight wins, I lost at Wordle. I got to four correct letters by my fourth guess, but there were too many possibilities for the remaining letter, and I didn’t guess right in either of my last two guesses.

Wordle failure

(If you’re still playing Wordle and it seems inconceivable to you that I could have failed to guess today’s middle letter, it may be because you’re playing the NY Times version of Wordle. I’m not. I grabbed the source code of the original Wordle and put it on a server so my family could continue to play no matter what the Times did with it. The two versions are no longer in sync.)

I wondered if I would have solved this puzzle even if my initial four right/one wrong guess had been my first guess. After a bit of thinking, I realized there was no guarantee—there are more than six words with that 🟩🟩⬜🟩🟩 pattern.

Which led to this thought: How many four right/one wrong patterns are there that have more than six words associated with them, and what are they? It seemed like something that could be solved with a fairly simple script.

It was. Here’s the script, which I named patterns.py:

 1:  #!/usr/bin/env python3
 3:  from collections import Counter
 4:  import re
 6:  # Initialize list of patterns
 7:  patterns = []
 9:  # Build list of patterns from list of words
10:  with open('wordle.txt') as f:
11:    for word in f:
12:      word = word.strip()
13:      patterns += [ word[:i] + '.' + word[i+1:] for i in range(5) ]
15:  # Dictionary of counts for each pattern
16:  counts = Counter(patterns)
18:  # Eliminate plural nouns and singular third-person verbs
19:  # Eliminate those that could be guessed in an exhaustive search
20:  pkeys = [ k for k in counts.keys() ]
21:  for k in pkeys:
22:    if re.search(r'[^s]s$', k) or counts[k] <= 6:
23:      del counts[k]
25:  # Print them out in order
26:  for k, v in counts.most_common():
27:    print(f'{k}: {v:2d}')

As you’ve probably guessed by looking at Line 10, I have extracted all the legal Wordle guesses from its source code and saved them, one word per line, in a file named wordle.txt.

The script is as short as it is because Python has a class that was perfectly suited for this type of problem. It’s the Counter class from the collections library. A Counter is a dictionary in which the keys are the elements of some sequence—a list, for example—and the values are the numbers of times those elements appear in the sequence. To create a Counter, which we do on Line 16, we provide it the sequence whose elements we want to count. There are no loops, no branches—just a single statement that does exactly what we want.

Of course, we have to create the sequence to build our Counter from, and that’s what’s done in Lines 7 and 10–13. For each word in the list of 12,972 legal guesses, there are five patterns that would lead to four correct letters and one incorrect letter. For example, here are the patterns for the word zygon:

.ygon   z.gon   zy.on   zyg.n   zygo.

I’ve used a period in each pattern to represent the position of the incorrect letter.1 After we’ve gone through all the words in wordle.txt, patterns will be a list that’s 64,860 (12,972 × 5) items long.

Line 16 creates the Counter counts from this list. At this point, the information in counts is useful, but it needs some pruning. It includes thousands of plural nouns and third-person singular verbs that are simply not going to be Wordle answers. Such words are allowed as guesses, but in 10 months of playing Wordle, I’ve never seen any show up as answers. And if we don’t eliminate them from counts, they’re going to overwhelm our output.

Lines 20–23 do the pruning. The regular expression in Line 22 is my imperfect attempt to find plural nouns and third-person singular verbs so they can be eliminated from consideration on Line 23. It finds all the patterns that end with an s but not a double s. This test finds some words that end with a single s that could well be an answer: basis, for example. But there aren’t that many such words, so I doubt I’m missing much by eliminating them.

Line 22 also finds patterns that match six or fewer words. Our goal is to find patterns that cannot be found through an exhaustive search of all legal words, so we want to retain only those patterns that match seven or more words.

You might be wondering about Line 20, which uses a comprehension to create the list pkeys that’s then used in the loop that starts on the next line. Why do we need pkeys? Why couldn’t the loop just start with

for k in count.keys():

The reason is that keys() returns a view object that’s tied to counts, not a fully independent list. Because we’re changing counts in the body of the loop, we can’t loop on something that changes along with counts. Building pkeys through a list comprehension makes it independent of counts.

Finally, Lines 26–27 print out the results in descending order of commonality using the most_common function of the Counter class. Normally, most_common would take an argument that determines the number of results returned, but when it’s called with no argument, all the key/value pairs are returned.

The script found 164 four right/one wrong patterns that had 7 or more examples in the word list. Here they are along with their counts.

.ight: 15    .ooky:  9    .inge:  8    .adge:  7    .icky:  7
co.ed: 14    .oral:  9    bo.ed:  8    .anty:  7    .inky:  7
.owed: 13    .ound:  9    .oked:  8    .arer:  7    .ived:  7
.aker: 12    .ouse:  9    .oner:  8    .ased:  7    do.er:  7
.ared: 12    .uddy:  9    .oppy:  8    .aser:  7    .ying:  7
.ater: 12    .unny:  9    .ough:  8    .atty:  7    .amed:  7
ho.ed: 12    .ushy:  9    .ubby:  8    .ayer:  7    .anga:  7
ra.ed: 12    .utty:  9    .ully:  8    .iddy:  7    fi.er:  7
.iver: 11    ca.ed:  9    .urry:  8    .iner:  7    .iler:  7
.erry: 11    .aver:  9    chir.:  8    .iter:  7    .appy:  7
.awed: 11    char.:  9    co.er:  8    .oody:  7    gra.e:  7
.olly: 11    .oled:  9    .oped:  8    .oose:  7    gri.e:  7
la.ed: 11    .osed:  9    .oted:  8    .otch:  7    .oker:  7
la.er: 11    .ined:  9    cra.e:  8    .otty:  7    li.er:  7
.itch: 10    do.ed:  9    .rate:  8    .oxed:  7    pare.:  7
.atch: 10    fa.ed:  9    .aunt:  8    .reed:  7    prim.:  7
.ated: 10    lo.ed:  9    .ewed:  8    .rier:  7    pro.e:  7
.ower: 10    ma.ed:  9    .ummy:  8    .rill:  7    ri.er:  7
.ager: 10    pa.ed:  9    .inny:  8    .uggy:  7    s.red:  7
.ippy: 10    pa.er:  9    .ammy:  8    .umpy:  7    s.art:  7
goo.y: 10    sa.er:  9    ha.ed:  8    .unch:  7    s.ore:  7
ra.er: 10    scra.:  9    mo.ed:  8    .usty:  7    sha.e:  7
ta.er: 10    .ider:  8    po.ed:  8    ca.er:  7    s.eer:  7
to.ed: 10    .ired:  8    ro.ed:  8    .aped:  7    s.ell:  7
wa.ed: 10    ba.ed:  8    sa.ed:  8    .aper:  7    si.ed:  7
.iled:  9    .aggy:  8    s.are:  8    .aved:  7    spa.e:  7
.andy:  9    .aked:  8    sta.e:  8    .ease:  7    spi.e:  7
.ayed:  9    .aled:  8    wa.er:  8    clo.e:  7    spoo.:  7
.eare:  9    .ally:  8    wi.ed:  8    co.ey:  7    stee.:  7
.elly:  9    .aned:  8    a.ing:  7    .oper:  7    sto.e:  7
.iked:  9    .ardy:  8    .ided:  7    .orse:  7    stra.:  7
.obby:  9    .enny:  8    .ural:  7    .rone:  7    tri.e:  7
boo.y:  9    .illy:  8    .addy:  7    .azed:  7    

The ight pattern at the top of the list isn’t surprising. We’ve seen it before. What is surprising—to me, at least—is how many of these patterns have the wrong character in the leading position: 102 of the 164. The next most common, with 34, have the wrong character in the center position, just like my pattern this morning.

You may have noticed that there’s nothing in patterns.py that provides the columnar output you see above. I piped the output of patterns.py through the rs command I wrote about a couple of weeks ago.

python3 patterns.py | rs -et -g3 33

I knew that command would come in handy again.

  1. An underscore might have been a more evocative choice for the incorrect letter, but a period is more in keeping with regular expression syntax.