Wordle permutations
January 24, 2023 at 2:54 PM by Dr. Drang
Last week, as a followup to my most recent Wordle/grep post, John Gruber sent me an email in which he told me about a script he wrote to do the same thing but with less fuss. Inspired by this, I wrote my own script.
As you may recall, I had beaten my family by getting this Wordle in three guesses:
At the time, I went with TOTAL as my third guess because I couldn’t think of another word that fit the restrictions of the first two guesses. But a quick review with grep
,
egrep [^irepch][^irepcha]t[^irepch]{2} wordle.txt | egrep a
revealed 22 acceptable guesses,
altos dotal lotsa outta
antas gotta lotta sutta
antsy gutta lytta total
astun jotas motza untax
autos kutas notal
botas lotas oktas
of which TOTAL and ANTSY seemed like the only likely Wordle solutions.
Two things are annoying about using grep
the way I did:
- It required two passes through
grep
. - Even with copy/paste, building the initial regular expression is more work than it ought to be.
There might be a clever way to avoid the second pass through grep
but if there is, it would make the regular expression even longer than it is.
Gruber’s script avoids the repetition in both calls to grep
and the building of the regex:
fives -without IREPCH -with a23 ..t..
As you can see, the argument to fives
is a simple regex with the correct (green) letters given in their positions and periods elsewhere. This is supplemented by the -without
option, which is a list of the (black) letters that can’t appear in the solution, and the -with
option, which gives the (yellow) letters which must appear and a list of positions where they cannot appear.
It wasn’t clear from this example how fives
handles situations in which we have more than one yellow letter. It could be that you should use one -with
for each yellow letter; it could be that the argument to -with
combines the information for all the yellow letters. Either way, -with
is a very smart way of handling the yellow letters. It combines the positive—this is a letter that must be in the solution—with the negative—these are the positions it cannot be.
Because Gruber didn’t send me fives
itself, I was compelled to write my own utility, with the less clever name of wordle
. Here’s how I’d use it to get the possible third guesses for the game above:
wordle -g ..t.. -b irepch -y a23
As you can see, I can’t be bothered with long option names, so I used the colors as their mnemonic. Also, I put the green letter string as an option instead of the argument to wordle
itself. That seemed more symmetric.
As you might expect, while Gruber wrote fives
in Perl, I wrote wordle
in Python. Here it is:
python:
1: #!/usr/bin/env python
2:
3: import re
4: import itertools
5: from docopt import docopt
6: import os
7: import sys
8:
9: # Usage message
10: usage = '''Usage:
11: wordle [-g GREEN -y YELLOW -b BLACK]
12:
13: Print possible Wordle guesses based on the current state of green,
14: yellow, and black letters.
15:
16: Options:
17: -g GREEN correct letter pattern [default: .....]
18: -y YELLOW string of present letters and positions [default: ]
19: -b BLACK string of absent letters [default: ]
20:
21: GREEN is a 5-character string like '..e.t', where the correct
22: letters are at the solved positions and the periods are at the
23: unsolved positions.
24:
25: BLACK is a list of letters that aren't in the word.
26:
27: YELLOW is a string of yellow letters followed by their positions.
28: For example, if your previous guesses have yellow Rs in the second
29: and fourth positions and a yellow E in the third position, the
30: argument would be 'r24e3'.
31: '''
32:
33: # Get all the words as a string with each word on its own line
34: wordle = open(os.environ['HOME'] + '/blog-stuff/wordle/wordle.txt').read()
35:
36: # Process the options
37: args = docopt(usage)
38:
39: # Green letters
40: green = args['-g']
41: greenPositions = [ i for i, v in enumerate(green) if v != '.' ]
42: greenPositions = set(greenPositions)
43:
44: # Black letters
45: black = args['-b']
46:
47: # Yellow letters. In the dictionary, the keys are the letters, and
48: # the values are sets of yellow positions.
49: yellow = {}
50: for m in re.finditer(r'([a-z])(\d+)', args['-y']):
51: yellow[m.group(1)] = set( int(i) - 1 for i in m.group(2) )
52:
53: # Dictionary of impossible positions for the yellow letters. Like
54: # the yellow dictionary above, but with the green letter positions
55: # added.
56: impossible = {}
57: for k in yellow.keys():
58: impossible[k] = yellow[k] | greenPositions
59:
60: # Base regex patterns for each character position. Start with the
61: # green positions, and then turn the periods into negated character
62: # classes from the black and yellow letters.
63: basePattern = list(green)
64: unsolved = sorted(list(set(range(5)) - greenPositions))
65: for i in unsolved:
66: basePattern[i] = '[^' + black + ']'
67: for k in yellow.keys():
68: if i in yellow[k]:
69: basePattern[i] = basePattern[i].replace(']', k + ']')
70: if basePattern[i] == '[^]':
71: basePattern[i] = '.'
72:
73: # Starting point for permuting the yellow letters
74: start = list(yellow.keys()) + ['~']*(5 - len(yellow.keys()))
75:
76: # Set of regexes for searching the wordle string. Each regex is
77: # based on the basePattern but with some of the negated character
78: # classes replaced by possible permutations of the yellow letters.
79: regexes = set()
80:
81: def possible(s):
82: for k in yellow.keys():
83: if s.index(k) in impossible[k]:
84: return False
85: return True
86:
87: for s in filter(possible, set(itertools.permutations(start))):
88: newPattern = basePattern[:]
89: for k in yellow.keys():
90: newPattern[s.index(k)] = k
91: regexes |= {'^' + ''.join(newPattern) + '$'}
92:
93: # Accumulate Wordle words that match
94: matches = set()
95: for r in regexes:
96: for m in re.finditer(r, wordle, re.M):
97: matches |= {m.group(0)}
98:
99: # Print out the matches in alphabetical order
100: print('\n'.join(sorted(list(matches))))
The overall idea behind wordle
is to create a set of regexes, each of which does the following:
- Puts the green letters in the positions they must be.
- Prevents black letters from appearing anywhere else.
- Prevents the yellow letters from appearing where they’ve been.
- Puts the yellow letters, in turn, where they can be.
The first three of these is basically what I was doing by hand in my grep
solution. The fourth is a sort of combinatoric way of dealing with the presence of the yellow letters in positions where they haven’t yet been. For the example we’ve been looking at, the regular expressions wordle
searches on are
^a[^irepcha]t[^irepch][^irepch]$
^[^irepch][^irepcha]t[^irepch]a$
^[^irepch][^irepcha]ta[^irepch]$
As you can see, the A is put sequentially in all of its possible positions: first, fourth, and fifth. Because this example has only one letter, it’s very simple, but wordle
can handle multiple yellow letters.
Suppose my first guess was LATER. What could the next guess be? Here’s the wordle
command I’d run:
wordle -g ..t.. -b er -y l1a2
Note that the -y
argument combines the yellow letters and their positions into a single string. The results are
altho cital ictal total
altos dital notal vital
aptly dotal octal
and the regexes that were searched were
^alt[^er][^er]$
^[^erl][^era]tal$
^[^erl]lt[^er]a$
^a[^era]tl[^er]$
^a[^era]t[^er]l$
^[^erl][^era]tla$
^[^erl]lta[^er]$
As you can see, the L and A are both prevented from being in the first and second positions, respectively, and are otherwise placed in all of their possible permutations.
Let’s go through the code and see what it does.
First, the options are handled by docopt
, a lovely library that parses the options from the usage message instead of creating a usage message from an options specification. It’s my favorite way of writing scripts that need options.
The newline-separated list of possible Wordle guesses is stored in $HOME/blog-stuff/wordle/wordle.txt
which is where Line 34 reads the text that we’re going to search.
Lines 40–42 parse the -g
option and create both a green
regex and a greenPositions
set of known character positions. Line 45 parses the -b
option and creates the black
string of letters that cannot be in the solution. We’ll use that to create a negated character class.
Lines 49–51 parse the -y
option and build a yellow
dictionary from it. yellow
’s keys are the yellow letters, and its corresponding values are sets of the positions of the yellow letters. Note that because Python lists are zero-based and most human beings are one-based, the positions in yellow
are one less than what’s given in the -y
argument. Note also that I’m using sets instead of lists to avoid repeating positions in the next step.
Lines 56–58 create an impossible
dictionary that extends the yellow
dictionary to include the green positions.
Lines 63–71 build a list of regexes for each character based on the green letters, the black letters and where the yellow letters can’t be. It’s basically the regex I would use in a grep
-based solution, except each character position is an item in a list. For the game at the beginning of the post, basePattern
would be
['[^irepch]', '[^irepcha]', 't', '[^irepch]', '[^irepch]']
Lines 74–91 use basePattern
and what we know about the yellow letters to build the set of regular expressions we’re going to search for. The start
variable in Line 74 is a five-character list that begins with the yellow characters and is filled out with tildes (any non-letter character would do for this fill). In Line 87, we generate all the permutations of this list using the permuatations
function from the itertools
library. This will always yield an iterator that’s 120 lists long (that’s 5 factorial), but many of the permutations will be, for our purposes, identical and can be eliminated by converting the iterator into a set.
Let’s use our LATER example to see how this works. Recall that if our first guess in the game above had been LATER, we would have executed wordle
this way:
wordle -g ..t.. -b er -y l1a2
That would give us a start
list of ['l', 'a', '~', '~', '~']
. A call to
itertools.permutations(start)
would yield these 120 lists (where I’ve collapsed the lists into strings to make it easier to read):
al~~~ a~~~l l~~a~ ~a~~l ~l~~a ~~la~
al~~~ a~~~l l~~a~ ~a~~l ~l~~a ~~la~
al~~~ a~~~l l~~~a ~a~~l ~l~~a ~~l~a
al~~~ a~~~l l~~~a ~a~~l ~l~~a ~~l~a
al~~~ la~~~ l~~~a ~a~~l ~~al~ ~~l~a
al~~~ la~~~ l~~~a ~a~~l ~~al~ ~~l~a
a~l~~ la~~~ l~~~a ~la~~ ~~al~ ~~l~a
a~l~~ la~~~ l~~~a ~la~~ ~~al~ ~~l~a
a~l~~ la~~~ ~al~~ ~la~~ ~~al~ ~~~al
a~l~~ la~~~ ~al~~ ~la~~ ~~al~ ~~~al
a~l~~ l~a~~ ~al~~ ~la~~ ~~a~l ~~~al
a~l~~ l~a~~ ~al~~ ~la~~ ~~a~l ~~~al
a~~l~ l~a~~ ~al~~ ~l~a~ ~~a~l ~~~al
a~~l~ l~a~~ ~al~~ ~l~a~ ~~a~l ~~~al
a~~l~ l~a~~ ~a~l~ ~l~a~ ~~a~l ~~~la
a~~l~ l~a~~ ~a~l~ ~l~a~ ~~a~l ~~~la
a~~l~ l~~a~ ~a~l~ ~l~a~ ~~la~ ~~~la
a~~l~ l~~a~ ~a~l~ ~l~a~ ~~la~ ~~~la
a~~~l l~~a~ ~a~l~ ~l~~a ~~la~ ~~~la
a~~~l l~~a~ ~a~l~ ~l~~a ~~la~ ~~~la
When creating permutations, each tilde is considered a separate item, which is why so many of these lists look the same. There are 6 identical lists (3 factorial) for each unique position of L and A. We don’t need the duplicates and can get rid of them with
set(itertools.permutations(start))
to reduce the number of lists down to just 20:
al~~~ la~~~ ~al~~ ~l~a~ ~~la~
a~l~~ l~a~~ ~a~l~ ~l~~a ~~l~a
a~~l~ l~~a~ ~a~~l ~~al~ ~~~al
a~~~l l~~~a ~la~~ ~~a~l ~~~la
Of course, even these 20 lists are more than we need because many of them are impossible. Anything with an L in the first position or an A in the second should be filtered out. That’s what the possible
function in Lines 81–85 and the filter
function in Line 87 are for. possible
returns False
for all the permutations that have one or more letters in an impossible
position and True
for all the others. Ultimately, the
for s in filter(possible, set(itertools.permutations(start))):
loop that starts in Line 87 goes through seven of these permutations and combines them with basePattern
to return the regexes
set that we showed above:
^alt[^er][^er]$
^[^erl][^era]tal$
^[^erl]lt[^er]a$
^a[^era]tl[^er]$
^a[^era]t[^er]l$
^[^erl][^era]tla$
^[^erl]lta[^er]$
Finally, Lines 94–97 search the wordle
string for each of these patterns in turn and collect all of them into the matches
set. Line 100 then sorts the collection of matches alphabetically and prints them out, one per line.
I could have written a script that hewed more closely to the logic of my grep
pipeline. Such a script would have created basePattern
, searched on that, and then searched that intermediate result for words that have all of the yellow letters. It would have been easier to write and might have run faster, too. But I wanted to do something new. While I’ve used the itertools
library before, I’ve never used permutations
. And the filter
function was new to me, too, despite it being built-in. I had fun writing wordle
and learned some things I may find useful in the future.
While I’m happy with my script, I hope John Gruber publishes his. He used to publish more scripty articles, and I miss them. He’s told me he might write about it, or at least publish it as a Gist, and I look forward to that. Don’t pester him about it, though—he had a tough weekend.
What’s a Twitter client?
January 20, 2023 at 5:34 PM by Dr. Drang
Thanks to yesterday’s Engadget report from Karissa Bell, we all know the official Twitter rule that Twiterrific and Tweetbot broke. It’s this bit of Section II.A of the Twitter Developer Agreement:
You will not or attempt to (and will not allow others to)
…
c) use or access the Licensed Materials to create or attempt to create a substitute or similar service or product to the Twitter Applications;
…
Of note is that this section of the agreement is entitled “Reverse Engineering and other Restrictions,” and Twitter reverse engineered it just yesterday to add this new restriction.
Michael Tsai had a good question today:
I don’t understand what this means for API users such as NetNewsWire that are not trying to create their own client. What counts as a substitute?
Because you can only read tweets through NetNewsWire that certainly shouldn’t count as a “substitute or similar service or product.” But who knows what’s going through the mind of Twitter nowadays?
On this week’s episode of Upgrade, Jason and Myke (mostly Myke) talk about how it would be a perfectly reasonable business decision for Twitter to shut down its API entirely. That goes too far, and I suspect they didn’t really mean it, because a lot of the API is valuable to Twitter’s most popular tweeters. Large, well-known organizations (like media companies) that tweet a constant stream of links to and excerpts from their own websites need the API to automate that process and keep it running smoothly. Twitter can’t kill the API without pissing off their most valuable content producers.
Which, I assume, is why the new section of the Developer Agreement is written the way it is. Rewriting the API and to make Tweetbot impossible while still giving CNN the freedom to spit out automatically generated tweets every few minutes would be really hard. Adding a provision that allows Twitter itself to decide what is and is not a “substitute or similar service or product” is really easy.
Twitter via RSS
January 15, 2023 at 10:39 AM by Dr. Drang
I canceled my Tweetbot subscription a month or two ago and don’t go to Twitter anymore—the former followed from the latter—so I haven’t been caught up in the recent mess. But it was a reminder that I wanted to share a simple tip for fellow Twitter-leavers on the off-chance you didn’t already know it.
There are plenty of Twitter accounts I still want to read that haven’t set up parallel accounts on Mastodon. For example, @BridgesCanal is just photos of bridges over English canals. Most of them are quaint old masonry arch bridges that look like something you’d see in a period drama on PBS. They’re soothing to look at, and they speak to the structural engineer in me.
I read these accounts via RSS. Once upon a time, RSS feeds were part of Twitter, but those days have long since passed. Now you have to use either the Twitter features built into your RSS reader or use a specialized service like RSS.app.1
As a NetNewsWire user, I set up its Twitter extension, which uses my Twitter account’s credentials, to access these accounts’ timelines and present them to me as if they were any other RSS subscription.
If you’re not a NetNewsWire user, your feed reader probably has the same feature but in a slightly different guise. I know Feedbin has it. I would have checked out Feedly, too, but it now presents itself as this weird AI-driven thing that frankly scares me a little.
Oh, and if you’re one of those people who chucked RSS years ago on the “Twitter is my RSS” theory, you can always come back.
Update 1/16/2023 4:44 PM
I saw today, via this post by John Voorhees, that Federico Viticci wrote about using RSS to follow Twitter last week on Club MacStories. I missed Federico’s article because I dropped my subscription to Club MacStories shortly after I got my M1 MacBook Air and moved away from the iPad and Shortcuts.
I wouldn’t be surprised if Federico also mentioned this on Connected or MacStories Unwind. If so, I missed it, as I haven’t listened to an Apple-focused podcast in several weeks. Too many other things going on in my life recently. I’ve been skipping a lot of my RSS feed, too, although luckily I did see John’s post.
Anyway, I imagine that most of the people who read me also read/listen to Federico and may have wondered why I didn’t link to him. It’s because I didn’t know until now.
-
Nota bene: I don’t have an account with RSS.app and know nothing about it other than that it can make RSS feeds from Twitter timelines. ↩
Not cheating at Wordle
January 14, 2023 at 3:53 PM by Dr. Drang
I got yesterday’s Wordle1 in three guesses and was immediately (and jokingly) accused of cheating by my family—mainly because they needed more guesses, but also because the word had a repeated letter and we usually don’t start guessing repeated letters that early in the game.
We all accuse each other of cheating from time to time, especially when the day’s winner gets the word in two guesses. But this was, I think the first time repeated letters were used as evidence. As evidence goes, it’s not bad. I normally would try to use five unique letters at this stage of the game. The only reason I didn’t was that I spent a good ten minutes after the second guess trying to come up with another word that had five unique letters and used what I knew from the board. Because I couldn’t, I went with TOTAL.
Later, when I opened my MacBook Air for the day (I always play Wordle on my phone), I decided to see if there were words I could have played other than TOTAL. As in this year-old post, I used grep
to explore, entering this at the command line:
egrep [^irepch][^irepcha]t[^irepch]{2} wordle.txt | egrep a
The pattern used by the first call to egrep
2 searches wordle.txt
—the file with all 12,972 legal guesses—for
- A first letter that wasn’t previously rejected.
- A second letter that wasn’t previously rejected or A.
- A T as the third letter.
- Two more letters that weren’t previously rejected.
The results from this first call will include words that have no As. The second call filters those out, resulting in
altos dotal lotsa outta
antas gotta lotta sutta
antsy gutta lytta total
astun jotas motza untax
autos kutas notal
botas lotas oktas
I can’t imagine any of these being a solution other than TOTAL and ANTSY. I confess that ANTSY never occurred to me. ALTOS and AUTOS are both reasonable words, but they violate the unwritten rule of no plurals. If any of the others were an actual solution, I—and most Wordle users, I suspect—would be really pissed. I mean, LOTSA?
By the way, the command shown above outputs the 22 words on separate lines. To get them in the more compact four-column form, I piped the output through the reshaping command I recently learned, rs
:
egrep [^irepch][^irepcha]t[^irepch]{2} wordle.txt | egrep a | rs -t -g4 0 4
The 0 4
tells rs
to output four columns and as many rows as necessary (the zero is kind of a placeholder that rs
ignores). The -t
tells rs
to print the output column-by-column instead of row-by-row. And the -g4
tells rs
to use a four-space gutter to separate the columns.
-
Longtime readers may remember that I copied the pre-New York Times version of Wordle and installed it on a server I control. If you play it at the Times site, what you see here will have nothing to do with game you played yesterday. ↩
-
egrep
is the same asgrep
, except it uses “extended” regular expressions. Extended regular expressions are reasonably close to Perl-compatible regular expressions, which is what I’m used to. ↩