Python Countdown

One of my wastes of time is watching celebrity-based British game shows on YouTube. This started with Never Mind the Buzzzcocks, expanded to QI, and now covers a handful of shows, including 8 Out of 10 Cats Does Countdown.1 One of the puzzles in this show is to form the longest possible word from a string of nine letters. I don’t have a head for anagrams, but I do like recreational programming, so I’ve always thought it would be fun to write a program that generates solutions to the show’s “letters round.”

The obvious way to do this—generate all permutations of the nine letters, then eight of the letters, then seven of the letters, etc., and filter them through a list of real words—didn’t appeal to me. There are 362,880 ways to arrange nine letters, the same number of ways to arrange eight of the nine letters, 181,440 ways to arrange seven of the nine letters, and so on. I’m not above brute force programming when I need to get a job done, but I prefer more elegant solutions when programming for fun. So my Countdown script didn’t get written.

Then last month John C. Cook wrote this post about anagram frequency and the light went on. The trick is to start with a list of words and build a data structure that allows you to look up all the anagrams of a given string of letters. A convenient data structure for this is an associative array, called a dictionary in Python. The clever bit is that the dictionary keys are the letters of the anagrams in alphabetical order and the values are lists of anagrams corresponding to those letters.

For example, one entry for six-letter words would have the key aeprss and the value

'aspers', 'parses', 'passer', 'prases', 'repass', 'spares', 'sparse', 'spears'


So if we were searching for anagrams in the string psasre, we’d first alphabetize it to aeprss and then look it up directly in our dictionary of six-letter anagrams.

The starting point, then, is to build a dictionary of anagrams. Because I wanted a bit more structure, I built four dictionaries, one each for nine-letter, eight-letter, seven-letter, and six-letter words.2 I then assembled those into a higher-level dictionary in which the word lengths were the keys.

I got the lists of words from which to build the dictionaries from this Scrabble help site. It provides long lists of legal Scrabble words of whatever length you like. By copying the pages for words of six through nine letters and doing a little editing, I created files of words of each length, with every word on its own line. I named these files word6.txt through word9.txt. Then I ran this script:

python:
1:  from collections import defaultdict
2:  import pickle
3:
4:  def sig(word):
5:    return ''.join(sorted(word)).strip()
6:
7:  words = {}
8:  for count in range(6, 10):
9:    lines = [ x.strip() for x in open(f'word{count}.txt') ]
10:    words[count] = defaultdict(set)
11:    for word in lines:
12:      words[count][sig(word)].add(word)
13:
14:  wordsfile = open('scrabble-words', 'wb')
15:  pickle.dump(words, wordsfile)


The sig function returns the alphabetized letters of the argument string and is used to generate the key, or “signature,” of each word in the file. It’s a simplified version of the sig function John C. Cook wrote.

The script goes through each of the wordn.txt files, adding the words in that file to the subdictionary associated with its word length. The defaultdict type from the collections module is used to avoid the initialization problem that arises when using regular Python dictionaries. The anagrams are stored as sets of strings (see Line 10 for how the defaultdicts are generated), which we’ll find useful later to avoid repetition when blending sets together.

When the looping in Lines 8—12 is done, we have a dictionary of dictionaries of sets assembled in words. The anagrams we looked at earlier can be found in words[6]['aeprss'].

This complex data structure is then stored in a file called scrabble-words using Python’s pickle method of data serialization. The idea is that we create this data structure once and then use it later without having to rebuild it every time we play Countdown.3

Which brings us to the script that gives us all the anagrams in a string:

python:
1:  import pickle
2:  from itertools import combinations
3:
4:  wordfile = open('scrabble-words', 'rb')
5:  words = pickle.load(wordfile)
6:
7:  def sig(word):
8:    return ''.join(sorted(word)).strip()
9:
10:  w = input("Letters: ").strip()
11:  print()
12:  for count in range(9, 5, -1):
13:    found = set()
14:    for s in combinations(w, count):
15:      t = ''.join(s)
16:      anagrams = sorted(words[count][sig(t)])
17:      if len(anagrams) > 0:
18:        found |= set(anagrams)
19:    if len(found) > 0:
20:      print(f'{len(found)} {count}-letter word{"s" if len(found)>1 else ""}:')
21:      print(' '.join(sorted(found)))
22:    else:
23:      print(f'No {count}-letter words')
24:    print()


Line 3–4 read in the scrabble-words file and convert it back into the data structure we want. Line 10 asks for the string of letters, and the rest of the script generates the anagrams.

The loop that begins on Line 12 sets the length of anagrams to search for. It starts at nine letters and works its way down to six, which means we’ll see the highest-scoring answers first. Line 13 creates an empty set of found words and Line 14 starts a loop that goes through all the count-letter combinations of the nine given letters. These combinations are generated by the aptly named combinations function in the itertools library.

Because the combinations function returns a tuple of letters, we use Line 15 to turn the tuple into a string. Line 16 then gets all the anagrams of that string. If there are any (Line 17), they’re added to the found set through the union operation to avoid repeats (Line 18).

The rest of the script is just output. Here’s an example of a run in Pythonista on my phone:

Is this cheating? Sure, but I’m not a contestant. I think of it more as an expansion of what Susie Dent does.

1. This is, oddly, an amalgam of two shows I find unwatchable: the Family Feud-like celebrity show, 8 Out of 10 Cats and the long-running traditional game show, Countdown

2. Words of five or fewer letters are fair game in Countdown and could have been included in my program, but I wasn’t interested in words that short.

3. This could be a false efficiency. While it’s true that reading the scrabble-words file is faster than building the data structure from scratch each time—about twice as fast according to my timings—both are so fast that the difference in speed is practically unnoticeable. And the speed difference comes at a price in storage space. The four wordn.txt files take up less than one megabyte of space in aggregate, while the scrabble-words file is just over five megabytes. I may test other storage schemes to see how they balance space and speed.