Lagrange and the JWST

The James Webb Space Telescope reached its destination today. Since its launch a month ago, there’s been a lot of explanation on space-centric sites of what and where the Lagrange points are. Most of these have neat animations or other graphics, and they all talk about how L2 (the JWST’s Lagrange point) is about 1.5 million kilometers. But they tend to avoid the math that explains why L2 is at that distance. Longtime readers of ANIAT know that avoiding the math is not what I do here.

Longtime readers of ANIAT also know that I’ve written about Lagrange points a few times already. In one of those, I showed how to find all five Lagrange points and included one of my favorite graphs.

Contour lines for Lagrange points

This is a contour plot of the potential energy of a small satellite, and the stationary points (i.e., points where the slope is zero) are the Lagrange points. The great thing about doing an energy-based solution is that you don’t have to know roughly where the Lagrange points are before you start—their existence and positions fall out of the analysis naturally. The downside is that the math gets complicated.

So let’s cheat to make the math simpler. We know the L2 point is beyond Earth’s orbit on the line that connects the Sun and the Earth. We’ll start there and figure out how far out from the Earth it is.

Here’s the layout:

Lagrange point 2

The orbits of the Earth and L2 are assumed circular (pretty close to true), and whatever is at L2 is taken to be so small that it doesn’t interfere with the Earth’s orbit in any measurable way (definitely true).

Let’s start by looking at the Earth’s orbit. Newton’s second law says the force acting on the Earth is equal to its mass times its acceleration. The force comes from Newton’s law of gravitation,

\[\frac{G M m}{r^2}\]

where \(G\) is the universal gravitational constant, \(M\) is the mass of the Sun, \(m\) is the mass of the Earth, and \(r\) is the distance between the two. The force is directed toward the Sun. Because the Earth is taken to be moving in a circle of radius \(r\) at constant angular velocity, its acceleration is

\[r \omega^2\]

where \(\omega\) is the angular velocity. Like the force, this acceleration points toward the Sun.

Putting these together, we get

\[\frac{G M m}{r^2} = m r \omega^2\]

and so

\[\omega^2 = \frac{G M}{r^3}\]

Now let’s look at our satellite at L2. It’s seeing two gravitational forces, one from the Sun

\[\frac{G M \tilde{m}}{(r + d)^2}\]

and one from the Earth

\[\frac{G m \tilde{m}}{d^2}\]

where \(\tilde{m}\) is the mass of the satellite. By definition, L2 orbits at the same angular velocity, \(\omega\), as the Earth, so

\[\frac{G M \tilde{m}}{(r + d)^2} + \frac{G m \tilde{m}}{d^2} = \tilde{m} (r + d) \omega^2 = \tilde{m} (r + d) \frac{G M}{r^3}\]

We’re now going to introduce a couple of nondimensional quantities:

Plugging these definitions into the equation for the satellite lets us eliminate \(m\) and \(d\) to get

\[\frac{G M \tilde{m}}{r^2 (1 + \delta)^2} + \frac{G M \mu \tilde{m}}{\delta^2 r^2} = \tilde{m} r(1 + \delta) \frac{G M}{r^3}\]

We can cancel out \(G\), \(M\), \(\tilde{m}\), and \(r\) from this equation, leaving

\[\frac{1}{(1 + \delta)^2} + \frac{\mu}{\delta^2} = (1 + \delta)\]

or, after multiplying through by \((1 + \delta)^2\),

\[1 + \mu \left( \frac{1}{\delta} + 1 \right)^2 = (1 + \delta)^3\]

From NASA’s Sun Fact Sheet, we see that \(\mu = 3.00\times 10^{-6}\), a very small number. That means \(\delta\) will also be a small number, and we can approximate the above equation with

\[1 + \frac{\mu}{\delta^2} \approx 1 + 3 \delta\]

and therefore

\[\mu \approx 3 \delta^3\]

Plugging in the value for \(\mu\) gives us

\[\delta \approx 0.01\]

so the distance from the Earth to L2 is one one-hundredth of the distance from the Sun to the Earth. Since the Earth is about 150 million kilometers from the Sun, L2 is about 1.5 million kilometers from the Earth. Which is what everyone has been saying, but now you can prove it.

What’s the diff?

Last week, Rob Griffiths—late of Mac OS X Hints and currently of Many Tricks—asked a really tough question on the Keyboard Maestro forum:

I’m trying to run a very simple shell script:

  cd /tmp
  echo "$KMVAR_var1" > file1.txt
  echo "$KMVAR_var2" > file2.txt
  diff file1.txt file2.txt > diff_result.txt

I don’t actually want the diff results in a file, I want them in a variable returned by the shell script action. But because that was failing, I tried the above to write it to a file instead. But it still failed (with the generic failed in shell script message).

Rob learned that the macro failed only when diff was the last command in the shell script. Adding a final innocuous command, like echo "foo" to the end of it got rid of the error. And of course, the script—without the echo "foo" line—worked just fine when run from the command line.

How can this be? My first thought was that the error had something to do with interactive vs. noninteractive shells, but that led nowhere. So I made my own version of Rob’s macro and changed the last line from diff to comm:

KM Griffdiff

The shell script in the last action is

cd ~/Desktop/griffdiff
echo "$KMVAR_InstanceVar1" > file1.txt
echo "$KMVAR_InstanceVar2" > file2.txt
comm file1.txt file2.txt

This worked fine. I think of comm and diff as being similar, so success with comm and failure with diff was a real stumper. There were no clues in diff’s man page. I put the problem aside to think about in the evening.

As luck would have it, when I returned to the problem I was using my iPad, so when I decided to review the man page again, I used this online version instead of the one included with macOS. And there, down at the end of the DESCRIPTION section, was the answer:

FILES are 'FILE1 FILE2' or 'DIR1 DIR2' or 'DIR FILE' or 'FILE
DIR'.  If --from-file or --to-file is given, there are no
restrictions on FILE(s).  If a FILE is '-', read standard input.
Exit status is 0 if inputs are the same, 1 if different, 2 if

Emphasis mine. I went back to look at the error message Keyboard Maestro gave when the shell script action ended with diff. Without the timestamps it was

Execute macro “Griffdiff” from trigger Editor
Action 222451 failed: Task failed with status 1
Task failed with status 1. Macro “Griffdiff” cancelled (while executing Execute Shell Script).

The “Task failed with status 1” message was not—as I had previously thought—giving me a status code generated by Keyboard Maestro itself. Instead, KM was just passing along the status code it had received from the shell. diff had returned an exit code of 1 because the inputs were different. Keyboard Maestro then interpreted the nonzero exit code as an error and bailed out. So everything was working just as it was supposed to.

But that didn’t fix Rob’s problem. Luckily, I remembered that I’d run into a situation some time ago in which I had to turn off Keyboard Maestro’s normal error handling. I did it by changing the “Failure Aborts Macro” and “Notify on Failure” settings in the action’s gear menu from ✔︎ to ✖︎.

KM gear menu

With those two changes, the macro ran fine. Now I had two questions:

  1. How had I missed the exit status stuff when I looked at the diff’s man page earlier?
  2. Why does diff return a nonzero exit code when it does exactly what you want?

The answer to the first question was easy: Here’s what macOS Catalina’s man diff says at the end of the DESCRIPTION section:

FILES  are  `FILE1  FILE2'  or `DIR1 DIR2' or `DIR FILE...' or `FILE...
DIR'.  If --from-file or --to-file is given, there are no  restrictions
on FILES.  If a FILE is `-', read standard input.

Nothing about exit status in that paragraph or anywhere else. According to the copyright notice, Catalina’s diff man page was written in 2002 (way to keep on top of things, Apple!). The online version was updated in 2019.

I’m not sure about the answer to the second question, but my guess is that it works that way so diff can be used in if statements or those short circuit statements with && or || you often see in shell scripts, like

diff file1.txt file2.txt && echo "Identical files"

where the part after the && is executed only if the part before it returns a zero (success) exit code. Still, I found it surprising. diff is probably used most often on files that are known to be different—it’s weird that using it that way produces an exit code that typically indicates failure.

By the way, if you’re wondering about the exit status of a command, you can learn what it is by running

echo $?

immediately after the command. This works in both bash and zsh.

Wordle letters

Like all internet hipsters, I started playing Wordle a few days before the New York Times article that introduced it to the great unwashed. I don’t think I’ll stick with it for very long—the universe of five-letter words seems like something that will wear thin soon—but I am interested in the strategy. So I did a little scripting.

Clearly, the idea is to identify as many letters in the target word as quickly as possible. Letter frequencies in English text famously follow the ETAOIN SHRDLU order, an ordering that was built into Linotype keyboards back in the days of hot metal type. But Wordle isn’t based on general English text, it’s based specifically on five-letter words. So we need the letter frequencies for that restricted set.

Mac and Linux computers carry on the Unix tradition of including a file, /usr/share/dict/words that’s used for spell checking. It’s an alphabetical list with one word per line, which is very convenient for working out letter frequencies. But first, we’ll need to pull out just the five-letter words, leaving behind any proper nouns. That can be done with a simple Perl one-liner:1

perl -nle 'print if /^[a-z]{5}$/' /usr/share/dict/words > words5.txt

The regular expression that is the backbone of this command matches only five-letter words with no capitals. After running this, we have a file, words5.txt, that contains just the words we need for Wordle. It has about 8,500 entries.

Now that we have a file with just five-letter words, we can compute the letter frequencies with this script:

 1:  #!/usr/bin/perl
 3:  while($word = <>){
 4:    chomp $word;
 5:    foreach (split //, $word){
 6:      $freq{$_}++;
 7:    }
 8:  }
10:  foreach $letter (sort keys %freq){
11:    print "$letter\t$freq{$letter}\n";
12:  }

Lines 3—8 loop through the lines of the input file and build up a hash (or associative array) named %freq with letters as the keys and their counts as the values. Lines 10—12 then print out the hash in alphabetical order:

a   4467
b   1162
c   1546
d   1399
e   4255
f   661
g   1102
h   1323
i   2581
j   163
k   882
l   2368
m   1301
n   2214
o   2801
p   1293
q   84
r   3043
s   2383
t   2381
u   1881
v   466
w   685
x   189
y   1605
z   250

I could have included a sorting command to reorder the hash in frequency order, but it was easier to just copy the output, paste it into a spreadsheet, and do the reordering there. I also used the spreadsheet to sum the counts and present the frequencies as percentages.

Letter Count Frequency
a 4467 10.5%
e 4255 10.0%
r 3043 7.2%
o 2801 6.6%
i 2581 6.1%
s 2383 5.6%
t 2381 5.6%
l 2368 5.6%
n 2214 5.2%
u 1881 4.4%
y 1605 3.8%
c 1546 3.6%
d 1399 3.3%
h 1323 3.1%
m 1301 3.1%
p 1293 3.0%
b 1162 2.7%
g 1102 2.6%
k 882 2.1%
w 685 1.6%
f 661 1.6%
v 466 1.1%
z 250 0.6%
x 189 0.4%
j 163 0.4%
q 84 0.2%

Using /usr/share/dict/words as a source of words was convenient, as it was already on my computer, but I doubted it was the source of legal words in Wordle. Wouldn’t a word gamer use a Scrabble dictionary? A little searching led me to this page and this one. I copied the source code for each, opened them in BBEdit, and after a few search-and-replaces, had a two new lists of five-letter words. They were nearly the same, differing by about 60 words out of 8,900. I merged the two and named the result scrabble5.txt.2

Running the letter frequency script on this new file and doing the same sorting and percentage calculations as before gave me this list:

Letter Count Frequency
s 4623 10.4%
e 4585 10.3%
a 3986 8.9%
o 2977 6.7%
r 2916 6.5%
i 2633 5.9%
l 2440 5.5%
t 2319 5.2%
n 2022 4.5%
d 1727 3.9%
u 1697 3.8%
c 1475 3.3%
y 1403 3.1%
p 1384 3.1%
m 1339 3.0%
h 1214 2.7%
g 1113 2.5%
b 1096 2.5%
k 949 2.1%
f 790 1.8%
w 690 1.5%
v 475 1.1%
z 249 0.6%
x 213 0.5%
j 186 0.4%
q 79 0.2%

The leap of s from 5.6% to 10.4% suggest plurals play a big role in Scrabble dictionaries and not much of one in /usr/share/dict/words. I checked this by running

perl -nle 'print if /s$/' words5.txt | wc -l


perl -nle 'print if /s$/' scrabble5.txt | wc -l

to tell me how words that end in s are in each of the two files. There were 357 such words in words5.txt and 2771 in scrabble.txt. This told me two things:

  1. Spell checkers that use /usr/share/dict/words must use algorithmic methods to deal with plurals.
  2. A lot of legal five-letter Scrabble words are just pluralized four-letter words.

I did this on Monday and was pretty happy with it until I read that Times article on Wednesday, where it said that Josh Wardle, the creator of Wordle, had started with a list of 12,000 words but then

…narrowed down the list of Wordle words to about 2,500, which should last for a few years.

That would mean my frequencies are based on a much broader set of words than Wordle considers legal, which could throw off my calculated frequencies.

And yet…

Today I sacrificed my score by trying out some oddball Scrabble words to see if Wordle would accept them. It did. Here’s my game:

Bad Wordle game

I don’t know about you, but if I were limiting myself to just 2,500 words, things like heuch and vrows wouldn’t make the cut. (I would definitely include rebus and tapir, which some dorks—I would also include dorks—have apparently complained about.) So I’m wondering if the Times got this part of the story mixed up somehow. (Update: Nope, see below.)

You might be wondering if counting the number of times each letter appears in the list of legal five-letter words is the right way to characterize the frequency of letters. Maybe we should be counting the number of words each letter appears in. This script, which uses the uniq function in the List::Util module to filter out repeated letters, does just that:

 1:  #!/usr/bin/perl
 3:  use List::Util qw(uniq);
 5:  while($word = <>){
 6:    chomp $word;
 7:    foreach (uniq split //, $word){
 8:      $freq{$_}++;
 9:    }
10:  }
12:  foreach $letter (sort keys %freq){
13:    print "$letter\t$freq{$letter}\n";
14:  }

Using this way of counting gives us another letter frequency table:

Letter Count Frequency
s 4106 46.1%
e 3993 44.8%
a 3615 40.5%
r 2751 30.9%
o 2626 29.5%
i 2509 28.1%
l 2231 25.0%
t 2137 24.0%
n 1912 21.4%
u 1655 18.6%
d 1615 18.1%
c 1403 15.7%
y 1371 15.4%
p 1301 14.6%
m 1267 14.2%
h 1185 13.3%
g 1050 11.8%
b 1023 11.5%
k 913 10.2%
f 707 7.9%
w 686 7.7%
v 465 5.2%
z 227 2.5%
x 212 2.4%
j 184 2.1%
q 79 0.9%

In this table, the frequency column gives the percentage of words in which each letter appears. The ordering of the letters is basically the same as before, so I don’t think this way of counting will change your strategy.

Update 1/8/2022 5:29 PM
People have gently tweeted me that the Wordle source code—which you can easily download, and I could have easily downloaded before writing this post—has two lists. One is words that might be answers (2,315), and the other is additional words that can be guessed (10,657). You can run my scripts on either of these lists (or their concatenation) to refine your strategies. If you’re a serious Wordle player, think carefully about whether doing so would spoil your fun.

Thanks to Tim Dierks and Antonio Bueno.

Update 01/9/2022 11:22 AM
Todd Wells devised a way to grade five-letter words according to how well they match a decent-sized corpus of five-letter words. His grade is based on both the number of letters that match and whether they’re in the right position. Both his Python code and his explanation of how it works are very clear, and you should go take a look.

I suppose I should have said this in the original post, but I’ll say it here: These methods of scoring letters and words are really just for your first—and maybe your second—guess (Todd makes this explicit by naming his script wordle_starting_guess). They are ways of increasing the probability of “hits” early in the game. After that, it’s a matter of vocabulary and logic to bring you home.

  1. Please don’t tweet me shell commands that can do this with less typing. I know they exist, but this was the most efficient for me because I could type it out with virtually no thought at all. 

  2. You may remember I did a similar thing a couple of years ago to create lists of words with 6–9 letters to help me cheat at Countdown

Automating the annotation of PDFs

A few days ago, on the Automators podcast forum, thatchrisharper asked about a way to automatically add the filename to the first page of a PDF. While I knew of many tools that allow you to overlay one PDF on top of another, I didn’t know of any to directly add text to a specific spot of a PDF. But it was the sort of thing I’ve occasionally needed to do, so I went looking for a solution. This post, most of which is in my answer to thatchrisharper’s question, is what I found.

The solution comes from a combination of Ghostscript, which you can install through Homebrew, and pdfMark (or maybe pdfmark without the intercap—the naming isn’t consistent), a system created by Adobe and described this way:

The pdfmark operator is a PostScript-language extension that describes features that are present in PDF, but not in standard PostScript.

Basically, pdfMark was a way for Adobe’s Distiller application to add PDF-specific bits (like annotations) to PostScript files as it was converting them to PDF. It came out back in the 90s, when PostScript was well established, but PDF was still in its infancy. We can use Ghostscript in place of Distiller.

Let’s say we have a PDF that consists of several letter-sized pages, and we want to add some text centered in the top margin. If our original looks like this,

PDF before annotation

we want the annotated version to look like this,

PDF after annotation

where the annotation appears on the first page only. Here’s what we do:

First, create a text file (we’ll call it pdfmark.txt, but the name can be anything) with the following contents:

1:  [
2:  /Subtype /FreeText
3:  /SrcPg 1
4:  /Rect [206 758 406 774]
5:  /Color [1 1 .75]
6:  /DA (/HeBo 14 Tf 0 0 .5 rg)
7:  /Contents (My Annotation Text Here)
8:  /ANN pdfmark

This file can be saved anywhere, but for convenience we’ll assume it’s in the same folder as the original PDF, which I will cleverly name original.pdf.

Now we run this Ghostscript command,

gs -dBATCH -dNOPAUSE -dQUIET -sDEVICE=pdfwrite -sOutputFile=annotated.pdf  pdfmark.txt original.pdf

and we end up a new PDF, annotated.pdf, with the annotation shown above.

You can probably figure out what each line of pdfmark.txt does, but let’s run through it anyway.

The opening bracket on Line 1 is the necessary start of every pdfMark command. If you go looking for the matching closing bracket, you won’t find one. If you want to know why there’s no closing bracket, you’ll have to ask Adobe. Seems like really dumb syntax to me.

Line 2 declares this mark to be of the FreeText subtype. There are over a dozen subtypes you can use; see page 16 of the manual.

Line 3 tells the annotation to appear on the first page of the output document. As far as I can tell, there’s no convenient way to extend this command to multiple pages. /SrcPg can only be followed by a single integer argument, so if you want the same thing on several pages, your pdfmark.txt file will have to have this command repeated for each page.

Lines 4 and 5 define the bounding box for the annotation and set its background color. PostScript and PDF coordinates are in points (1/72 inch) with the origin at the lower left corner of the page. Unlike a lot of graphics formats, but like most graphs you see in math class, the y-coordinate increases as you go up. A letter-sized page is 612 points wide and 792 points tall, so the bounding box in Line 4 is 200 points wide, centered left/right, and its top edge is ¼ inch down from the top of the page. Colors are defined by an red-green-blue triplet of numbers that run from 0 to 1. Black is 0 0 0 and white is 1 1 1. White is the default, so if you leave out Line 5, it’s equivalent to /Color [1 1 1]

Line 6 defines the font used in the annotation. /DA means default appearance, and the rest of the line tells the text to appear in 14-point Helvetica Bold with a dark blue color.

Line 7 defines the text of the annotation between the parentheses.

Finally, Line 8 identifies the type of pdfmark as an annotation.

(There’s a nice document with other examples of pdfMark commands and another Adobe reference manual.)

So how do we automate this? Fundamentally, we create a temporary file for the pdfMark commands, run the Ghostscript command, and then delete the temporary file. Here’s a quickly written script, annotatePDF:

 1:  #!/usr/bin/env python
 3:  import os
 4:  import subprocess
 5:  import sys
 6:  import tempfile
 8:  # Set the parameters
 9:  annText = sys.argv[1]
10:  originalPDF = sys.argv[2]
11:  annotatedPDF = sys.argv[3]
13:  # Build the pdfMark command
14:  pdfMarkCommand = f"""[
15:  /Subtype /FreeText
16:  /SrcPg 1
17:  /Rect [156 758 456 774]
18:  /DA (/HeBo 14 Tf)
19:  /Contents ({annText})
20:  /ANN pdfmark
21:  """
23:  # Create a temporary file for the pdfMark commands and write to it
24:  fh, fpath = tempfile.mkstemp()
25:  with open(fpath, 'w') as f:
26:    f.write(pdfMarkCommand)
28:  # Run the Ghostscript command to make the annotated file
29:['gs', '-dBATCH', '-dNOPAUSE', '-dQUIET', '-sDEVICE=pdfwrite', f'-sOutputFile={annotatedPDF}',  fpath, originalPDF])
31:  # Delete the pdfMark command file
32:  os.remove(fpath)

This is not a great script. No error handling, no options, and no way to automate the naming of the output file. But it works.

annotatePDF 'Hello, world!' original.pdf annotated.pdf

As you can see from Lines 14—21, I don’t really want a yellow box with dark blue text. That was just to show some of pdfMark’s features.

By the way, the annotations you add through pdfMark are just like annotations you add in Preview or PDFpen. They can be selected, moved around, and edited. Here’s what the output file looks like in Preview after clicking on the added text.

Annotation selected in Preview