Other wavy paths

I got some good replies on Mastodon after Saturday’s post. Longtime friend of the blog Nathan Grigg said he’s always assumed that GPS-measured lengths would be long because no matter how straight your path, the GPS error would make it zigzagged. Then wherami and thvedt pointed out that the official rules of these events say that the race’s distance is supposed to be measured along the shortest path. If the road is curving left, it should be measured along the left edge; if the road is curving right, it should be measured along the right edge. This makes sense, as you don’t want runners to be able to run less than the official distance. The result is that basically everyone goes farther than the official course distance.

As you can see from this Fitness app screenshot of my route, the 5k at the Morton Arboretum has both right and left curves, so the route measurement must be done carefully if it’s to be by the rules.

Morton Arboretum 5k route

I began thinking about calculating the length of a zigzag path along a road that went in a circuit. The simplest circuit is a circle, and I thought it would be easier to define my path as a sinusoid within the roadway. Like this:

Wavy path around a circle

Following the principle that you should walk before you run, I started with a simpler problem: a sinusoidal path along a straight road.

Wavy path along a straight road

Taking the length of the road as d and the width as w, I defined the wavy path as

y=w2(1cosnπxd)

where the x axis runs along the bottom edge of the road, and the y axis runs across the road.

A differential length of arc is

ds=dx2+dy2=1+(nπw2d)2sin2nπxddx

Integrating this from 0 to d gives us the length of the wavy path. This can be done through elliptic integrals, but I’ve never felt comfortable with them, so I just did it numerically, using d=5000, w=5, and plugging in different values of n until I got a result of 5070, which is, as you might recall, the distance my watch gave as I finished the race.

The answer I got was n=151.5. Here’s the Mathematica code that got me there:

Mathematica solution for straight road

This is slightly less than the n=167.9 I got in Saturday’s analysis, where I took the path to be a series of straight-line segments. The lower number for a sinusoid makes sense. The path distance from one edge of the road to another is longer when following a sinusoid than when going in a straight line, so it takes fewer zigs and zags to get to a particular distance.

Now let’s tackle the problem of a wavy path along a circular road. Polar coordinates seem like our best bet for this. We’ll define the radial coordinate of the wavy path as

r=(ri+w2)w2cosnθ2

where

ri=50002π

is the radius of the inside edge of the circle, the circumference of which is the 5000 m length of the race. We’ll use w=5 as before.

In polar coordinates, differential arc length has a more complicated definition:

ds=dr2+(rdθ)2=(nw4)2sin2nθ2+[(ri+w2)w2cosnθ2]2dθ

Numerical integration of this over θ from 0 to 2π with different values of n led to a solution of n=133.4 to get a path length of 5070. Here’s a screenshot of the Mathematica code:

Mathematica solution for a circular road

I used t for θ because it’s easier to type.

That this value of n is smaller than for the straight road makes sense because all of the path is beyond the inner edge of the roadway. The waviness is centered on a path that’s already longer than the course.

I suppose I could have set up iterative solutions in Mathematica to get the values of n that led to path lengths of 5070. But NIntegrate worked so quickly that it was faster to just work my way to n via trial-and-error.

I should also mention that Mathematica has an ArcLength function, which seemed at first like the right way to go. But it was extremely slow, possibly because it was trying to get an analytical solution. By doing a little thinking to get the equations for the differential arc length, I saved myself a lot of time.


Zigzag 5k

I did the Fall Color 5k Run and Walk at the Morton Arboretum this morning. I say “did” to downplay the fact that I don’t run races like this, I walk them. I got terrible shin splints 20+ years ago and haven’t run more than a few hundred meters at a time since then. But I do try to walk at a decent clip. My goal is to do these races in under 45 minutes, and I managed that this morning. Here’s a screenshot of my watch as I crossed the finish line.

Watch screenshot at end of 5k

The time of 43:38 matched my official time to the second, so I’m confident the start/stop I did on my watch was accurate. What’s off, though, is the distance. My watch says I went 70 meters more than 5k. How can that be?

The obvious answers are that either my watch is slightly off through an accumulation of small GPS errors or the course distance is slightly off because it wasn’t measured perfectly. Could also be some combination of both.

But these answers—although probably right—are dull and don’t let me try out any calculations. Because I started the race late (very long traffic backup at the entrance; also today was the first time I slept past 6:30 for as long as I can remember), I had to pass a lot of people on the route. Could it be that my zigzag path along the road through the Arboretum added 70 meters to my walk?

Let’s simplify the problem to make it tractable. Imagine a straight road 5 m wide and 5000 m long. I take a zigzag path down the road, with each zig and zag of the same length and the full width of the road. How many of these zigzags are needed to end up with a path length of 5070 m, and is that answer reasonable?

Zigzag path diagram

We’ll take n as the number of zigs and zags, so

d=na

The total length of my walk is

=na2+b2=nd2n2+b2=d2+n2b2

With d=5000m, b=5m, and =5070m, what is n?

Since

2=d2+n2b2

we can solve for n:

n=2d2b2

Now, when you’re taking the difference of two numbers that are relatively close to one another, you’re supposed to be careful about losing significant digits. Here, and d differ by enough that we really don’t have to worry about that, but let’s be good boys and girls anyway.

Rewrite as

=d+ϵ

Then

2=d2+2dϵ+ϵ2

and

2d2=2dϵ+ϵ2

so

n=2dϵ+ϵ2b2

and we won’t lose any significant digits because we’re adding, not subtracting.

Plugging in the numbers, we get

n=2500070+70252=167.9

Is it reasonable to think I zigged and zagged 168 times? No. That’s one full crossing of the road every 30 meters or so, which is ridiculous.

So we’re left with the duller but more reasonable answer that my 5.07 km distance is some combination of measurement errors. Disappointing.


LLM proofreading update

I’m still experimenting with ChatGPT and Claude as proofreaders for my blog posts. Because I’ve been writing more posts lately, I’ve been learning the good and the bad more rapidly than I would have expected when I started this. Time for a preliminary assessment.

Overall, ChatGPT does a better job than Claude, but they both have found errors that the other has missed. So far, this has cost me time but not money, as I’m still using the free versions of both. If I decide to subscribe, it will be to only one of them, so I see this as a sort of playoff round.

(I haven’t given Apple’s Writing Tools a serious test yet, mainly because I doubt they can navigate the mixture of text and code that my posts usually have. But I suppose it’s time to start and see if my doubts are justified.)

Both LLMs are pretty good at finding my most common mistakes, which are missing words (I often think I’ve typed a word that I haven’t) and inconsistent tense (which usually crops up as I rewrite part of a sentence). By “pretty good,” I mean they tend to find both real errors and hallucinatory errors. So far, the number of false positives has been acceptably small.

The two main problems I’ve had have been stopping them from giving me stylistic advice and getting them to give me good directions on where the errors are. My proofreading instructions have evolved over the past couple of weeks to address these deficiencies. Here’s my current set of instructions, which I insert via Typinator:

Find typographical errors and grammatical mistakes in the following Markdown text. Do not make any changes, just tell me what you think should be changed. Ignore all embedded URLs, quoted text, and programming code. Only report actual errors, not style suggestions. I am using the linefeed character to end each line and have put line numbers at the beginning of each line; use those line numbers to report where the errors are.

Claude suggested some of this prompt. In particular, the redundancy of including “Only report actual errors, not style suggestions” in addition to “Find typographical errors and grammatical mistakes” was Claude’s doing.

The wordy final sentence is my latest attempt to solve the navigation problem. I write my posts in BBEdit with line numbers along the left side of the window.

BBEdit editing window

As you can see, line numbers in BBEdit are based not on how the lines appear as wrapped but on where the linefeed characters (ASCII decimal 10) are. This is how I want the location of errors reported to me, and it’s been surprisingly difficult to get both ChatGPT and Claude to do this right.

I started by telling them to report the line numbers of the errors. That gave me nonsensical and inconsistent line numbers. Then I told them to use the linefeed character as the line separator, and they still came back with nonsense. I had to repeat the instruction and tell them to do the analysis again. That second instruction in the same session managed to get both of them to give me correct line numbers, but ChatGPT spent a lot of time—maybe as long as a minute—doing the second analysis.

I asked ChatGPT why it took so long to process the text with proper line numbers in the analysis. Here’s what it said:

Good question — it’s because I have to process the text exactly as you specify (counting literal linefeed characters, not wrapped lines), which means I can’t rely on my usual tokenization or formatting tools. I have to manually parse and number every line in the order they appear, skipping over things like embedded HTML or math markup but still counting their linefeeds. That’s slower and more error-prone than my normal text analysis, but it’s the only way to follow your instruction precisely.

Apparently ChatGPT doesn’t know how to use wc -l. I asked it for the best way to handle this problem. It suggested four options:

  1. I could add section markers to the text. It would then report errors as “section 2, paragraph 3.” This was a nonstarter. The point is to analyze the text as written.
  2. It could report character offsets like “character 1,085.” Yeah, that’s very helpful.
  3. It could present to me a version of the text with line numbers at the beginning of each line. After I confirm the line numbering, it could then report errors using those numbers. This was tempting, but I didn’t really want my text spit back at me for confirmation. And what if I disagree with the line numbers it generated?
  4. I could give it the text with line numbers already in place at the start of each line. I was going to reject this suggestion until I realized how easily I could add line numbers to what I paste into ChatGPT without changing the text in BBEdit.

So I chose the fourth option and made this simple Keyboard Maestro macro to insert line numbers before pasting:

KM Paste line-numbered text

The macro is available from the Keyboard Maestro menu only when ChatGPT or Claude is the active application. The single-line shell script that adds the line numbers is

nl -ba | sed -E 's/^ +//'

The nl command adds the line numbers. The -ba option tells it to number all the lines instead of just the nonblank lines, which makes the line numbering consistent with BBEdit’s. The sed command then strips the leading space characters, which I thought might confuse ChatGPT, leaving just the line number and a tab character before each line proper.

Because the first action puts the line-numbered text on the clipboard, I added the second action to remove it, leaving the clipboard the way it was before the macro was called.

Although I wrote this macro based on ChatGPT’s responses, it works with Claude, too. Both LLMs now give me the line numbers I want.

Does it seem right that computer systems that use billions of dollars of hardware, electricity, and clean water need my help to add line numbers to a few kilobytes of plain text? No, it does not. But at least this gives me deterministic results for one part of the processing.

I still have my Mac recite my posts to me after passing them through the LLMs. This isn’t much help in finding mistakes anymore, but it still leads me to rewrite paragraphs here and there.


Ties and percentages

I was looking at the NFC standings yesterday, particularly the NFC North, the Chicago Bears’ division. The hated Packers are at the top of the division with a 4-1-1 record and a winning percentage of .750.

NFC Standings from Week 7

(Screenshot taken this morning from ESPN.)

How does a 4-1-1 record turn into .750? I guess the obvious answer is that the tie is treated as half a win and

4+½6=.750

But for some reason, my initial thought was more complicated. I thought of it as the weighted average of the winning percentage of non-ties with the winning percentage of ties:

56(.800)+16(.500)=.750

(There may be a philosophical argument as to whether a tie truly constitutes a .500 winning percentage, but I’m just going to accept it as a given.)

The calculation works out the same, of course, because

56(.800)+16(.500)=5(.800)+1(.500)6=4+½6=.750

I think this says something about me and my tendency to overcomplicate things. On the other hand, it’s always nice to be able to work out a solution in more than one way.

If you find that referring to numbers like .750, .800, and .500 as “percentages” a tiny bit off-putting, you’re my kind of people. But it’s common in sports to divide one number by another and call it a percentage—a tribute to the elasticity of language.

Two other common features of sports percentages:

  1. They are always (?) written to three decimal places. I think this was arrived at through experience—enough digits to distinguish close races but not enough to waste space.
  2. They are always spoken as if the decimal point isn’t there. The examples above are “seven-fifty,” “eight hundred,” and “five hundred.”

Baseball also uses “average” to express one number divided by another: batting average and earned run average come to mind. But it also has the slugging percentage, which isn’t remotely like a percentage.

I’m a big fan of “modulus,” mainly because of its many uses in engineering mechanics, e.g., Young’s modulus, bulk modulus, and section modulus. But I don’t expect to see it on ESPN anytime soon.