LLM proofreading update

I’m still experimenting with ChatGPT and Claude as proofreaders for my blog posts. Because I’ve been writing more posts lately, I’ve been learning the good and the bad more rapidly than I would have expected when I started this. Time for a preliminary assessment.

Overall, ChatGPT does a better job than Claude, but they both have found errors that the other has missed. So far, this has cost me time but not money, as I’m still using the free versions of both. If I decide to subscribe, it will be to only one of them, so I see this as a sort of playoff round.

(I haven’t given Apple’s Writing Tools a serious test yet, mainly because I doubt they can navigate the mixture of text and code that my posts usually have. But I suppose it’s time to start and see if my doubts are justified.)

Both LLMs are pretty good at finding my most common mistakes, which are missing words (I often think I’ve typed a word that I haven’t) and inconsistent tense (which usually crops up as I rewrite part of a sentence). By “pretty good,” I mean they tend to find both real errors and hallucinatory errors. So far, the number of false positives has been acceptably small.

The two main problems I’ve had have been stopping them from giving me stylistic advice and getting them to give me good directions on where the errors are. My proofreading instructions have evolved over the past couple of weeks to address these deficiencies. Here’s my current set of instructions, which I insert via Typinator:

Find typographical errors and grammatical mistakes in the following Markdown text. Do not make any changes, just tell me what you think should be changed. Ignore all embedded URLs, quoted text, and programming code. Only report actual errors, not style suggestions. I am using the linefeed character to end each line and have put line numbers at the beginning of each line; use those line numbers to report where the errors are.

Claude suggested some of this prompt. In particular, the redundancy of including “Only report actual errors, not style suggestions” in addition to “Find typographical errors and grammatical mistakes” was Claude’s doing.

The wordy final sentence is my latest attempt to solve the navigation problem. I write my posts in BBEdit with line numbers along the left side of the window.

BBEdit editing window

As you can see, line numbers in BBEdit are based not on how the lines appear as wrapped but on where the linefeed characters (ASCII decimal 10) are. This is how I want the location of errors reported to me, and it’s been surprisingly difficult to get both ChatGPT and Claude to do this right.

I started by telling them to report the line numbers of the errors. That gave me nonsensical and inconsistent line numbers. Then I told them to use the linefeed character as the line separator, and they still came back with nonsense. I had to repeat the instruction and tell them to do the analysis again. That second instruction in the same session managed to get both of them to give me correct line numbers, but ChatGPT spent a lot of time—maybe as long as a minute—doing the second analysis.

I asked ChatGPT why it took so long to process the text with proper line numbers in the analysis. Here’s what it said:

Good question — it’s because I have to process the text exactly as you specify (counting literal linefeed characters, not wrapped lines), which means I can’t rely on my usual tokenization or formatting tools. I have to manually parse and number every line in the order they appear, skipping over things like embedded HTML or math markup but still counting their linefeeds. That’s slower and more error-prone than my normal text analysis, but it’s the only way to follow your instruction precisely.

Apparently ChatGPT doesn’t know how to use wc -l. I asked it for the best way to handle this problem. It suggested four options:

  1. I could add section markers to the text. It would then report errors as “section 2, paragraph 3.” This was a nonstarter. The point is to analyze the text as written.
  2. It could report character offsets like “character 1,085.” Yeah, that’s very helpful.
  3. It could present to me a version of the text with line numbers at the beginning of each line. After I confirm the line numbering, it could then report errors using those numbers. This was tempting, but I didn’t really want my text spit back at me for confirmation. And what if I disagree with the line numbers it generated?
  4. I could give it the text with line numbers already in place at the start of each line. I was going to reject this suggestion until I realized how easily I could add line numbers to what I paste into ChatGPT without changing the text in BBEdit.

So I chose the fourth option and made this simple Keyboard Maestro macro to insert line numbers before pasting:

KM Paste line-numbered text

The macro is available from the Keyboard Maestro menu only when ChatGPT or Claude is the active application. The single-line shell script that adds the line numbers is

nl -ba | sed -E 's/^ +//'

The nl command adds the line numbers. The -ba option tells it to number all the lines instead of just the nonblank lines, which makes the line numbering consistent with BBEdit’s. The sed command then strips the leading space characters, which I thought might confuse ChatGPT, leaving just the line number and a tab character before each line proper.

Because the first action puts the line-numbered text on the clipboard, I added the second action to remove it, leaving the clipboard the way it was before the macro was called.

Although I wrote this macro based on ChatGPT’s responses, it works with Claude, too. Both LLMs now give me the line numbers I want.

Does it seem right that computer systems that use billions of dollars of hardware, electricity, and clean water need my help to add line numbers to a few kilobytes of plain text? No, it does not. But at least this gives me deterministic results for one part of the processing.

I still have my Mac recite my posts to me after passing them through the LLMs. This isn’t much help in finding mistakes anymore, but it still leads me to rewrite paragraphs here and there.