Automating the annotation of PDFs

A few days ago, on the Automators podcast forum, thatchrisharper asked about a way to automatically add the filename to the first page of a PDF. While I knew of many tools that allow you to overlay one PDF on top of another, I didn’t know of any to directly add text to a specific spot of a PDF. But it was the sort of thing I’ve occasionally needed to do, so I went looking for a solution. This post, most of which is in my answer to thatchrisharper’s question, is what I found.

The solution comes from a combination of Ghostscript, which you can install through Homebrew, and pdfMark (or maybe pdfmark without the intercap—the naming isn’t consistent), a system created by Adobe and described this way:

The pdfmark operator is a PostScript-language extension that describes features that are present in PDF, but not in standard PostScript.

Basically, pdfMark was a way for Adobe’s Distiller application to add PDF-specific bits (like annotations) to PostScript files as it was converting them to PDF. It came out back in the 90s, when PostScript was well established, but PDF was still in its infancy. We can use Ghostscript in place of Distiller.

Let’s say we have a PDF that consists of several letter-sized pages, and we want to add some text centered in the top margin. If our original looks like this,

PDF before annotation

we want the annotated version to look like this,

PDF after annotation

where the annotation appears on the first page only. Here’s what we do:

First, create a text file (we’ll call it pdfmark.txt, but the name can be anything) with the following contents:

1:  [
2:  /Subtype /FreeText
3:  /SrcPg 1
4:  /Rect [206 758 406 774]
5:  /Color [1 1 .75]
6:  /DA (/HeBo 14 Tf 0 0 .5 rg)
7:  /Contents (My Annotation Text Here)
8:  /ANN pdfmark

This file can be saved anywhere, but for convenience we’ll assume it’s in the same folder as the original PDF, which I will cleverly name original.pdf.

Now we run this Ghostscript command,

gs -dBATCH -dNOPAUSE -dQUIET -sDEVICE=pdfwrite -sOutputFile=annotated.pdf  pdfmark.txt original.pdf

and we end up a new PDF, annotated.pdf, with the annotation shown above.

You can probably figure out what each line of pdfmark.txt does, but let’s run through it anyway.

The opening bracket on Line 1 is the necessary start of every pdfMark command. If you go looking for the matching closing bracket, you won’t find one. If you want to know why there’s no closing bracket, you’ll have to ask Adobe. Seems like really dumb syntax to me.

Line 2 declares this mark to be of the FreeText subtype. There are over a dozen subtypes you can use; see page 16 of the manual.

Line 3 tells the annotation to appear on the first page of the output document. As far as I can tell, there’s no convenient way to extend this command to multiple pages. /SrcPg can only be followed by a single integer argument, so if you want the same thing on several pages, your pdfmark.txt file will have to have this command repeated for each page.

Lines 4 and 5 define the bounding box for the annotation and set its background color. PostScript and PDF coordinates are in points (1/72 inch) with the origin at the lower left corner of the page. Unlike a lot of graphics formats, but like most graphs you see in math class, the y-coordinate increases as you go up. A letter-sized page is 612 points wide and 792 points tall, so the bounding box in Line 4 is 200 points wide, centered left/right, and its top edge is ¼ inch down from the top of the page. Colors are defined by an red-green-blue triplet of numbers that run from 0 to 1. Black is 0 0 0 and white is 1 1 1. White is the default, so if you leave out Line 5, it’s equivalent to /Color [1 1 1]

Line 6 defines the font used in the annotation. /DA means default appearance, and the rest of the line tells the text to appear in 14-point Helvetica Bold with a dark blue color.

Line 7 defines the text of the annotation between the parentheses.

Finally, Line 8 identifies the type of pdfmark as an annotation.

(There’s a nice document with other examples of pdfMark commands and another Adobe reference manual.)

So how do we automate this? Fundamentally, we create a temporary file for the pdfMark commands, run the Ghostscript command, and then delete the temporary file. Here’s a quickly written script, annotatePDF:

python:
 1:  #!/usr/bin/env python
 2:  
 3:  import os
 4:  import subprocess
 5:  import sys
 6:  import tempfile
 7:  
 8:  # Set the parameters
 9:  annText = sys.argv[1]
10:  originalPDF = sys.argv[2]
11:  annotatedPDF = sys.argv[3]
12:  
13:  # Build the pdfMark command
14:  pdfMarkCommand = f"""[
15:  /Subtype /FreeText
16:  /SrcPg 1
17:  /Rect [156 758 456 774]
18:  /DA (/HeBo 14 Tf)
19:  /Contents ({annText})
20:  /ANN pdfmark
21:  """
22:  
23:  # Create a temporary file for the pdfMark commands and write to it
24:  fh, fpath = tempfile.mkstemp()
25:  with open(fpath, 'w') as f:
26:    f.write(pdfMarkCommand)
27:  
28:  # Run the Ghostscript command to make the annotated file
29:  subprocess.run(['gs', '-dBATCH', '-dNOPAUSE', '-dQUIET', '-sDEVICE=pdfwrite', f'-sOutputFile={annotatedPDF}',  fpath, originalPDF])
30:  
31:  # Delete the pdfMark command file
32:  os.remove(fpath)

This is not a great script. No error handling, no options, and no way to automate the naming of the output file. But it works.

annotatePDF 'Hello, world!' original.pdf annotated.pdf

As you can see from Lines 14—21, I don’t really want a yellow box with dark blue text. That was just to show some of pdfMark’s features.

By the way, the annotations you add through pdfMark are just like annotations you add in Preview or PDFpen. They can be selected, moved around, and edited. Here’s what the output file looks like in Preview after clicking on the added text.

Annotation selected in Preview


How many Thursdays?

A couple of weeks ago, this question from ldebritto appeared in the Automators forum:

I was looking for a way to have Shortcuts peek into last month’s calendar and count how many Thursdays were that month (4 or 5).

Any ideas on how to make it?

Stephen Millard, who posts there under the name sylumer, answered the question with a straightforward shortcut that loops through the days of last month, counting the Thursdays it meets along the way. It’s the kind of solution I employ often: a brute force approach that can be written quickly and uses looping instead of excessive cleverness. And in this case, since there can’t be more than 31 trips around the loop, it will run extremely fast. It’s a good solution.

But it nagged at me. This blog has a long history of showing calendrical calculations, and I had written a post on a similar problem (finding months with five Fridays, Saturdays, and Sundays) eleven years ago. The scripts in that post had some brute force aspects, but they also took advantage of how months are structured to filter the loops. I wanted to do something similar for the “how many Thursdays” problem.

As ldebritto said in his question, every month has either four or five Thursdays (indeed, four or five of any weekday). Whether it’s four or five depends on the number of days in the month and the date of the first Thursday of the month. We’ll consider each of these pieces of information and how they affect the answer.

Since every month has 28–31 days, we know that each month has four weeks plus 0–3 “extra” days. If the date of the first Thursday is less than or equal to the number of extra days, there will be five Thursdays in that month. For example:

The logic is simple, but implementing it in Shortcuts takes a little effort because there’s no “date of the first Thursday” function. Instead, we’ll figure out how many days from the first of the month to the first Thursday. And because this is one less than the date of the first Thursday the “less than or equal to” comparison we used above will turn into a simple “less than” comparison.

So how do we get the number of days from the first of the month to its first Thursday? We’ll need to use custom date formatting codes and do a little modulo arithmetic. Here’s the whole shortcut:

StepActionComment
1 Thursdays Last Month Step 01 Get the current date
2 Thursdays Last Month Step 02 Move to the start of this month. We call this magic variable StartOfThisMonth.
3 Thursdays Last Month Step 03 Go back one month. We call this magic variable StartOfLastMonth.
4 Thursdays Last Month Step 04 Get the number of days between StartOfLastMonth and StartOfThisMonth, which is the length of last month.
5 Thursdays Last Month Step 05 Get the number of days over 4 weeks in last month. We call this magic variable ExtraDaysInMonth.
6 Thursdays Last Month Step 06 Format StartOfLastMonth with the c code, which gives the day of the week as a number from 1 (Sunday in my locale) to 7 (Saturday).
7 Thursdays Last Month Step 07 This is where we do the modulo arithmetic to get the number of days from the start of last month to last month’s first Thursday. We call this magic variable DaysToFirstThursday.
8 Thursdays Last Month Step 08 If this is true…
9 Thursdays Last Month Step 09 Return 5.
10 Thursdays Last Month Step 10 Otherwise…
11 Thursdays Last Month Step 11 Return 4.
12 Thursdays Last Month Step 12 End If block.
13 Thursdays Last Month Step 13 Show the answer.

The only tricky parts here are Steps 6 and 7. Custom date formatting patterns in Shortcuts follow Unicode Technical Standard #35, where c will give the day of the week as a digit from 1 to 7. In my locale Day 1 is Sunday and Day 7 is Saturday; it may be different where you live.

Step 7 uses the number we got in Step 6 and calculates the number of days from the start of the month to the first Thursday. This involves modulo 7 arithmetic, so the first thing we have to do is subtract 1 from the Step 6 result. Numbers in mod 7 go from 0 to 6, not 1 to 7. After subtraction, we’re in a day numbering system where Thursday is 4. To get the number of days from the start of the month to the first Thursday, we subtract the weekday number from 4, add 7 (to force a positive result), and then get the remainder (mod) after dividing by 7.

Let’s do a couple of examples. If I run the shortcut today, last month is September 2021, which started on a Wednesday. Step 6 returns 4, and Step 7 returns

(4 - (4 - 1) + 7) mod 7 = (4 - 3 + 7) mod 7 = 8 mod 7 = 1

which makes sense, as there’s one day from the start of September 2021 to the first Thursday of the month.

If you’re wondering why we bothered adding 7 and doing the mod stuff—why not just do 4 - (4 - 1) = 1?—think of what happens when we run the shortcut next month. October 2021 starts on a Friday. Step 6 returns 6, and Step 7 returns

(4 - (6 - 1) + 7) mod 7 = (4 - 5 + 7) mod 7 = 6 mod 7 = 6

and there are, in fact, 6 days from the start of October 2021 to the first Thursday of the month. Here, without the + 7 we would have gotten a result of -1 which is not the number of days from the start of October to its first Thursday. By including the + 7 and mod 7 terms, we have an equation that handles all cases.

This is not nearly as easy to understand as Stephen Millard’s solution, so I didn’t post it to the forum. On the other hand, his solution didn’t exercise my modulo muscles.

By the way, it didn’t take me two weeks to work this out. I had the shortcut written the day of the question, but I soon learned that my splitflow script for creating the table of Shortcuts steps you see above was broken. The Shortcuts team changed the look of the app, and the computer vision functions that splitflow uses weren’t able to figure out the top and bottom boundaries of the steps. So I put this post on ice until I had energy to update splitflow. It still needs some work—the padding above the steps shouldn’t be wider than the padding below—but it’s passable.


Semiautomated LaTeX tables

As a followup to my last post about automating the creation of Markdown tables, here’s a simple pair of functions that’ve been very helpful in making LaTeX tables quickly.

A few years ago, I wrote about how much I hated the syntax of LaTeX tables and how I was shifting to building tables as graphics files that I could insert using the \includegraphics{} command. In the early stages, I was doing this by hand, as outlined in that post, but after I developed a sense of the kind of spacing I liked, I translated that into a set of Python functions that used the ReportLab module to create decent-looking tables as PDF files. A Python solution made the most sense, as the data from which I made the tables typically came out of data analysis done in Python.

The functions I wrote worked well enough, but the overall system was more fussy than it should have been, and I realized I’m better at programming the manipulation of text than the manipulation of graphics. So I started thinking about ways to make LaTeX tables directly from Python data.

Here’s an artificial example that isn’t too far away from the kinds of tables I need to make. Let’s say we want a short table of (base 10) logarithms. We can generate the data—a list of values and a parallel list of their logs—like this:

python:
from math import log10

x = [ (10+i)/10 for i in range(90) ]
lx = [ f'{log10(y):.4f}' for y in x ]

And what I want is a table that looks like this:

Log table

I use the booktabs package for making tables. I like its clean look, and I like its nice \addlinespace command for adding a little extra space between certain rows to make reading long tables easier. The code that produced the table above is

\setlength{\tabcolsep}{.125in}
\begin{table}[htbp]
\begin{center}
\begin{tabular}{
    @{\hspace*{5pt}}
    cc@{\hspace{.75in}}cc@{\hspace{.75in}}cc
    @{\hspace*{5pt}}
}
\toprule
$x$ & $\log x$ & $x$ & $\log x$ & $x$ & $\log x$ \\
\midrule
1.0 & 0.0000 & 4.0 & 0.6021 & 7.0 & 0.8451 \\
1.1 & 0.0414 & 4.1 & 0.6128 & 7.1 & 0.8513 \\
1.2 & 0.0792 & 4.2 & 0.6232 & 7.2 & 0.8573 \\
1.3 & 0.1139 & 4.3 & 0.6335 & 7.3 & 0.8633 \\
1.4 & 0.1461 & 4.4 & 0.6435 & 7.4 & 0.8692 \\
\addlinespace
1.5 & 0.1761 & 4.5 & 0.6532 & 7.5 & 0.8751 \\
    .
    .
    .
3.7 & 0.5682 & 6.7 & 0.8261 & 9.7 & 0.9868 \\
3.8 & 0.5798 & 6.8 & 0.8325 & 9.8 & 0.9912 \\
3.9 & 0.5911 & 6.9 & 0.8388 & 9.9 & 0.9956 \\
\bottomrule
\end{tabular}
\caption{Partial logarithm table}
\label{table}
\end{center}
\end{table}

The boilerplate at the top and bottom is produced by a Keyboard Maestro macro. Except for the column alignment line and caption—which need to be set for each table—it never changes. The troublesome parts are the header and, especially, the body. For those, I use two functions, theader and tbody, to build the LaTeX code without having to touch the ampersand or backslash keys. To make the table in the same script as the data, I adjust the script to this:

python:
from math import log10
from latex_tables import tbody, theader

x = [ (10+i)/10 for i in range(90) ]
lx = [ f'{log10(y):.4f}' for y in x ]
headers = ['$x$', '$\log x$']*3
print(theader(headers))
print(tbody(x[:30], lx[:30], x[30:60], lx[30:60], x[60:], lx[60:], group=5))

The output is

$x$ & $\log x$ & $x$ & $\log x$ & $x$ & $\log x$ \\
\midrule
1.0 & 0.0000 & 4.0 & 0.6021 & 7.0 & 0.8451 \\
1.1 & 0.0414 & 4.1 & 0.6128 & 7.1 & 0.8513 \\
1.2 & 0.0792 & 4.2 & 0.6232 & 7.2 & 0.8573 \\
1.3 & 0.1139 & 4.3 & 0.6335 & 7.3 & 0.8633 \\
1.4 & 0.1461 & 4.4 & 0.6435 & 7.4 & 0.8692 \\
\addlinespace
1.5 & 0.1761 & 4.5 & 0.6532 & 7.5 & 0.8751 \\
    .
    .
    .
3.7 & 0.5682 & 6.7 & 0.8261 & 9.7 & 0.9868 \\
3.8 & 0.5798 & 6.8 & 0.8325 & 9.8 & 0.9912 \\
3.9 & 0.5911 & 6.9 & 0.8388 & 9.9 & 0.9956 \\

which, as you can see, is exactly what goes between the top and bottom boilerplate.

The functions are defined in the file latex_tables.py, which is saved in my site-packages directory. Here’s the code:

python:
 1:  def tbody(*cols, group=0):
 2:    "Given the columns, return the body of a LaTeX table."
 3:  
 4:    # Add blanks at the ends of short columns
 5:    lens = [ len(c) for c in cols ]
 6:    nrows = max(lens)
 7:    cols = [ c + [' ']*(nrows - len(c)) for c in cols ]
 8:  
 9:    # Assemble the rows of the table
10:    rows = []
11:    for i in range(nrows):
12:      if (group > 0) and (i > 0) and (i % group == 0):
13:        rows.append(r'\addlinespace')
14:      row = [ str(c[i]) for c in cols ]
15:      rows.append(' & '.join(row) + r' \\')
16:  
17:    return '\n'.join(rows)
18:  
19:  def theader(hcols):
20:    "Given a list of column headers, return the header lines of a LaTeX table."
21:  
22:    # Figure out the maximum number of lines in the header
23:    hcols = [ x.splitlines() for x in hcols ]
24:    maxlines = max(len(x) for x in hcols)
25:    hcols = [ [' ']*(maxlines - len(x)) + x for x in hcols ]
26:  
27:    # Assemble the rows of the header
28:    rows = []
29:    for i in range(maxlines):
30:      row = [ str(c[i]) for c in hcols ]
31:      rows.append(' & '.join(row) + r' \\')
32:  
33:    return '\n'.join(rows) + '\n\\midrule'

Like the example log table, most of the tables I make come from lists of data that are meant to go into the columns of the table. So the main feature of tbody is assembling the rows from those lists. The other significant feature is placing the \addlinespace command according to the group parameter.

The theader function is even simpler. The only interesting thing about it is how it handles multiline headers. If it’s called like this,

print(theader(['One line', 'Two\nlines', 'And\nthree\nlines']))

the output will be

    &   & And \\
    & Two & three \\
One line & lines & lines \\
\midrule

which will produce a table that has a header that looks like this:

Multiline header

The header lines are bottom-aligned instead of top-aligned. The main purpose of theader is to automate that alignment.

Trickier table arrangements still have me hauling out my copy of Kopka & Daly, but theader and tbody do the bulk of the work even in those cases.


Under the table

In my last post, I mentioned an AppleScript I wrote some time ago that turns a table in Numbers into a MultiMarkdown1 table. A couple of days later, jacobio asked a question in the Mac Power Users forum, that led me to posting the script there. After looking it over—and seeing jacobio’s independently developed script—I made a few small changes to the script and decided to talk about it here.

The premise for the script is that I’m writing a blog post in BBEdit, and I want to include a table. Writing a Markdown table isn’t especially hard, but doing so involves lots of typing of formatting characters, the pipes and colons that tell the Markdown processor how to make the table but aren’t part of the table’s content. It’s much easier to write a table in a spreadsheet or some other dedicated table-writing tool and then invoke a script that converts it to Markdown. And if the table I want to insert is already in a spreadsheet, using a conversion script is even faster.

(Let me pause here and mention TableFlip, which Rosemary Orchard talked about in that same thread. I haven’t used it, but it sounds great, especially if most of the tables you deal with have to be written from scratch. It’s killer feature is how it ties the Markdown table in your document to the spreadsheet-like form in TableFlip itself. Changes you make to the table in TableFlip are conveyed automatically to the Markdown document. The downside to using TableFlip is that you don’t have a version of the table in a spreadsheet, which is often very useful.)

I’ve created a BBEdit package of scripts and text filters that help me write blog posts. The Table from Numbers script is in that package and has a few lines that are package-specific. But most of it could be used as a standalone conversion script. Here’s the code:

applescript:
 1:  -- Get the path to the Normalize Table text filter in the package.
 2:  tell application "Finder" to set cPath to container of container of container of (path to me) as text
 3:  set normalize to quoted form of (POSIX path of cPath & "Text Filters/06)Blogging/02)Normalize Table.py")
 4:  
 5:  -- Construct a MultiMarkdown table from the top Numbers table
 6:  tell application "Numbers"
 7:    tell table 1 of sheet 1 of document 1
 8:      set tbl to ""
 9:      set h to header row count
10:      set c to column count
11:      set r to row count
12:      
13:      -- header lines
14:      repeat with i from 1 to h
15:        set tbl to tbl & "|"
16:        repeat with j from 1 to c
17:          set val to formatted value of cell j of row i
18:          if val = missing value then
19:            set val to ""
20:          end if
21:          set tbl to tbl & " " & val & " |"
22:        end repeat
23:        set tbl to tbl & linefeed
24:      end repeat
25:      
26:      -- formatting line
27:      set tbl to tbl & "|"
28:      repeat with j from 1 to c
29:        set a to alignment of cell j of row h
30:        if a = right then
31:          set tbl to tbl & "---:|"
32:        else if a = center then
33:          set tbl to tbl & ":--:|"
34:        else
35:          set tbl to tbl & ":---|"
36:        end if
37:      end repeat
38:      set tbl to tbl & linefeed
39:      
40:      -- body lines
41:      repeat with i from h + 1 to r
42:        set tbl to tbl & "|"
43:        repeat with j from 1 to c
44:          set val to formatted value of cell j of row i
45:          if val = missing value then
46:            set val to ""
47:          end if
48:          set tbl to tbl & " " & val & " |"
49:        end repeat
50:        set tbl to tbl & linefeed
51:      end repeat
52:      
53:    end tell -- table
54:  end tell -- Numbers
55:  
56:  -- Normalize the table
57:  set the clipboard to tbl
58:  set tbl to do shell script ("pbpaste | " & normalize)
59:  
60:  tell application "BBEdit" to set selection to tbl

Lines 1–3 are some of the package-specific stuff I talked about before. Let’s skip over that for now and get to the meat of the script, which starts on Line 6 with a long tell block that controls Numbers.

The script assumes that the table of interest is the first table of the first sheet of the frontmost Numbers document. Line 7 starts a tell block that speaks to that table. Line 8 initializes the tbl variable, which is where we’re going to put all the text of the Markdown table. Lines 9–11 get the sizes of the header and the table as a whole.

A word about headers is in order. Some Markdown implementations allow for only one header line, but others—including MultiMarkdown—allow for any number of header lines. This script accommodates the more general case. In Numbers, the header lines (rows) are set in the Table section of the Format side panel. There’s a pop-up button that lets you set the number of header rows, and this is typically reflected in the number of shaded rows at the top of the table.

Header rows

Lines 14–24 use nested loops to create the header lines of the Markdown table. The outer loop, which starts on Line 14, goes through the header rows of the table. The inner loop, which starts on Line 16, goes through the columns of each row. As is often the case, AppleScript’s idiosyncrasies are more difficult to deal with than the logic of the process. To make our Markdown table show what the Numbers table shows, we need to ask for the formatted value of each cell. Also, instead of asking for column j of row i in Line 17, we have to ask for cell j of row i. And finally, if a cell is empty, AppleScript will set val to the literal text “missing value” in Line 17, so we need Lines 18–20 to correct that to an empty string before appending val and the appropriate space and pipe (|) characters to tbl in Line 21.

Lines 27–38 use the alignment of the cells in the last header row (row h) to create the formatting line in the Markdown table. I think the logic here is pretty easy to understand. If the alignment of the cell is right, we add a short string of hyphens with a colon at the right end; if the alignment is left, we add a short string of hyphens with a colon at each end; for any other alignment, we treat the column as left-aligned and add a short string of hyphens with a colon at the left end. “Any other alignment” includes auto align; this is the default and is presented as left-aligned for text, which is what header cells usually contain.

Lines 41–51 loop through the body of the table. Apart from the limits on the repeat command in Line 41, this is the same code as in Lines 14–24. I probably should factor this out into a function.

When the code exits the tell block on Line 54, tbl contains a valid but ugly Markdown table. Something like this:

|  | County | Yes | No | Margin | CMargin |
|:---|:---|---:|---:|---:|---:|
| 1 | Kern | 126,999 | 78,477 | 48,522 | 48,522 |
| 2 | Placer | 114,643 | 85,302 | 29,341 | 77,863 |
| 3 | Shasta | 49,141 | 21,655 | 27,486 | 105,349 |
| 4 | Tulare | 64,372 | 41,009 | 23,363 | 128,712 |
| 5 | El Dorado | 58,393 | 39,907 | 18,486 | 147,198 |
| 6 | Stanislaus | 82,911 | 69,247 | 13,664 | 160,862 |
| 7 | Tehama | 15,958 | 6,186 | 9,772 | 170,634 |
| 8 | Madera | 25,638 | 16,233 | 9,405 | 180,039 |
| 9 | Sutter | 20,458 | 11,593 | 8,865 | 188,904 |
| 10 | Kings | 19,710 | 11,242 | 8,468 | 197,372 |

This can be made nicer looking by running it through my “Normalize Table” text filter, which is also part of my Blogging package and which I’ve written about before. Lines 2–3 figure out the path to the text filter, and Lines 57–58 apply it to the contents of tbl, putting the nicely-formatted result, e.g.,

|    | County     |     Yes |     No | Margin | CMargin |
|---:|:-----------|--------:|-------:|-------:|--------:|
|  1 | Kern       | 126,999 | 78,477 | 48,522 |  48,522 |
|  2 | Placer     | 114,643 | 85,302 | 29,341 |  77,863 |
|  3 | Shasta     |  49,141 | 21,655 | 27,486 | 105,349 |
|  4 | Tulare     |  64,372 | 41,009 | 23,363 | 128,712 |
|  5 | El Dorado  |  58,393 | 39,907 | 18,486 | 147,198 |
|  6 | Stanislaus |  82,911 | 69,247 | 13,664 | 160,862 |
|  7 | Tehama     |  15,958 |  6,186 |  9,772 | 170,634 |
|  8 | Madera     |  25,638 | 16,233 |  9,405 | 180,039 |
|  9 | Sutter     |  20,458 | 11,593 |  8,865 | 188,904 |
| 10 | Kings      |  19,710 | 11,242 |  8,468 | 197,372 |

back into tbl. Note that the code uses the clipboard and pbpaste to send the text through the “Normalize Table” filter. I could avoid the clipboard by setting up a shell command that uses a here-document, but that code is messy and easy to screw up (I speak from experience). Because I use Keyboard Maestro’s clipboard history manager, I can always get back to what was on the clipboard before I ran this script.

Finally, Line 60 puts the text of tbl into my current BBEdit document. If there’s a selection, it replaces the selection; if not, it inserts the text at the cursor.

If you want to adapt this script for use with any text editor, delete Lines 60 and 1–3 and change Line 58 to

set tbl to do shell script "pbpaste | /path/to/normalize | pbcopy"

For this last part to work, you’ll need to have a normalization script and replace /path/to/normalize with the path to it. When you get it working, you’ll have a script that will put the formatted Markdown table on your clipboard, ready to be pasted anywhere.

This script is no speed demon. AppleScript is very deliberate as it walks through the table, so there’s always a pause between choosing the Table from Numbers command and seeing the table appear in BBEdit. If you adapt it to put the table on your clipboard, you might want to add a command at the end to play a sound when the clipboard is ready for pasting.


  1. From now on, I’m just going to call these Markdown tables. Gruber’s Markdown doesn’t include tables, but pretty much every Markdown implementation has them—either built in or as an option. I bet most Markdown users don’t even know that tables are an extension.