# Highlighting with Highlights and LiquidText

I decided to get a copy of Highlights after reading this John Voorhees review in MacStories. After trying it out on a few PDF documents that needed to be summarized for my job, I learned that it wasn’t going to work for me. Ironically, it was very bad at highlighting text in the kinds of documents I deal with. But my experiment with Highlights led me to giving LiquidText another try, and with a new perspective on how to use it, LiquidText fits my highlighting needs pretty well.

Highlights is a focused app with a straightforward user interface for highlighting and commenting on text in a PDF, and it can export the highlighted text as plain text.1 It seems like a perfect fit for how I want to work. I subscribed to the Pro version (necessary to get plain text exporting) so I could give it a try.

As luck would have it, I had just finished a couple of projects in which I had done a fair amount of document summarizing. So I put copies of those documents in a new folder and started going through them in Highlights to see how much more effective it would be than my current system.2

On the very first document I tried, Highlights wouldn’t select the text I wanted it to, overselecting in certain areas and underselecting in others. (I can’t show screenshots of it because that would expose the client’s work product.) The trouble seemed to be most commonly associated with footnotes. The selection jumped around in unpredictable ways whenever a footnote or endnote was within the desired selection or nearby.

My history with Microsoft products leads me to believe that this PDF, which I know started its life as a Word document, has a convoluted internal structure, and that may be part of the reason Highlights had so much trouble with it. But I don’t have any control over the history of PDFs I need to summarize, and documents with footnotes that were written in Word make up a large enough percentage of the material I get from clients to make Highlights effectively useless to me. I cancelled my Pro subscription.

Shortly after my Highlights experiment, this thread about PDF note-taking apps appeared on the Mac Power Users forum. It reminded me of that copy of LiquidText I got a long time ago and decided not to use. Maybe it was worth another shot.

The signature feature of LiquidText is the ability to grab excerpts from PDFs and combine them into new documents that show the linkages between different PDFs and different sections of the same PDF. It’s very impressive but struck me as more a presentation tool than a research tool. What I learned from the forum (and some new testing of my own) was that I don’t have to use LiquidText’s cool linking feature; I can just highlight and comment on text as I read along and generate a summary of the highlights and comments when I’m done. And, most important, LiquidText is much better at selecting the text I want than Highlights is. Not perfect—I have found a couple of glitches—but definitely good enough that I expect the average PDF to give me no trouble at all.

After the text is highlighted, in needs to come out, and if you’ve looked through LiquidText’s sharing options, you might think it’s not possible to get a plain text summary out of it.

But if you choose the “Notes Outline” option, which is intended to create a Word file, you’ll see another window appear that lets you put the notes onto the clipboard instead of into a DOCX file.

Copying those notes into Drafts makes for a pretty decent summary.

As with the Markdown export from Highlights, I’m not thrilled with the way the notes are formatted, but I’ve written a Drafts action that cleans it up, distinguishing between highlights and comments and getting rid of the extra spaces that often appear in selections from fully justified text.

(I should point out that this particular PDF had several equations that I had highlighted. They’re hard to express in plain text and will need to be cleaned up.)

The Drafts action that does the cleanup consists of just one JavaScript step:

javascript:
1:  var summary = editor.getText();
2:
3:  // Inexplicably, LiquidText uses CRs as line endings.
4:  var reformatted = summary.replace(/\r/g, '\n');
5:
6:  // Get rid of the second header.
7:  reformatted = reformatted.replace(/\nNotes in Document \n[^\n]+\n/, '');
8:
9:  // Reformat the comments and highlights.
10:  function noteReplace(full, m1, m2, m3, offset, string) {
11:    if (m1 == "Highlight") {
12:      return 'Page ' + m3 + ' quote:\n' + m2;
13:    }
14:    else {
15:      return 'Page ' + m3 + ' summary:\n' + m2;
16:    }
17:  }
18:
19:  reformatted = reformatted.replace(/(Highlight|Comment):\s?:?\s?(.+)\u2028$.+p.(\d+)$/g, noteReplace);
20:
21:  // Extra spaces are probably from full justification.
22:  reformatted = reformatted.replace(/  +/g, ' ');
23:
24:  editor.setText(reformatted);


A couple of comments on the script:

• As you can see in Lines 3–4, the notes from LiquidText use carriage return (CR) characters as line endings. Not linefeed (LF) characters, as would be expected on iOS. Not the CRLF combination common to Microsoft products. No, it’s the bare CR that Macs used to use back in the pre-OS X days. Bizarre.
• Nearly as odd is the use of the Unicode LINE SEPARATOR character (U+2028) within each note. That gets cleaned up in the large regex in Line 19.

I don’t expect the summary that comes out of this process to be in final form, but my early experiments have shown that there’s less editing needed in these summaries than in those I dictate.

Overall, I like the experience of summarizing a document with LiquidText. I can still sit with my iPad on my lap, and I prefer swiping with the Pencil to reading text into my phone. In those places where I need to make a comment instead of a highlight, I can still dictate—I just dictate into the iPad instead of my phone. I no longer have to wake up a phone that’s gone to sleep between comments. Unless I run into a showstopper, LiquidText is how I’ll be summarizing from now on.

1. The exported plain text is Markdown with a header structure that I wouldn’t want to use, but I don’t consider that a significant problem. It’s easy to write a filter that reformats well-structured text to get the output I want.

2. Which is to dictate quotes and comments into Drafts on my phone as I read a document on my iPad.

# Siri and context, four years on

Reading the latest Daring Fireball post this morning, I immediately thought of Effingham and my frustration, four years ago, with Siri’s inability to make reasonable guesses as to what we want from it no matter how many contextual clues it has.

I was driving up through Central Illinois… My iPhone was charging and sitting upside-down in a cupholder in the center console. I pushed the home button, waited for the Siri beep to come through my car’s speakers, and asked “How far is it to Effingham?”

Siri’s response: “Which Effingham? Tap the one you want.”

On the positive side, Siri recognized the word “Effingham” and recognized it as a place name. But those successes made its two context failures even more annoying.

First, I’m driving north on I-57 in Illinois between Mount Vernon and Effingham. Which effing Effingham do you think I want?!

And Siri knows damned well I’m driving. It’s connected to my car via Bluetooth. It can use its GPS to figure out I’m moving 80 mph. It has no business asking me to tap on a choice.

The interesting difference between my 2016 experience and John Gruber’s and Nilay Patel’s 2020 experiences is that I did want the nearest city with the name I gave. It’s fun to see the wide variety of ways in which Siri manages to choose the worthless answer, but we really should have a better assistant by now.

# Derangement extra postage

I left something out of last night’s post on derangements: the Python program that generated the table of values I used to search the OEIS for the sequences associated with one, two, three, etc. fixed points. It was a deliberate omission, as I thought the post was long enough as it was. Consider this post the equivalent of Numberphile’s “extra footage.”

Recall that the table looked like this (you may have to scroll left to see it all)

                                   m
0      1      2      3      4      5      6      7      8
----------------------------------------------------------------
2 |      1      0      1
3 |      2      3      0      1
4 |      9      8      6      0      1
n 5 |     44     45     20     10      0      1
6 |    265    264    135     40     15      0      1
7 |   1854   1855    924    315     70     21      0      1
8 |  14833  14832   7420   2464    630    112     28      0      1


where $n$ is the number of cards we’re shuffling and $m$ is the number of fixed points, that is, the number of cards that are in their original position after the shuffling.

Shuffling cards is a permuting operation, and the itertools module in Python’s standard library has a permutations function that returns all the permutations of the list (or other iterable) passed to it. The program I wrote does nothing more than call that function to generate the permutations and then search through them, counting up the number of fixed points in each. It does that for $n = 2 \ldots 8$.

Here it is:

python:
1:  #!/usr/bin env python
2:
3:  from itertools import permutations
4:
5:  def countMatches(o, p):
6:    count = 0
7:    for i in range(len(o)):
8:      if p[i] == o[i]:
9:        count += 1
10:    return count
11:
13:  print(' '*3, end='')
14:  for m in range(9):
15:    print(f'{m:7d}', end='')
16:  print()
17:  print(' '*3 + '-'*64)
18:
19:  # Table body
20:  for n in range(2, 9):
21:    cards = [chr(i+65) for i in range(n)]
22:    counts = [0]*(n+1)
23:
24:    for a in permutations(cards):
25:      matches = countMatches(a, cards)
26:      counts[matches] += 1
27:
28:    print(f'{n} |', end='')
29:    for i in range(n+1):
30:      print(f'{counts[i]:7d}', end='')
31:    print()


The key to the script is the countMatches function in Lines 5–10. When given the original list and a permutation, it marches through each and returns the number of fixed points. I’m sure there’s a more clever way of doing this, but I wasn’t interested in spending the time to be clever. Since the program as a whole is a brute force approach to the problem, I didn’t care if there was a brute force subroutine lurking within it.

Lines 12–17 just set up the table’s header, with the values of $m$ and a row of hyphens. The rest of the program loops through all the values of $n$ to create the row labels and body of the table.

Those of you who know your ASCII will recognize that the list of cards created in Line 21 consists of n capital letters in alphabetical order, starting with A. Line 22 initializes the counts list to a bunch of zeros.

The loop in Lines 24–26 goes through all the permutations and uses the countMatches function to increment the appropriate member of counts. For each item in counts, the index is the number of fixed points, $m$, and the value is the number of permutations with that many fixed points.

Finally, Lines 28–31 print out the row of the table associated with the current value of n.

The “m” label above the top header row and the “n” label to the left of the row labels were added by hand. It was easier than programming them in.

You’ll note that some of the printing is done with f-strings, so this will only run in Python 3.6 and above. But you could rewrite those lines to use the format method if you’re really stuck with an older version of Python.

If you ever run into a problem that requires rearranging or combining lists, keep itertools in mind.

# Arrangements and derangements

I’ve never thought of myself as particularly good at combinatorics or integer math in general. Most of my formal math study and the math I use in my professional life deals with the full continuity of real numbers, so that’s what I’m best at. But I do enjoy playing around outside my comfort zone.

I recently watched this Numberphile video from a few years ago. It’s about rearranging a set of items, with the goal being to determine the likelihood, after a random shuffle, that none of the items are in their original position.

When James Grime said the probability was 37%, I knew he was really talking about an asymptotic approach to $1/e$. Not because I’m a math genius, but because I’ve seen that come up many times in dealing with the binomial distribution and its asymptotic movement toward the Poisson distribution as the number of “events” gets large.

The description of why the probability of derangement (a lovely word choice there) tends toward $1/e$ as the number of shuffled items gets larger is explained, although not fully explained, in the video’s extra footage.

While it’s nice to see the series expansion of $e^{-1}$, what got me curious was the form of the individual terms of the series. It was the formula

$\frac{n!}{k!}$

for the alternating terms of the series. While writing out these terms (starting at about 3:30 in the second video), Dr. Grime says “You might have to think about this” and “Have a check about that.” So I did, and now I’m going to inflict my thinking on you.

I generally approach combinatoric problems by enumerating all the possibilities of a small number of items and then looking for the patterns of interest. Here, I started with a set of four cards, labeled A, B, C, and D. If we start with them in that order and shuffle, we could end up with any of the following 24 permutations:

A B C D        B A C D
A B D C        B A D C
A C B D        B C A D
A C D B        B C D A
A D B C        B D A C
A D C B        B D C A

C A B D        D A B C
C A D B        D A C B
C B A D        D B A C
C B D A        D B C A
C D A B        D C A B
C D B A        D C B A


Jumping immediately to the final answer of the first video, we can see there are 9 permutations in which none of the letters are in their original place, so the likelihood of a well-shuffled set of four cards coming out with none of the cards in their original place (i.e., zero fixed points) is

$\frac{9}{24} = \frac{3}{8} = 0.375$

which is, as Dr. Grime says, pretty close to $1/e \approx 0.3679$ or about 37%.

Now let’s try to come up with the individual terms of the series he wrote out in the second video. First, we’ll count up all the examples of where some of the cards are in their right places.

Card(s) in right place Count
A 6
B 6
C 6
D 6
A and B 2
A and C 2
A and D 2
B and C 2
B and D 2
C and D 2
A and B and C 1
A and B and D 1
A and C and D 1
B and C and D 1
A and B and C and D 1

Grouping these, we see that we have four 6s, six 2s, four 1s, and then another 1. These counts, of course, are not mutually exclusive. For example, the two cases in which A and B are in the right place overlap with cases of A being in the right place and B being in the right place. So if we wanted to count the permutations for which A or B are in their right place, we’d have to subtract off the overlap:

$6 + 6 - 2 = 10$

You can confirm this by going back to the list of all permutations. You’ll see that there are 10 with A or B in the right place: all six from the subset with A in first position, two from the subset with C in the first position, and two from the subset with D in the first position.

Returning to our table of counts, we can figure out the number of permutations with at least one fixed point:

$4 \cdot 6 - 6 \cdot 2 + 4 \cdot 1 - 1 = 24 - 12 + 4 - 1 = 15$

Therefore, the number of permutations with zero fixed points is

$24 - (24 - 12 + 4 - 1) = 24 - 15 = 9$

which is what we got by looking through the list directly.

How can we generalize this to any number of cards, $n$? The leading term, the one outside the parentheses, is the total number of permutations, so we already know it’s general form: $n!$.

Now let’s think about the first term within the parentheses, which represents all the permutaions in which at least one card is in the right spot. We’ll start by consider the cases where the A card is in the right spot. The number of arrangements in which A is in the right spot is the number of permutations of the other cards, which is $(n - 1)!$. We can say the same thing for each of the $n$ cards, so the number of permutations for which at least one card is in the right spot is

$n (n - 1)!$

In our example with $n = 4$, that’s the four 6s ($4\cdot3!$) at the top of the counts table. (Yes, we could simplify this to $n!$, but we’ll see soon that it makes more sense to leave it in this form).

Now we move on to the permutations for which at least two cards are in their original spots. If there are two cards in their original places, there are $n - 2$ other cards and $(n - 2)!$ ways to arrange them. And how many ways can 2 cards out of $n$ be in their right places? It’s the number of combinations of $n$ items taken 2 at a time. Therefore, the second term inside the parentheses is

$\binom{n}{2} (n - 2)!$

which matches the six 2s we got for $n = 4$.

At this point, we should step back and realize that our expression for the first term in the parentheses,

$n (n - 1)!$

could have been written as

$\binom{n}{1} (n - 1)!$

The pattern should be clear. Each term for $k$ fixed points is

$\binom{n}{k} (n - k)!$

and the terms have alternating signs for the reasons given by Dr. Grime in the second video.

Now let’s look again at the leading term outside the parentheses: $n!$. It may seem silly to do so, but this can be written as

$\binom{n}{0} (n - 0)!$

because

$\binom{n}{0} = 1$

for all values of $n$.

The method to this madness is that we now have all the terms—the leading term outside the parentheses and all the terms inside the parentheses—in the same form and we can bundle them together into nice compact summation:

$\sum_{k=0}^{n} (-1)^k \binom{n}{k} (n - k)!$

where the $(-1)^k$ term handles the alternating signs.

We still haven’t reached the form Dr. Grime showed. For that, we need to expand out the binomial terms and do a little algebra.

Recall that

$\binom{n}{k} = \frac{n!}{k! (n - k)!}$

So

$\sum_{k = 0}^n (-1)^k \binom{n}{k} (n - k)! = \sum_{k = 0}^n (-1)^k \frac{n!}{k! (n - k)!} (n - k)! = n! \sum_{k = 0}^n (-1)^k \frac{1}{k!}$

The summation part is

$\sum_{k=0}^n \frac{(-1)^k}{k!} = \frac{1}{0!} - \frac{1}{1!} + \frac{1}{2!} - \frac{1}{3!} + \dots \pm \frac{1}{n!}$

which is exactly what the second video shows (at about 4:35).

This is a series that tends toward $1/e$ as $n$ gets “large.” Since the denominators are growing by factorials, “large” isn’t very large at all. As we can see for $n = 4$, the series is already within 2% of $1/e$.

Now that we know the formula for the number of permutations with zero fixed points, we should be able to figure out the formulas for the numbers of permutations with exactly one, two, three, etc. fixed points. And we can.

For exactly one fixed point, the number of permutations is

$n! \sum_{k = 0}^{n-1} (-1)^k \frac{1}{k!}$

For exactly two fixed points, it’s

$\frac{n!}{2} \sum_{k = 0}^{n-2} (-1)^k \frac{1}{k!}$

For exactly three fixed points, it’s

$\frac{n!}{6} \sum_{k = 0}^{n-3} (-1)^k \frac{1}{k!}$

You have the pattern now. For exactly $m$ fixed points,1 it’s

$\frac{n!}{m!} \sum_{k = 0}^{n-m} (-1)^k \frac{1}{k!}$

I’d like to say I worked these formulas out on my own, but I didn’t. I cheated. I do think, though, that I cheated in an interesting way.

I wrote a short, brute force Python program that calculated the number of fixed permutations for $n$ from 2 to 8 and $m$ from 2 to $n$. It used the itertools library to generate all the permutations and looped through them, counting all the occurrences of zero, one, two, three, etc. fixed points. Here they are:2

                                   m
0      1      2      3      4      5      6      7      8
----------------------------------------------------------------
2 |      1      0      1
3 |      2      3      0      1
4 |      9      8      6      0      1
n 5 |     44     45     20     10      0      1
6 |    265    264    135     40     15      0      1
7 |   1854   1855    924    315     70     21      0      1
8 |  14833  14832   7420   2464    630    112     28      0      1


I then took these sequences (the columns) and searched for them in the Online Encyclopedia of Integer Sequences. The sequence for one fixed point is A000240, the one for two fixed points is A000387, the one for three fixed points is A000449, and the general one for any number of fixed points is A008290.

The nice thing about the OEIS is that it doesn’t just identify the sequences, it also gives you formulas and recurrence relations in several forms, tables, graphs, references, connections to other sequences, and even musical interpretations.

Apart from helping me find the formulas in the OEIS, the table showed a few other things:

• The numbers on the main diagonal ($m = n$) are all ones. It should be obvious that there’s only one way to arrange all the cards in their original places, but it’s nice to see that the math works out that way.
• The numbers immediately to the left of the main diagonal ($m = n-1$) are all zeros. This is only slightly less obvious: there’s no way to have just one of the cards out of its original place.
• The numbers in the 0 and 1 columns always differ by one. It’s clear from the formulas why this is so, but I can’t think of a simple narrative explanation.

I think I’ve mentioned before that my doctor likes his patients to do puzzles and other mental stimulation as they move toward senior citizen status. Whether the “use it or lose it” theory is valid, it can’t hurt. Maybe I’ll show him this post at my next checkup as proof that I’m following doctor’s orders.

1. I’ve carefully included the word “exactly” in these descriptions because I want to emphasize that they’re not the “at least” calculations we did earlier. Now that you’ve seen it a few times, we’ll take the “exactly” as given for the remainder of the post.

2. Sorry, I was too lazy to turn this into an HTML table.