Summaries

Following up on this post on note-taking, I now have a set of scripts for turning my Siri-dictated plain-text notes into a nice PDF summary that includes links to specific pages within the document I’m summarizing. The scripts are in this GitHub repository.

I’m not especially proud of these scripts. They were whipped together quickly from a few other scripts I already had lying around, which is why one is Python, one is PHP, and one is a shell script that wraps everything together. They work, but they could stand some cleaning up and consolidation.

The key is wkpdf, a Ruby Cocoa script that uses WebKit to convert HTML to PDF. Frederico Viticci mentioned it a few months ago in a MacStories article, and this is the first time I’ve had a chance to use it. Apart from the fact that it really doesn’t deal well with standard input and output—which is why I needed to create and later delete a temporary HTML file in the shell script—wkpdf seems to work well.

There’s nothing especially tricky or interesting about any of the three scripts. How they work together is described in the repository’s README, which I’ve reproduced below.


Summary

A set of scripts for producing a nicely formatted summary of a document in PDF form from a simple text file. The summary is organized according to the page numbers in the original (PDF) document and includes links to those pages in the original.

This system was inspired by Walton Jones’s method of summarizing academic papers.

Simple text file

The original file is assumed to be a PDF. The summary, which must be saved in the same directory as the original file, is written in this form:

Title: Example document
Subtitle: November 16, 2012
File: example.pdf
Starting page: 1
Starting sheet: 1
Pages per sheet: 1

5
The material on page 5 is interesting and this is a brief
description of what's on it.

7-8
Pages 7 and 8 also have something I think is worth remembering,
and I've described it here.

11
Page 11 is the last page with interesting content.

Of the header lines at the top, only Title and File are required. I often use the Subtitle line for the date of the original document, but it can be used for any extra information.

The other lines account for the possibilities that the page numbers printed on the original document don’t always correspond to the page numbers in the PDF and that PDFs are often formatted as two-up or four-up, with more than one logical page per physical page.

The body of the summary consists of several stanzas, each starting with a line giving the page number(s) and followed by a brief description of what’s on that page. Hyphens are used to indicate page ranges.

I describe how I go about dictating the body of the summary in this blog post.

Scripts

There are three scripts in the repository:

wkpdf is a processor written in Ruby Cocoa that uses WebKit to transform an HTML document into a PDF. It must be installed in the user’s $PATH so summary2pdf can execute it.

A couple of customizations you’ll likely need to make are:

You may also want to change the fonts and other <style> choices given in summary.php.

Use

If your simple summary file is named example-summary.txt, then running

summary2pdf example-summary.txt

will generate a new file, example-summary.pdf, in the same folder as example.pdf and example-summary.txt.

The page numbers in example-summary.pdf will be links to the corresponding pages in example.pdf. I’ve found that neither Preview nor PDFpenPro will follow these links properly, but Skim will.