Summaries
November 16, 2012 at 7:55 PM by Dr. Drang
Following up on this post on note-taking, I now have a set of scripts for turning my Siri-dictated plain-text notes into a nice PDF summary that includes links to specific pages within the document I’m summarizing. The scripts are in this GitHub repository.
I’m not especially proud of these scripts. They were whipped together quickly from a few other scripts I already had lying around, which is why one is Python, one is PHP, and one is a shell script that wraps everything together. They work, but they could stand some cleaning up and consolidation.
The key is wkpdf
, a Ruby Cocoa script that uses WebKit to convert HTML to PDF. Frederico Viticci mentioned it a few months ago in a MacStories article, and this is the first time I’ve had a chance to use it. Apart from the fact that it really doesn’t deal well with standard input and output—which is why I needed to create and later delete a temporary HTML file in the shell script—wkpdf
seems to work well.
There’s nothing especially tricky or interesting about any of the three scripts. How they work together is described in the repository’s README, which I’ve reproduced below.
Summary
A set of scripts for producing a nicely formatted summary of a document in PDF form from a simple text file. The summary is organized according to the page numbers in the original (PDF) document and includes links to those pages in the original.
This system was inspired by Walton Jones’s method of summarizing academic papers.
Simple text file
The original file is assumed to be a PDF. The summary, which must be saved in the same directory as the original file, is written in this form:
Title: Example document
Subtitle: November 16, 2012
File: example.pdf
Starting page: 1
Starting sheet: 1
Pages per sheet: 1
5
The material on page 5 is interesting and this is a brief
description of what's on it.
7-8
Pages 7 and 8 also have something I think is worth remembering,
and I've described it here.
11
Page 11 is the last page with interesting content.
Of the header lines at the top, only Title and File are required. I often use the Subtitle line for the date of the original document, but it can be used for any extra information.
The other lines account for the possibilities that the page numbers printed on the original document don’t always correspond to the page numbers in the PDF and that PDFs are often formatted as two-up or four-up, with more than one logical page per physical page.
The body of the summary consists of several stanzas, each starting with a line giving the page number(s) and followed by a brief description of what’s on that page. Hyphens are used to indicate page ranges.
I describe how I go about dictating the body of the summary in this blog post.
Scripts
There are three scripts in the repository:
summary2md
is a Python script that transforms the simple summary format above into Markdown. The page numbers are turned into links to those pages in the original document.summary.php
is a PHP script that transforms, via PHP Markdown and SmartyPants processors, the Markdown produced bysummary2md
into a full HTML document.summary2pdf
is a shell script that processes the simple summary file throughsummary2md
,summary.php
, and, finally,wkpdf
to generate a nicely formatted PDF version of the summary.
wkpdf
is a processor written in Ruby Cocoa that uses WebKit to transform an HTML document into a PDF. It must be installed in the user’s $PATH
so summary2pdf
can execute it.
A couple of customizations you’ll likely need to make are:
- Lines 3 and 4 of
summary.php
give the paths to the PHP Markdown and SmartyPants processors. - Line 4 of
summary2pdf
gives the paths tosummary2md
andsummary.php
.
You may also want to change the fonts and other <style>
choices given in summary.php
.
Use
If your simple summary file is named example-summary.txt
, then running
summary2pdf example-summary.txt
will generate a new file, example-summary.pdf
, in the same folder as example.pdf
and example-summary.txt
.
The page numbers in example-summary.pdf
will be links to the corresponding pages in example.pdf
. I’ve found that neither Preview nor PDFpenPro will follow these links properly, but Skim will.