My report writing workflow

Blame Kieran Healy for this. Last night he posted a nice account of his writing workflow, with flow charts, lists of tools, examples of input and output, and, most important, a cogent explanation of how and why he developed his way of working. If you haven’t already seen it, go there now—I’ll wait.

♫ Tall and tan and young and lovely, the girl from Ipanema goes… ♫

Oh, you’re back. Good. Yes, that was nice of him to quote from one of my posts. But let’s move on to the blame thing. Prof. Healy’s post inspired me to draw up my own workflow for the reports I write.1

Writing workflow

There’s a lot of stuff in there, but my input comes only at the points where you see my picture. The workflow starts with me writing the text of the report in Markdown format. That file invariably contains references to photographs, drawings, or plots that’ll be included in the final output. These are usually prepared before I start writing, but are, more often than not, edited, annotated, or otherwise adjusted as I write the text, because the process of writing gives me new ideas on how to present the information.

The Markdown file is processed by a shell script, md2report, which spits out a LaTeX version of the report. Most of the time this file is ready to be processed by pdflatex, which also gathers in all the images files and creates the PDF output. If the report needs a particularly complex table, or if I need to tweak the spacing somewhere, I’ll go in and edit the LaTeX file directly. This is pretty rare, which is why I’ve grayed out my input to the LaTeX file. The great majority of my reports need no direct LaTeX input from me.

My goal is for almost all of my effort to be spent on the content of the report and almost none on styling and processing. The styles are all defined in auxiliary files and scripts that I wrote long ago. The only processing I do consists of invocations of the two executables on the main spine of the workflow:

md2report report
pdflatex report

(They’re smart enough to know the file extensions.)

The md2report script is really just a pipeline. It runs the Markdown file through mmmd, which is my fork of Fletcher Penney’s MultiMarkdown, and John Gruber’s Smarty Pants to produce XHTML. This is then piped through xsltproc to produce LaTeX. The XSL file that defines the transformation, xhtml2article.xsl, is my fork of Fletcher’s original. Finally, the LaTeX is piped through a couple of small scripts, addsignature and separateplates, to add some final formatting touches.

I should mention here that I forked MultiMarkdown and the XSL file in 2005 or 2006 so I could include equations in my reports (which MultiMarkdown didn’t support at the time) and use a LaTeX style file, report.sty, that I’d developed some years earlier, when I wrote my reports in LaTeX directly.2 If I were starting today, I’d use the current MultiMarkdown (or Pandoc, as Kieran does), but since what I have works, there’s no incentive for me to change. My job is to produce reports, not workflows.

I did spend quite a while developing this workflow, but that was many years ago and the effort has paid off several times over. Randall Munroe isn’t always right.


  1. I’m pretty sure that by linking the image below directly to the original size—which you can get to by clicking on it—I’m violating Flickr’s terms of service. I hope you appreciate the risk I’m taking. Also, here’s the Flickr image page I’m supposed to link to. 

  2. I wrote about this a few years ago