Converting PDFs to JPEGs

Update 11/24/09
The script given in this post worked for me in Leopard but not in Snow Leopard. See this post for more information.

Last week I had several multipage PDFs I needed to print. The PDFs were sets of annotated photographs, and I wanted to have the printing done on the nice photo printers at my local Costco. Costco, it seems, doesn’t like PDFs; its photo uploading software won’t even let you choose PDF files from your hard disk to sent to its servers. So I had to generate JPEGs, one for every page of every PDF.

In theory, OS X has a couple of obvious ways to do this. Plan A was to open each PDF in Preview and do a Save As to convert it to a JPEG. But this proved to be far too slow to be practical, because I had to do a four-step Save As routine—choose JPEG format, choose resolution, close the new JPEG, reopen the original PDF—for each page in each document, of which there were dozens. Plan B was the creation of an Automator workflow. Automator has a “Render PDF Pages as Images” action that seems perfectly suited to the task, but for some reason I could never get the rendered JPEGs saved onto my Desktop. Even if I had gotten the workflow to work, Automator’s file renaming capabilities are so limited that I wouldn’t have liked the result.

So, off to Google, where I found this wonderful AppleScript/Python hybrid droplet from Martin Michel and Dinu Gherman. It takes one or more PDFs and spits out all the pages as individual JPEGs. Even better, it uses the name of the PDF file as the base for the JPEG names. I dumped the AppleScript part (I have no use for droplets) and pared down the Python to just the bits I needed:

 1:  #!/usr/bin/python
 2:  
 3:  # author: Martin Michel
 4:  # eMail: martin.michel@macscripter.net
 5:  # created: 01.04.2008
 6:  # modified by Dr. Drang (drdrang@gmail.com) 2009-11-20
 7:  
 8:  # Thanks to Dinu C. Gherman for providing the
 9:  # code example on http://python.net/~gherman/pdf2tiff.html
10:  
11:  import sys
12:  import os
13:  from os.path import splitext
14:  from objc import YES, NO
15:  from Foundation import NSData
16:  from AppKit import *
17:  
18:  NSApp = NSApplication.sharedApplication()
19:  
20:  def pdf2jpg(pdfpath, resolution=300):
21:      """I am converting all pages of a PDF file to JPG images."""
22:      
23:      pdfdata = NSData.dataWithContentsOfFile_(pdfpath)
24:      pdfrep = NSPDFImageRep.imageRepWithData_(pdfdata)
25:      pagecount = pdfrep.pageCount()
26:      for i in range(0, pagecount):
27:          pdfrep.setCurrentPage_(i)
28:          pdfimage = NSImage.alloc().init()
29:          pdfimage.addRepresentation_(pdfrep)
30:          origsize = pdfimage.size()
31:          width, height = origsize
32:          pdfimage.setScalesWhenResized_(YES)
33:          rf = resolution / 72.0
34:          pdfimage.setSize_((width*rf, height*rf))
35:          tiffimg = pdfimage.TIFFRepresentation()
36:          bmpimg = NSBitmapImageRep.imageRepWithData_(tiffimg)
37:          data = bmpimg.representationUsingType_properties_(NSJPEGFileType, {NSImageCompressionFactor: 1.0})
38:          jpgpath = "%s-%02d.jpg" % (splitext(pdfpath)[0], i+1)
39:          if not os.path.exists(jpgpath):
40:              data.writeToFile_atomically_(jpgpath, False)
41:  
42:  if __name__ == '__main__':
43:      for pdfpath in sys.argv[1:]:
44:          pdf2jpg(pdfpath)
45:  

My only changes to the pdf2jpg function were in:

The code around pdf2jpg was changed so I could use it as a command-line tool instead of a droplet.

Since I know virtually no Cocoa, my understanding of the guts of pdf2jpg is pretty limited. I can make a pretty good guess as to what all those NS calls are doing, but I wouldn’t feel comfortable trying to change any of them. Fortunately, I don’t have to; the script works really well as is.

I have the script saved as pdf2jpg. It’s invoked from the command line like this:

pdf2jpg docA.pdf docB.pdf docC.pdf

and the result is a series of JPEGs,

docA-01.jpg       docB-01.jpg       docC-01.jpg
docA-02.jpg       docB-02.jpg       docC-02.jpg
docA-03.jpg       docB-03.jpg       docC-03.jpg
docA-04.jpg                         docC-04.jpg
docA-05.jpg                         docC-05.jpg
docA-06.jpg

where docA.pdf, docB.pdf, and docC.pdf are 6-, 3-, and 5-page PDFs, respectively.

Tags: