# PDFs and JPEGs again

So I tried that pdf2jpg script today at work and it failed. Oh, it made a JPEG for every page of the PDF I gave it, but the JPEGs were at only a 72 dpi resolution. That’s 792×612, not nearly enough for a decent print. What went wrong?

Well, in one sense, nothing went wrong. The script still works on my iBook G4 running Leopard, which is where I used it last week. Today’s failure occurred on my office computer, which is an Intel iMac running Snow Leopard. I really doubt the processor has anything to do with the difference in behavior; my guess is that something has changed in either Cocoa or PyObjC.

Here’s the pdf2jpg script again:

 1:  #!/usr/bin/python
2:
3:  # author: Martin Michel
4:  # eMail: martin.michel@macscripter.net
5:  # created: 01.04.2008
6:  # modified by Dr. Drang (drdrang@gmail.com) 2009-11-20
7:
8:  # Thanks to Dinu C. Gherman for providing the
9:  # code example on http://python.net/~gherman/pdf2tiff.html
10:
11:  import sys
12:  import os
13:  from os.path import splitext
14:  from objc import YES, NO
15:  from Foundation import NSData
16:  from AppKit import *
17:
18:  NSApp = NSApplication.sharedApplication()
19:
20:  def pdf2jpg(pdfpath, resolution=300):
21:      """I am converting all pages of a PDF file to JPG images."""
22:
23:      pdfdata = NSData.dataWithContentsOfFile_(pdfpath)
24:      pdfrep = NSPDFImageRep.imageRepWithData_(pdfdata)
25:      pagecount = pdfrep.pageCount()
26:      for i in range(0, pagecount):
27:          pdfrep.setCurrentPage_(i)
28:          pdfimage = NSImage.alloc().init()
30:          origsize = pdfimage.size()
31:          width, height = origsize
32:          pdfimage.setScalesWhenResized_(YES)
33:          rf = resolution / 72.0
34:          pdfimage.setSize_((width*rf, height*rf))
35:          tiffimg = pdfimage.TIFFRepresentation()
36:          bmpimg = NSBitmapImageRep.imageRepWithData_(tiffimg)
37:          data = bmpimg.representationUsingType_properties_(NSJPEGFileType, {NSImageCompressionFactor: 1.0})
38:          jpgpath = "%s-%02d.jpg" % (splitext(pdfpath)[0], i+1)
39:          if not os.path.exists(jpgpath):
40:              data.writeToFile_atomically_(jpgpath, False)
41:
42:  if __name__ == '__main__':
43:      for pdfpath in sys.argv[1:]:
44:          pdf2jpg(pdfpath)


The resolution of the JPEG is set in Lines 33-37, where the image of the PDF page is supposed to be upsized (Lines 33-34), made into a TIFF image (Line 35), and then converted into a JPEG, (Lines 36-37). After a bit of experimentation involving writing out TIFF files before the JPEG conversion, it appears to me that the problem is in Line 35: the TIFF image isn’t being created at the 300 dpi resolution. So there’s either something different in Snow Leopard’s version TIFFRepresentation, or the PyObjC that comes with Snow Leopard is calling it wrong. Maybe the setScalesWhenResized_ call in Line 32 isn’t working right.

Or maybe my diagnosis is completely off and the problem lies somewhere else entirely.

So how can I convert multipage PDFs to a series of JPEGs when I’m at work? It turns out that Automator, which last week I couldn’t get to do the conversion on the iBook (but see below), can do the conversion just fine on the iMac. Here’s what the workflow looks like:

This isn’t the greatest solution in the world, because every new file is called “Sheet-xx.jpeg” instead of taking its name from the original PDF, but I suppose that can be fixed by adding an AppleScript or shell script to the workflow. Or, conversely, maybe the workflow can be called from a script that handles the names. Even if the names stay as they are, at least I’ve got the JPEGs I need.

And about the Automator workflow not performing on my iBook? Well…er…I tried it again tonight, and it worked just fine. Don’t know what I was doing wrong last week.

So now I’m left with at decision: should I try to get pdf2jpg to work on the iMac, or should I try to improve the file naming in the Automator workflow. The former appeals to me because I might actually learn something about Cocoa or PyObjC, but the latter seems more likely to yield quick results.