PDFs and JPEGs again
November 24, 2009 at 10:54 PM by Dr. Drang
So I tried that pdf2jpg
script today at work and it failed. Oh, it made a JPEG for every page of the PDF I gave it, but the JPEGs were at only a 72 dpi resolution. That’s 792×612, not nearly enough for a decent print. What went wrong?
Well, in one sense, nothing went wrong. The script still works on my iBook G4 running Leopard, which is where I used it last week. Today’s failure occurred on my office computer, which is an Intel iMac running Snow Leopard. I really doubt the processor has anything to do with the difference in behavior; my guess is that something has changed in either Cocoa or PyObjC.
Here’s the pdf2jpg
script again:
1: #!/usr/bin/python
2:
3: # author: Martin Michel
4: # eMail: martin.michel@macscripter.net
5: # created: 01.04.2008
6: # modified by Dr. Drang (drdrang@gmail.com) 2009-11-20
7:
8: # Thanks to Dinu C. Gherman for providing the
9: # code example on http://python.net/~gherman/pdf2tiff.html
10:
11: import sys
12: import os
13: from os.path import splitext
14: from objc import YES, NO
15: from Foundation import NSData
16: from AppKit import *
17:
18: NSApp = NSApplication.sharedApplication()
19:
20: def pdf2jpg(pdfpath, resolution=300):
21: """I am converting all pages of a PDF file to JPG images."""
22:
23: pdfdata = NSData.dataWithContentsOfFile_(pdfpath)
24: pdfrep = NSPDFImageRep.imageRepWithData_(pdfdata)
25: pagecount = pdfrep.pageCount()
26: for i in range(0, pagecount):
27: pdfrep.setCurrentPage_(i)
28: pdfimage = NSImage.alloc().init()
29: pdfimage.addRepresentation_(pdfrep)
30: origsize = pdfimage.size()
31: width, height = origsize
32: pdfimage.setScalesWhenResized_(YES)
33: rf = resolution / 72.0
34: pdfimage.setSize_((width*rf, height*rf))
35: tiffimg = pdfimage.TIFFRepresentation()
36: bmpimg = NSBitmapImageRep.imageRepWithData_(tiffimg)
37: data = bmpimg.representationUsingType_properties_(NSJPEGFileType, {NSImageCompressionFactor: 1.0})
38: jpgpath = "%s-%02d.jpg" % (splitext(pdfpath)[0], i+1)
39: if not os.path.exists(jpgpath):
40: data.writeToFile_atomically_(jpgpath, False)
41:
42: if __name__ == '__main__':
43: for pdfpath in sys.argv[1:]:
44: pdf2jpg(pdfpath)
The resolution of the JPEG is set in Lines 33-37, where the image of the PDF page is supposed to be upsized (Lines 33-34), made into a TIFF image (Line 35), and then converted into a JPEG, (Lines 36-37). After a bit of experimentation involving writing out TIFF files before the JPEG conversion, it appears to me that the problem is in Line 35: the TIFF image isn’t being created at the 300 dpi resolution. So there’s either something different in Snow Leopard’s version TIFFRepresentation
, or the PyObjC that comes with Snow Leopard is calling it wrong. Maybe the setScalesWhenResized_
call in Line 32 isn’t working right.
Or maybe my diagnosis is completely off and the problem lies somewhere else entirely.
So how can I convert multipage PDFs to a series of JPEGs when I’m at work? It turns out that Automator, which last week I couldn’t get to do the conversion on the iBook (but see below), can do the conversion just fine on the iMac. Here’s what the workflow looks like:
This isn’t the greatest solution in the world, because every new file is called “Sheet-xx.jpeg” instead of taking its name from the original PDF, but I suppose that can be fixed by adding an AppleScript or shell script to the workflow. Or, conversely, maybe the workflow can be called from a script that handles the names. Even if the names stay as they are, at least I’ve got the JPEGs I need.
And about the Automator workflow not performing on my iBook? Well…er…I tried it again tonight, and it worked just fine. Don’t know what I was doing wrong last week.
So now I’m left with at decision: should I try to get pdf2jpg
to work on the iMac, or should I try to improve the file naming in the Automator workflow. The former appeals to me because I might actually learn something about Cocoa or PyObjC, but the latter seems more likely to yield quick results.