July 12, 2015 at 8:29 PM by Dr. Drang
My iMac recently got a new hard drive from Apple through the 3TB Hard Drive Replacement Program. Because my old hard drive was still working fine, I thought Apple was going to clone it onto the new one, but that didn’t happen. Probably a privacy concern. Then I thought I’d be able to use either my Time Machine or SuperDuper backup to make the new drive like the old one, but for reasons I’m still not sure of—and which are too painful to recount, anyway—those options didn’t work out, either. So I’m in the process of rebuilding my system starting with a stock version of Yosemite 10.10.4. Copying over my home directory was simple enough, and I’m slowly reinstalling the various applications, utilities, and other digital toolsets my work computer has accreted over the years.
Python modules are usually pretty easy to install:
pip install modname almost always does the trick. But I decided not to install the
pyexiv2 library that a few of my photo-related scripts use. Pyexiv2 accesses the EXIF metadata embedded in JPEG files and is a wrapper around the
exiv2 C++ library, which in turn relies on Boost and SCons. I’ve been successful in installing all these in the past, but I really didn’t want to go through that rigamarole again. I’d much rather have a pure Python solution, because it’s easier to install now and easier to reinstall later.
I decided to go with Ben Leslie’s
pexif module. It’s under active development, and unlike the
exifread package, it allows for both reading and writing of the EXIF metadata. Although I don’t need writing for the script presented below, I do for another photo script. So although I think
exifread is a bit easier to use, I’m going with
pexif because I’ll need it eventually.
The first script I’m rewriting to use
pexif is called
canonize, which renames photo files according to the date on which they were taken. This is a script I use almost every week at work, so it’s important that I get a working version of it up and running right away. I’m sure there are ways to use Hazel or some other utility to do the same thing, but I’ve been using one form of
canonize or another for about 15 years, and I’m comfortable with it. The basic logic has remained the same for all that time—starting with a Perl version and now through three Python versions—so rewriting it to use a new EXIF library was no big deal.
Here’s the latest version of
python: 1: #!/usr/bin/env python 2: 3: import docopt 4: import pexif 5: import os 6: import os.path 7: import sys 8: 9: usage = """Usage: 10: canonize [options] FILE... 11: canonize [options] -f 12: 13: Rename JPEG photo files according to the date taken. 14: 15: Options: 16: -f get filenames from STDIN instead of command line 17: -s SSS optional suffix [default: drang] 18: -n NNN start with this number [default: 1] 19: -t show the renaming but don't do it 20: -h show this help message 21: 22: The format for the file name is yyyymmddsss-nnn.jpg, where 23: yyyy is the year, mm is the month number, dd is the day, sss 24: is the optional suffix (which can be any length), and nnn is 25: the (zero-padded) photo number for that day. By default, the 26: original file names are given on the command line; if the -f 27: option is used, the original file names are taken from 28: STDIN.""" 29: 30: # Handle the command line options. 31: args = docopt.docopt(usage) 32: suffix = args['-s'] 33: start = int(args['-n']) 34: test = args['-t'] 35: filtrate = args['-f'] 36: 37: # Get the file list and create a list of (filedate, filename) tuples. 38: if filtrate: 39: filenames = sys.stdin.read().split() 40: else: 41: filenames = args['FILE'] 42: filedates =  43: for f in filenames: 44: info = pexif.JpegFile.fromFile(f).exif.primary 45: try: 46: d = info.ExtendedEXIF.DateTimeOriginal 47: filedates.append((d, f)) 48: except AttributeError: # skip over files without EXIF info 49: continue 50: 51: # Don't bother going on if there aren't any files in the list. 52: if len(filedates) == 0: 53: sys.exit() 54: 55: # Some background info: 56: # DateTimeOriginal is a string in the form 'yyyy:mm:dd hh:mm:ss'. 57: # All the numbers use leading zeros if necessary; the hours use a 58: # 24-hour clock format. An alphabetic sort on strings in this form 59: # also sorts on date and time. Running split() on this string yields 60: # a (date, time) tuple. 61: 62: # Sort the files according to date and time taken. 63: filedates.sort() 64: 65: # Create a list of (oldfilename, newfilename) tuples. 66: newnames =  67: i = start - 1 # initialize the sequence number 68: prev = filedates.split() # initialize the date 69: for date, old in filedates: 70: current = date.split() 71: if current == prev: # still on same date 72: i += 1 73: else: # starting new date 74: i = 1 75: prev = current 76: path = os.path.dirname(old) 77: new = os.path.join(path, 78: "%s%s-%03d.jpg" % (current.replace(':', ''), suffix, i)) 79: if new in filenames: 80: sys.stderr.write("Error: %s is already being used\n" % new) 81: sys.exit() 82: else: 83: newnames.append((old, new)) 84: 85: # Rename the files or print out how they would be renamed. 86: if test: 87: for o,n in newnames: 88: print "%s -> %s" % (o, n) 89: else: 90: for o,n in newnames: 91: os.rename(o,n)
canonize does is pretty simple and is, I think, fully explained in the
usage string on Lines 9–28. It renames each photo file according to the date on which it was taken and the order in which it was taken on that date. (I know some people like to include the time in their photo names, but I’ve always preferred a simple counter.) There are some options that I seldom use for changing the details of the renaming or the source of the filenames.
On Line 31,
docopt uses the
usage string to parse the options and arguments and returns them all in a dictionary named
args. Lines 32–35 turn the items of
args into simple variables with nicer names.
pexif module gets used in the loop in Lines 43–49. For each file, the EXIF information is read in Line 44 and the
DateTimeOriginal field is plucked out in Line 46. This is the date and time, in
yyyy:mm:dd hh:mm:ss format, at which the photo was taken. That date/time stamp is then used to sort the files in chronological order in Line 63 and determine their new names in Lines 66–83.
Lines 44 and 46 show why I’m not entirely thrilled with
pexif. While it makes perfect sense that the attribute I’m after is called
DateTimeOriginal, it’s not at all obvious that it should be buried under the three layers of
ExtendedEXIF. I understand that this structure comes the EXIF spec, but I shouldn’t have to dig though the spec (or
pexif’s source code) to find clues to the module’s attribute hierarchy. That’s what documentation is for, something
pexif is sorely lacking.
On the other hand,
pexif has some very nice helper functions for getting and setting GPS data, and they’ll be very helpful when I rewrite my other photo-handling script.
docopt really does work as well as Rob Wells said. The only thing I’m uncomfortable with is its name—I’m not sure why.