A double library transplant

With a new EXIF library in place, I rewrote my canonize photo renaming utility to take advantage of it. Canonize was my motivation for finding a new EXIF library in the first place. It’s a command-line program that renames photos based on the date and time they were taken. It does so by reading the EXIF metadata in the photo file and extracting the DateTimeOriginal field. The name is a bad pun on the idea of a canonical filename for the photos and the fact that I use Canon cameras.

Canonize relies on the pyexif library, which works fine but doesn’t allow for writing EXIF data, only reading. Canonize doesn’t need to write EXIF data, but I have plans to write other scripts that do need to write, and I want to standardize on a single library for all my EXIF work.

Substituting pyexiv2 methods for pyexif methods was really easy—only 3-4 lines needed to be changed and the changes themselves were obvious. I’ll point them out in a bit. But since I had the hood open, it seemed like a good time to switch out another library, the options parsing library.

Before today, canonize used the optparse library, which was the standard high-level library for handling command-line options in Python 2.6. It was deprecated in 2.7 in favor of the argparse library. You might think I’d upgrade to argparse, but I decided to move instead to the simpler getopt library, which doesn’t have all the bells and whistles of the other libraries but is plenty capable for my elementary needs and is unlikely to be deprecated because it’s written to mimic the venerable getopt() C function.

So here’s the new source code:

python:
  1:  #!/usr/bin/env python
  2:  
  3:  import pyexiv2
  4:  import getopt
  5:  import os
  6:  import os.path
  7:  import sys
  8:  
  9:  # Options and help messages.
 10:  usage = """Usage: canonize [options] [list of files]
 11:  
 12:  Options:
 13:    -s sss    optional suffix
 14:    -f        get filenames from STDIN instead of command line
 15:    -t        show the renaming but don't do it
 16:    -h        show this help message
 17:  
 18:  Rename a list of photo files (JPEGs) according to the date
 19:  on which they were taken. The format for the file name is
 20:  yyyymmddsss-nnn.jpg, where yyyy is the year, mm is the month
 21:  number, dd is the day, sss is the optional suffix (which can
 22:  be any length), and nnn is the (zero-padded) photo number
 23:  for that day. By default, the original file names are given
 24:  on the command line; if the -f option is used, the original
 25:  file names are taken from STDIN."""
 26:  
 27:  # Handle the command line options.
 28:  try:
 29:    options, filenames = getopt.getopt(sys.argv[1:], 's:fth')
 30:  except getopt.GetoptError, err:
 31:    print str(err)
 32:    sys.exit(2)
 33:  
 34:  filtrate = False    # default for -f
 35:  suffix = ''         # default for -s
 36:  test = False        # default for -t
 37:  for o, a in options:
 38:    if o == '-s':
 39:      suffix = a
 40:    elif o == '-f':
 41:      filtrate = True
 42:    elif o == '-t':
 43:      test = True
 44:    else:
 45:      print usage
 46:      sys.exit()
 47:  
 48:  # Get the file list and create a list of (filedate, filename) tuples.
 49:  if filtrate:
 50:    filenames = sys.stdin.read().split()
 51:  filedates = []
 52:  for f in filenames:
 53:    info = pyexiv2.ImageMetadata(f)
 54:    try:                              # skip over files without EXIF info
 55:      info.read()
 56:      d = info['Exif.Photo.DateTimeOriginal'].raw_value
 57:      filedates.append((d, f))
 58:    except KeyError:
 59:      continue
 60:  
 61:  # Don't bother going on if there aren't any files in the list.
 62:  if len(filedates) == 0:
 63:    sys.exit()
 64:  
 65:  # Some background info:
 66:  # DateTimeOriginal is a string in the form 'yyyy:mm:dd hh:mm:ss'.
 67:  # All the numbers use leading zeros if necessary; the hours use a
 68:  # 24-hour clock format. An alphabetic sort on strings in this form
 69:  # also sorts on date and time. Running split() on this string yields
 70:  # a (date, time) tuple.
 71:  
 72:  # Sort the files according to date and time taken.   
 73:  filedates.sort()
 74:  
 75:  # Create a list of (oldfilename, newfilename) tuples.
 76:  newnames = []
 77:  i = 0                               # initialize the sequence number
 78:  prev = filedates[0][0].split()[0]   # initialize the date 
 79:  for date, old in filedates:
 80:    current = date.split()[0]
 81:    if current == prev:               # still on same date
 82:      i += 1
 83:    else:                             # starting new date
 84:      i = 1
 85:      prev = current
 86:    path = os.path.dirname(old)
 87:    new = os.path.join(path,
 88:      "%s%s-%03d.jpg" % (current.replace(':', ''), suffix, i))
 89:    if new in filenames:
 90:      sys.stderr.write("Error: %s is already being used\n" % new)
 91:      sys.exit()
 92:    else:
 93:      newnames.append((old, new))
 94:  
 95:  # Rename the files or print out how they would be renamed.
 96:  if test:
 97:    for o,n in newnames:
 98:      print "%s -> %s" % (o, n)
 99:  else:
100:    for o,n in newnames:
101:      os.rename(o,n)

One of the nice things about using getopt is that because it doesn’t cobble together a usage message from disparate help strings across the code—as the other option parsing libraries do—it encourages you to put together a nice, monolithic usage message. There’s no advantage in this to the user, but there’s a big advantage to the programmer to see the code’s raison d’etre together in one spot.

The command-line handling is done in Lines 27-46 and it’s pretty obvious what’s going on. The most important things to know are:

The new EXIF library is called in Lines 53, 55, and 56. If you look back to an earlier version of canonize, you’ll see that these lines are nearly one-for-one replacements of lines that called the previous library. That’s why it was so easy to make the switch.

Now that I’m comfortable with pyexiv2, I’ll start putting it into scripts that do more interesting things. My first thoughts are