A double library transplant

With a new EXIF library in place, I rewrote my canonize photo renaming utility to take advantage of it. Canonize was my motivation for finding a new EXIF library in the first place. It’s a command-line program that renames photos based on the date and time they were taken. It does so by reading the EXIF metadata in the photo file and extracting the DateTimeOriginal field. The name is a bad pun on the idea of a canonical filename for the photos and the fact that I use Canon cameras.

Canonize relies on the pyexif library, which works fine but doesn’t allow for writing EXIF data, only reading. Canonize doesn’t need to write EXIF data, but I have plans to write other scripts that do need to write, and I want to standardize on a single library for all my EXIF work.

Substituting pyexiv2 methods for pyexif methods was really easy—only 3-4 lines needed to be changed and the changes themselves were obvious. I’ll point them out in a bit. But since I had the hood open, it seemed like a good time to switch out another library, the options parsing library.

Before today, canonize used the optparse library, which was the standard high-level library for handling command-line options in Python 2.6. It was deprecated in 2.7 in favor of the argparse library. You might think I’d upgrade to argparse, but I decided to move instead to the simpler getopt library, which doesn’t have all the bells and whistles of the other libraries but is plenty capable for my elementary needs and is unlikely to be deprecated because it’s written to mimic the venerable getopt() C function.

So here’s the new source code:

  1  #!/usr/bin/env python
  2  
  3  import pyexiv2
  4  import getopt
  5  import os
  6  import os.path
  7  import sys
  8  
  9  # Options and help messages.
 10  usage = """Usage: canonize [options] [list of files]
 11  
 12  Options:
 13    -s sss    optional suffix
 14    -f        get filenames from STDIN instead of command line
 15    -t        show the renaming but don't do it
 16    -h        show this help message
 17  
 18  Rename a list of photo files (JPEGs) according to the date
 19  on which they were taken. The format for the file name is
 20  yyyymmddsss-nnn.jpg, where yyyy is the year, mm is the month
 21  number, dd is the day, sss is the optional suffix (which can
 22  be any length), and nnn is the (zero-padded) photo number
 23  for that day. By default, the original file names are given
 24  on the command line; if the -f option is used, the original
 25  file names are taken from STDIN."""
 26  
 27  # Handle the command line options.
 28  try:
 29    options, filenames = getopt.getopt(sys.argv[1:], 's:fth')
 30  except getopt.GetoptError, err:
 31    print str(err)
 32    sys.exit(2)
 33  
 34  filtrate = False    # default for -f
 35  suffix = ''         # default for -s
 36  test = False        # default for -t
 37  for o, a in options:
 38    if o == '-s':
 39      suffix = a
 40    elif o == '-f':
 41      filtrate = True
 42    elif o == '-t':
 43      test = True
 44    else:
 45      print usage
 46      sys.exit()
 47  
 48  # Get the file list and create a list of (filedate, filename) tuples.
 49  if filtrate:
 50    filenames = sys.stdin.read().split()
 51  filedates = []
 52  for f in filenames:
 53    info = pyexiv2.ImageMetadata(f)
 54    try:                              # skip over files without EXIF info
 55      info.read()
 56      d = info['Exif.Photo.DateTimeOriginal'].raw_value
 57      filedates.append((d, f))
 58    except KeyError:
 59      continue
 60  
 61  # Don't bother going on if there aren't any files in the list.
 62  if len(filedates) == 0:
 63    sys.exit()
 64  
 65  # Some background info:
 66  # DateTimeOriginal is a string in the form 'yyyy:mm:dd hh:mm:ss'.
 67  # All the numbers use leading zeros if necessary; the hours use a
 68  # 24-hour clock format. An alphabetic sort on strings in this form
 69  # also sorts on date and time. Running split() on this string yields
 70  # a (date, time) tuple.
 71  
 72  # Sort the files according to date and time taken.   
 73  filedates.sort()
 74  
 75  # Create a list of (oldfilename, newfilename) tuples.
 76  newnames = []
 77  i = 0                               # initialize the sequence number
 78  prev = filedates[0][0].split()[0]   # initialize the date 
 79  for date, old in filedates:
 80    current = date.split()[0]
 81    if current == prev:               # still on same date
 82      i += 1
 83    else:                             # starting new date
 84      i = 1
 85      prev = current
 86    path = os.path.dirname(old)
 87    new = os.path.join(path,
 88      "%s%s-%03d.jpg" % (current.replace(':', ''), suffix, i))
 89    if new in filenames:
 90      sys.stderr.write("Error: %s is already being used\n" % new)
 91      sys.exit()
 92    else:
 93      newnames.append((old, new))
 94  
 95  # Rename the files or print out how they would be renamed.
 96  if test:
 97    for o,n in newnames:
 98      print "%s -> %s" % (o, n)
 99  else:
100    for o,n in newnames:
101      os.rename(o,n)

One of the nice things about using getopt is that because it doesn’t cobble together a usage message from disparate help strings across the code—as the other option parsing libraries do—it encourages you to put together a nice, monolithic usage message. There’s no advantage in this to the user, but there’s a big advantage to the programmer to see the code’s raison d’etre together in one spot.

The command-line handling is done in Lines 27-46 and it’s pretty obvious what’s going on. The most important things to know are:

The new EXIF library is called in Lines 53, 55, and 56. If you look back to an earlier version of canonize, you’ll see that these lines are nearly one-for-one replacements of lines that called the previous library. That’s why it was so easy to make the switch.

Now that I’m comfortable with pyexiv2, I’ll start putting it into scripts that do more interesting things. My first thoughts are