Photo file renaming, again

Quite a while ago, I wrote a post about a Perl script that renamed digital photo files according to the date on which they were taken. I’ve recently rewritten that program in Python, making the logic cleaner, taking out some features that I never used, and adding some features that were missing. I still like the name canonize, so that’s what I’ve named the new version; the old version is still hanging around in by ~/bin directory with the clever name old-canonize.

Here’s the listing for the new version. (Changed as of 8/5/07. See the update at the bottom of the post.) I like to think the source code is self-explanatory, mainly because of long description I gave it in lines 12-19.

 1:  #!/usr/bin/env python
 2:  
 3:  from pyexif import parse
 4:  from optparse import OptionParser
 5:  import os
 6:  import os.path
 7:  import sys
 8:  
 9:  # Options and help messages.
10:  use = '%prog [options] [list of files]'
11:  
12:  desc = '''Rename a list of photo files (JPEGs) according to
13:  the date on which they were taken. The format for the file
14:  name is yyyymmddsss-nnn.jpg, where yyyy is the year, mm is the
15:  month number, dd is the day, sss is the optional suffix (which
16:  can be any length), and nnn is the (zero-padded) photo number
17:  for that day. By default, the original file names are given
18:  on the command line; if the -f option is used, the original
19:  file names are taken from STDIN.'''
20:  
21:  parser = OptionParser(usage = use ,description = desc)
22:  parser.add_option('-f', '--filter',
23:    action = 'store_true',
24:    dest = 'filter',
25:    help = 'get file names from STDIN instead of the command line')
26:  parser.add_option('-s', '--suffix',
27:    action = 'store', type='string',
28:    dest='suffix',
29:    help = 'suffix string' )
30:  parser.add_option('-t', '--test',
31:    action = 'store_true',
32:    dest = 'test',
33:    help = "print how the names will change, but don't do it")
34:  parser.set_defaults(filter = False, suffix = '', test = False)
35:  (options, args) = parser.parse_args()
36:  
37:  # Get the file list and create a list of (filedate, filename) tuples.
38:  # Also create a list of files that don't have the DateTimeOriginal tag.
39:  # These "odd" files may be non-JPEGs, or they maybe JPEGs that don't have
40:  # the right EXIF data.
41:  if options.filter:
42:    filenames = sys.stdin.read().split()
43:  else:
44:    filenames = args
45:  filedates = []
46:  oddfiles = []
47:  for f in filenames:
48:    info = parse(f)
49:    try:
50:      d = str(info['DateTimeOriginal'])
51:      filedates.append((d, f))
52:    except KeyError:
53:      oddfiles.append(os.path.basename(f))
54:  
55:  # Report the odd files and stop
56:  if len(oddfiles) > 0:
57:    print 'No EXIF dates:'
58:    print '  '.join(oddfiles)
59:    sys.exit()
60:  
61:  # Some background info:
62:  # DateTimeOriginal is a string in the form 'yyyy:mm:dd hh:mm:ss'.
63:  # All the numbers use leading zeros if necessary; the hours use a
64:  # 24-hour clock format. An alphabetic sort on strings in this form
65:  # also sorts on date and time. Running split() on this string yields
66:  # a (date, time) tuple.
67:  
68:  # Sort the files according to date and time taken.   
69:  filedates.sort()
70:  
71:  # Create a list of (oldfilename, newfilename) tuples.
72:  newnames = []
73:  i = 0                               # initialize the sequence number
74:  prev = filedates[0][0].split()[0]   # initialize the date 
75:  for date,old in filedates:
76:    current = date.split()[0]
77:    if current == prev:               # still on same date
78:      i += 1
79:    else:                             # starting new date
80:      i = 1
81:      prev = current
82:    dir = os.path.dirname(old)
83:    new = os.path.join(dir,
84:      '%s%s-%03d.jpg' % (current.replace(':', ''), options.suffix, i))
85:    if new in filenames:
86:      print 'Name conflict:'
87:      print "'%s' is already being used." % os.path.basename(new)
88:      sys.exit()
89:    else:
90:      newnames.append((old, new))
91:  
92:  # Rename the files or print out how they would be renamed.
93:  if options.test:
94:    for o,n in newnames:
95:      print '%s -> %s' % (o, n)
96:  else:
97:    for o,n in newnames:
98:      os.rename(o,n)

As you can see on line 3, it uses a module called pyexif. This is a pure Python module that you can download from its SourceForge site. The file is actually called exif.py, but because I already had a file named EXIF.py in my library, and the Mac doesn’t do case-sensitive file names, I changed the name to reflect SourceForge project name.

I had an EXIF.py in my library because I had already downloaded and tested a similarly-themed module from this site. In fact, the pyexif home page suggests that EXIF.py is an improvement. It may be an improvement in some ways, but speed isn’t one of them. A version of canonize using EXIF.py took over 5 times as long to rename a typical (for me) list of files. For example, on my G4 iBook, renaming a set of about 200 files takes just a few seconds using the pyexif module, but almost a minute using EXIF.

What I removed from the old version of canonize was the ability to have the file numbering start at something other than 1. When I first wrote the program, I thought that would be very important, but I never used it. What I added to the program is the ability to put a suffix after the date string. I’ve found myself cataloging photos taken on the same day by several photographers, and appending the photographers’ initials has made the cataloging much easier.

I also added the ability to use canonize as a filter, so I could pipe the list of files to it rather than putting them on the command line. I haven’t used this so far, but I think it will be helpful if I incorporate this program into an Automator workflow.

Update
By using a combination of Platypus and CocoaDialog, I’ve created a GUI application that renames the files when you drop them on it. It works pretty well--and I’ll be writing a post about it soon--but I found while testing it that canonize needed to be a bit more strict.

As written in the original version of this post, canonize would silently ignore files passed to it that did not have a DateTimeOriginal EXIF tag. This was in keeping with the “don’t complain, do something” philosophy of Larry Wall (can’t find a link, but I’m pretty sure it’s in Programming Perl). I figured that any files that didn’t have a DateTimeOriginal tag would not be digital photo files (even if they were JPEGs), and no harm would be done by skipping over them and renaming all the others. It turns out that I have several digital photo files that don’t have the DateTimeOriginal tag--it appears to have been stripped out when the photos were rotated--and leaving them alone while renaming the others can make a big mess of inconsistent file names. It’s a mess that can be recovered from, but it’s much better not to have the mess in the first place.

The changes to canonize are in lines 37-59. The names of any files without the DateTimeOriginal tag are collected in a list. If that list has any items in it, canonize quits and reports the oddball files. No files are renamed, not even the ones that do have the DateTimeOriginal tag. Canonize will not rename any of the files passed to it unless all of them can be renamed. This is safer behavior, and it’s more in keeping with the way name conflicts are handled in lines 85-90.

Tags: