August 4, 2007 at 6:33 PM by Dr. Drang
Quite a while ago, I wrote a post about a Perl script that renamed digital photo files according to the date on which they were taken. I’ve recently rewritten that program in Python, making the logic cleaner, taking out some features that I never used, and adding some features that were missing. I still like the name
canonize, so that’s what I’ve named the new version; the old version is still hanging around in by
~/bin directory with the clever name
Here’s the listing for the new version. (Changed as of 8/5/07. See the update at the bottom of the post.) I like to think the source code is self-explanatory, mainly because of long description I gave it in lines 12-19.
1: #!/usr/bin/env python 2: 3: from pyexif import parse 4: from optparse import OptionParser 5: import os 6: import os.path 7: import sys 8: 9: # Options and help messages. 10: use = '%prog [options] [list of files]' 11: 12: desc = '''Rename a list of photo files (JPEGs) according to 13: the date on which they were taken. The format for the file 14: name is yyyymmddsss-nnn.jpg, where yyyy is the year, mm is the 15: month number, dd is the day, sss is the optional suffix (which 16: can be any length), and nnn is the (zero-padded) photo number 17: for that day. By default, the original file names are given 18: on the command line; if the -f option is used, the original 19: file names are taken from STDIN.''' 20: 21: parser = OptionParser(usage = use ,description = desc) 22: parser.add_option('-f', '--filter', 23: action = 'store_true', 24: dest = 'filter', 25: help = 'get file names from STDIN instead of the command line') 26: parser.add_option('-s', '--suffix', 27: action = 'store', type='string', 28: dest='suffix', 29: help = 'suffix string' ) 30: parser.add_option('-t', '--test', 31: action = 'store_true', 32: dest = 'test', 33: help = "print how the names will change, but don't do it") 34: parser.set_defaults(filter = False, suffix = '', test = False) 35: (options, args) = parser.parse_args() 36: 37: # Get the file list and create a list of (filedate, filename) tuples. 38: # Also create a list of files that don't have the DateTimeOriginal tag. 39: # These "odd" files may be non-JPEGs, or they maybe JPEGs that don't have 40: # the right EXIF data. 41: if options.filter: 42: filenames = sys.stdin.read().split() 43: else: 44: filenames = args 45: filedates =  46: oddfiles =  47: for f in filenames: 48: info = parse(f) 49: try: 50: d = str(info['DateTimeOriginal']) 51: filedates.append((d, f)) 52: except KeyError: 53: oddfiles.append(os.path.basename(f)) 54: 55: # Report the odd files and stop 56: if len(oddfiles) > 0: 57: print 'No EXIF dates:' 58: print ' '.join(oddfiles) 59: sys.exit() 60: 61: # Some background info: 62: # DateTimeOriginal is a string in the form 'yyyy:mm:dd hh:mm:ss'. 63: # All the numbers use leading zeros if necessary; the hours use a 64: # 24-hour clock format. An alphabetic sort on strings in this form 65: # also sorts on date and time. Running split() on this string yields 66: # a (date, time) tuple. 67: 68: # Sort the files according to date and time taken. 69: filedates.sort() 70: 71: # Create a list of (oldfilename, newfilename) tuples. 72: newnames =  73: i = 0 # initialize the sequence number 74: prev = filedates.split() # initialize the date 75: for date,old in filedates: 76: current = date.split() 77: if current == prev: # still on same date 78: i += 1 79: else: # starting new date 80: i = 1 81: prev = current 82: dir = os.path.dirname(old) 83: new = os.path.join(dir, 84: '%s%s-%03d.jpg' % (current.replace(':', ''), options.suffix, i)) 85: if new in filenames: 86: print 'Name conflict:' 87: print "'%s' is already being used." % os.path.basename(new) 88: sys.exit() 89: else: 90: newnames.append((old, new)) 91: 92: # Rename the files or print out how they would be renamed. 93: if options.test: 94: for o,n in newnames: 95: print '%s -> %s' % (o, n) 96: else: 97: for o,n in newnames: 98: os.rename(o,n)
As you can see on line 3, it uses a module called
pyexif. This is a pure Python module that you can download from its SourceForge site. The file is actually called
exif.py, but because I already had a file named
EXIF.py in my library, and the Mac doesn’t do case-sensitive file names, I changed the name to reflect SourceForge project name.
I had an
EXIF.py in my library because I had already downloaded and tested a similarly-themed module from this site. In fact, the pyexif home page suggests that
EXIF.py is an improvement. It may be an improvement in some ways, but speed isn’t one of them. A version of
EXIF.py took over 5 times as long to rename a typical (for me) list of files. For example, on my G4 iBook, renaming a set of about 200 files takes just a few seconds using the
pyexif module, but almost a minute using
What I removed from the old version of
canonize was the ability to have the file numbering start at something other than 1. When I first wrote the program, I thought that would be very important, but I never used it. What I added to the program is the ability to put a suffix after the date string. I’ve found myself cataloging photos taken on the same day by several photographers, and appending the photographers’ initials has made the cataloging much easier.
I also added the ability to use
canonize as a filter, so I could pipe the list of files to it rather than putting them on the command line. I haven’t used this so far, but I think it will be helpful if I incorporate this program into an Automator workflow.
By using a combination of Platypus and CocoaDialog, I’ve created a GUI application that renames the files when you drop them on it. It works pretty well—and I’ll be writing a post about it soon—but I found while testing it that
canonize needed to be a bit more strict.
As written in the original version of this post,
canonize would silently ignore files passed to it that did not have a DateTimeOriginal EXIF tag. This was in keeping with the “don’t complain, do something” philosophy of Larry Wall (can’t find a link, but I’m pretty sure it’s in Programming Perl). I figured that any files that didn’t have a DateTimeOriginal tag would not be digital photo files (even if they were JPEGs), and no harm would be done by skipping over them and renaming all the others. It turns out that I have several digital photo files that don’t have the DateTimeOriginal tag—it appears to have been stripped out when the photos were rotated—and leaving them alone while renaming the others can make a big mess of inconsistent file names. It’s a mess that can be recovered from, but it’s much better not to have the mess in the first place.
The changes to
canonize are in lines 37-59. The names of any files without the DateTimeOriginal tag are collected in a list. If that list has any items in it,
canonize quits and reports the oddball files. No files are renamed, not even the ones that do have the DateTimeOriginal tag.
Canonize will not rename any of the files passed to it unless all of them can be renamed. This is safer behavior, and it’s more in keeping with the way name conflicts are handled in lines 85-90.