Not dead - finished

Last week, I wanted to run my photo renaming script, canonize, on my (relatively) new MacBook Air. I had the script itself installed, but I’d forgotten to install a critical Python library for reading EXIF data.1 As I was hunting it down (easy_install couldn’t find it), I was reminded that many people have uninformed prejudices against old software. There’s a tendency to think that a program or library that hasn’t been updated in a long time is worthless and has been abandoned. Often, though, it’s because the code is done.

The library I needed was pyexif by Martin Blais and Chris Stromberger. It’s hosted at SourceForge and hasn’t been updated since 2001. But it does exactly what I need, so I downloaded and installed it. Canonize works perfectly with it.

The pyexif home page mentions another Python EXIF library, While Googling to see if that’s been improved since the last time I looked at it—it was really slow compared to pyexif—I came across this blog post from 2006 in which the author complains about both libraries:

Oh great, the last comment in big red letters (from 2002) said that they hadn’t done anything on the [pyexif] library for a long time, f***. And I should go and check out “Gene Cash’s library”, which I finally found here. To judge by the comments this one has also been last updated in 2004. Oh man, I think I am going back to PHP, there will surely be up-to-date libraries.

This is stupid. The issue is not how old the software is, it’s whether it does the job. I have several scripts sitting on my computer that I haven’t updated in a decade or more. But I know they work because I use them at least a few times a month.

In fact, older software should be more reliable than newer software. The reliability of mechanical systems is typically governed by what’s called, for obvious reasons, the bathtub curve.

Bathtub curve for mechanical systems

When a mechanical system is brand new, inherent errors in design or manufacturing lead to high early failure rates—this is the wear in phase. When a system is really old, corrosion, fatigue, and wear take their toll, and the failure rate becomes high again—this is the wear out phase. In between, the failure rate is generally low and steady; failures in this phase tend to come from random overloading or misuse.2

Software, though, doesn’t wear out. Once the major bugs get shaken out, it should be at a steady state reliability level indefinitely. Now it’s true that old software can be made completely unusable if an upgrade to the underlying operating system changes some vital library. GUI applications are particularly susceptible to this because OS vendors like to fiddle with things to make their products look new and exciting, but console-based programs—like canonize and the pyexif library—can last for a very long time.

When I was writing the second installment of my “Text files and me” series, I went searching for links to some of the tools I used back in the ’90s. James Clark’s NSGMLS and David Megginson’s SGMLS Perl module haven’t been updated in about 15 years. They’re still fine.

The best example of an old program that no one would call dead is TeX. Don Knuth declared it feature-complete over 20 years ago and—because he’s Don Knuth—hasn’t had to do many bug fixes since.

Kernighan and Pike’s The Practice of Programming, which came out about 10 years ago, was written in troff, some parts of which would have been 25 years old. I believe Kernighan still uses awk, which is now over 30 years old.

Two morals to this story, I guess:

  1. Don’t ignore old software; if it still runs at all, it’s likely to be quite reliable.
  2. Do a good job when writing your own programs. You may find yourself still using them when you’re old and gray.

  1. Canonize renames photos based on the day they were taken, turning filenames like IMG_2345.JPG to 20110309-001.jpg. It does this by reading the file’s EXIF metadata to get the date and time when the picture was taken. The part before the dash is the date in yyyymmdd format, and the part after the photo index for that date. The advantages of this format are that it tells you immediately when you took the photo and alphabetical order is the same as chronological order. 

  2. This also applies to people and animals. The wear in phase is infant mortality, and the wear out phase is old age.