Better option parsing in Python (maybe)

I’ve bitched in the past about Python’s option parsing modules. It’s not just the modules themselves I dislike, it’s the Python Standard Library’s back-and-forth support for different modules. Perhaps it’s unsurprising that the most attractive option parsing module comes from outside the Standard Library.

Option parsing modules are collections of functions or methods that help you write command-line programs that conform to the usual Unix pattern: the command itself, followed by one or more optional switches designated by leading dashes, followed by one or more arguments. The ncdc script I wrote about a couple of days ago is an example. The usage message is

usage: ncdc [options] date
Return NCDC weather report for the given date.
Options:
  -a     : return ASCII file instead of HTML
  -d     : return daily summary instead of hourly reports
  -m     : entire month, not just a single day
  -p     : precipitation (hourly only, overrides -d)
  -s STA : the station abbreviation
           O'Hare     ORD (default)
           Midway     MDW
           Palwaukee  PWK
           Aurora     ARR
           Waukegan   UGN
           Lewis      LOT
           DuPage     DPA
           Lansing    IGQ
           Joliet     JOT
           Kankakee   IKK
           Gary       GYY
  -h     : print this message

which lays out all of the options for changing the behavior of the script.

Instead of having to write your own code to tediously work through the sys.argv list and account for every permutation of the switches (including combinations like -adm), you import an option parsing module and use its functions to do that work for you. The problem is that in Python 2.7, there are three such modules.

There’s getopt, which is the simplest and least powerful. It’s based on the C and shell functions of the same name and provides a sort of bare-bones approach. I tend to use getopt in my scripts because it’s the easiest to remember, it allows me to write my usage message the way I want, and it doesn’t force me into a coding style I find convoluted. The downside is that parsing with getopt often takes up many lines, even when you don’t have that many options. Here’s an example of getopt from the Python docs:

python:
import getopt, sys

def main():
    try:
        opts, args = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="])
    except getopt.GetoptError as err:
        # print help information and exit:
        print str(err) # will print something like "option -a not recognized"
        usage()
        sys.exit(2)
    output = None
    verbose = False
    for o, a in opts:
        if o == "-v":
            verbose = True
        elif o in ("-h", "--help"):
            usage()
            sys.exit()
        elif o in ("-o", "--output"):
            output = a
        else:
            assert False, "unhandled option"

Then there’s optparse, which was the hot thing several years ago. Back in the Python 2.6 days, optparse was the recommended library, and I wrote a few scripts with it, despite finding it difficult. Yes, it required fewer lines of code that getopt to do the parsing, but the lines tended to be long, and I had a hard time remembering all the arguments to the add_option command. Here’s an example of optparse from the docs:

python:
from optparse import OptionParser
[...]
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
                  help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
                  action="store_false", dest="verbose", default=True,
                  help="don't print status messages to stdout")

(options, args) = parser.parse_args()

What’s supposed to be cool about optparse is how it automatically puts the help arguments together into a usage message. It is cool, but I could never wrap my mind around all that action, dest, metavar stuff.

In some ways, I’m glad I didn’t spend a lot of time getting comfortable with optparse, because with Python 2.7, it was deprecated. Now we’re all supposed to use argparse. But to me, argparse feels an awful lot like optparse. Here’s an example from the docs:

python:
import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                   help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                   const=sum, default=max,
                   help='sum the integers (default: find the max)')

args = parser.parse_args()
print args.accumulate(args.integers)

You will not be surprised that I found this no easier to use than optparse. You will also not be surprised that I was annoyed to learn that the optparse code I felt compelled to write in 2.6 wasn’t going to survive another Python version bump. This is when I decided to go back to getopt—it’s too traditional to get deprecated.

This morning, though, my RSS feed had this post from Rob Wells, describing how option parsing Python programs in general—and my ncdc script in particular—could be streamlined by using Vladimir Keleshev’s docopt module.

I won’t recapitulate Rob’s post; you should just go read it. But I will say that what’s very appealing about docopt is that it turns optparse and argparse on their heads. Instead of writing code that generates a usage message, you write a usage message that docopt turns into code. Well, not code exactly, but a dictionary of all the switches and arguments that can be easily incorporated into code. For example, if I had used docopt in my ncdc code, and run the script like this,

ncdc -da -s MDW 12/25/2014

the parsing phase would end with this dictionary:

python:
{'-a': True,
 '-d': True,
 '-h': False,
 '-m': False,
 '-p': False,
 '-s': 'MDW',
 'DATE': '12/25/2014'}

You can imagine how straightforward it would be to fit this dictionary into the subsequent logic of the script.

The usefulness of docopt depends on how easy it is to follow its rules for writing the usage message. Although the rules are relatively flexible, you can’t just format your usage message any way you please. Fortunately, the rules aren’t too different from my usual practice (see how little Rob had to change to get a docopt-compliant message), so I think I could use it without the kind of mental gymnastics I had to use to work with optparse.