Better option parsing in Python (maybe)
June 30, 2015 at 11:52 PM by Dr. Drang
I’ve bitched in the past about Python’s option parsing modules. It’s not just the modules themselves I dislike, it’s the Python Standard Library’s back-and-forth support for different modules. Perhaps it’s unsurprising that the most attractive option parsing module comes from outside the Standard Library.
Option parsing modules are collections of functions or methods that help you write command-line programs that conform to the usual Unix pattern: the command itself, followed by one or more optional switches designated by leading dashes, followed by one or more arguments. The ncdc
script I wrote about a couple of days ago is an example. The usage message is
usage: ncdc [options] date
Return NCDC weather report for the given date.
Options:
-a : return ASCII file instead of HTML
-d : return daily summary instead of hourly reports
-m : entire month, not just a single day
-p : precipitation (hourly only, overrides -d)
-s STA : the station abbreviation
O'Hare ORD (default)
Midway MDW
Palwaukee PWK
Aurora ARR
Waukegan UGN
Lewis LOT
DuPage DPA
Lansing IGQ
Joliet JOT
Kankakee IKK
Gary GYY
-h : print this message
which lays out all of the options for changing the behavior of the script.
Instead of having to write your own code to tediously work through the sys.argv
list and account for every permutation of the switches (including combinations like -adm
), you import an option parsing module and use its functions to do that work for you. The problem is that in Python 2.7, there are three such modules.
There’s getopt
, which is the simplest and least powerful. It’s based on the C and shell functions of the same name and provides a sort of bare-bones approach. I tend to use getopt
in my scripts because it’s the easiest to remember, it allows me to write my usage message the way I want, and it doesn’t force me into a coding style I find convoluted. The downside is that parsing with getopt
often takes up many lines, even when you don’t have that many options. Here’s an example of getopt
from the Python docs:
python:
import getopt, sys
def main():
try:
opts, args = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="])
except getopt.GetoptError as err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized"
usage()
sys.exit(2)
output = None
verbose = False
for o, a in opts:
if o == "-v":
verbose = True
elif o in ("-h", "--help"):
usage()
sys.exit()
elif o in ("-o", "--output"):
output = a
else:
assert False, "unhandled option"
Then there’s optparse
, which was the hot thing several years ago. Back in the Python 2.6 days, optparse
was the recommended library, and I wrote a few scripts with it, despite finding it difficult. Yes, it required fewer lines of code that getopt
to do the parsing, but the lines tended to be long, and I had a hard time remembering all the arguments to the add_option
command. Here’s an example of optparse
from the docs:
python:
from optparse import OptionParser
[...]
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose", default=True,
help="don't print status messages to stdout")
(options, args) = parser.parse_args()
What’s supposed to be cool about optparse
is how it automatically puts the help
arguments together into a usage message. It is cool, but I could never wrap my mind around all that action
, dest
, metavar
stuff.
In some ways, I’m glad I didn’t spend a lot of time getting comfortable with optparse
, because with Python 2.7, it was deprecated. Now we’re all supposed to use argparse
. But to me, argparse
feels an awful lot like optparse
. Here’s an example from the docs:
python:
import argparse
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
const=sum, default=max,
help='sum the integers (default: find the max)')
args = parser.parse_args()
print args.accumulate(args.integers)
You will not be surprised that I found this no easier to use than optparse
. You will also not be surprised that I was annoyed to learn that the optparse
code I felt compelled to write in 2.6 wasn’t going to survive another Python version bump. This is when I decided to go back to getopt
—it’s too traditional to get deprecated.
This morning, though, my RSS feed had this post from Rob Wells, describing how option parsing Python programs in general—and my ncdc
script in particular—could be streamlined by using Vladimir Keleshev’s docopt
module.
I won’t recapitulate Rob’s post; you should just go read it. But I will say that what’s very appealing about docopt
is that it turns optparse
and argparse
on their heads. Instead of writing code that generates a usage message, you write a usage message that docopt
turns into code. Well, not code exactly, but a dictionary of all the switches and arguments that can be easily incorporated into code. For example, if I had used docopt
in my ncdc
code, and run the script like this,
ncdc -da -s MDW 12/25/2014
the parsing phase would end with this dictionary:
python:
{'-a': True,
'-d': True,
'-h': False,
'-m': False,
'-p': False,
'-s': 'MDW',
'DATE': '12/25/2014'}
You can imagine how straightforward it would be to fit this dictionary into the subsequent logic of the script.
The usefulness of docopt
depends on how easy it is to follow its rules for writing the usage message. Although the rules are relatively flexible, you can’t just format your usage message any way you please. Fortunately, the rules aren’t too different from my usual practice (see how little Rob had to change to get a docopt
-compliant message), so I think I could use it without the kind of mental gymnastics I had to use to work with optparse
.