The Matplotlib documentation problem

I was on Myke Hurley’s CMD+Space podcast this week, which was a lot of fun, except for the parts where I felt myself sliding into incoherency. In any event, if you want to hear about my addiction to British TV quiz shows on YouTube, give it a listen.

I mentioned during the interview that production here at the ol’ ANIAT factory has dropped recently, primarily due to an uptick in work at my real job. One of the things that’s been eating up my time the past couple of weeks is a long series of mechanical tests and the analysis of their results. In the process, I’ve been generating lots of plots—hundreds if you count each subplot in a small multiples set—and I’ve learned a few things about Matplotlib, the Python 2D plotting library. It’s fair to say that I’ve learned them almost in spite of the documentation.

If you’ve used Matplotlib yourself, you surely know that its documentation is a mess. There’s plenty of good information in there, but it’s so poorly organized that a new user can’t help but feel adrift. The problem is that Matplotlib is a multilevel library. At its core is a hierarchical object-oriented set of classes and methods for the various parts of a plot. Wrapped around this core—in the pyplot module—is a skin of convenience functions that allow you to do a lot of plotting without concerning yourself with the objects underneath. I like this structure, but the documentation just doesn’t explain it well.

There is, for example, no single place in which the hierarchy of objects is explained. An organization chart would do wonders in making the relationships clear. I’ve thought about making one myself, but I don’t feel confident enough in my understanding of the relationships to do so.

One thing I can say with confidence is that the new user should stay away from all but the first few sections of the User’s Guide. After the pyplot tutorial, which can get you started with some simple plots using the convenience functions, the User’s Guide wanders off into a series of topics for which the new user is totally unprepared. The terminology is unfamiliar, the ordering of the sections is inexplicable, and even the motivation for most of the sections is vague.

Speaking of terminology, I must pause here and complain about one aspect of Matplotlib itself: the Axes class is horribly named. First, it’s a plural word used for singular instances, so the documentation is littered with references to “an Axes,” which is jarring to any native speaker of English. Even worse, the name of the Axes class is thoroughly misleading. If you know what the word axes means, you would think that the Axes class would be restricted to things like plot limits, log vs. linear scaling, tick mark spacing, tick labeling, and so on. But you’d be wrong. All of the methods for plotting data are Axes methods, which makes no sense. Why the Axes class wasn’t called Plot or Frame or Graph is beyond me, but that name has been my biggest source of confusion when reading the documentation. Even after I knew that Axes covered more than just the axes, my natural tendency to assign the usual meaning to the word would lead me astray.

I’ve mentioned before Edsger Dijkstra’s dictum:

Besides a mathematical inclination, an exceptionally good mastery of one’s native tongue is the most vital asset of a competent programmer.

The Matplotlib people could have made their library easier to use if they’d thought more about naming this critical class.

Update 7/28/13
Reader Ilya Brook (@ibrook) tells me via Twitter that the Axes name was copied from MATLAB. The Matplotlib people have tried to make transitioning from MATLAB as easy as possible, so I guess that extends to reusing bad class names. We should be grateful they didn’t call it MATPLOTLIB.

Getting back to my main thread, there are two good sections of the Matplotlib documentation. The first is the Usage FAQ, which starts with this paragraph:

matplotlib has an extensive codebase that can be daunting to many new users. However, most of matplotlib can be understood with a fairly simple conceptual framework and knowledge of a few important points.

The FAQ goes on to explain what I mentioned earlier, that Matplotlib1 has an object-oriented core with a procedural wrapper. It gives a decent explanation of why the library is structured that way and an overall philosophy of how to use it. Why this is buried in a FAQ instead of being the very first thing in the documentation is beyond me, but my advice to new users is to go here right after the pyplot tutorial and to return to this section whenever you’re confused about how to use Matplotlib effectively.

The other good section of the documentation is the pyplot plotting commands summary and the subsequent detail of each of the commands. When the Matplotlib documentation discusses specifics, which it does in this section, it’s very good. It might seem a little terse in places, especially when you see a **kwargs argument in the command definition and don’t see a list of what those named arguments can be. Don’t despair: the full explanation of the named arguments is given elsewhere and there will be a link to it.

Before I sat down to write, this post was going to give a couple of Matplotlib tips on using multiple vertical axes in a single plot and creating minor tick marks, but I seem to have gone astray. I’ll try to do the tips post tonight.

  1. The documentation is inconsistent in its capitalization of Matplotlib. Usually it’s uncapitalized, but it is capitalized in titles and section headings. Then again, it’s uncapitalized at the beginnings of sentences. I’ve decided that trusting the Matplotlib documenation writers on matters of English usage is foolish and will treat the name as a proper noun.