December 31, 2017 at 10:05 AM by Dr. Drang
I’ve been reading political blogger Kevin Drum’s work since his Calpundit days. I like his writing, and I particularly like that he tends to illustrate his points with graphs even though his graphing style has often left me cold. Yesterday he wrote a post with a graph and discussed how hew went about making it. I have some thoughts.
First, let’s look at the graph itself.
The idea is to look for a correlation between GDP growth and employment. A better way to do this might be to plot these two quantities against one another, but that would lose the temporal aspect of the data, which Drum apparently wants to include in the presentation. His main concern is whether it was right to use two y axes on the same plot.
Kieran Healy wrote about the evils of two y axes a couple of years ago, and I wrote a response shortly after. The problem with this kind of plot is the freedom it gives the chartmaker to fiddle with the scales of the axes to make the items being plotted look more or less correlated.
But Drum’s not worried about being misleading, he’s worried about being too complicated:
So the question is: Does the clearer presentation of the relationship make up for the added complexity of the chart? And is there a better way to show it? I’d answer definitely yes to the first question, and usually no to the second. Sometimes there is a better way, but not always. Sometimes it’s either a dual y-axis or nothing.
No, it isn’t. In my post referenced above, I took a chart with two y axes that I’d made several year ago and recast it with two subplots. I made a similar graph recently that combined three categories of data into a single plot:
Here, I’m plotting data for the cooling system in an industrial vehicle during use. The top subplot covers the coolant pressure at various points in the system, the middle subplot covers the coolant temperature at those same points, and the bottom subplot gives the engine speed. This was a quick chart for my own use; if I were going to put it in a report for others to read, I would have moved the subplots closer together and eliminated the tick labels from the x axes of the top and middle subplots. I also would have used colors more friendly to colorblind readers.
Is this cheating? Am I really making three plots and calling them one? Sort of, but I think it’s fair to call it a single plot because of the alignment and choice of scales.1 You need to read up and down the whole set to see what’s going on.
So there are ways of presenting data of different units and wildly different scales without resorting to multiple y axes. Even so, the two y axes aren’t what I dislike most about Drums’s chart. My main complaints are with his x axis:
- First, the choice of tick spacing and positioning makes no sense. If you’re making a political point, you should use four-year spacing aligned with presidential administrations. If you’re not going to use administrations, just label the decades and use tick marks between them. Five-year spacing on the twos and sevens is something clearly decided by software; it’s not how humans think.
- Second, there should be vertical gridlines. If precision in reading the data is important enough to add a horizontal grid, it’s important enough to add a vertical grid as well. I’ve been seeing a lot of graphs with half-assed grids lately, so I assume this is the default in some graphing software.
- Third—and I admit this is a personal peeve—there’s no point in tilting the x axis tick labels. It’s just an affectation. The years are tilted 30–35° from the horizontal, which means they take up over 80% of the space that untilted labels would. The savings isn’t worth it, and it draws attention away from the data.
Which leads me to Drum’s interpretation of Edward Tufte:
I’ve been a big fan of chart guru Edward Tufte for decades, and his mantra was to simplify as much as possible and to ruthlessly eliminate “chart junk.” This is good advice, but ever since Tufte became popular it’s become advice that many people take too far (as Tufte himself did later in life, I think). Eventually you get to the point where you’re making it harder to read a chart because it’s become so spare that it lacks the visual cues readers expect. You can eliminate gridlines entirely, for example, but that makes it harder on the reader who wants to look at a chart carefully and get a real sense of the data behind it. When you sacrifice that, you can easily end up with a wiggly curve that’s more just a directional symbol (something is going up, or down, or U-shaped) than a true chart.
I appreciate the argument against Bezos-style graphs, but this is a misreading of Tufte. His concern is with the primacy of the data, not simplifying. Gridlines should be muted (which Drum does a good job of) but not eliminated if they serve the presentation of the data. Tufte’s main admonition is to keep the skeleton of the chart from drawing attention away from the flesh. Which is why I dislike tilted tick labels.
Furthermore, Tufte’s writings argue strongly for information dense graphs. No advocate for simplification would love Minard’s map of Napolean’s invasion of Russia as much as Tufte does.
By the way, the y axis scales were chosen to be consistent over a series of charts like this one. That’s why the plotted data don’t fill the space. ↩