Plotting Apple
July 22, 2015 at 1:06 AM by Dr. Drang
Big day here at Drang World Headquarters. My position as a leading thinkfluencer in the Apple graphing vertical was solidified, as both Six Colors and MacStories began using moving averages plots to illustrate Apple’s sales and revenue figures. This is a major blow to Kieran Healy, who was hoping for more LOESS.
But amid the celebration, I realized that I had screwed up three months ago. It was my intention to follow up the post in which I showed a simple moving average plot with one in which I presented the Python/Matplotlib code that generated it. The moving average calculations are no big deal, but I thought some of the script’s date manipulations were interesting and something I’d like to memorialize for later reference. I just forgot to do it.
Here’s the plot, updated with the sales figures Apple released today. Every data point in the graph (except for the first three for each device) represents an average of the sales for four quarters: the current quarter and the previous three. The idea behind the averaging is to smooth out the strongly seasonal ups and downs in the raw quarterly data.
The data came from text files with lines that look like this:
2014-Q1 26.04
2014-Q2 16.35
2014-Q3 13.28
2014-Q4 12.32
where the quarter and the sales (in millions of units) are separated by a tab character. There’s a file for the Mac, one for the iPhone, and one for the iPad. The following script reads the data files and generates a plot in PNG format.
python:
1: #!/usr/bin/env python
2:
3: from dateutil.relativedelta import *
4: from datetime import date
5: from sys import stdin, argv, exit
6: import numpy as np
7: import matplotlib.pyplot as plt
8: import matplotlib.dates as mdates
9: from matplotlib.ticker import MultipleLocator
10:
11: # Initialize
12: phoneFile = 'iphone-sales.txt'
13: padFile = 'ipad-sales.txt'
14: macFile = 'mac-sales.txt'
15: lastYear = 2000
16: plotFile = argv[1]
17: if plotFile[-4:] != '.png':
18: plotFile = plotFile + '.png'
19:
20: # Get the last Saturday of the given month.
21: def lastSaturday(y, m):
22: return date(y, m, 1) + relativedelta(day=31, weekday=SA(-1))
23:
24: # Read the given data file and return the series. Also update the
25: # global variable lastYear to the last year in the data.
26: def getSeries(fname):
27: global lastYear
28: qmonths = {'Q1': 12, 'Q2': 3, 'Q3': 6, 'Q4': 9}
29: dates = []
30: sales = []
31: for line in open(fname):
32: quarter, units = line.strip().split('\t')
33: units = float(units)
34: year, q = quarter.split('-')
35: year = int(year)
36: month = qmonths[q]
37: if month == 12:
38: qend = lastSaturday(year-1, month)
39: else:
40: qend = lastSaturday(year, month)
41: if qend.year > lastYear:
42: lastYear = qend.year
43: dates.append(qend)
44: sales.append(units)
45: ma = [0]*len(sales)
46: for i in range(len(sales)):
47: lower = max(0, i-3)
48: chunk = sales[lower:i+1]
49: ma[i] = sum(chunk)/len(chunk)
50: return dates, sales, ma
51:
52: # Read in the data
53: macDates, macRaw, macMA = getSeries(macFile)
54: phoneDates, phoneRaw, phoneMA = getSeries(phoneFile)
55: padDates, padRaw, padMA = getSeries(padFile)
56:
57: # Tick marks and tick labels
58: y = mdates.YearLocator()
59: m = mdates.MonthLocator(bymonth=[1, 4, 7, 10])
60: yFmt = mdates.DateFormatter(' %Y')
61: ymajor = MultipleLocator(10)
62: yminor = MultipleLocator(2)
63:
64: # Plot the moving averages with major gridlines.
65: fig, ax = plt.subplots(figsize=(8,5))
66: ax.plot(macDates, macMA, 'g-', linewidth=4, label='Mac')
67: ax.plot(phoneDates, phoneMA, 'b-', linewidth=4, label='iPhone')
68: ax.plot(padDates, padMA, 'r-', linewidth=4, label='iPad')
69: ax.grid(linewidth=1, which='major', color='#dddddd', linestyle='-')
70:
71: # Set the upper limit to show all of the last year in the data set.
72: plt.xlim(xmax=date(lastYear, 12, 31))
73:
74: # Set the labels
75: plt.ylabel('Sales (millions)')
76: plt.xlabel('Calendar year')
77: t = plt.title('Four-quarter moving averages')
78: t.set_y(1.03)
79: ax.xaxis.set_major_locator(y)
80: ax.xaxis.set_minor_locator(m)
81: ax.xaxis.set_major_formatter(yFmt)
82: ax.yaxis.set_minor_locator(yminor)
83: ax.yaxis.set_major_locator(ymajor)
84: ax.set_axisbelow(True)
85: plt.legend(loc=(.13, .6))
86: fig.set_tight_layout({'pad': 1.5})
87:
88: # Save the plot file as a PNG.
89: plt.savefig(plotFile, format='png', dpi=150)
It’s a long one because I had very particular ideas on how the plot should look.
First, I wanted the time series to go by calendar years, not Apple’s dumbass fiscal quarters. So the getSeries
function, on Lines 26–50, had to translate dates given as strings in the form 2014-Q1
into real Python dates.
This leads to a question: with what date should the sales be associated? Since the sales represent an entire quarter, a good case could be made for plotting them at the midpoint of the quarter. On the other hand, sales are reported as of the endpoint of the quarter, so that’s also a reasonable choice. There’s little practical difference between the two, just a slight horizontal shift in all the data points. I decided to use the endpoint.
But Apple doesn’t end its fiscal quarters where you might expect. If you look through the quarterly reports, you’ll see that they end on the last Saturdays of March, June, September, and December. So I wrote the short lastSaturday
function to get that date for any given month. It’s a very short function because all the hard work is done by the relativedelta
type, imported from the dateutil
module. Was this really necessary? No, I could’ve just used the last day of the month and the plots wouldn’t have looked any different. But I thought it was worth getting some practice with relativedelta
. And I like doing date calculations.
The getSeries
function uses lastSaturday
to make the translation from, for example, 2014-Q1
to the Python form datetime.date(2013, 12, 28)
. You can see in Lines 28 and 36–40 how it recognizes that 2014-Q1
is really the last calendar quarter of 2013, and therefore the day it wants is the last Saturday of December 2013.
The moving average calculation in getSeries
is pretty simple. It starts by initializing ma
to a list of zeros in Line 45. It then marches through the sales
list, averaging the current and three (or fewer) previous values and stuffing the result into the corresponding spot in ma
.
One other thing getSeries
does is figure out the last calendar year for which there are sales figures and stores that in the global variable lastYear
. Using a global variable like this probably makes real programmers feel queasy, but since I learned to program in Fortran, this kind of dangerous crap is second nature to me.
The rest of the script is just wrestling with Matplotlib and its weird ideas about how plots should be specified. Whenever I work with Matplotlib, I’m reminded that although Perl’s motto is “There’s more than one way to do it” (TMTOWTDI), the Zen of Python says “There should be one—and preferably only one—obvious way to do it.” Matplotlib’s developers are apparently unfamiliar with the Zen of Python, as there are seemingly dozens of ways to do every goddamned thing in Matplotlib, and few of them are obvious.
Here are a few formatting items worth mentioning:
- If you’re going to use time series, you’ll probably need to import the
matplotlib.dates
submodule. I used itsYearLocator
andMonthLocator
methods in Lines 58–59 to set the major and minor tickmarks on the horizontal axis. - The
YearLocator
puts its ticks at January 1 for each year, and the default location for the tick label is centered under the tick. This is fine if you’re going to use a full date specification for the label (e.g., “Jan 1, 2014”), but not so good if you just want to use the year, as I did. Having “2014” appear directly under the tick (and grid line) that represents the boundary between 2013 and 2014 is just wrong, so I cheated by adding a bunch of spaces before the year in the format specification for the tick label in Line 60. This pushed the year over to the right, roughly centering it between the year boundaries. How did I know how many spaces to use? Trial and error. - To get the right edge of the plot to always end on a year boundary, regardless of the date associated with the last data point, I used the
lastYear
variable in Line 72 to set the upper limit on the horizontal axis to December 31 of that year. This was the whole point of creating thelastYear
variable. - I used the
set_tight_layout
method in Line 86 to give a little extra padding below the plot. Without it, the “Calendar year” label was in danger of getting clipped. - The overall size of the plot is governed by the
figsize
assignment in Line 65 and thedpi
assignment in Line 89. These two lines make the plot 1200×750. Why does it take two lines to set the size of the plot? Forget it Jake, it’s Matplotlib.
There is, admittedly, no reason for me to spend so much time on this plot. I’m not an Apple pundit, and no one comes here to see a summary of Apple’s quarterly earnings report. But I do use time series plots in my work, and having documented examples is very helpful in getting them made quickly.
And if you’re wondering why I don’t just use Numbers or Excel for plotting, the answers are easy. The plots from Numbers always look like cartoons to me, and I would never feel comfortable putting them in a professional report. Excel’s plots can be wrangled into a presentable form, and I spent years doing just that—I have no interest in doing it anymore. Even Matplotlib is less annoying to work with than Microsoft.