Not quite repeating calendar events

I have sleep apnea, and to reduce the hundreds of brief times that I would otherwise stop breathing every night, I use a CPAP machine to keep my air passages open while I sleep. There are certain parts of the machine that need to be rinsed daily, some that need to be rinsed weekly, and some that need to be replaced either every month (kind of) or every six months (kind of).

The daily schedule is easy to maintain; I take them into the bathroom with me when I wake up and rinse them out. The weekly schedule is easy, too. I do those parts every Sunday and have never had any trouble remembering. It’s the less frequent maintenance that I need help with, partly because the interval between actions is longer but mostly because of the “kind of” aspect to their schedule.

The “kind of” comes from how the replacement parts get to me. The supplier is coordinated with my doctor and insurance company to regularly get me the replacement parts I need. About four times a year, I get an automated call that asks me how I’m doing (fine) and whether I want all of the replacement parts I’m due (of course). About a week later the parts show up: three sets of filters and masks that get replaced approximately monthly, one hose that gets replaced approximately quarterly, and, with every other delivery, another hose and a reservoir that get replaced approximately biannually.

I was originally told the replacements would be done monthly, quarterly, and biannually, but I learned after the first year (which had some scheduling hiccups because the supplier didn’t have me properly entered in their database) that the deliveries were more like every 15 weeks and that I should replace the mask and filter every 5 weeks. I find a 5-week schedule hard to track without help, so I set up a recurring event in Calendar to remind me to replace whatever needs replacing on Sundays spaced 5 weeks apart.

That worked reasonably well, but unfortunately the deliveries aren’t consistent. Too often I find myself with an alert to change the filter and mask before a new set of filters and masks had arrived. Waiting a few extra days to do the replacement doesn’t bother me, but having to reset my recurring calendar event does. Yesterday, after another late delivery of supplies showed up at my door, I built a shortcut to create all the calendar events for the 15 (or so) week period between deliveries.

Here it is:

1 CPAP calendar Step 01 Use the current date as the default starting date, but let me choose another in case I run this a day or two later. The magic variable result is renamed to “First set.”
2 CPAP calendar Step 02 Create the calendar event for when the delivery came. I’ve shown this one in expanded view so you can see that it’s an all-day event in my home calendar. The others are the same except that I’ve added alerts to them. This one doesn’t need an alert because presumably I’ve already replaced the parts when I run the shortcut.
3 CPAP calendar Step 03 Calculate when I’ll need to replace the filter and mask. Rename this magic variable “Second set.”
4 CPAP calendar Step 04 Create a calendar event for replacing the filter and mask with the second set that came in the delivery.
5 CPAP calendar Step 05 Calculate when I’ll next need to replace the filter and mask. Rename this magic variable “Third set.”
6 CPAP calendar Step 06 Create a calendar event for replacing the filter and mask with the third and final set that came in the delivery.
7 CPAP calendar Step 07 Calculate when the next delivery should come. Rename this magic variable “Next delivery.”
8 CPAP calendar Step 08 Create a calendar event for the next delivery.

The idea is to run this shortcut on the day a delivery arrives and I make whatever replacements go with that delivery. It creates four calendar entries:

Whenever the next delivery comes, I’ll run the shortcut again, and the schedule for the next 15 weeks will be set. No need to dig into Calendar to delete or reset recurring events.

Small multiples and normalization

In my last post, I mentioned Kieran Healy’s article on Apple sales data and how he created graphs that showed a cyclic (or seasonal) component on top of an overall trend. Yesterday, he did it again, although this time the seasonal aspects were only a minor part of the article and he worked with a different Apple time series.

The time series was what Apple is calling its Mobility Trends, a set of data that summarizes requests for directions in Apple Maps. The data are categorized by country, by city, and by type of direction type (driving, walking, and transit). They start on January 13 of this year—more about that later—and are continually updated. You can graph certain data directly on the Mobility Trends site or download the data set as a CSV file and play with it on your own.

Apple’s primary goal is to show how requests for directions have changed as people respond to the COVID-19 pandemic. To focus on the change rather than the gross number of requests, Apple has normalized all the data. The number of requests for each country, city, and type of direction request has been set to 100 for January 13 and all subsequent data are relative to that starting point. A 50, for example, for the Chicago walking directions on April 1 would mean that there were half as many requests for walking directions in Chicago on that day as there were on January 13.

My favorite part of Kieran’s post is this small multiples chart that plots all 89 cities in the data set, with the walking, driving, and transit direction requests shown in black, yellow-orange, and light blue, respectively.

Healy small multiples

I’ve always thought the small in small multiples means the number of subplots; here it’s better applied to the size of the subplots. Despite the size of the subplots, arraying them this way allows you to pick out things that you might not otherwise see.

In his post, Kieran talks about the obvious Mardi Gras spike in Rio de Janeiro and how that led him to question the pre-Mardi Gras spikes in several French and German cities. The likely answer is both funny and a lesson in data analysis; I won’t spoil it here.

What jumped out at me when looking at the small multiples was Seattle’s transit data. Seattle is along the left edge, third up from the bottom, and you can see the blue line (transit) running below the other two for the entire length of the graph. Having the transit line below the others in late March and April is relatively common, but for it to be distinctly lower all the way back to mid-January is unique. What makes Seattle different in this way?

To answer the question, let’s look at the Seattle data by itself.

Seattle data from Apple

The vertical gridlines are positioned at every Monday to make the weekly cycle easier to see. Two things are clear:

  1. The number of requests for transit directions on January 13 was particularly high for a Monday and was near the peak for all the days in the data set.
  2. The numbers of requests for walking and driving directions on January 13 were particularly low for a Monday and were near the bottom for all the days before social distancing kicked in.

These coinciding oddities conspired to push the transit line down from the walking and driving lines. When viewed this way, it becomes clear that setting the January 13 data to 100 was probably not Apple’s best idea—not for Seattle, certainly, and not for the other cities, either, because they all show weekly cycles. Normalizing the data to a Monday implies that Mondays are the norm, even though they clearly are not. Seattle’s data stood out from the other cities only because the particular Monday Apple chose to normalize to happened to be an unusual one there.

One effect of this normalization choice is to make the recent walking and driving requests in Seattle look higher than they should. Apple’s scores suggest that they are currently averaging 50–65% of what they were pre-COVID, but those are artificially high numbers because the norm was set artificially low.

A better way to normalize the data would be to take a week’s average, or a few weeks’ average, before social distancing and scale all the data with that set to 100. If we do that, the plot shifts to look like this:

Seattle data rescaled

Now the early data are cycling about 100, which makes more sense if we think of 100 as an average day, and the walking and driving requests are seen to be running currently in the 35–50% range. The transit requests are slightly higher in this graph, but the difference isn’t easy to see because it’s only about two percentage points.

To be specific, what I did to make this plot was get the average of each set of requests over the first four weeks of data and scale all the data by dividing by those averages. The averages were:

Type Average
Walking 128.6
Driving 135.0
Transit 90.4

So you can see why the walking and driving lines were reduced by about 25% and the transit line was lifted by about 10%.

If you look back at the small multiples, you can see that rescaling Apple’s numbers according to an initial 3- or 4-week average would probably be reasonable for most of the cities but certainly not all of them. Seoul, for example, which is sitting right next to Seattle, was already starting its social distancing in January, so it would be hard to get a pre-COVID average there.

The particular oddity of January 13 as a starting point for Seattle reminds me of how climate change deniers used to make global temperature plots that starting in 1998 because that was a strong El Niño year. For a while, at least, subsequent years appeared cooler,1 and they would argue that their plots were proof that global warming was a hoax. In this case, of course, there’s nothing dishonest in Apple’s numbers; it’s just that they could have been scaled in a more useful way.

If you’re interested in how I made my plots, it was pretty simple. I downloaded Apple’s CSV file, opened it in Numbers, deleted all the non-Seattle data, transposed the sheet (Apple had the dates in columns; I wanted them in rows), and exported a new CSV that looked like this for its first several lines:


If I intended to do more with Apple’s data, I’d have written a script for some of this. For a one-off, it was faster to all the data prep by hand.

With the data in the shape I wanted, I wrote this script to make the two plots:

 1:  import pandas as pd
 2:  import numpy as np
 3:  import matplotlib.pyplot as plt
 4:  import matplotlib.dates as mdates
 5:  from matplotlib.ticker import MultipleLocator
 6:  from pandas.plotting import register_matplotlib_converters
 8:  # Import the data
 9:  register_matplotlib_converters()
10:  df = pd.read_csv('seattle.csv', parse_dates=['Date'])
12:  # Get the 4-week  means for each type
13:  means = {}
14:  for t in ('Driving', 'Transit', 'Walking'):
15:    means[t] = np.mean(df[t][:28])
16:  print(means)
18:  # Rescale the data
19:  for t in ('Driving', 'Transit', 'Walking'):
20:    df[t + 'Scale'] = df[t]/means[t]*100
22:  # Set the date ticks
23:  othermondays = mdates.WeekdayLocator(byweekday=mdates.MO, interval=2)
24:  mondays = mdates.WeekdayLocator(byweekday=mdates.MO, interval=1)
26:  # Plot the rescaled data
27:  fig, ax = plt.subplots(figsize=(8, 5))
28:  ax.plot(df.Date, df.WalkingScale, '-', lw=2, color='#000000', label='Walking')
29:  ax.plot(df.Date, df.DrivingScale, '-', lw=2, color='#dba237', label='Driving')
30:  ax.plot(df.Date, df.TransitScale, '-', lw=2, color='#306fac', label='Transit')
32:  ax.xaxis.set_major_locator(othermondays)
33:  ax.xaxis.set_minor_locator(mondays)
34:  ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %-d'))
35:  ax.set_ylim(0, 200)
36:  ax.grid(linewidth=.5, which='major', color='#dddddd', linestyle='-')
37:  ax.legend()
39:  plt.savefig('20200423-Seattle data rescaled.png', format='png', dpi=150)
41:  # Plot the Apple data
42:  fig, ax = plt.subplots(figsize=(8, 5))
43:  ax.plot(df.Date, df.Walking, '-', lw=2, color='#000000', label='Walking')
44:  ax.plot(df.Date, df.Driving, '-', lw=2, color='#d95f02', label='Driving')
45:  ax.plot(df.Date, df.Transit, '-', lw=2, color='#7570b3', label='Transit')
47:  ax.xaxis.set_major_locator(othermondays)
48:  ax.xaxis.set_minor_locator(mondays)
49:  ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %-d'))
50:  ax.set_ylim(0, 200)
51:  ax.grid(linewidth=.5, which='major', color='#dddddd', linestyle='-')
52:  ax.legend()
54:  plt.savefig('20200423-Seattle data from Apple.png', format='png', dpi=150)

Nothing special here. I got the colors for the driving and transit plots by importing Kieran’s chart into a graphics program and using the eyedropper tool.

I can’t say I learned anything new in this exercise, but it did reinforce some things that are easy to forget. First, the value of small multiples in seeing patterns and deviations from patterns. And second, that some normalizations are more useful than others.

Update Apr 29, 2020 1:16 PM 
Apple has added several dozen cities to its data set, and Prof. Healy has found that January 13 was also a highly unrepresentative normalization date for New Orleans. Remember college football?

  1. There was some dishonesty in that, too. 

The Tuesday jump

Today, like every Tuesday recently, I looked at my updated graph of COVID-19 data for the US, and saw that the number of deaths over the past day had jumped up.

Tuesday jumps

These plots are for the daily figures, and you can see by the arrows in the top subplot that there’s been a distinct jump every Tuesday since late March. Why is this?

My strong suspicion is that this is due to the weekly work schedule of the people who report the figures the COVID Tracking Project compiles. Deaths over the weekend probably don’t get reported in full and the backlog of paperwork doesn’t get finished until Tuesday. And it may be due specifically to the weekly schedule of health officials in New York, as it’s New York’s numbers that dominate the country’s totals.

Update Apr 26, 2020 9:35 PM 
Nope. A review of the figures from individual states shows that New Jersey is primarily responsible for the Tuesday jump, with Massachusetts also contributing.

There appear to be cyclic components superimposed on the overall trends in the positive and completed test graphs, too. If you’re interested in teasing out the details, I suggest you look at this post by Kieran Healy from about five years ago in which he analyzed the cyclic aspects of Apple sales figures. If you’re really interested in the topic, William Cleveland’s Visualizing Data is probably the best source for the underlying ideas. Kieran’s own Data Visualization is also really good, but I’m pretty sure he doesn’t get into the cyclic stuff in that book.

Update Apr 23, 2020 1:27 PM 
Kieran has a new post with cyclic data from Apple. This time it’s the mobility data Apple recently published that shows, among other things, the change in requests for directions in Maps since January. I think my favorite part of his post is the small multiples chart showing the mobility trends in all the cities in Apple’s dataset. The subplots are really small and there are many multiples.

Half an ounce

This post is my victory lap. A couple of weeks ago, I predicted, using some very high level analysis and mathematics you probably didn’t understand, that the Magic Keyboard for the 12.9″ iPad would weigh in at 684 g. Today, Matt Panzarino published a review of the Magic Keyboard in which he said the 12.9″ version weighs… 700 g.

I was just half an ounce off. Less than one-sixth of a newton. About 2%. That’s top-shelf punditry.

You might point out that my methods weren’t serious. You might also note that immediately after making my prediction I suggested that 684 g might be too high. Honestly, that’s beneath you. Let’s not bicker and argue. This is supposed to be a happy occasion.

Update Apr 20, 2020 7:46 PM 
Well, well, well! My eerie accuracy has been noted at the highest levels.