January 8, 2009 at 5:23 PM by Dr. Drang
Because I was away from decent internet access for a few days, I wasn’t able to comment (except in a tweet) on the leap year bug that bit Zune owners on New Year’s Eve. The bug is interesting to me because it’s in a simple bit of code that would have been bug-free had it been even simpler.
First, let’s talk about what happened. On December 31, Zune owners found their little music players freezing during startup. Microsoft soon issued what was usually reported as a “fix.” In truth, it wasn’t so much a fix as it was a workaround—the bug is still there, waiting to strike any Zune 30 still in use on December 31, 2012.
The bug is in the Zune’s low level clock software, apparently written by Freescale Semiconductor for its hardware. The full C source code is here, and the relevant buggy part is isolated here. Translated to Python, the code looks like this:
1: #!/usr/bin/python 2: 3: def isLeapYear(y): 4: leap = False 5: if y % 4 == 0: 6: leap = True; 7: if y % 100 == 0: 8: leap = True if y % 400 == 0 else False 9: return leap 10: 11: def getYear(days): 12: year = 1980 13: while days > 365: 14: if isLeapYear(year): 15: if days > 366: 16: days -= 366 17: year += 1 18: else: 19: days -=365 20: year += 1 21: return year 22: 23: # We'll test on: 24: # Jan 01, 1980 25: # Dec 30, 1980 26: # Jan 01, 1981 27: # Dec 31, 1981 28: # Jan 01, 1982 29: # Jan 01, 2009 30: # Dec 31, 1980 31: # Dec 31, 2008 32: testDays = [1, 365, 366+1, 366+365, 366+365+1, 10593+1, 366, 10593] 33: for d in testDays: 34: print "%5d: %4d" %(d, getYear(d)
I’ve added some tests at the end of the code, so you can play with it and run it yourself. I figured Python would be easier to run, since you don’t have to compile it.
In a nutshell, the Zune’s clock keeps track of time through two integers: one counts seconds—which we’re not interested in here—and the other counts days from January 1, 1980. The day counter is fed to the
getYear routine, which is supposed to return the year. It’s the
getYear function (which is actually part of a larger subroutine in the original code; I’ve put the part that calculates the year into its own function because that’s the part with the bug) that caused the Zunes to freeze.
The problem is the
while loop that starts on Line 13. The loop is supposed to keep decrementing the
days variable by 365 (or 366 for leap years) until less than a year’s worth of days are left. With each decrement in days, the year counter is incremented by one. Unfortunately, the condition that controls the loop assumes that a year has 365 days, which is obviously not true in leap years. On December 31, 2008—day 366 of that year—the loop ran as usual until the value of
days was decremented to 366 and
year was incremented to 2008. At that point, the loop just kept cycling. The
days variable stayed at 366 and the
year stayed at 2008 because:
- the condition in Line 13 remained true, keeping us in the
- the condition in Line 14 remained true, keeping us in that
- the condition in Line 15 remained false, keeping us out of that
ifblock and cycling back to Line 13.
It was this infinite loop that froze the Zunes. If the Zunes had existed on December 31, 2004, it would have happened then. If any of these Zunes are still working on December 31, 2012, it will happen again. Microsoft’s “fix” was to tell its customers to let the battery run down until the Zune shut itself off, then recharge and restart it after the new year. With the hardware-supplied
days counter at a different value, the infinite loop was avoided.
If you run the program as I’ve presented it above, it will go into an infinite loop when it tries to determine the year for Day 366. You’ll have to kill the process some other way (Control-C on Unix-y machines) to get it to stop.
The Zuneboards page has some comments in which it is suggested that the loop can be tweaked with a
break statement to get out of the loop on the last day of a leap year. While that might work, it adds more complexity to the algorithm and makes it harder to follow. A better solution would be to simplify the algorithm like so:
1: #!/usr/bin/python 2: 3: def daysInYear(y): 4: if (y % 4 == 0 and y % 100 != 0) or y % 400 == 0: 5: return 366 6: else: 7: return 365 8: 9: def getYear(days): 10: year = 1980 11: days -= daysInYear(year) 12: while days > 0: 13: year += 1 14: days -= daysInYear(year) 15: return year 16: 17: # We'll test on: 18: # Jan 01, 1980 19: # Dec 30, 1980 20: # Jan 01, 1981 21: # Dec 31, 1981 22: # Jan 01, 1982 23: # Jan 01, 2009 24: # Dec 31, 1980 25: # Dec 31, 2008 26: testDays = [1, 365, 366+1, 366+365, 366+365+1, 10593+1, 366, 10593] 27: for d in testDays: 28: print "%5d: %4d" %(d, getYear(d))
Now the loop in
getYear stops when the
days counter is exhausted (zero or negative) by the previous cycle (or by Line 11 if it’s 1980). This loop will always stop because there are no conditionals within it—
days is decremented every cycle and will eventually be zero or negative.
Instead of an
isLeapYear function, we now have a
daysInYear function, which goes through basically the same logic, but returns 366 or 365 instead of True or False. (I’ve also taken the liberty of rewriting the leap year logic to a more common and easily read form—a Python version of the leap year code found in Kernighan and Ritchie. Why a C programmer wouldn’t use the well-tested K&R code is beyond me.)
If you run this script, you’ll find that it has no trouble with the last day of leap years. I also think it’s easier to read than the original.
There’s nothing especially brilliant about this corrected code. I’m simply following the ideas laid out by Nachum Dershowitz and Edward M. Reingold in their well known “Calendrical Calculations” papers, books, and software. They use a similar day counter, except that it starts on January 1 of the Year 1 instead of 19801. And instead of cycling through every year from Year 1, they
- make a quick (under)estimate of the year by doing an integer division of the day number by 366, and
- cycle forward from that estimate to get the precise year value.
Step 1 isn’t necessary; it just speeds things up by eliminating about 2000 cycles for dates near the present. To see exactly how they do it, go to the software link and look for the
gregorian-from-absolute function. How’s your Lisp?
The Freescale programmers made two mistakes that even amateurs like me should be embarrassed by. First, they didn’t bother to discover that there were well known, public domain solutions for their problem. Second, they didn’t test their code. Clearly, the dates on which edge case problems will arise are going to be the first and last days of leap and non-leap years. Had they run a test case against the last day of any leap year, they would have found the bug immediately.
Yes, of course they know there was no Gregorian calendar in the Year 1. They use a “proleptic” Gregorian calendar, calculating dates as if the Gregorian calendar were in existence throughout the Common Era. Read their work to see how they handle it. ↩︎