Dr. Twoot and Twitpocalypse II

Back in mid-October, Twitter warned developers that another integer overflow problem was looming and that they should prepare their code to deal with it. I didn’t heed the warning, and for the past couple of weeks, Dr. Twoot has been behaving strangely. I think I have things fixed now.

The original Twitpocalypse occurred back in 2009, when Twitter’s message ID numbers (the sequential number that uniquely identifies each tweet) first exceeded 231 (2,147,483,648), the largest number a signed 32-bit integer can handle. Several Twitter clients failed back then, but Dr. Twoot handled it with aplomb, because JavaScript uses more than 32 bits.1

But sometime in the past few weeks, the message IDs exceeded JavaScript’s integer limit. Javascript uses a 64-bit floating point representation for all numbers. The mantissa is 53 bits long—52 bits of explicitly stored mantissa plus an implicit leading digit of 1—so the largest integer it can accurately represent is 253 (9,007,199,254,740,992). That’s the number that got rolled over in late November. Here’s a tweet of mine from November 28:

In which I get to the bottom of a longstanding Christmas mystery: http://xrl.us/bh9gtv

9:22 AM Sun Nov 28, 2010

It has an ID of 8,903,561,186,381,824, just before the rollover. Here’s another from later that same day:

I don’t mind the traffic, I don’t mind the crowded stores, but I hate putting up outdoor Christmas lights.

4:33 PM Sun Nov 28, 2010

Its ID is 9,012,004,878,557,184, just after the rollover.

Since that time, Twitter has implemented a new method of generating IDs called Snowflake, which isn’t sequential but does increase with later tweets, so sorting by ID still works as a chronological sort. This is important for Dr. Twoot, because it mixes mentions and the home timeline together and sorts them.

Interestingly, Dr. Twoot didn’t die when Twitpocalypse II2 hit, it just started acting flaky under one particular set of conditions. Dr. Twoot updates every three minutes. If the last tweet in an update was a retweet, and there were no tweets in the three-minute span between that update and the next, that retweet would appear again in the next update. At slack times, the retweet would get repeated several times.

Repeated retweets

I can’t say I know exactly why this happened, but I’m sure it has to do with the interaction of two facts:

  1. The ID of the retweet is larger than that of the original tweet.
  2. All the IDs are getting rounded off in the floating point representation.

I lived with the repeated retweets for a while, but eventually decided enough was enough and implemented the changes recommended by Twitter. All references in the code to a message’s id property have been changed to id_str, a string that contains the message ID. The biggest change was in the sorting. I could no longer use a numerical sort of the IDs (because they weren’t numbers anymore), and I couldn’t do a string sort, either, because there was no guarantee that the ID strings would all be the same length. So I wrote a comparison function that accounted for that possibility:

54:  // Compare two numeric string, returning -1, 0, or 1. To be used for sorting message IDs.
55:  // We can't just subract one ID from the other because the ID numbers have grown
56:  // beyond JavaScripts's ability to parse them. So we have to do a string comparison
57:  // that accounts for the possibility that the strings are of different length.
58:  function cmpID (a, b) {
59:    if (a.length < b.length) return -1;
60:    else if (a.length > b.length) return 1;
61:    else return a.localeCompare(b);
62:  }

The new and improved Dr. Twoot seems to be working fine now. All the code can be found in its GitHub repository.

Update 12/24/10
Sharp-eyed readers will see a couple of off-by-one errors in the post. The largest number that can be represented by n bits is 2n - 1, not 2n. The reason is zero. For example, three bits can represent eight numbers, but the numbers are 0, 1, 2, 3, 4, 5, 6, and 7. Eight doesn’t make the cut because zero pushes it out. When I’m just trying to figure out the order of magnitude, I don’t bother subtracting the one; but when I give a precise value, as I did in the post, I should.

If you spotted the mistake, reward yourself with an extra holiday cookie.

  1. In my post about why Dr. Twoot survived the first Twitpocalypse, I said it was because Dr. Twoot treated IDs as strings instead of integers. That was both wrong and a bit of foreshadowing for Dr. Twoot’s recent problems. 

  2. It took every bit of inner strength I could muster to avoid adding “Electric Boogaloo” to the title of this post. I hope you appreciate it.