December 23, 2010 at 11:00 PM by Dr. Drang
Back in mid-October, Twitter warned developers that another integer overflow problem was looming and that they should prepare their code to deal with it. I didn’t heed the warning, and for the past couple of weeks, Dr. Twoot has been behaving strangely. I think I have things fixed now.
In which I get to the bottom of a longstanding Christmas mystery: http://xrl.us/bh9gtv
It has an ID of 8,903,561,186,381,824, just before the rollover. Here’s another from later that same day:
I don’t mind the traffic, I don’t mind the crowded stores, but I hate putting up outdoor Christmas lights.
Its ID is 9,012,004,878,557,184, just after the rollover.
Since that time, Twitter has implemented a new method of generating IDs called Snowflake, which isn’t sequential but does increase with later tweets, so sorting by ID still works as a chronological sort. This is important for Dr. Twoot, because it mixes mentions and the home timeline together and sorts them.
Interestingly, Dr. Twoot didn’t die when Twitpocalypse II2 hit, it just started acting flaky under one particular set of conditions. Dr. Twoot updates every three minutes. If the last tweet in an update was a retweet, and there were no tweets in the three-minute span between that update and the next, that retweet would appear again in the next update. At slack times, the retweet would get repeated several times.
I can’t say I know exactly why this happened, but I’m sure it has to do with the interaction of two facts:
- The ID of the retweet is larger than that of the original tweet.
- All the IDs are getting rounded off in the floating point representation.
I lived with the repeated retweets for a while, but eventually decided enough was enough and implemented the changes recommended by Twitter. All references in the code to a message’s
id property have been changed to
id_str, a string that contains the message ID. The biggest change was in the sorting. I could no longer use a numerical sort of the IDs (because they weren’t numbers anymore), and I couldn’t do a string sort, either, because there was no guarantee that the ID strings would all be the same length. So I wrote a comparison function that accounted for that possibility:
The new and improved Dr. Twoot seems to be working fine now. All the code can be found in its GitHub repository.
Sharp-eyed readers will see a couple of off-by-one errors in the post. The largest number that can be represented by n bits is 2n - 1, not 2n. The reason is zero. For example, three bits can represent eight numbers, but the numbers are 0, 1, 2, 3, 4, 5, 6, and 7. Eight doesn’t make the cut because zero pushes it out. When I’m just trying to figure out the order of magnitude, I don’t bother subtracting the one; but when I give a precise value, as I did in the post, I should.
If you spotted the mistake, reward yourself with an extra holiday cookie.
In my post about why Dr. Twoot survived the first Twitpocalypse, I said it was because Dr. Twoot treated IDs as strings instead of integers. That was both wrong and a bit of foreshadowing for Dr. Twoot’s recent problems. ↩︎
It took every bit of inner strength I could muster to avoid adding “Electric Boogaloo” to the title of this post. I hope you appreciate it. ↩︎