How I lost 50 lbs with the new Twitter API
April 10, 2009 at 12:05 AM by Dr. Drang
Twitter just changed its API, which screwed up my Twitter client, Dr. Twoot. I was forced to rethink and rewrite an important section of the code, which pissed me off. But now that I’ve done it—and the changes didn’t take that long to make—I think Dr. Twoot is better than it was: more fault-tolerant and in better shape to have new features added later. In this post, I’ll discuss the changes Twitter made to its API and how I fixed my code to adjust to those changes.
When I started up Dr. Twoot this morning, I was greeted with far more tweets than I’m used to. Instead of getting tweets from just the past 24 hours, I got about three days of my friends’ updates and replies to me that were months old. As I scrolled down to see the recent tweets, this one from Daniel Sandler caught my eye:
Hmm. Recent @twitterapi change (no more POST; only GET) deals brdfdr a critical hit. Healing now.
So I started looking at the new API. It turned out that the GET/POST change wasn’t the culprit in Dr. Twoot’s weird new behavior; it was Twitter’s decision to eliminate from the API a parameter that Dr. Twoot—like its predecesor, Twoot—relied on to set the maximum age of the tweets it downloads.
Interaction with the Twitter API is through HTTP queries. For example, to get the most recent 20 tweets in your friends’ timeline (what you see when you go to your Twitter home page) you send this request:
http://twitter.com/statuses/friends_timeline.json
and Twitter will respond with the set of 20 tweets in JSON format. Dr. Twoot downloads tweets by running a variant of this URL through jQuery’s getJSON
function. You can do the same thing from the command line via curl
. Open the Terminal and execute
curl -u name:password http://twitter.com/statuses/friends_timeline.xml
with name
and password
replaced by your Twitter login credentials, and you’ll see your Terminal window fill with that same data, this time in XML format.
You can alter the results by adding parameters to the end of the URL. For example,
curl -u name:password http://twitter.com/statuses/friends_timeline.xml?count=40
will get you 40 tweets instead of the default 20. And before today, you could do something like
curl -u name:password http://twitter.com/statuses/friends_timeline.xml?count=200&since=Wed%2C%2008%20Apr%202009%2010:00:00%20GMT
where that mess at the end is the URL encoded form of “Wed, 08 Apr 2009 10:00:00 GMT.” This would get the tweets since that time, up to a maximum of 200.
But no more. The since
parameter, which could be used with the friends’ timeline, the user’s timeline, and replies (now called “mentions”) was eliminated. If you issue that command today, you’ll get 200 tweets; the since
part is ignored.
Dr. Twoot used the since
parameter in two ways:
- On startup, it would request the friends’ timeline and the replies with a
since
set to 24 hours before the current time. - At every refresh (set to three-minute intervals by default) it would request the friends’ timeline and the replies with
since
set to the time of the last request.
Without since
, Dr. Twoot’s requests were no longer limited to tweets after a certain date and time, which is why it downloaded and displayed so many tweets when I started it this morning.
The closest analog to the since
parameter is since_id
. Executing something like
curl -u name:password http://twitter.com/statuses/friends_timeline.xml?count=200&since_id=1488170000
will get the tweets that came after the one with a message id of 1488170000, up to a maximum of 200.
[As near as I can tell, the message id is simply a counter. If so, Twitter has processed almost 1.5 billion tweets. Maybe the counter didn’t start at 1? Well, if you execute
curl http://twitter.com/statuses/show/86.xml
you’ll see a tweet by Biz Stone from March 21, 2006. For some reason, numbers lower than 86 return errors, but it’s pretty clear the message id started at or very near 1.]
To use since_id
instead of since
, I had to change the logic of Dr. Twoot.
- On startup, it gets the last 100 tweets from the friends’ timeline and gets all the mentions (née replies) since the earliest of those.
- At every refresh it gets the friends’ timeline and mentions since the latest tweet retrieved.
The code required to do this is a bit more involved than the code that used since
. This excerpt shows the new code.
8: // The initial update looks back COUNT updates in your friends' timeline. Must be <= 200.
9: var COUNT = 100;
10: // The id of the most recently retrieved update.
11: var LAST_UPDATE;
12: // The times, in milliseconds, between status refreshes and timestamp recalculations.
13: var REFRESH = 3*60*1000;
14: var RECALC = 60*1000;
15: // The id of the message you are replying to or retweeting.
16: var MSG_ID;
17: // The twitter URLs for getting tweets.
18: var BASE_URL = {'friends' : 'http://twitter.com/statuses/friends_timeline.json',
19: 'mentions': 'http://twitter.com/statuses/mentions.json',
20: 'directs' : 'http://twitter.com/direct_messages.json',
21: 'mine' : 'http://twitter.com/statuses/user_timeline.json'};
22:
23: $.fn.gettweets = function(){
24: return this.each(function(){
25: var list = $('ul.tweet_list').appendTo(this);
26: var friendsURL = BASE_URL['friends'] + '?count=' + COUNT;
27: var mentionsURL = BASE_URL['mentions'] + '?count=' + COUNT;
28: if (LAST_UPDATE != null) friendsURL += "&since_id=" + LAST_UPDATE;
29:
30: $.getJSON(friendsURL, function(friends){
31: if (LAST_UPDATE != null) mentionsURL += "&since_id=" + LAST_UPDATE;
32: else mentionsURL += "&since_id=" + friends[friends.length - 1].id;
33:
34: $.getJSON(mentionsURL, function(mentions){
35: friends = $.merge(friends, mentions);
36: if (friends.length > 0) LAST_UPDATE = friends[0].id;
The gettweets
function is called on startup and on each refresh. COUNT is the number of tweets initially downloaded; I’ve set it, somewhat arbitrarily, at 100. LAST_UPDATE holds the message id of the last successfully downloaded tweet. Because it has no initial value, the conditionals in Lines 28 and 31 are, in effect, tests for whether gettweets
is being called on startup or on a refresh.
You can get all the code at its GitHub repository. Don’t worry if you don’t use Git; you can simply download the latest version of the code as a zip file.
The new code works better when Twitter is acting flaky. (Twitter acting flaky? Surely I jest!) For example:
Say Dr. Twoot is on a schedule to refresh at 10:00, 10:03, 10:06, etc. And suppose Twitter fails to respond at the 10:03 refresh. Under the old logic, the 10:06 refresh would look back only as far as 10:03 and would miss tweets between 10:00 and 10:03. This was not simply a theoretical problem; it had happened many times, especially in the past few days, as the fail whale has been making regular appearances. Under the new logic, Dr. Twoot’s requests always go back to the last downloaded tweet—Twitter can fail many refreshes in a row, but when it finally comes back all the tweets since the last will be downloaded.
Another advantage of the new logic is that it should be easy to extend to include direct messages. The API for direct messages never included the since
parameter, which meant that mixing direct messages into the stream displayed by Dr. Twoot would be awkward—so awkward I didn’t even try it. But the direct message API does support the since_id
parameter, so the code for direct messages should parallel the existing code for the friends’ timeline and mentions. I don’t think I’m being unrealistic in believing that direct messages can be added with fewer than 20 lines of code.
In summary, then, my initial anger at the new Twitter API was misplaced. Now that I’ve used it, my code is better, my teeth whiter, my hair glossier, and my cholesterol lower.
Update 4/10/09
Looking through Twitter’s list of direct message elements, I see that the awkwardness of mixing direct messages with regular tweets comes not from the lack of a since
parameter (in fact, there may well have been a since
parameter for downloading direct messages—I can’t find an older version of the API to check), but from the difference in element names between direct messages and regular tweets. In particular, the sender of a regular tweet is identified by the user.id
element, whereas the sender of a direct message is identified by the sender_id
. Differences like this make it still too much of a hassle for me to bother incorporating direct messages into Dr. Twoot.
Since very few direct messages come my way (the last was about two months ago), and I get an email notification when they do, their absence makes little difference to me.