July 6, 2012 at 12:52 AM by Dr. Drang
If I were a little less anally retentive, it wouldn’t have bothered me that the timestamps in the tweet archive I generated from ThinkUp were off. But I am and it did. Some of the timestamps were right, some were off by one hour, some by five hours, and some by six hours. It drove me crazy—not just to get the timestamps right, but to figure out why they were wrong.
Before we get into that, though, let me offer some advice to those of you who want to extract your old tweets from ThinkUp and put them in a form that’s compatible with future archiving via hugovk’s IFTTT recipe. If you’ve installed ThinkUp in the past year or so—after August or September of 2011—chances are that the CSV file you get when you export your tweets from TU to your local computer will have the tweet timestamps set to the UTC time zone. If this is the case, use Jan Marcel Gentil’s (@jgmarcel’s) modification of my
tu2ifttt script. It’ll change the timestamps in the output of that script to reflect the time zone of your local computer (which is, presumably, the same time zone as the location given in your Twitter profile).
Unfortunately, I can’t use a script quite that simple to fix the timestamps in my tweet archive. Because I’ve been using ThinkUp since the spring of 2011, and ThinkUp has gone through some changes in how it handles timestamps since then, my tweet archive can be divided into three regimes:
- Tweets posted before March 13, 2011 use the time zone of the server that hosts my copy of ThinkUp. This happens to be US/Eastern, close to my home timezone, but not quite.
- Tweets posted on or after March 13 but before September 1, 2011 use my home time zone, US/Central.
- Tweets posted on or after September 1, 2011 use UTC.
How did I learn this? Well, it was obvious to me that the timestamps of the earliest tweets in my archive were either correct or within an hour of being correct.1 Probably the best example was this tweet,
Starting the TiVo-delayed Oscars now. Entering Twitter silence.
— Dr. Drang (@drdrang) Sun Feb 22 2009
which had a timestamp of “2009-02-22 21:25:38” in my ThinkUp archive. Since the 2009 Oscars were broadcast during the evening of February 22 in the US, but in the early morning hours of February 23, UTC, this timestamp had to be for Central or Eastern time.
Since I almost never tweet between 1:00 am and 6:00 am, I searched through the ThinkUp archive—by which I mean the CSV export file—for the regex pattern
which would give me timestamps purportedly in the wee hours. Because my wife and I have gone on a few overnight bicycle rides, some of the hits on this search were for tweets that really were posted at that time. But I soon found some that must have been posted at a significantly different time and then narrowed down the change to UTC timestamps to September 1, 2011. I assume this is the day I installed a updated version of ThinkUp.
The change from Eastern to Central wasn’t as easy to find, but I did a kind of divide-and-conquer search on the tweets before September of last year, comparing a tweet’s time in the ThinkUp archive to that shown by Twitter when I went to that tweet’s page. It didn’t take more than 5-10 minutes to find that this
I confess I haven’t been paying attention to the NCAA this year, but what’s the deal with play-in games at the 11th and 12 seeds?
— Dr. Drang (@drdrang) Sun Mar 13 2011
was the first tweet to have a US/Central timestamp in the archive, and that all previous tweets were US/Eastern.
Knowing the three timezones used in the archive, and knowing the ID numbers of the tweets at the changeovers, I was able to modify my
tu2ifttt script to output a plain text version of all the tweets in the ThinkUp archive with the timestamps changed to reflect the US/Central time zone. Here’s the new version of the script:
python: 1: #!/usr/bin/python 2: 3: import csv 4: import os 5: from datetime import datetime 6: import sys 7: import pytz 8: 9: # Twitter username here. 10: me = "drdrang" 11: 12: # My ThinkUp archive has three regimes: 13: # 14: # 1. Tweets with the timestamp of the server running ThinkUp (US/Eastern). 15: # 2. Tweets with the timestamp of my home (US/Central). 16: # 3. Tweets with a UTC timestamp. 17: # 18: # With a little digging, I was able to determine the boundaries between 19: # these regimes. With that information, I can adjust the times so that 20: # all the output tweets will be set to US/Central. 21: 22: serverTZ = pytz.timezone('US/Eastern') 23: homeTZ = pytz.timezone('US/Central') 24: utc = pytz.utc 25: firstHome = 46785535590678528 26: firstUTC = 109318305277427712 27: 28: # Archive format. 29: single = "%s\n%s\nhttp://twitter.com/" + me + "/status/%s" 30: 31: # Open the CSV file specified on the command line and read the field names. 32: tfile = open(sys.argv) 33: treader = csv.reader(tfile) 34: fields = treader.next() 35: 36: # Fill a list with the tweets, with each tweet a dictionary. 37: allInfo =  38: for row in treader: 39: allInfo.append(dict(zip(fields,row))) 40: 41: # Collect only the info we need in a list of lists. Convert the date string 42: # into a datetime object that's time zone aware. 43: tweets =  44: for tw in allInfo: 45: if int(tw['post_id']) < firstHome: 46: t = serverTZ.localize(datetime.strptime(tw['pub_date'], \ 47: "%Y-%m-%d %H:%M:%S")).astimezone(homeTZ) 48: elif int(tw['post_id']) < firstUTC: 49: t = homeTZ.localize(datetime.strptime(tw['pub_date'], \ 50: "%Y-%m-%d %H:%M:%S")) 51: else: 52: t = utc.localize(datetime.strptime(tw['pub_date'], \ 53: "%Y-%m-%d %H:%M:%S")).astimezone(homeTZ) 54: tweets.append([t, tw['post_id'], tw['post_text']]) 55: 56: # We put the date first so we can sort on it easily. 57: tweets.sort() 58: 59: # Construct a new list of tweets formatted the way the IFTTT recipe does. 60: out = [ single % \ 61: (x, x.strftime("%B %d, %Y at %I:%M%p"), x) \ 62: for x in tweets ] 63: 64: print '\n- - - - -\n\n'.join(out) 65: print '- - - - -'
The time zone conversions were done using
pytz, a non-standard but very useful library for dealing with dates and times in multiple time zones. If you program in Python and need to deal with time zones, install
The big changes are in Lines 12-26, which set out the different zones we’ll encounter and where they switch over, and Lines 41-54, which do the conversions. The script runs slower than it did before because I exchanged a simple list comprehension for the loop and branching in Lines 44-54. Now it takes a couple of seconds to run instead of finishing in the blink of an eye.
I used the output of this script acting on an up-to-date ThinkUp archive to replace my
~/Dropbox/ifttt/twitter/twitter.txt file, and now the IFTTT recipe is archiving newer tweets to that file. All the timestamps are consistent and all’s right with the world.
Before closing, I should mention this script by Chris Kinniburgh (@CKinniburgh). If you don’t have a ThinkUp archive to convert into a simple text file, his script will create the simple text file directly. Be aware of
four three things before you use it:
- It’s output is in a Markdown-flavored format that’s similar to, but not the same as Brett Terpstra’s IFTTT recipe.
- It may take forever to run because Twitter allows only 200 tweets to be downloaded in a single request, and it limits the number of requests you can make per hour.
- Regardless of how long it the script runs, Twitter won’t allow it (or any script, including ThinkUp) to download more than the 3200 most recent tweets.
It doesn’t appear to do any time zone conversion. (I’m not a Ruby programmer, so maybe I missed something. If so, let me know, and I’ll update this.) All the tweets it archives will have UTC timestamps.
Chris’s script didn’t convert from UTC to local time initially, but now it does.
It’s a good thing I’ve lived in the same house throughout the time I’ve been on Twitter. If I had moved from one time zone to another, I probably would have felt obligated to make yet another adjustment to the timestamps.
In fact, I initially thought I didn’t have to correct the timestamps because I looked at the first dozen or so and it was obvious to me that they were US times, not UTC. ↩
Unlike some in the Python community, I don’t give a rat’s ass how you install it. You like
easy_install? Fine with me.
pip? That’s great.
setup.py install? Go for it. In other words, if you think there’s One True Installer, post about it on your own blog. I will ruthlessly delete any comment prescribing “the best” Python library installer. And that goes for Ruby people who want to tell me about
gem, too. ↩