Archiving tweets

Last night, Justin Blanton wrote about this interesting IFTTT recipe that archives your tweets in a plain text file in Dropbox. This morning, people began tweeting about it as they realized how nice it would be to have an archive like that sitting on your computer, always up to date within a few minutes. Brett Terpstra wrote a variation on the recipe that put the entries into Markdown format. I wrote a little script to convert my ThinkUp archive into the format the IFTTT recipe uses, and now I have about 6,500 of my tweets in a Dropbox file.

I’ve been using Gina Trapani’s ThinkUp for quite a while to archive my tweets. It’s been rock solid for me, but I’ve been thinking about using a different method because

  1. It’s overkill. I understand that many people need the statistics it provides, but I really just need a tweet archive.
  2. It’s inconvenient for searching. Log in, click to go to Tweets page, click to go to More page, click to go to Search. It would be far easier to search by grepping or acking a local file on the computer in front of me.
  3. It stores the tweets in a database rather than a plain text file. For ThinkUp’s purposes, a database is probably necessary, but for my limited needs, a text file is preferable because it’s more portable and easier to understand.

I’d been thinking about writing a script, to be run periodically by launchd, that would go get my recent tweets from Twitter and append them to a text file. I know how to write such a script—back in 2008, I wrote a script to grab my tweets from the previous day and post them here.1 A script that just appended the tweets to a file should be even easier. But I started thinking about things like what sort of information should be included in the file and what the best format would be. The perfect became the enemy of the good, and I never wrote the script.

When I saw the IFTTT recipe for archiving tweets—which was, by the way, written by Hugo (@hugovk on Twitter), not by Justin Blanton as several people think—I saw that it had all I really needed. The format wasn’t what I would have chosen, but so what? It’s clear and should be easy to parse. Here’s what the tail of my archive looks like now:

I believe @jblanton owes @ttscoff one (1) full day of productivity.
July 03, 2012 at 03:22PM
http://twitter.com/drdrang/status/220251096495570945
- - - - - 

@ttscoff @jaheppler If you can install @thinkup, it’ll go back and pull old tweets into a database.
July 03, 2012 at 03:27PM
http://twitter.com/drdrang/status/220252567698018304
- - - - - 

@thanland @jaheppler Yep. About 1,000 of my pearls of wisdom are lost to me forever. Oh, the humanity!
July 03, 2012 at 03:33PM
http://twitter.com/drdrang/status/220253916225474560
- - - - - 

@BenjaminBrooks @gruber I had 9 SunOS visitors. According to Google Analytics, every one of them had a beard longer than @jdalrymple’s.
July 03, 2012 at 04:10PM
http://twitter.com/drdrang/status/220263296400498691
- - - - - 

The one problem with the IFTTT recipe is that it’s prospective only; it won’t go back and grab the tweets you posted before activating the recipe. But since I had most of my tweets2 in ThinkUp, all I needed to do was get the tweets out of its database and into a text file in the right format.

That was actually pretty easy because I’d done a similar thing before. ThinkUp has a command that will export your tweets to your local hard disk in CSV format.

ThinkUp tweet export

A CSV library is part of the standard Python distribution, so extracting the desired information and formatting it didn’t take much programming:

python:
 1:  #!/usr/bin/python
 2:  
 3:  import csv
 4:  import os
 5:  from datetime import datetime
 6:  import sys
 7:  
 8:  # Put your Twitter username here.
 9:  me = "drdrang"
10:  
11:  # Archive format.
12:  single = "%s\n%s\nhttp://twitter.com/" + me + "/status/%s"
13:  
14:  # Open the CSV file specified on the command line and read the field names.
15:  tfile = open(sys.argv[1])
16:  treader = csv.reader(tfile)
17:  fields = treader.next()
18:  
19:  # Fill a list with the tweets, with each tweet a dictionary.
20:  allInfo = []
21:  for row in treader:
22:    allInfo.append(dict(zip(fields,row)))
23:  
24:  # Collect only the info we need in a list of lists. Convert the date string
25:  # into a datetime object.
26:  tweets = [ [datetime.strptime(x['pub_date'], "%Y-%m-%d %H:%M:%S"), \
27:              x['post_id'], x['post_text']] \
28:              for x in allInfo ]
29:  
30:  # We put the date first so we can sort by date easily.
31:  tweets.sort()
32:  
33:  # Construct a new list of tweets formatted the way the IFTTT recipe does.
34:  out = [ single % \
35:          (x[2], x[0].strftime("%B %d, %Y at %I:%M%p"), x[1]) \
36:          for x in tweets ]
37:  
38:  print '\n- - - - -\n\n'.join(out)
40:  print '\n- - - - -'

Update 7/4/12
The original version of this script had a bug on Line 35. I had the hour code as %H (24-hour clock) instead of %I (12-hour clock) as IFTTT uses. If you used the original, as I did, your evening tweets will have stupid timestamps like “18:08PM.”

To fix this:

  1. Run the new version of the script without piping it to pbcopy. Note the last tweet.
  2. Open your ~/Dropbox/ifttt/twitter/twitter.txt file and delete all the tweets from the beginning through the one you just noted.
  3. Rerun the new version of the script and pipe the result to pbcopy.
  4. Paste the fixed tweets at the beginning of your ~/Dropbox/ifttt/twitter/twitter.txt file.

Sorry about the extra work this mistake caused.

The script, called tu2ifttt, expects the exported CSV file to be its argument, and it prints the transformed (and much simplified) archive to standard out. I piped the output to the clipboard,

python tu2ifttt ~/Downloads/posts-drdrang-twitter.csv | pbcopy

and pasted it at the beginning of the twitter.txt file in my Dropbox folder that the IFTTT recipe had created.3 I may have needed to add an extra empty line between the old tweets I’d just pasted in and the newer ones that IFTTT had archived.

If you have a ThinkUp archive and would like to add your old tweets to your new IFTTT archive, just change the drdrang in Line 9 of the script to your username and run the pipeline on your downloaded CSV file. You’ll have all your old tweets on the clipboard, ready for pasting.


So now I have a 1.2 MB file in my Dropbox folder that contains over 6,500 tweets of mine. I can quickly find that tweet from when my wife was watching Downfall by searching for “Hitler.”

My wife is watching “Downfall.” She’s mad because every time I hear Hiitler yelling I start laughing.
  — Dr. Drang (@drdrang) Wed Jul 14 2010

Which is of vital importance.

The questions I need to answer now are:

  1. Should I trust IFTTT to keep running?
  2. Should I continue to use ThinkUp as a backup?
  3. Should I just write my own script for archiving each day’s tweets so I don’t have to rely on ThinkUp or IFTTT?

  1. Don’t judge me too harshly. People were doing that sort of thing back then. I shut it down in 2009. 

  2. I don’t have all of my tweets in ThinkUp because Twitter won’t let any program collect more than the most recent 3,200 tweets, and by the time I started using ThinkUp I was already past 4,200. 

  3. I had made my own copy of the IFTTT recipe and started it running before I went back and collected my older archive from ThinkUp.