Fighting Twitter spam in Dr. Twoot
June 13, 2011 at 11:36 PM by Dr. Drang
Twitter spam takes many forms, but every spam tweet I’ve gotten in the past couple of weeks has looked like this:
The body of the tweet has been, apart from my handle and some whitespace, just a URL. I’ve been reporting the spammers right away, but I’m sure they just start up new accounts with different user names.
I decided to add automatic spam reporting to Dr. Twoot, my homemade Twitter client. The logic of the addition is pretty simple: A tweet is considered spam if
- the tweet consists of just my name and a URL, and
- I don’t follow the sender.
The function that decides whether a tweet is spam is just a few lines long:
javascript:
function isSpam(body, sender, friendList) {
spamRE = new RegExp('^@' + SNAME + '\\s+' + 'https?://[^ \\n]+[^ \\n.,;:?!&\'"’”)}\\]]' + '\\s*$');
return body.match(spamRE) && ($.inArray(sender, friendList) == -1);
}
It’s passed the body of the tweet, the user ID of the sender, and a list of the user IDs of people I follow. The regex that identifies a URL is reused from the htmlify
function that turns URLs into clickable links; the friends list is generated from the friends/ids
call to the Twitter API.
If a tweet is identified as spam, a notice is added to the end of the tweet,
and the sender is reported through the report_spam
API call. I suppose I could have Dr. Twoot just delete the offending tweet from the stream, but actually knowing that the spam is being found and reported seems more satisfying.
The notice was also helpful in debugging the code. As you no doubt guessed, the examples you see above were made by me, not by a real spammer. I created the drangspam account so I could test my new functions without waiting around for real spammers to appear.
This won’t catch the keyword spam that usually pops up right after you tweet about an iPhone or an iPad, and it won’t report bots like the delightful @ro_bot_dylan, who will tweet a Bob Dylan lyric to you if you mention Dylan in one of your tweets, but it will take care of these brain-dead random spams that have been getting under my skin.