Settling my hash

Today I ran across a nasty bug that affects several of my TextExpander URL snippets. Fixing the bug was easy—it took exactly one keystroke—but I was surprised by two things:

  1. I hadn’t run into this bug before.
  2. It was a bug at all.

The bug was in a Python script used by all my URL shortening snippets. The script is called escapeurl, and it used to look like this:

python:
1:  #!/usr/bin/python
2:  
3:  from urllib import quote
4:  from sys import argv
5:  
6:  if argv[1]:
7:  print quote(argv[1],'/:#')

The purpose of the script is to percent-encode the URL before sending it off to Metamark for shortening. The bug was having the hash character in the second argument to quote in Line 7. That argument is a string with all the non-alphanumeric characters that aren’t to be encoded.

Metamark apparently wants that hash encoded. Changing Line 7 to

python:
7:  print quote(argv[1],'/:')

fixed the problem.

I ran into the bug when shortening a Twitter URL. Some time ago—possibly with the rollout of “New Twitter”—Twitter changed the URLs of individual tweets from this format

http://twitter.com/drdrang/status/32691848535351296

to this format

http://twitter.com/#!/drdrang/status/32691848535351296

The addition of the #! part is in keeping with Google’s recommendations for making Ajax pages “crawlable.” I first noticed this after forking the blackbirdpy repository; Jeff Miller’s original, written before Twitter changed the format, didn’t allow for the #! part in the URL, and I had to change a regex to get it to work.

I suppose the reason this bug hadn’t bitten me before today was that I hadn’t shortened any Twitter URLs since the format change.

As to why it’s a bug, I’m still not sure. Hashes are perfectly acceptable characters in URLs and shouldn’t need to be encoded; that’s why I put them in the “exclusion” string passed to quote. But I guess Metamark thinks otherwise. When you give its REST API a URL with an unencoded hash, it considers only the portion before the hash. This is tricky behavior because a shortened URL is returned—it just doesn’t point to the address you gave it. For a Twitter URL, the shortened version ends up pointing to the main Twitter page.

The Metamark service is run by the folks at perl.org, so it’s not surprising that it uses the Perl philosophy of “don’t complain, do something.” In this particular case, though, the something is not what you want. A little Python pedantry would have let me know about this bug earlier.

Anyway, the bug is fixed now, and I can shorten Twitter URLs to my heart’s content.