Twoot, again

I’ve really been enjoying my customized version of Twoot (here’s its GitHub repository). It works the way I want it to and is easy to adjust when my needs change. Credit for that goes to:

This morning, I was confronted with a bug. @PhilGeek posted a link to a funny poem/research paper, but because there was no space between the link and the preceding text, Twoot screwed up the link URL.

I knew where to go: a gnarly regular expression that looks for URLish text and turns it into a link. It started like this:


It’s the \w+ that’s the problem. It collects all the consecutive “word” characters (letters, numbers, underscores) before the colon and double-backslash, so it ran past the “http” and grabbed the “Seuss” before it.

Obviously, the reason Peter Krantz wrote the regex that way was to be able to handle all URL types, not just http. I didn’t need or want such generality, so I changed the regex to


which would recognize only http URLs. I later decided this was too restrictive and changed the regex to


which allows http, https, and ftp URLs. The runaway collection of word characters has been eliminated, and Twoot can now handle links that don’t have a preceding space or punctuation character.