Smart quotes in JavaScript

With my Twitter client Dr. Twoot back on its feet, I wanted to add a smart quotes feature that would give my tweets typographically correct quotation marks and apostrophes. I figured someone would have a JavaScript library or function just waiting for me to download and plug into my code, but alas, I ended up writing my own.

“‘It’s quoted!’ he said.”

A smart quotes feature is one that turns the ambidextrous, easily-typed straight quotation marks that originated (I think) with typewriters into the curved quotation marks that books and magazines use. It’s a feature common to word processors nowadays, but I first ran into it in the mid ’80s in David Dunham’s desk accessory text editor, miniWriter. According to Dunham—and I’m sure he’s right—this was the very first implementation of the smart quotes algorithm.

MiniWriter and the other programs that picked up the algorithm did the straight-to-curly replacement as you typed, something not available to me in the text field of Dr. Twoot. I’d be using batch processing to make the substitutions after the tweet was written but before it was sent off to Twitter for posting. The premier smart quotes batch processor is John Gruber’s SmartyPants, which also does replacements for dashes and ellipses. I couldn’t use Smartypants in Dr. Twoot because it’s written in Perl, but I figured someone had done a similar implementation in JavaScript.

I found jsPrettify, which was much more elaborate than I needed, but looked promising. I pulled out the smart quote substitution function, made a few changes, and put it in Dr. Twoot. Unfortunately, the changes broke something, and my third tweet with the “improved” code was this embarrassment, with an open single quotation mark where an apostrophe should be:

@PhilGeek A good tip. Here‘s another: when a guy on UTC is starting tomorrow it’s time for me to end today.

12:34 AM Thu Nov 18, 2010

I decided to abandon jsPrettify and do something simpler:

// Change straight quotes to curly and double hyphens to em-dashes.
function smarten(a) {
  a = a.replace(/(^|[-\u2014\s(\["])'/g, "$1\u2018");       // opening singles
  a = a.replace(/'/g, "\u2019");                            // closing singles & apostrophes
  a = a.replace(/(^|[-\u2014/\[(\u2018\s])"/g, "$1\u201c"); // opening doubles
  a = a.replace(/"/g, "\u201d");                            // closing doubles
  a = a.replace(/--/g, "\u2014");                           // em-dashes
  return a

Can five regex substitutions take the place of SmartyPants’s 750 lines of Perl? Of course not, but SmartyPants does a lot of things I don’t need. Most important, because it deals with HTML, SmartyPants has to distinguish between the straight quotes that have to stay straight—for tag attributes, and inside <code> blocks—and those that should get curled. I don’t have to worry about that, nor do I have to worry about deeply nested quotes within quotes. Tweets just don’t get that complex. Also, because I’m working in the UTF-8 environment of the web, I don’t have to work around HTML entities.

At present, smarten doesn’t convert three periods to an ellipsis, mainly because my habit is to type the ellipsis directly with ⌥;. I may change my mind on that one; it’d be an easy addition to make.

You’ll notice also that I get an em-dash from two hyphens, not three. By default, SmartyPants uses three hyphens for an em-dash and two for an en-dash, a convention first used by TeX and LaTeX. My typing habits go back further than that—my high school typing course was taught in a room full of actual typewriters1—and the convention I learned was to use two hyphens. Since I usually just type the em-dash directly (⌥⇧-) this conversion will seldom be done, but sometimes my fingers follow their old ways.

(Strictly speaking, I don’t need a smart quotes converter at all; I could just type the curly quotation marks directly. But that’s a habit I doubt I’ll ever develop.)

One last thing. To help me debug the smarten function, I made this simple page to run it through a test suite. I never worked up a comprehensive set of tests, but the tests that are there are probably good enough; it’s hard to get really complex in 140 characters. The page is set up to make new tests easy to add and test interactively.

  1. They were electric typewriters, at least. I’m not that old.