ASCIIfying

I’ve been adding more automation to my static blog publishing workflow.1 The scripts themselves are of no use to anyone else, but some bits and pieces may be of wider interest. For example, this morning I wrote a script using a library that converts Unicode strings to their nearest ASCII equivalent.

The script, written to be used as a Text Filter in BBEdit, automates the generation of header lines in the Markdown source code of a post. The header of this post, for example, looks like this:

Title: ASCIIfying
Keywords: python, programming
Date: 2014-10-19 22:10:00
Slug: asciifying
Link: http://www.leancrew.com/all-this/2014/10/asciifying/

I write the Title and Keyword lines as I start the post, using a simple BBEdit Clipping. But before I publish, I need the other lines. The Date is easy to generate using the datetime library. That’s also the library I use to generate the year and month portions of the Link URL. The tricky thing is automating the creation of the Slug, which also shows up in the Link.

Oh, it’s very easy to make a slug when the title is as simple as this one, but suppose we started with this:

Title: Çingleton/Montréal isn't done
Keywords: test

Non-ASCII characters are allowed in URLs, but they can be troublesome, and I prefer to avoid them. Also, we can’t have the slash in there, and the apostrophe ought to go, too. Finally, I don’t want any spaces, because they cause nothing but trouble in the file system, and I hate seeing %20 in a URL.

The function I settled on is this:

python:
1:  def slugify(u):
2:    "Convert Unicode string into blog slug."
3:    u = re.sub(u'[–—/:;,.]', '-', u)  # replace separating punctuation
4:    a = unidecode(u).lower()          # best ASCII substitutions, lowercased
5:    a = re.sub(r'[^a-z0-9 -]', '', a) # delete any other characters
6:    a = a.replace(' ', '-')           # spaces to hyphens
7:    a = re.sub(r'-+', '-', a)         # condense repeated hyphens
8:    return a

All of the lines are straightforward and obvious except the unidecode call in Line 4. That is the one function exported by the unidecode library, and it does the substitutions that make slugify generate strings that are much more useful than anything I could write with the standard encode and decode methods. My script turns that two-line header above into

Title: Çingleton/Montréal isn't done
Keywords: test
Date: 2014-10-19 21:31:22
Slug: cingleton-montreal-isnt-done
Link: http://www.leancrew.com/all-this/2014/10/cingleton-montreal-isnt-done/

which has a perfectly readable URL that includes nothing but lowercase ASCII characters, numerals, and hyphens.

The unidecode library is a Python port of a Perl module, and its documentation is sparse. If you want to know what it does and why it does it, go to Sean Burke’s writeup of his original Perl module, Text::Unidecode. It lays out his goals for the module, explains its limitations, and includes little gems like this:

I discourage you from being yet another German who emails me, trying to impel me to consider a typographical nicety of German to be more important than all other languages.

If you ever need to ASCIIfy some text, Text::Unidecode or one of its ports (here’s one for Ruby) will come in handy.


  1. “Static blog publishing workflow” may be the most jargon-filled four-word phrase I’ve ever written. 


Circling the drain with Drafts

A Mac-only solution isn’t very satisfying anymore, so last night’s post on using Services to create ⓒⓘⓡⓒⓛⓔⓓ, pǝddılɟ, and s̸t̸r̸u̸c̸k̸ ̸t̸h̸o̸u̸g̸h̸ text felt incomplete. Combining the logic of the Python conversion scripts in that post with what I learned about using JavaScript in Drafts 4, I made three new keyboard scripts to convert selected characters in Drafts.

Each script works pretty much as you’d expect. Type in some normal text, select the portion you want converted, and tap the appropriate button. Voila!

Drafts encircler

The script that does the encircling is this:

javascript:
 1:  function encircle(s) {
 2:    var pchars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
 3:      , cchars = "ⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢⓣⓤⓥⓦⓧⓨⓩⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ⓪①②③④⑤⑥⑦⑧⑨"
 4:      , count  = pchars.length
 5:      , regex  = new RegExp('.', 'g')
 6:      , trans  = {}
 7:      , lookup = function(c) { return trans[c] || c; };
 8:    
 9:    for (var i=0; i<count; i++) {
10:      trans[pchars[i]] = cchars[i];
11:    }
12:    
13:    return s.replace(regex, lookup);
14:  }
15:  
16:  setSelectedText(encircle(getSelectedText()))

The structure of the encircle function is pretty much stolen wholesale from one of the answers to this Stack Overflow question. It takes advantage of a feature of JavaScript familiar to old Perl programmers: the replace method can take a function as the replacement argument. That function, lookup, is defined in Line 7 and the trans object it uses to replace characters is built in Lines 9–11. The regular expression in Line 5 that serves as the first argument to replace is very simple because all the work is done by lookup. Line 16 gets the selected text from the draft, converts it, and replaces the selection with the converted text.

Update 10/18/14
People who think I know what I’m doing with this stuff are so misguided. Nathan Grigg, an actual programmer who’s actually smart, pointed out on Twitter that the regular expression I was initially using was far more complicated than it needed to be. He was, of course, absolutely correct, so I’ve simplified it in both this script and the character flipping script below. The descriptions have been edited to match the new code. Thanks, Nathan!

The key that applies this script is built by tapping the pencil icon at the end of Drafts’ custom key strip, tapping the + button to add a new key, choosing the Script option, and then entering the label, name, and script. When you’re done, a new key with that label will appear in the strip.1

Creating the Encircle script key

The flipping script is built the same way.

javascript:
 1:  function flip(s) {
 2:    var pchars = "abcdefghijklmnopqrstuvwxyz,.?!'(){}[]"
 3:      , fchars = "ɐqɔpǝɟƃɥıɾʞlɯuodbɹsʇnʌʍxʎz'˙¿¡,)(}{]["
 4:      , count  = pchars.length
 5:      , regex  = new RegExp('.', 'g')
 6:      , trans  = {}
 7:      , t      = s.toLowerCase()
 8:      , lookup = function(c) { return trans[c] || c; };
 9:    
10:    for (var i=0; i<count; i++) {
11:      trans[pchars[i]] = fchars[i];
12:    }
13:    var a = t.split("");
14:    a.reverse();
15:    return a.join("").replace(regex, lookup);
16:  }
17:  
18:  setSelectedText(flip(getSelectedText()))

Apart from the two strings that define the conversion, there are a two other differences between this script and the encircler:

  1. Because there aren’t good upside-down versions of all the capital letters, everything is converted to lowercase first. That’s done in Line 7.
  2. To look decent in the flipped condition, the order of the letters has to be reversed. That’s done by creating an array of characters in Line 13, reversing it in Line 14, and then joining them back together in Line 15.

Finally, there’s the strikethrough script.

javascript:
 1:  function strikeout(unstruck) {
 2:    var s = String.fromCharCode(824)
 3:      , a = unstruck.split('');
 4:    if (a.length > 0) {
 5:      return a.join(s) + s;
 6:    }
 7:    else {
 8:      return '';
 9:    }
10:  }
11:  
12:  setSelectedText(strikeout(getSelectedText()))

This one doesn’t do any replacing, because there aren’t “stuck out” versions of all the letters. Instead, it puts the COMBINING LONG SOLIDUS OVERLAY character (decimal 824, hex 0338) after every character in the string. The combination appears as a character with a diagonal line through it.

It should be easy enough to use these scripts as guidelines for creating other conversions. sᴍᴀʟʟ ᴄᴀᴘs, for example, seems like something Than Tibbetts should be all over.

Because these key definitions seem fairly simple and robust, I’ve uploaded them to the Keyboard Extensions section of the Drafts Actions Directory:

You can install them from there if you don’t want to go through the character-building2 exercise of making them yourself.

Update 10/19/14
Here’s a clever idea from Jamie Jenkins on Twitter. Put the circled characters at the end of the pchars string and the plain characters at the end of the cchars string, like this:

var pchars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢⓣⓤⓥⓦⓧⓨⓩⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ⓪①②③④⑤⑥⑦⑧⑨"
  , cchars = "ⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢⓣⓤⓥⓦⓧⓨⓩⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ⓪①②③④⑤⑥⑦⑧⑨abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"     

The rest of the script remains the same. With this change, you can toggle between circled and plain using the same key command. Same thing can be done with the pchars and fchars variables in the flip function.


  1. Yes, one screenshot shows the label as Ⓐ, and the other shows it as ⓐ. I changed the name between screenshots and didn’t feel like going back and redoing the earlier one. 

  2. I have no shame. 


Circle service

Greg Scown of Smile Software wrote a blog post today in which he described a TextExpander group for writing ⓒⓘⓡⓒⓛⓔⓓ ⓛⓔⓣⓣⓔⓡⓢ. He was inspired by the excessive use of such letters in the tweets of Stephen Hackett.

ⓣⓗⓔ ⓐⓟⓟⓛⓔ ⓢⓣⓞⓡⓔ ⓘⓢ ⓓⓞⓦⓝ ⓙⓤⓢⓣ ⓛⓘⓚⓔ ⓔⓥⓔⓡⓨ ⓞⓣⓗⓔⓡ ⓐⓟⓟⓛⓔ ⓔⓥⓔⓝⓣ ⓑⓡⓑ ⓑⓛⓞⓖⓖⓘⓝⓖ
Stephen Hackett (@ismh) Oct 16 2014 7:56 AM

What’s good about Greg’s snippet group is that it works on both the Mac and iOS; what’s less good is the length of the abbreviations:

For example:

oooT gets you: Ⓣ
ooox gets you: ⓧ
ooo4 gets you: ④

Now it’s true that typing three o’s in a row isn’t much more time-consuming than typing just one, but I still prefer to type the text normally and then convert it to circled form. So I fired up Automator and made a Service to do it.

It took almost no time because I’d already made a few services that did similar things: s̸t̸r̸i̸k̸e̸t̸h̸r̸o̸u̸g̸h̸, dılɟ, and EBG13. The circled text service was most like the text flipper, so I copied it and did some quick editing.

In Automator, the circle service looks like this:

Circle service

The Python script that runs when the Service is invoked is this:

python:
1:  # coding: utf8
2:  
3:  from sys import stdin, stdout
4:  
5:  pchars = u"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
6:  cchars = u"ⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢⓣⓤⓥⓦⓧⓨⓩⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ⓪①②③④⑤⑥⑦⑧⑨"
7:  circler = dict(zip(map(ord, pchars), cchars))
8:  stdout.write(stdin.read().decode('utf8').translate(circler).encode('utf8'))

Update 10/18/14
OK, this is kind of weird. There are apparently a couple of ways to get the Ⓜ character, and what I did originally caused the translation to fail with capital letters beyond M. The UTF-8 code for Ⓜ, in hex, is 24C2. But for some reason, when you insert it from the Mac Character Viewer, as I did when I first wrote this script, it inserts not just 24C2, but also FE0E, which is Variation Selector-15, a character of zero space.

Circled M in Character Viewer

That messed up the definition of the dictionary in Line 7 and caused all of the higher capital letters to point to a circled capital letter one lower than they should have. For example, S would turn into Ⓡ. To fix this problem, I opened IPython and gave it these two commands

m = unichr(9410)
print m

Because decimal 9410 is hex 24C2, this caused Ⓜ to print without the trailing zero-width character. I used it to clean up the cchars string, and now the service works as it should.

You can’t see any difference between the current script and what I originally posted, because the difference is an invisible character. This is the sort of thing that makes people hate computers.

What I should hate, though, is Apple for sticking an invisible character in where it doesn’t belong. I wonder if that bug has always been there.

The first line tells the Python interpreter that the source code is going to include UTF-8 characters. Lines 5–7 set up a dictionary in which the keys are the character codes of the regular characters and the values are the corresponding circled characters. This kind of dictionary is what the string translate method uses as its argument.1

Lines 8 looks more complicated than it is. Basically, it reads standard input, runs the translation, and writes the result to standard output. The messiness comes from the decode and encode methods, which are there to handle the non-ASCII characters. This arrangement is called the “Unicode sandwich,” and I learned about it by watching this excellent talk by Ned Batchelder.

The service is called Circle selection and it appears in the Services submenu when text is selected. But using the Services submenu is a pain in the ass, so I defined shortcuts for this and the other text conversion services.

Keyboard shortcuts for text translation

I’ve restricted the shortcuts to work only in Dr. Twoot, because Twitter is the only place2 I’ll use them.

The other services are structured the same way in Automator; the only differences are the Python scripts. The post from last year shows older versions of the scripts, which generally worked, but aren’t as robust as what I’m using now.

Here’s the script for flipping characters:

python:
 1:  # coding: utf8
 2:  
 3:  from sys import stdin, stdout
 4:  
 5:  pchars = u"abcdefghijklmnopqrstuvwxyz,.?!'()[]{}"
 6:  fchars = u"ɐqɔpǝɟƃɥıɾʞlɯuodbɹsʇnʌʍxʎz'˙¿¡,)(][}{"
 7:  flipper = dict(zip(map(ord, pchars), fchars))
 8:  a = list(stdin.read().decode('utf8').lower().translate(flipper))[:-1]
 9:  a.reverse()
10:  stdout.write(''.join(a).encode('utf8'))

The script for striking through characters is this:

python:
1:  from sys import stdin, stdout
2:  
3:  unstruck = stdin.read().decode('utf8')
4:  struck = u'\u0338'.join(unstruck)
5:  stdout.write(struck.encode('utf-8'))

Strikethrough works for both A̸S̸C̸I̸I̸ and most Ü̸ñ̸î̸ç̸ø̸ƌ̸é̸ characters, but it won’t work on Emoji. I guess Emoji don’t have whatever characteristics are necessary to function with combining characters.

The script for doing a ROT13 is this:

python:
1:  from string import maketrans
2:  from sys import stdin, stdout
3:  
4:  alpha = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
5:  rot13 = 'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM'
6:  r13table = maketrans(alpha, rot13)
7:  stdout.write(stdin.read().translate(r13table))

You’ll notice that I used maketrans in this script and that there’s no Unicode sandwich. That’s because the only characters that get converted are ASCII. Everything that isn’t an ASCII letter passes through untouched, so multi-byte characters don’t need to be decoded and encoded. That’s what what my testing shows, anyway.

Since all four of these services are basically just Python scripts, there’s probably some clever way to run them via Pythonista and a URL scheme on iOS. I haven’t looked into it. Frankly, I wrote these scripts mostly because they were fun, not because I expect to use them much. Unless I need to communicate with Stephen.


  1. In theory, you can use the string.maketrans function to create this kind of table, but it’s never worked for me when Unicode characters are involved. 

  2. Other than this post. 


Three things for Drafts 4

Greg Pierce at Agile Tortoise released Drafts 4 yesterday, and it’s a doozy. A new look and many new features, especially for those of us who like programmable tools. Brett Terpstra has threatened to port a bunch of his Markdown stuff to Drafts, and Gabe Weatherhead has already dipped his toes into the new JavaScript system for adding scriptable buttons to the keyboard. If you want a more conventional review of all the new features, Alex Guyot has a good one in the usual comprehensive MacStories style. I’m just going to show a few things I’ve done today as I explore the new things Drafts 4 can do.


First, there’s the Web Capture Template. Drafts 4 comes with a sharing action that lets you quickly create a new draft from the web page you’re viewing in Safari. The three components it’ll capture are the page’s title, its URL, and whatever text you may have selected on the page. This is a great way to collect information during online research, and it’s also a quick way to make a link post for a blog. I don’t do much link posting here, but with a handy tool like the Web Capture Template, I might do more.

Here’s my Web Capture Template for starting a link post.

Drafts web capture template

It’s a short chunk of Markdown that puts the title of the page first and makes it a reference link with reference number 1. Then comes a quote from the web page. Because Drafts uses double brackets for its template tags and Markdown uses brackets for links, the template has a shitload of brackets in it, but they all serve a purpose. Here it is in text form:

[[[title]]][1]:

> [[selection]]


[1]: [[url]]

The images below show the template in use. In the left image, I’m in Safari and have selected the text I want to appear in my post. I then bring up the share sheet and tap on the Drafts action icon. That generates the text shown in the right image from the template. Tapping the Capture button saves it as a draft that I can edit or expand on later.

Drafts action in Safari

This all done without leaving Safari and without any use of the clipboard. (The Copy button is in the left image because it appears automatically whenever you select text in Safari—it doesn’t get tapped.)


Let’s say I switched over to Drafts and finished writing the text of my link post. The natural thing to do would be to post it, but my static blogging system isn’t quite ready for that. I can, however, save the post into the source directory where all the Markdown files go. This directory is in Dropbox, and Drafts makes it easy to write an action that saves a file there.

Drafts blog action

The path is cut off in the screenshot. It’s

/Elements/all-this/source/[[date|%Y/%m]]

The [[date]] tag adds the year and month to the end of the path.1

Like my blogging system, this is a work in progress. The name of the file should really come from the Slug line in the file header, not from the first line of the file, but I’ll get that worked out in due time.

By the way, if you think I’m cheating here because this action doesn’t use any of the new features of Drafts 4, I’ll point you to the immortal words of Ring Lardner:

“Shut up,” he explained.


Finally, here’s a keyboard script inspired by Gabe’s line sorting script. Gabe’s script takes a scrambled set of numbered lines and rearranges them in ascending order. Mine does essentially the opposite.

When I write an ordered list, I often find myself moving the items up and down the list during editing,2 which messes up the numbering. Markdown processors don’t care about that, but I do. I want the items to stay in the order I put them but to be renumbered from one up. So I wrote this keyboard script:

javascript:
 1:  function renumber(s) {
 2:    var list = new Array();
 3:    list = s.split('\n');
 4:    for (var i=0; i<list.length; i++) {
 5:      var parts = new Array();
 6:      parts = list[i].split('.');
 7:      parts.shift();
 8:      parts.unshift((i+1).toString());
 9:      list[i] = parts.join('.');
10:    }
11:    var renumbered = list.join('\n');
12:    return renumbered;
13:  }
14:  
15:  setSelectedText(renumber(getSelectedText()));

Here it is in action. The left screenshot shows the list with messed up numbers. I select the list, tap the 🔢 button, and the script renumbers the list as shown in the right screenshot.

Renumbering list items in Drafts

In case you’re wondering, I do know that it’s “almost fanatical dedication to the Pope,” but I left out the qualifier so the item would fit on one line. I’ll punish myself by going to sit in… The Comfy Chair!

Update 10/16/14
Rob Trew is using Drafts 4, so you can expect lots of clever scripts as he digs deeper into its capabilities. Today, he posted an improvement to my renumbering script, which allows the user to select only portions of the first and last lines of the list. I don’t know about you, but I often find it hard to lift my finger off the screen without shifting the selection a bit. My original script forces you to select the list exactly from start to finish; Rob’s is more forgiving.

After playing around with it, though, I realized that neither of our scripts could handle an important case: a numbered list with paragraphs within one or more of the list items. Like this:

Improved list renumbering

For this, we need a script that splits the string into list items, not lines. Stealing some ideas from Rob, this is what I came up with:

javascript:
 1:  function renumber(s) {
 2:    var rgx=/^\d+\. /m, 
 3:        list=s.split(rgx),
 4:        count;
 5:    list.shift();
 6:    count = list.length;
 7:    for(var i=0; i<count; i++) {
 8:      list[i] = (i+1).toString() + '. ' + list[i];
 9:    }
10:    return list.join('');
11:  }
12:  
13:  var rngLines = getSelectedLineRange(),
14:      iFrom=rngLines[0],
15:      iTo=rngLines[1];
16:  
17:  setTextInRange(iFrom, iTo, renumber(getTextInRange(iFrom, iTo)));

In Line 3, we split the string on the digits, a period, and a space that start a list item. This gives us an array with an empty first item, which we shift off in Line 5.

There are, undoubtedly, more improvements that could be made, but I’ll leave it at this. Too much JavaScript and I feel like a cross beam that’s gone out of skew on the treadle.


  1. The Elements directory name is a holdover from when I used Elements as my iOS text editor. I don’t use it anymore (I’m not even sure it’s available anymore), but I always point whatever editor I’m using (currently Notesy) to that directory because that’s where the old files are. 

  2. Drafts 4 makes list reordering very easy. Tapping the hamburger button at the bottom of the screen (when the keyboard is hidden) splits the text at newlines and lets you drag each line (or paragraph) up and down. Much faster than selecting, cutting, and pasting.