Patterns

A few months ago, I bought Patterns in the Mac App Store when it was on sale for 99¢. As I said at the time, I’ve been using regular expressions for years1 and am pretty good at constructing them, but it’s helpful to have the visual feedback Patterns gives you.

For reasons I can’t quite explain, I only recently used Patterns for a replacement. The first thing I noticed is that backreferences are done in a Perl style, regardless of which language you have selected at the top.

Patterns

By “Perl style,” I mean the use of dollar signs to refer to the parenthesized subpatterns. Python uses backslashes (as Perl did, once upon a time). I expected Patterns to use the syntax of the chosen language, but I guess the idea is that it uses a single syntax in the GUI and then outputs the syntax specific to the chosen language when the Copy Code button is pushed. That’s fine with me.

What isn’t fine is the non-Pythonic code that Patterns produces from Copy Code. Here’s what the expressions above generate:

import re
result = re.sub("^ *(\\d\\d?).?(\\d\\d).?(\\d\\d) *$", "\\1:\\2:\\3", searchText)

This works, but the doubled backslashes are both ugly and unnecessary. As the documentation for the re module says:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as \\ inside a regular Python string literal.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

Most Python programmers would heed this advice and use

result = re.sub(r"^ *(\d\d?).?(\d\d).?(\d\d) *$", r"\1:\2:\3", searchText)

which is easier to both write and read.

(Actually, for all but the simplest regexes, I tend to compile the pattern and then use the search, sub, split, or whatever method on the compiled pattern. This makes the code extend over two or more lines, but each line is simpler.)

I’m going to shoot an email to Patterns’ developer, Nikolai Krill, suggesting he use raw strings for the Python code. Until he does, I’ll just go back to what I’ve been doing for the past three months: selecting the regexes directly from the fields in the window and pasting them into my code, ignoring the Copy Code button.


  1. I’m not sure you can program in Perl for as long as I did without being comfortable with regexes. And it certainly helped that I read Friedl’s book