November 29, 2013 at 10:30 PM by Dr. Drang
A few months ago, I bought Patterns in the Mac App Store when it was on sale for 99¢. As I said at the time, I’ve been using regular expressions for years1 and am pretty good at constructing them, but it’s helpful to have the visual feedback Patterns gives you.
For reasons I can’t quite explain, I only recently used Patterns for a replacement. The first thing I noticed is that backreferences are done in a Perl style, regardless of which language you have selected at the top.
By “Perl style,” I mean the use of dollar signs to refer to the parenthesized subpatterns. Python uses backslashes (as Perl did, once upon a time). I expected Patterns to use the syntax of the chosen language, but I guess the idea is that it uses a single syntax in the GUI and then outputs the syntax specific to the chosen language when the Copy Code button is pushed. That’s fine with me.
What isn’t fine is the non-Pythonic code that Patterns produces from Copy Code. Here’s what the expressions above generate:
import re result = re.sub("^ *(\\d\\d?).?(\\d\\d).?(\\d\\d) *$", "\\1:\\2:\\3", searchText)
This works, but the doubled backslashes are both ugly and unnecessary. As the documentation for the
re module says:
Regular expressions use the backslash character (
'\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write
'\\\\'as the pattern string, because the regular expression must be
\\, and each backslash must be expressed as
\\inside a regular Python string literal.
The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with
r"\n"is a two-character string containing
"\n"is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
Most Python programmers would heed this advice and use
result = re.sub(r"^ *(\d\d?).?(\d\d).?(\d\d) *$", r"\1:\2:\3", searchText)
which is easier to both write and read.
(Actually, for all but the simplest regexes, I tend to
compile the pattern and then use the
split, or whatever method on the compiled pattern. This makes the code extend over two or more lines, but each line is simpler.)
I’m going to shoot an email to Patterns’ developer, Nikolai Krill, suggesting he use raw strings for the Python code. Until he does, I’ll just go back to what I’ve been doing for the past three months: selecting the regexes directly from the fields in the window and pasting them into my code, ignoring the Copy Code button.