Sorting by parts, redux

I got some nice feedback on my “Sorting by parts” post from a few days ago. Enough, I thought, to write a new post instead of just updating the old one. Recall that all of this stems from T.J. Luoma’s original.

The idea was to sort a list of domains, like

alphabetically, indexing first by the top-level domain, then by the regular domain, then by the subdomain (if present). The sorted list should be

I gave a couple of solutions: first a shell pipeline that used awk and sort, then a short Python script.

Aristotle Pagaltzis suggested a Perl replacement for the awk portion of the pipeline:

@drdrang: perl -lpe '$_=join" ",reverse $_,"",split/\./' # half as long and twice as clear as the awk code
Aristotle (@apag) May 5 2014 6:04 PM

I think Aristotle’s own mastery of Perl is making him overestimate the ease with which such a brief and elegant chunk of code can be constructed. There was a time when Perl one-liners came naturally to me, but those years are firmly in the past. If I tried Perl, it would have taken me ten times as long to come up with something one-tenth as good as this.

On the other hand, Nathan Grigg’s suggestion reminded me that I still have some aspects of Perl stuck in my brain. He tweeted an improvement to my Python script:

I would use key-based sorting:

f = lambda s: list(reversed(s.split(".")))
print "".join(sorted(fileinput.input(), key=f))

Nathan Grigg (@nathangrigg) May 4 2014 8:41 AM

Python’s sorted function (and the in-place sort method) can take as a named argument a function that returns the key used to determine the sort order. That’s what Nathan’s f is, and his solution is more Pythonic than my rearrangement of the parts of the line before and after the sort. What I was doing was harkening back to my Perl days and writing a half-assed Schwartzian Transform. You could also say I was mimicking the logic of my shell pipeline. Either way, I wasn’t writing idiomatic Python.

Joe Rosensteel wondered why I didn’t use pyp and wrote a blog post with a nice pyp-based solution. Pyp is an amalgam of Python and pipes developed at Sony Pictures Imageworks and released under a BSD license.

I’d read about pyp a year or two ago but didn’t think it’d be worth learning. Much of Python’s power comes from its libraries, and the typing required to import modules negates much of the value of a one-liner. In particular, pyp didn’t have regular expressions, which are very handy in one-liners.

Joe’s pyp solution is pretty compelling,1 but I was still skeptical. The domain sorting problem didn’t require any regexes and therefore didn’t expose pyp’s main weakness. But as I was giving pyp a second look, I learned that some recent beta versions have added regex functions. Those might be worth exploring in detail.

Thanks to Aristotle, Nathan, and Joe for expanding my horizons. There’s always more to learn, isn’t there?

  1. Although I would argue that his solution could be improved by Nathan’s suggestion, too.