May 8, 2014 at 10:55 PM by Dr. Drang
I got some nice feedback on my “Sorting by parts” post from a few days ago. Enough, I thought, to write a new post instead of just updating the old one. Recall that all of this stems from T.J. Luoma’s original.
The idea was to sort a list of domains, like
foo.tjluoma.com a.luo.ma bar.luo.ma b.tjluoma.com leancrew.com drdrang.com daringfireball.net 6by6.5by5.fm 4by4.5by5.fm 5by5.fm tjluoma.com atp.fm wordpress.com wordpress.net wordpress.co
alphabetically, indexing first by the top-level domain, then by the regular domain, then by the subdomain (if present). The sorted list should be
wordpress.co drdrang.com leancrew.com tjluoma.com b.tjluoma.com foo.tjluoma.com wordpress.com 5by5.fm 4by4.5by5.fm 6by6.5by5.fm atp.fm a.luo.ma bar.luo.ma daringfireball.net wordpress.net
I gave a couple of solutions: first a shell pipeline that used
sort, then a short Python script.
Aristotle Pagaltzis suggested a Perl replacement for the
awk portion of the pipeline:
@drdrang: perl -lpe '$_=join" ",reverse $_,"",split/\./' # half as long and twice as clear as the awk code
— Aristotle (@apag) May 5 2014 6:04 PM
I think Aristotle’s own mastery of Perl is making him overestimate the ease with which such a brief and elegant chunk of code can be constructed. There was a time when Perl one-liners came naturally to me, but those years are firmly in the past. If I tried Perl, it would have taken me ten times as long to come up with something one-tenth as good as this.
On the other hand, Nathan Grigg’s suggestion reminded me that I still have some aspects of Perl stuck in my brain. He tweeted an improvement to my Python script:
I would use key-based sorting:
f = lambda s: list(reversed(s.split(".")))
print "".join(sorted(fileinput.input(), key=f))
— Nathan Grigg (@nathangrigg) May 4 2014 8:41 AM
sorted function (and the in-place
sort method) can take as a named argument a function that returns the key used to determine the sort order. That’s what Nathan’s
f is, and his solution is more Pythonic than my rearrangement of the parts of the line before and after the sort. What I was doing was harkening back to my Perl days and writing a half-assed Schwartzian Transform. You could also say I was mimicking the logic of my shell pipeline. Either way, I wasn’t writing idiomatic Python.
Joe Rosensteel wondered why I didn’t use
pyp and wrote a blog post with a nice
Pyp is an amalgam of Python and pipes developed at Sony Pictures Imageworks and released under a BSD license.
I’d read about
pyp a year or two ago but didn’t think it’d be worth learning. Much of Python’s power comes from its libraries, and the typing required to import modules negates much of the value of a one-liner. In particular,
pyp didn’t have regular expressions, which are very handy in one-liners.
pyp solution is pretty compelling,1 but I was still skeptical. The domain sorting problem didn’t require any regexes and therefore didn’t expose
pyp’s main weakness. But as I was giving
pyp a second look, I learned that some recent beta versions have added regex functions. Those might be worth exploring in detail.
Thanks to Aristotle, Nathan, and Joe for expanding my horizons. There’s always more to learn, isn’t there?
Although I would argue that his solution could be improved by Nathan’s suggestion, too. ↩︎