Sorting by parts, redux
May 8, 2014 at 10:55 PM by Dr. Drang
I got some nice feedback on my “Sorting by parts” post from a few days ago. Enough, I thought, to write a new post instead of just updating the old one. Recall that all of this stems from T.J. Luoma’s original.
The idea was to sort a list of domains, like
foo.tjluoma.com
a.luo.ma
bar.luo.ma
b.tjluoma.com
leancrew.com
drdrang.com
daringfireball.net
6by6.5by5.fm
4by4.5by5.fm
5by5.fm
tjluoma.com
atp.fm
wordpress.com
wordpress.net
wordpress.co
alphabetically, indexing first by the top-level domain, then by the regular domain, then by the subdomain (if present). The sorted list should be
wordpress.co
drdrang.com
leancrew.com
tjluoma.com
b.tjluoma.com
foo.tjluoma.com
wordpress.com
5by5.fm
4by4.5by5.fm
6by6.5by5.fm
atp.fm
a.luo.ma
bar.luo.ma
daringfireball.net
wordpress.net
I gave a couple of solutions: first a shell pipeline that used awk
and sort
, then a short Python script.
Aristotle Pagaltzis suggested a Perl replacement for the awk
portion of the pipeline:
@drdrang: perl -lpe '$_=join" ",reverse $_,"",split/\./' # half as long and twice as clear as the awk code
— Aristotle (@apag) May 5 2014 6:04 PM
I think Aristotle’s own mastery of Perl is making him overestimate the ease with which such a brief and elegant chunk of code can be constructed. There was a time when Perl one-liners came naturally to me, but those years are firmly in the past. If I tried Perl, it would have taken me ten times as long to come up with something one-tenth as good as this.
On the other hand, Nathan Grigg’s suggestion reminded me that I still have some aspects of Perl stuck in my brain. He tweeted an improvement to my Python script:
@drdrang
I would use key-based sorting:
f = lambda s: list(reversed(s.split(".")))
print "".join(sorted(fileinput.input(), key=f))
— Nathan Grigg (@nathangrigg) May 4 2014 8:41 AM
Python’s sorted
function (and the in-place sort
method) can take as a named argument a function that returns the key used to determine the sort order. That’s what Nathan’s f
is, and his solution is more Pythonic than my rearrangement of the parts of the line before and after the sort. What I was doing was harkening back to my Perl days and writing a half-assed Schwartzian Transform. You could also say I was mimicking the logic of my shell pipeline. Either way, I wasn’t writing idiomatic Python.
Joe Rosensteel wondered why I didn’t use pyp
and wrote a blog post with a nice pyp
-based solution. Pyp
is an amalgam of Python and pipes developed at Sony Pictures Imageworks and released under a BSD license.
I’d read about pyp
a year or two ago but didn’t think it’d be worth learning. Much of Python’s power comes from its libraries, and the typing required to import modules negates much of the value of a one-liner. In particular, pyp
didn’t have regular expressions, which are very handy in one-liners.
Joe’s pyp
solution is pretty compelling,1 but I was still skeptical. The domain sorting problem didn’t require any regexes and therefore didn’t expose pyp
’s main weakness. But as I was giving pyp
a second look, I learned that some recent beta versions have added regex functions. Those might be worth exploring in detail.
Thanks to Aristotle, Nathan, and Joe for expanding my horizons. There’s always more to learn, isn’t there?
-
Although I would argue that his solution could be improved by Nathan’s suggestion, too. ↩