Discontinuous ranges in Python
September 22, 2019 at 10:58 AM by Dr. Drang
This will be the third in an unplanned trilogy of posts on generating sequences. The first two, the jot
and seq
one and the brace expansion one, were about making sequences in the shell. But most of the sequences I make are in Python programs, and Python has some interesting quirks.
The fundamental sequence maker is range
. In Python 2 and earlier, range
created a list. For example,
python:
range(2, 10)
returns
[2, 3, 4, 5, 6, 7, 8, 9]
For very long sequences, you could save space by using the xrange
function, which would generate the sequence on demand (“lazy evaluation” is the term of art) rather than creating it in full all at once.
python:
r = xrange(5000)
In this case, r
would not be a list but would be of the special xrange type.
In Python 3, range
stopped generating lists and became essentially what xrange
used to be. And the now-redundant xrange
was removed from the language. So
python:
r = range(5000)
would make r
into a variable of the range class, and
python:
r2 = xrange(5000)
would return an error.
For most uses, the change in range
made very little difference in how I write Python scripts. But there is one use I’ve had to modify.
I mentioned in the previous two posts that I often have to create a list of apartment or unit numbers for a building. I use the list to assist in developing inspection plans and keeping track of the inspection results. In the simplest case, the list could be made like this:
python:
units = [ '{}{:02d}'.format(f, u) for f in range(2, 10) for u in range(1, 6) ]
with units
having the value
['201', '202', '203', '204', '205', '301', '302', '303',
'304', '305', '401', '402', '403', '404', '405', '501',
'502', '503', '504', '505', '601', '602', '603', '604',
'605', '701', '702', '703', '704', '705', '801', '802',
'803', '804', '805', '901', '902', '903', '904', '905']
But for taller buildings, a simple range
for the floors wasn’t possible, because residential buildings generally don’t have 13th floors, at least as far as addressing is concerned. The numbering scheme of the units skips directly from the 12xx set to the 14xx set.
In Python 2, I handled this by adding the lists created by range
:
floors = range(2, 13) + range(14, 25)
which gave floors
the value
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24]
The +
operator concatenated the lists produced by range
, and it made for compact and easy-to-read code.
In Python 3, this doesn’t work because the new range class doesn’t understand the +
operator.
python:
floors = range(2, 13) + range(14, 25)
returns a TypeError because +
is unsupported for ranges.
How do we get around this? One way is to turn the ranges into lists before concatenation:
python:
floors = list(range(2, 13)) + list(range(14, 25))
This is certainly clear, but it’s ugly. Another way to do it, assuming we don’t need floors
to be a list, is to use the chain
function from the itertools
library
python:
floors = chain(range(2, 13), range(14, 25))
This is less ugly than the list()
construct, but still not to my taste. I would use it if I had a huge discontinuous sequence to deal with, but not when I have only dozens of items.
With Python 3.5, a new way to unpack iterators was introduced to the language, extending the definition of the unary *
operator. I didn’t learn about it until I was already on Python 3.7, but I’ve been making up for lost time.
You probably know about using *
to unpack a list variable when calling a function. Say you have a function f
that takes a list of five positional variables and a five-element list variable x
that has its items ordered just the way f
wants. Instead of calling f
like this:
python:
a = f(x[0], x[1], x[2], x[3], x[4])
you can call it like this:
python:
a = f(*x)
The extension introduced in Python 3.5 allows us to unpack more than one list in the function call. If list variable y
has two items (corresponding to the first two arguments to f
) and list variable z
has three (corresponding to the final three arguments), we can call f
this way:
python:
a = f(*y, *z)
This by itself doesn’t help with the problem of discontinous floor numbering, but the unpacking extension also allowed the multiple *
construct to be used outside of function calls. Thus,
python:
b = *y, *z
will assign to b
a tuple consisting of the concatenated elements of y
and z
. And this works for other iterables, too. So for the floor problem, I can do
python:
floors = *range(2, 13), *range(14, 25)
to get a tuple of the floors without 13. If I want a list, it’s
python:
floors = [ *range(2, 13), *range(14, 25) ]
This is neither as compact nor as clear as the old Python 2 way, but it’s not too bad, and it avoids the cluster of parentheses I was using with list()
and chain()
.
I’ve made a promise to myself to read the release notes when I switch to Python 3.8.
Update Sep 22, 2019 2:15 PM
Joe Lion suggested this:
python:
floors = [ f for f in range(2, 25) if f != 13 ]
What I like about this is how explicit it is that we are excluding 13 from the list. What I’m less enthused about is the f for f in
part, which is a lot of typing for essentially a no-op.
So it got me thinking about other ways to exclude the 13. To my surprise, I’m beginning to favor this:
python:
floors = list(range(2, 25))
floors.remove(13)
I still don’t like the nesting of range
within list
, and I have the common tendency to dislike using two lines when it can be done in one, but the intent of this—just like with Joe’s comprehension—is very clear: I want a list that goes from 2 through 24 but without 13. I’ll have to give it some thought.