Discontinuous ranges in Python

This will be the third in an unplanned trilogy of posts on generating sequences. The first two, the jot and seq one and the brace expansion one, were about making sequences in the shell. But most of the sequences I make are in Python programs, and Python has some interesting quirks.

The fundamental sequence maker is range. In Python 2 and earlier, range created a list. For example,

python:
range(2, 10)


returns

[2, 3, 4, 5, 6, 7, 8, 9]


For very long sequences, you could save space by using the xrange function, which would generate the sequence on demand (“lazy evaluation” is the term of art) rather than creating it in full all at once.

python:
r = xrange(5000)


In this case, r would not be a list but would be of the special xrange type.

In Python 3, range stopped generating lists and became essentially what xrange used to be. And the now-redundant xrange was removed from the language. So

python:
r = range(5000)


would make r into a variable of the range class, and

python:
r2 = xrange(5000)


would return an error.

For most uses, the change in range made very little difference in how I write Python scripts. But there is one use I’ve had to modify.

I mentioned in the previous two posts that I often have to create a list of apartment or unit numbers for a building. I use the list to assist in developing inspection plans and keeping track of the inspection results. In the simplest case, the list could be made like this:

python:
units = [ '{}{:02d}'.format(f, u) for f in range(2, 10) for u in range(1, 6) ]


with units having the value

['201', '202', '203', '204', '205', '301', '302', '303',
'304', '305', '401', '402', '403', '404', '405', '501',
'502', '503', '504', '505', '601', '602', '603', '604',
'605', '701', '702', '703', '704', '705', '801', '802',
'803', '804', '805', '901', '902', '903', '904', '905']


But for taller buildings, a simple range for the floors wasn’t possible, because residential buildings generally don’t have 13th floors, at least as far as addressing is concerned. The numbering scheme of the units skips directly from the 12xx set to the 14xx set.

In Python 2, I handled this by adding the lists created by range:

floors = range(2, 13) + range(14, 25)


which gave floors the value

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24]


The + operator concatenated the lists produced by range, and it made for compact and easy-to-read code.

In Python 3, this doesn’t work because the new range class doesn’t understand the + operator.

python:
floors = range(2, 13) + range(14, 25)


returns a TypeError because + is unsupported for ranges.

How do we get around this? One way is to turn the ranges into lists before concatenation:

python:
floors = list(range(2, 13)) + list(range(14, 25))


This is certainly clear, but it’s ugly. Another way to do it, assuming we don’t need floors to be a list, is to use the chain function from the itertools library

python:
floors = chain(range(2, 13), range(14, 25))


This is less ugly than the list() construct, but still not to my taste. I would use it if I had a huge discontinuous sequence to deal with, but not when I have only dozens of items.

With Python 3.5, a new way to unpack iterators was introduced to the language, extending the definition of the unary * operator. I didn’t learn about it until I was already on Python 3.7, but I’ve been making up for lost time.

You probably know about using * to unpack a list variable when calling a function. Say you have a function f that takes a list of five positional variables and a five-element list variable x that has its items ordered just the way f wants. Instead of calling f like this:

python:
a = f(x[0], x[1], x[2], x[3], x[4])


you can call it like this:

python:
a = f(*x)


The extension introduced in Python 3.5 allows us to unpack more than one list in the function call. If list variable y has two items (corresponding to the first two arguments to f) and list variable z has three (corresponding to the final three arguments), we can call f this way:

python:
a = f(*y, *z)


This by itself doesn’t help with the problem of discontinous floor numbering, but the unpacking extension also allowed the multiple * construct to be used outside of function calls. Thus,

python:
b = *y, *z


will assign to b a tuple consisting of the concatenated elements of y and z. And this works for other iterables, too. So for the floor problem, I can do

python:
floors = *range(2, 13), *range(14, 25)


to get a tuple of the floors without 13. If I want a list, it’s

python:
floors = [ *range(2, 13), *range(14, 25) ]


This is neither as compact nor as clear as the old Python 2 way, but it’s not too bad, and it avoids the cluster of parentheses I was using with list() and chain().

I’ve made a promise to myself to read the release notes when I switch to Python 3.8.

Update Sep 22, 2019 2:15 PM
Joe Lion suggested this:

python:
floors = [ f for f in range(2, 25) if f != 13 ]


What I like about this is how explicit it is that we are excluding 13 from the list. What I’m less enthused about is the f for f in part, which is a lot of typing for essentially a no-op.

So it got me thinking about other ways to exclude the 13. To my surprise, I’m beginning to favor this:

python:
floors = list(range(2, 25))
floors.remove(13)


I still don’t like the nesting of range within list, and I have the common tendency to dislike using two lines when it can be done in one, but the intent of this—just like with Joe’s comprehension—is very clear: I want a list that goes from 2 through 24 but without 13. I’ll have to give it some thought.

Brace yourself, I’m in an expansive mood

A longstanding truth of this blog is that whenever I write a post about shell t̸r̸i̸c̸k̸s̸ features, I get a note from Aristotle Pagaltzis letting me know of a shorter, faster, or better way to do it. Normally, I add a short update to the post with Aristotle’s improvements, or I explain whay his faster way wouldn’t be faster for me (because some things just won’t stick in my head). But his response to my jot and seq post got me exploring an area of the shell that I’ve seen but never used before, and I thought it deserved a post of its own. I even learned something useful without his help.

Here’s Aristotle’s tweet:

@drdrang Bash/zsh brace expansion can replace 99% of jot/seq uses (though bash < 4.x doesn’t support padding or step size 🙁). Your last example becomes much simpler:

printf '%s\n' 'Apt. '{2..5}{A..D}

You even get your preferred argument order:

printf '%s\n' {10..40..5}

Brace expansion in bash and zsh doesn’t seem like a very important feature because it takes up so little space in either manual. The brief exposure I’ve had to it has been in articles that talked about using it to run an operation on several files at once. For example, if I have a script called file.py that generates text, CSV, PDF, and PNG output files, all named file but with different extensions, I might want to delete all the output files while leaving the script intact. I can’t do

rm file.*


because that would delete the script file. What works is

rm file.{txt,csv,pdf,png}


The shell expands this into

rm file.txt file.csv file.pdf file.png


and then runs the command.

This is cute, but I never thought it worth committing to memory because tab completion and command-line editing through the Readline library makes it very easy to generate the file names interactively.

What I didn’t realize until Aristotle’s tweet sent me to the manuals was that the expansion could also be specified as a numeric or alphabetic sequence using the two-dot syntax. Thus,

mkdir folder{A..T}


creates 20 folders in one short step, which is the sort of thing that can be really useful.

And you can use two sets of braces to do what is effectively a nested loop. With apologies to Aristotle, here’s how I would do the apartment number generation from my earlier post:

printf "Apt. %s\n" {2..5}{A..D}


This gives output of

Apt. 2A
Apt. 2B
Apt. 2C
Apt. 2D
Apt. 3A
Apt. 3B
Apt. 3C
Apt. 3D
Apt. 4A
Apt. 4B
Apt. 4C
Apt. 4D
Apt. 5A
Apt. 5B
Apt. 5C
Apt. 5D


just like my more complicated jot/seq command.

The main limitation to brace expansion when compared to jot and seq is that you can’t generate sequences with fractional steps. If you want numbers from 0 through 100 with a step size of 0.5,

seq 0 .5 100


is the way to go.

And if you’re using the stock version of bash that comes with macOS (bash version 3.2.57), you’ll run into other limitations.

First, you won’t be able to left-pad the generated numbers with zeros. In zsh and more recent versions of bash, you can say

echo {005..12}


and get1

005 006 007 008 009 010 011 012


where the prefixed zeros (which can be put in front of either number or both) tell the expansion to zero-pad the results to the same length. If you run that same command in the stock bash, you just get

5 6 7 8 9 10 11 12


Similarly, the old bash that comes with macOS doesn’t understand the three-parameter type of brace sequence expansion (mentioned by Aristotle), in which the third parameter is the (integer) step size:

echo {5..100..5}


which gives

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100


in zsh and newer versions of bash. Old bash doesn’t understand the three-parameter at all and just outputs the input string:

{5..100..5}


We’ve been told that Catalina will ship with zsh as the default shell, which means we shouldn’t have to worry about these deficiencies for long. Because I don’t want to learn a new system of configuration files, I’m sticking with bash, but I have switched to version 5.0.11 that’s installed by Homebrew. My default shell is now /usr/local/bin/bash.2

One more thing. I said last time that seq needs a weirdly long formatting code to get zero-padded numbers embedded in another string. The example was

seq -f "file%02.0f.txt" 5


to get

file01.txt
file02.txt
file03.txt
file04.txt
file05.txt


What I didn’t understand was how the %g specifier works. Based on my skimming of the printf man page, I thought it just chose the shorter output of the equivalent %f and %e specifiers. But it turns out to do further shortening, eliminating all trailing zeros and the decimal point if there’s no fractional part to the number. Therefore, we can use the simpler

seq -f "file%02g.txt" 5


to get the output above. Because printf-style formatting is used in lots of places, this is good to know outside the context of seq.

Of course, now that I understand brace expansion, I wouldn’t use seq at all. I’d go with something like

echo file{01..5}.txt


1. I’m using echo here to save vertical space in the output.

2. Fair warning: I will ignore or block all attempts to get me to change to zsh. I’m glad you like it, but I’m not interested.

Making lists with jot and seq

The Mac comes with two command-line tools for making sequential lists: jot and seq. In the past, I’ve tended to use jot, mainly because that’s the one I learned first. But seq can do almost everything jot can, and its argument ordering makes more sense in many situations.

The point of both commands is to generate a list of strings that include sequential numbers.1 At their most basic, jot and seq each produce a set of lines with numbers that count up from one. Both

jot 5


and

seq 5


produce the same output:

1
2
3
4
5


More interesting results come from adding arguments and options.

Arguments

jot can take up to four arguments; seq up to three. Here’s a table of how the arguments are ordered.

Arguments jot seq
1 count last
2 count first first last
3 count first last first step last
4 count first last step

As you can see, additional arguments get added to the end for jot but are inserted for seq. As we’ve seen above, for the one-argument case, jot’s count and seq’s last are effectively the same, because both commands have default starting points and step sizes of 1.2

For the two-argument case, both commands keep the default step size as 1 but allow you to change the starting point. To generate numbers from 5 through 10, you’d use either

jot 6 5


or

seq 5 10


To me, the seq arguments are easier to remember and a more natural expression of what’s intended. It’s easy to make an off-by-one error and try

jot 5 5


which will get you only up to 9.

As far as I’m concerned, jot with three arguments is worthless. If I know the starting and stopping points, and the step size is still at its default of one, the count parameter is redundant. You can avoid doing the subtraction in your head (and then remembering to add 1 to avoid the off-by-one error) by filling the count slot with a hyphen:

jot - 5 10


This is fine, but seq 5 10 is simpler.

The three-argument seq and the four-argument jot allow you to specify the step size. Again, jot’s count is redundant and it’s best to put a hyphen in its place to allow the computer to do the arithmetic. To get from 10 to 40 counting by 5s, enter

jot - 10 40 5


to get

10
15
20
25
30
35
40


The seq command to get the same result is

seq 10 5 40


This is the one place where I find seq’s arguments hard to remember. So many programming languages use the order first, last, step that my fingers do that before I can stop them. There’s nothing illogical about the first, step, last ordering, but it’s not what I’m used to.

Options

Just printing out lines with numbers is dull. The real value of jot and seq comes from applying options (often called switches) that allow us format the numbers and include them in other text. Here are the options for jot and seq:

Option jot seq
output format -w string -f string
repeated string -b string
equal width -w
separator -s string -s string
omit final newline -n
precision -p decimals
characters -c
random -r

The option I use most often is -w for jot and the nearly equivalent -f for seq. In both cases, the string argument is a printf-style formatting string. So if I wanted to generate a bunch of sequential file names, I could use

jot -w "file%02d.txt" 5


or

seq -f "file%02.0f.txt" 5


to get3

file01.txt
file02.txt
file03.txt
file04.txt
file05.txt


Here’s a situation where jot is a little more sensible. Even though both commands treat the numbers they generate as floating point, jot recognizes when the numbers have no decimal part and allows you to use the %d format specifier. seq forces you to use a floating point specifier, like %f or %e, which then forces you to explicitly state that the number of digits after the decimal point is 0. If you don’t, you’ll get a mess. For example,

seq -f "file%02f.txt" 5


returns

file1.000000.txt
file2.000000.txt
file3.000000.txt
file4.000000.txt
file5.000000.txt


The -b option for jot will repeat the given string as many times as specified. The string isn’t a formatting string and won’t include the numbers generated by jot.

jot -b Hello 4


returns

Hello
Hello
Hello
Hello


which I guess could be useful. You might think you could use the -w option with a string that doesn’t include a formatting specifier. But if you try that, jot will add the number to the end anyway.

jot -w Hello 4


returns

Hello1
Hello2
Hello3
Hello4


which is kind of presumptuous of it.

seq is better behaved.

seq -f Hello 4


returns

Hello
Hello
Hello
Hello


just as you’d expect.

By the way, if you want just a list of numbers, but you want them zero-padded to be of the same character length, seq has you covered:

seq -w 8 12


returns

08
09
10
11
12


You could achieve the same thing with a formatting string, but the -w option is a little shorter.

We’ve been looking at output where each item in the sequence is on its own line—i.e., separated by newlines—but both commands have an -s option for specifying another separator. The option works a little differently for the two commands.

jot -s, 5 8


gives

8,9,10,11,12


whereas

seq -s, 8 12


gives

8,9,10,11,12,


You see that seq’s “separator” (that’s what it’s called in the man page) is really a suffix. To get the same output as jot, you can pipe the output through sed to delete the last character:

seq -s, 8 12 | sed 's/.$//'  Kind of annoying. One advantage seq has over jot with regard to separators is that it understands tabs. seq -s "\t" 8 12 | sed 's/.$//'


will give you the numbers separated by tab characters. But jot treats the backslashed t literally.

jot -s "\t" 5 8


will give you

8\t9\t10\t11\t12


One more thing related to -s. Because jot treats the separator as an actual separator, the newline that’s added by default to the end of the output isn’t handled by -s. If you don’t want jot to add it, including the -n option will omit it. This is just like the -n option to echo.

As mentioned above, both commands handle floating point numbers, but they have different defaults.

seq -w 7 .5 9


returns

7.0
7.5
8.0
8.5
9.0


as you would expect. But the similar

jot - 7 9 .5


returns

7
8
8
8
9


which seems bizarre. The trick is that jot has a default precision of zero decimal places, so it’s rounding to the nearest integer. The default precision can be changed with the -p option.

jot -p1 - 7 9 .5


returns

7.0
7.5
8.0
8.5
9.0


which is almost certainly what you wanted.

Finally, I want to talk about one clear advantage that jot has over seq: the ability to generate sequences of characters using the -c option.

jot -c 5 65


returns

A
B
C
D
E


What it’s doing is interpreting each number as an ASCII code and returning its corresponding character. Even better, you can use character arguments:

jot -c - a e


returns

a
b
c
d
e


You can get the same effect by using the %c formatting specifier in the -w option. I’ve used this feature of jot to generate apartment addresses when working with a building that uses letters for its units. For example, say I am dealing with a five-story building in which the apartments start on the second floor and are given the letters A through D. I want to quickly generate a list of all the apartment addresses. The command is

for n in seq 2 5; do jot -w "Apt. \$n%c" - A D; done


and the output is

Apt. 2A
Apt. 2B
Apt. 2C
Apt. 2D
Apt. 3A
Apt. 3B
Apt. 3C
Apt. 3D
Apt. 4A
Apt. 4B
Apt. 4C
Apt. 4D
Apt. 5A
Apt. 5B
Apt. 5C
Apt. 5D


As a practical matter, with only 16 apartments, I probably wouldn’t bother with seq or jot. It’s faster to just type out the apartment addresses when the number is small. But with larger buildings, a command like this saves a lot of time and insures that I don’t skip or duplicate addresses. As with any automation, accuracy is as important as the time saved.

1. jot also has an option for generating random numbers, but I’ve never had any use for that.

2. That the counts start at 1 instead of 0 is an indication that these commands were written for normal humans, not programmers.

3. The quotation marks aren’t necessary in either of these cases because the shell won’t process the formatting string before sending it to jot or seq. But I have a hard time remembering which characters are special to the shell, so I tend to use quotation marks any time I have a string that includes potentially special characters like %

How to mock your Apple Card

I feel the need to expand on this tweet from last night:

I use thousands of dollars of equipment from the company that wrote this.

The quote comes from Apple’s “How to clean your Apple Card” support document, which went up earlier this week.

The one-paragraph jump from “leather and denim may stain your card” to “keep your card in your wallet or your pocket” generated lots of complaints on Twitter, mostly of the form “That’s Apple, putting form over function.”12

My complaint is not that the Apple Card may lose its luster in a wallet. I’m not sure anything will maintain its looks when put between sheets of leather and compressed by my butt. My complaint is that Apple wrote a support document that looks absurd and invites snarky comments. Everything Apple does generates derision from Apple haters; this generated derision from Apple’s best customers.

The support document is, in fact, putting function over form. Apple wants to tell its customers that the card won’t look brand new forever and advise them on the best way to store it. That’s the function of the document. But through bad writing—how many people read this before it was published?—it looks like Apple made a fragile card and is advising you to store it in a way that will destroy it. Instead of invoking Louis Sullivan, we should be be turning to Casey Stengel: Can’t anybody here play this game?

1. If Louis Sullivan knew how often his words would be abused by people with no sense of form or function, he might have bit his tongue. As reader Scott Wright said, whatever staining might occur doesn’t affect the function of the card.

2. Apple critics would argue that the real function of the Apple Card is not to pay for things but to look cool. If that’s the case, though, form and function are the same, and Apple can’t put one over the other.