Reshaping output
October 28, 2014 at 12:18 AM by Dr. Drang
I often need to take a long list of character strings—one per line—and reformat them into a table so they’ll be easier to read. In the past, I’ve done this by pasting the data into a spreadsheet and moving the cells around to build the table by hand. Today, I decided there had to be a Unix command line tool that would do this for me, and I set out looking for it. I found two.
The data set I was dealing with was a list of devices, each identified by serial number and having a set of characteristics. I wrote a short Python script that filtered the data, printing out only the serial numbers of the devices that fit particular criteria. The easiest way to write the script was to have the serial numbers come out one per line, but that’s not how I wanted to present them in an email to my client. I wanted them laid out in a nice table.
I found a Python module, pycolumnize
that will do this, but I wanted to learn a more general tool, one that I could use on any set of textual data, not just a list within a Python program.
The first command I ran across was column
, which is pretty easy to use and doesn’t have many options. Here’s an example:
$ jot 36 | column -c 32
2 11 20 29
3 12 21 30
4 13 22 31
5 14 23 32
6 15 24 33
7 16 25 34
8 17 26 35
9 18 27 36
The jot
command generates the numbers 1–36, one per line. Then column
rearranges them into columns. The number of columns is determined by the value of the -c
option; column
packs as many columns as it can into an output block that many characters wide.1 While I can understand why someone might like this way of specifying the output, I’d rather specify the number of rows or columns directly.
Also, column
puts a tab character between each column.2 Again, I can see why some people might like that, but I prefer spaces between the columns.
My dissatisfaction with column
led me to rs
, the reshape command. This is a much more flexible command than column
. In its most basic form, you give it the number of rows and columns (in that order) you want in the output.
$ jot 36 | rs 4 9
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
Having to specify both rows and columns shouldn’t be necessary, and it isn’t. You can use 0 as a dummy value for either the row or column count, and rs
will figure out what that value ought to be.
$ jot 36 | rs 0 4
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21 22 23 24
25 26 27 28
29 30 31 32
33 34 35 36
As you can see, by default rs
enters the values row by row instead of column by column. That can be changed with the -t
(transpose) option:
$ jot 36 | rs -t 0 4
1 10 19 28
2 11 20 29
3 12 21 30
4 13 22 31
5 14 23 32
6 15 24 33
7 16 25 34
8 17 26 35
9 18 27 36
Each column of rs
output is as wide as the widest entry. Other values are padded with spaces. By default, the entries are left justified within that width, but you can make them right justified by using the -j
option:
$ jot 36 | rs -tj 0 6
1 7 13 19 25 31
2 8 14 20 26 32
3 9 15 21 27 33
4 10 16 22 28 34
5 11 17 23 29 35
6 12 18 24 30 36
The “gutter” is the space between the columns, and rs
uses a two-space gutter by default. That can be changed with the -g
option:
$ jot 36 | rs -tj -g4 0 4
1 10 19 28
2 11 20 29
3 12 21 30
4 13 22 31
5 14 23 32
6 15 24 33
7 16 25 34
8 17 26 35
9 18 27 36
Be careful with -g
. The width number has to come right after it. If you put a space between the g
and the width number, rs
will misinterpret it.
The rs
command has, perhaps unfortunately, many more options, but -t
, -j
, and -g
are probably the most useful.
By the way, I don’t want to leave the impression that you need to have a full table. If you ask for 36 entries to be arranged in 5 columns, you’ll get this:
$ jot 36 | rs -tj -g4 0 5
1 9 17 25 33
2 10 18 26 34
3 11 19 27 35
4 12 20 28 36
5 13 21 29
6 14 22 30
7 15 23 31
8 16 24 32
Or this if you don’t transpose:
$ jot 36 | rs -j -g4 0 5
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36
For the data I was dealing with today, the serial numbers were all four digits long and there were, after filtering, about 200 of them, so you can see why I wanted to reshape them into a table. I chose six columns and a four-space gutter to get a table that was relatively easy to read. I piped the output of my Python script through rs -t -g4 0 6
and got output that looked something like this:
1013 2337 3908 5808 7374 8919
1021 2358 3962 5819 7384 8932
1095 2419 3980 5843 7467 8936
1125 2481 4007 5843 7494 8960
1173 2485 4059 6028 7497 8998
1194 2501 4076 6110 7510 9118
1250 2595 4119 6128 7600 9168
1255 2603 4260 6206 7623 9196
1314 2724 4315 6233 7688 9208
1326 2867 4363 6260 7766 9307
1330 2904 4372 6310 7850 9308
1346 2905 4376 6352 7914 9394
1355 3021 4379 6407 7935 9411
1404 3032 4502 6450 7990 9435
1408 3196 4503 6457 7991 9442
1449 3234 4572 6501 8022 9473
1462 3246 4699 6652 8078 9536
1477 3246 4767 6660 8127 9544
1477 3293 4819 6665 8158 9563
1529 3293 4988 6724 8222 9603
1537 3355 5035 6747 8299 9603
1551 3356 5042 6765 8333 9630
1552 3372 5101 6783 8402 9742
1809 3421 5111 6845 8436 9749
1930 3458 5261 6861 8474 9783
1996 3499 5315 6862 8507 9892
2045 3569 5522 6875 8519 9906
2047 3575 5525 6897 8524 9944
2070 3597 5526 7036 8586 9948
2078 3686 5569 7050 8664 9959
2124 3691 5583 7077 8720
2146 3722 5592 7109 8751
2158 3739 5617 7310 8891
2336 3810 5755 7333 8905
Yes, it’s long, but it’s probably as easy to read as 200 serial numbers can be.
-
Sort of. In practice, I’ve found that you usually need to give
-c
a greater number than is absolutely necessary. For example, in the table above, each line is only 26 characters long, butcolumn -c 28
would result in only three columns of data. I don’t know why. ↩ -
HTML doesn’t like tab characters, so to match the look of the output in Terminal, I’ve shown the output of
column
with spaces between the columns. ↩