Formatting MultiMarkdown tables with NumPy and tabulate

January 21, 2016 at 6:55 PM by Dr. Drang

I’ve been spending a lot of time lately making lists of random samples for testing. In the past, I’ve used Python’s random module to generate and print the list the samples from a population list, and then I’ve reformatted the list into a MultiMarkdown table in BBEdit for presentation in a report. But now I do everything in Python by using NumPy to manipulate the list and the tabulate module to format it as a MultiMarkdown table.

Let’s say I have a bunch of components available for testing, all with serial numbers.¹ I don’t need to test all of them, only a sample, but I want to make sure that I don’t subconsciously cherry-pick the best ones—or the worst ones, for that matter. To ensure that my prejudices don’t play any role in the selection of components to test, I write a Python script to do the selecting for me. In general form, this is what it looks like:

python:
 1:  #!/usr/bin/env python
 2:  # coding=utf-8
 3:  
 4:  import numpy as np
 5:  from tabulate import tabulate
 6:  import random
 7:  
 8:  # Define the population. Serial numbers normally aren't
 9:  # this simplistic, but this is just an example. 
10:  population = range(5668, 7023)
11:  
12:  # Draw a sample of 28 from the population.
13:  sample = random.sample(population, 28)
14:  
15:  # Pad the list out with zeros to fill a 10x3 table.
16:  sample = np.append(sample, [0, 0])
17:  
18:  # Turn the list into a 10x3 table.
19:  table = np.reshape(sample, (10, 3), 'F')
20:  
21:  # Print the table.
22:  print '| 1–10 | 11–20 | 20–30 |'
23:  print tabulate(table, tablefmt='pipe')

It starts by creating a list of the serial numbers on Line 10 for all the available components. For this example, I’m using a nonsense range of numbers from 5668 through 7022. This is the population. Then I use the sample function from the random module on Line 13 to generate a new list of items chosen at random from the population.

That’s the easy and obvious part. The part that saves me a lot of editing time is what comes next. First, I use NumPy’s append function on Line 16 to add zeros to the end of the list. I want to end up with a 10×3 table of serial numbers, so I need two more items to fill out the list. Then the reshape function on Line 19 turns the flat list into a 10×3 matrix.

The F argument tells reshape to use Fortran ordering for the reshaped result, which means that the first index changes fastest. By default, reshape uses C ordering, which means the last index in the matrix changes fastest. Here’s an example with an ordered list of numbers:

python:
a = range(1000, 1012)
b = np.reshape(a, (4, 3))
c = np.reshape(a, (4, 3), 'F')

print a
print
print b
print
print c

The output is

[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011]

[[1000 1001 1002]
 [1003 1004 1005]
 [1006 1007 1008]
 [1009 1010 1011]]

[[1000 1004 1008]
 [1001 1005 1009]
 [1002 1006 1010]
 [1003 1007 1011]]

Note the difference in ordering in the two reshaped matrices. I want my sample ordering to be read down each column in turn.

Now, you might argue that since the samples are random, the order they appear in the matrix makes no difference. That’s true, but by keeping them in the order they came out of the shuffle function, I’m keeping my hands out of the process entirely.

Finally, Line 22 prints the header row of the MultiMarkdown table (those are n-dashes between the numbers, which is why you see the coding=utf-8 directive at the top of the file), and Line 23 uses tabulate to print the format line and the body of the table. Here’s the output:

| 1–10 | 11–20 | 20–30 |
|-----:|-----:|-----:|
| 6940 | 5839 | 6007 |
| 6615 | 6957 | 6314 |
| 6169 | 6877 | 6224 |
| 6142 | 6324 | 6210 |
| 6492 | 6685 | 6961 |
| 6908 | 5964 | 6475 |
| 6604 | 6387 | 6192 |
| 6189 | 6860 | 6090 |
| 6444 | 6162 |    0 |
| 5812 | 6950 |    0 |

And here’s what it looks like after processing,

1–10	11–20	20–30
6940	5839	6007
6615	6957	6314
6169	6877	6224
6142	6324	6210
6492	6685	6961
6908	5964	6475
6604	6387	6192
6189	6860	6090
6444	6162
5812	6950

where I’ve edited out those padding zeros to avoid any confusion.

There’s not much to this, I know, but by using this as my template and changing the individual parts to fit the particular problem at hand, I save myself a lot of time and can concentrate on the real work and not the fiddly formatting.

Or I could assign serial numbers if they don’t have them already. ↩

And now it’s all this

I just said what I said and it was wrong
Or was taken wrong

Formatting MultiMarkdown tables with NumPy and tabulate

Site search

Meta

Recent posts

Credits

And now it’s all this

I just said what I said and it was wrong Or was taken wrong

Formatting MultiMarkdown tables with NumPy and tabulate

Site search

Meta

Recent posts

Credits

I just said what I said and it was wrong
Or was taken wrong