Formatting MultiMarkdown tables with NumPy and tabulate

I’ve been spending a lot of time lately making lists of random samples for testing. In the past, I’ve used Python’s random module to generate and print the list the samples from a population list, and then I’ve reformatted the list into a MultiMarkdown table in BBEdit for presentation in a report. But now I do everything in Python by using NumPy to manipulate the list and the tabulate module to format it as a MultiMarkdown table.

Let’s say I have a bunch of components available for testing, all with serial numbers.1 I don’t need to test all of them, only a sample, but I want to make sure that I don’t subconsciously cherry-pick the best ones—or the worst ones, for that matter. To ensure that my prejudices don’t play any role in the selection of components to test, I write a Python script to do the selecting for me. In general form, this is what it looks like:

python:
 1:  #!/usr/bin/env python
 2:  # coding=utf-8
 3:  
 4:  import numpy as np
 5:  from tabulate import tabulate
 6:  import random
 7:  
 8:  # Define the population. Serial numbers normally aren't
 9:  # this simplistic, but this is just an example. 
10:  population = range(5668, 7023)
11:  
12:  # Draw a sample of 28 from the population.
13:  sample = random.sample(population, 28)
14:  
15:  # Pad the list out with zeros to fill a 10x3 table.
16:  sample = np.append(sample, [0, 0])
17:  
18:  # Turn the list into a 10x3 table.
19:  table = np.reshape(sample, (10, 3), 'F')
20:  
21:  # Print the table.
22:  print '| 1–10 | 11–20 | 20–30 |'
23:  print tabulate(table, tablefmt='pipe')

It starts by creating a list of the serial numbers on Line 10 for all the available components. For this example, I’m using a nonsense range of numbers from 5668 through 7022. This is the population. Then I use the sample function from the random module on Line 13 to generate a new list of items chosen at random from the population.

That’s the easy and obvious part. The part that saves me a lot of editing time is what comes next. First, I use NumPy’s append function on Line 16 to add zeros to the end of the list. I want to end up with a 10×3 table of serial numbers, so I need two more items to fill out the list. Then the reshape function on Line 19 turns the flat list into a 10×3 matrix.

Finally, Line 22 prints the header row of the MultiMarkdown table (those are n-dashes between the numbers, which is why you see the coding=utf-8 directive at the top of the file), and Line 23 uses tabulate to print the format line and the body of the table. Here’s the output:

| 1–10 | 11–20 | 20–30 |
|-----:|-----:|-----:|
| 6940 | 5839 | 6007 |
| 6615 | 6957 | 6314 |
| 6169 | 6877 | 6224 |
| 6142 | 6324 | 6210 |
| 6492 | 6685 | 6961 |
| 6908 | 5964 | 6475 |
| 6604 | 6387 | 6192 |
| 6189 | 6860 | 6090 |
| 6444 | 6162 |    0 |
| 5812 | 6950 |    0 |

And here’s what it looks like after processing,

1–10 11–20 20–30
6940 5839 6007
6615 6957 6314
6169 6877 6224
6142 6324 6210
6492 6685 6961
6908 5964 6475
6604 6387 6192
6189 6860 6090
6444 6162
5812 6950

where I’ve edited out those padding zeros to avoid any confusion.

There’s not much to this, I know, but by using this as my template and changing the individual parts to fit the particular problem at hand, I save myself a lot of time and can concentrate on the real work and not the fiddly formatting.


  1. Or I could assign serial numbers if they don’t have them already. ↩︎