Formatting MultiMarkdown tables with NumPy and tabulate
January 21, 2016 at 6:55 PM by Dr. Drang
I’ve been spending a lot of time lately making lists of random samples for testing. In the past, I’ve used Python’s random
module to generate and print the list the samples from a population list, and then I’ve reformatted the list into a MultiMarkdown table in BBEdit for presentation in a report. But now I do everything in Python by using NumPy to manipulate the list and the tabulate
module to format it as a MultiMarkdown table.
Let’s say I have a bunch of components available for testing, all with serial numbers.1 I don’t need to test all of them, only a sample, but I want to make sure that I don’t subconsciously cherry-pick the best ones—or the worst ones, for that matter. To ensure that my prejudices don’t play any role in the selection of components to test, I write a Python script to do the selecting for me. In general form, this is what it looks like:
python:
1: #!/usr/bin/env python
2: # coding=utf-8
3:
4: import numpy as np
5: from tabulate import tabulate
6: import random
7:
8: # Define the population. Serial numbers normally aren't
9: # this simplistic, but this is just an example.
10: population = range(5668, 7023)
11:
12: # Draw a sample of 28 from the population.
13: sample = random.sample(population, 28)
14:
15: # Pad the list out with zeros to fill a 10x3 table.
16: sample = np.append(sample, [0, 0])
17:
18: # Turn the list into a 10x3 table.
19: table = np.reshape(sample, (10, 3), 'F')
20:
21: # Print the table.
22: print '| 1–10 | 11–20 | 20–30 |'
23: print tabulate(table, tablefmt='pipe')
It starts by creating a list of the serial numbers on Line 10 for all the available components. For this example, I’m using a nonsense range of numbers from 5668 through 7022. This is the population. Then I use the sample
function from the random
module on Line 13 to generate a new list of items chosen at random from the population.
That’s the easy and obvious part. The part that saves me a lot of editing time is what comes next. First, I use NumPy’s append
function on Line 16 to add zeros to the end of the list. I want to end up with a 10×3 table of serial numbers, so I need two more items to fill out the list. Then the reshape
function on Line 19 turns the flat list into a 10×3 matrix.
Finally, Line 22 prints the header row of the MultiMarkdown table (those are n-dashes between the numbers, which is why you see the coding=utf-8
directive at the top of the file), and Line 23 uses tabulate
to print the format line and the body of the table. Here’s the output:
| 1–10 | 11–20 | 20–30 |
|-----:|-----:|-----:|
| 6940 | 5839 | 6007 |
| 6615 | 6957 | 6314 |
| 6169 | 6877 | 6224 |
| 6142 | 6324 | 6210 |
| 6492 | 6685 | 6961 |
| 6908 | 5964 | 6475 |
| 6604 | 6387 | 6192 |
| 6189 | 6860 | 6090 |
| 6444 | 6162 | 0 |
| 5812 | 6950 | 0 |
And here’s what it looks like after processing,
1–10 | 11–20 | 20–30 |
---|---|---|
6940 | 5839 | 6007 |
6615 | 6957 | 6314 |
6169 | 6877 | 6224 |
6142 | 6324 | 6210 |
6492 | 6685 | 6961 |
6908 | 5964 | 6475 |
6604 | 6387 | 6192 |
6189 | 6860 | 6090 |
6444 | 6162 | |
5812 | 6950 |
where I’ve edited out those padding zeros to avoid any confusion.
There’s not much to this, I know, but by using this as my template and changing the individual parts to fit the particular problem at hand, I save myself a lot of time and can concentrate on the real work and not the fiddly formatting.
-
Or I could assign serial numbers if they don’t have them already. ↩