Aligning text tables on the decimal point
November 22, 2014 at 8:54 PM by Dr. Drang
I’ve recently written several reports with tables of numbers, and as I made more and more tables of this type, it became clear that my Normalize Table filter for BBEdit needed some updating. Now it can align numbers on the decimal point.
Here’s an example. Suppose I start with a table in MultMarkdown format
| Very long description | Short | Intermediate |
|.--.|.--.|.--.|
| 17,750 | 9,240.152 | 7,253.0 |
| 3,517 | 13,600.0 | 6,675 |
| 18,580.586 | 8,353.3 | 13,107.3 |
| 7,725 | 355.3 | 14,823.2 |
| None | 10,721.6 | 7,713 |
| 12,779.1592 | 14,583.867 | 3,153.2 |
| 18,850.03 | -4,756.6686 | 13,081.74 |
| -2,264.80 | 13,772.729 | 12,557 |
| -17,141.001 | 26,227.27 | — |
| 2,772.35 | 14,772.1509 | -3,814.0 |
| 934 | 20,853 | -10,272.2 |
| 8,082.139 | 7,864.0048 | 1,583.010 |
This is legal, but it’s impossible to read as plain text. Since readability is the chief virtue of Markdown, I want to adjust the spacing so all the columns line up. The older version of my filter worked well for left-aligned, right-aligned, and centered columns, but it made no attempt to align numeric columns on the decimal point. Now it does. After running the above through the Normalize Table filter, it comes out looking like this:
| Very long description | Short | Intermediate |
|.---------------------.|.-----------.|.------------.|
| 17,750 | 9,240.152 | 7,253.0 |
| 3,517 | 13,600.0 | 6,675 |
| 18,580.586 | 8,353.3 | 13,107.3 |
| 7,725 | 355.3 | 14,823.2 |
| None | 10,721.6 | 7,713 |
| 12,779.1592 | 14,583.867 | 3,153.2 |
| 18,850.03 | -4,756.6686 | 13,081.74 |
| -2,264.80 | 13,772.729 | 12,557 |
| -17,141.001 | 26,227.27 | — |
| 2,772.35 | 14,772.1509 | -3,814.0 |
| 934 | 20,853 | -10,272.2 |
| 8,082.139 | 7,864.0048 | 1,583.010 |
The key features of the new filter are:
- Decimal points are aligned (duh).
- Numbers without decimal points are right-aligned just before the decimal point.
- Decimal-aligned lists of figures are centered in their columns.
- Entries that aren’t numbers are also centered in their columns.
The decimal points at either end of the format line (.----.
) tell the filter to use decimal alignment on that column. I should point out that the extra spaces the filter adds have no effect on the processed output.
HTML doesn’t have a good and widely supported way to align table columns on decimal points, but that’s OK, because my reports get transformed into LaTeX, not HTML, and LaTeX has a nice dcolumn package for decimal alignment.
Here’s the new version of the filter:
1 #!/usr/bin/python
2
3 import sys
4 import re
5
6 decimal = re.compile(r'^( -?[0-9,]*)\.(\d* )$')
7 integer = re.compile(r'^( -?[0-9,]+) $')
8
9 def just(string, type, n, b, d):
10 "Justify a string to length n according to type."
11
12 if type == '::':
13 return string.center(n)
14 elif type == '-:':
15 return string.rjust(n)
16 elif type == ':-':
17 return string.ljust(n)
18 elif type == '..':
19 isdec = decimal.search(string)
20 isint = integer.search(string)
21 if isdec:
22 before = len(isdec.group(1))
23 after = len(isdec.group(2))
24 string = ' ' * (b - before) + string + ' ' * (d - after)
25 elif isint:
26 before = len(isint.group(1))
27 string = ' ' * (b - before) + string + ' ' * d
28 return string.center(n)
29 else:
30 return string
31
32
33 def normtable(text):
34 "Aligns the vertical bars in a text table."
35
36 # Start by turning the text into a list of lines.
37 lines = text.splitlines()
38 rows = len(lines)
39
40 # Figure out the cell formatting.
41 # First, find the separator line.
42 for i in range(rows):
43 if set(lines[i]).issubset('|:.- '):
44 formatline = lines[i]
45 formatrow = i
46 break
47
48 # Delete the separator line from the content.
49 del lines[formatrow]
50
51 # Determine how each column is to be justified.
52 formatline = formatline.strip(' ')
53 if formatline[0] == '|': formatline = formatline[1:]
54 if formatline[-1] == '|': formatline = formatline[:-1]
55 fstrings = formatline.split('|')
56 justify = []
57 for cell in fstrings:
58 ends = cell[0] + cell[-1]
59 if ends in ['::', ':-', '-:', '..']:
60 justify.append(ends)
61 else:
62 justify.append(':-')
63
64 # Assume the number of columns in the separator line is the number
65 # for the entire table.
66 columns = len(justify)
67
68 # Extract the content into a matrix.
69 content = []
70 for line in lines:
71 line = line.strip(' ')
72 if line[0] == '|': line = line[1:]
73 if line[-1] == '|': line = line[:-1]
74 cells = line.split('|')
75 # Put exactly one space at each end as "bumpers."
76 linecontent = [ ' ' + x.strip() + ' ' for x in cells ]
77 content.append(linecontent)
78
79 # Append cells to rows that don't have enough.
80 rows = len(content)
81 for i in range(rows):
82 while len(content[i]) < columns:
83 content[i].append('')
84
85 # Get the width of the content in each column. The minimum width will
86 # be 2, because that's the shortest length of a formatting string and
87 # because that matches an empty column with "bumper" spaces.
88 widths = [2] * columns
89 beforedots = [0] * columns
90 afterdots = [0] * columns
91 for row in content:
92 for i in range(columns):
93 isdec = decimal.search(row[i])
94 isint = integer.search(row[i])
95 if isdec:
96 beforedots[i] = max(len(isdec.group(1)), beforedots[i])
97 afterdots[i] = max(len(isdec.group(2)), afterdots[i])
98 elif isint:
99 beforedots[i] = max(len(isint.group(1)), beforedots[i])
100 widths[i] = max(len(row[i]), beforedots[i] + afterdots[i] + 1, widths[i])
101
102 # Add whitespace to make all the columns the same width.
103 formatted = []
104 for row in content:
105 formatted.append('|' + '|'.join([ just(s, t, n, b, d) for (s, t, n, b, d) in zip(row, justify, widths, beforedots, afterdots) ]) + '|')
106
107 # Recreate the format line with the appropriate column widths.
108 formatline = '|' + '|'.join([ s[0] + '-'*(n-2) + s[-1] for (s, n) in zip(justify, widths) ]) + '|'
109
110 # Insert the formatline back into the table.
111 formatted.insert(formatrow, formatline)
112
113 # Return the formatted table.
114 return '\n'.join(formatted)
115
116
117 # Read the input, process, and print.
118 unformatted = unicode(sys.stdin.read(), "utf-8")
119 print normtable(unformatted)
The main additions come in Lines 88–100 and Lines 18–28. Instead of just tracking the length of each cell, the filter now also tracks the number of characters before and after the decimal point.
Although I use this in BBEdit, there’s no reason it couldn’t be adapted to other intelligent text editors. It uses standard input and standard output, so it ought to be easy to incorporate into a Sublime Text, TextMate, Vim, or Emacs workflow.
-
If you use commas as your decimal indicator, you’ll have to edit the filter a bit to get it to work for you. Be aware, though, that you’ll still need to use periods in the formatting line—MultiMarkdown doesn’t allow commas there. ↩