Aligning text tables on the decimal point

I’ve recently written several reports with tables of numbers, and as I made more and more tables of this type, it became clear that my Normalize Table filter for BBEdit needed some updating. Now it can align numbers on the decimal point.

Here’s an example. Suppose I start with a table in MultMarkdown format

| Very long description | Short | Intermediate |
|.--.|.--.|.--.|
| 17,750 | 9,240.152 | 7,253.0 |
| 3,517 | 13,600.0 | 6,675 |
| 18,580.586 | 8,353.3 | 13,107.3 |
| 7,725 | 355.3 | 14,823.2 |
| None | 10,721.6 | 7,713 |
| 12,779.1592 | 14,583.867 | 3,153.2 |
| 18,850.03 | -4,756.6686 | 13,081.74 |
| -2,264.80 | 13,772.729 | 12,557 |
| -17,141.001 | 26,227.27 | — |
| 2,772.35 | 14,772.1509 | -3,814.0 |
| 934 | 20,853 | -10,272.2 |
| 8,082.139 | 7,864.0048 | 1,583.010 |

This is legal, but it’s impossible to read as plain text. Since readability is the chief virtue of Markdown, I want to adjust the spacing so all the columns line up. The older version of my filter worked well for left-aligned, right-aligned, and centered columns, but it made no attempt to align numeric columns on the decimal point. Now it does. After running the above through the Normalize Table filter, it comes out looking like this:

| Very long description |    Short    | Intermediate |
|.---------------------.|.-----------.|.------------.|
|       17,750          |  9,240.152  |   7,253.0    |
|        3,517          | 13,600.0    |   6,675      |
|       18,580.586      |  8,353.3    |  13,107.3    |
|        7,725          |    355.3    |  14,823.2    |
|          None         | 10,721.6    |   7,713      |
|       12,779.1592     | 14,583.867  |   3,153.2    |
|       18,850.03       | -4,756.6686 |  13,081.74   |
|       -2,264.80       | 13,772.729  |  12,557      |
|      -17,141.001      | 26,227.27   |      —       |
|        2,772.35       | 14,772.1509 |  -3,814.0    |
|          934          | 20,853      | -10,272.2    |
|        8,082.139      |  7,864.0048 |   1,583.010  |

The key features of the new filter are:

  1. Decimal points are aligned (duh).
  2. Numbers without decimal points are right-aligned just before the decimal point.
  3. Decimal-aligned lists of figures are centered in their columns.
  4. Entries that aren’t numbers are also centered in their columns.

The decimal points at either end of the format line (.----.) tell the filter to use decimal alignment on that column. I should point out that the extra spaces the filter adds have no effect on the processed output.

HTML doesn’t have a good and widely supported way to align table columns on decimal points, but that’s OK, because my reports get transformed into LaTeX, not HTML, and LaTeX has a nice dcolumn package for decimal alignment.

Here’s the new version of the filter:

  1  #!/usr/bin/python
  2  
  3  import sys
  4  import re
  5  
  6  decimal = re.compile(r'^( -?[0-9,]*)\.(\d* )$')
  7  integer = re.compile(r'^( -?[0-9,]+) $')
  8  
  9  def just(string, type, n, b, d):
 10    "Justify a string to length n according to type."
 11    
 12    if type == '::':
 13      return string.center(n)
 14    elif type == '-:':
 15      return string.rjust(n)
 16    elif type == ':-':
 17      return string.ljust(n)
 18    elif type == '..':
 19      isdec = decimal.search(string)
 20      isint = integer.search(string)
 21      if isdec:
 22        before = len(isdec.group(1))
 23        after = len(isdec.group(2))
 24        string = ' ' * (b - before) + string + ' ' * (d - after)
 25      elif isint:
 26        before = len(isint.group(1))
 27        string = ' ' * (b - before) + string + ' ' * d
 28      return string.center(n)
 29    else:
 30      return string
 31  
 32  
 33  def normtable(text):
 34    "Aligns the vertical bars in a text table."
 35    
 36    # Start by turning the text into a list of lines.
 37    lines = text.splitlines()
 38    rows = len(lines)
 39    
 40    # Figure out the cell formatting.
 41    # First, find the separator line.
 42    for i in range(rows):
 43      if set(lines[i]).issubset('|:.- '):
 44        formatline = lines[i]
 45        formatrow = i
 46        break
 47    
 48    # Delete the separator line from the content.
 49    del lines[formatrow]
 50    
 51    # Determine how each column is to be justified.
 52    formatline = formatline.strip(' ')
 53    if formatline[0] == '|': formatline = formatline[1:]
 54    if formatline[-1] == '|': formatline = formatline[:-1]
 55    fstrings = formatline.split('|')
 56    justify = []
 57    for cell in fstrings:
 58      ends = cell[0] + cell[-1]
 59      if ends in ['::', ':-', '-:', '..']:
 60        justify.append(ends)
 61      else:
 62        justify.append(':-')
 63    
 64    # Assume the number of columns in the separator line is the number
 65    # for the entire table.
 66    columns = len(justify)
 67    
 68    # Extract the content into a matrix.
 69    content = []
 70    for line in lines:
 71      line = line.strip(' ')
 72      if line[0] == '|': line = line[1:]
 73      if line[-1] == '|': line = line[:-1]
 74      cells = line.split('|')
 75      # Put exactly one space at each end as "bumpers."
 76      linecontent = [ ' ' + x.strip() + ' ' for x in cells ]
 77      content.append(linecontent)
 78    
 79    # Append cells to rows that don't have enough.
 80    rows = len(content)
 81    for i in range(rows):
 82      while len(content[i]) < columns:
 83        content[i].append('')
 84    
 85    # Get the width of the content in each column. The minimum width will
 86    # be 2, because that's the shortest length of a formatting string and
 87    # because that matches an empty column with "bumper" spaces.
 88    widths = [2] * columns
 89    beforedots = [0] * columns
 90    afterdots = [0] * columns
 91    for row in content:
 92      for i in range(columns):
 93        isdec = decimal.search(row[i])
 94        isint = integer.search(row[i])
 95        if isdec:
 96          beforedots[i] = max(len(isdec.group(1)), beforedots[i])
 97          afterdots[i] = max(len(isdec.group(2)), afterdots[i])
 98        elif isint:
 99          beforedots[i] = max(len(isint.group(1)), beforedots[i])
100        widths[i] = max(len(row[i]), beforedots[i] + afterdots[i] + 1, widths[i])
101        
102    # Add whitespace to make all the columns the same width. 
103    formatted = []
104    for row in content:
105      formatted.append('|' + '|'.join([ just(s, t, n, b, d) for (s, t, n, b, d) in zip(row, justify, widths, beforedots, afterdots) ]) + '|')
106    
107    # Recreate the format line with the appropriate column widths.
108    formatline = '|' + '|'.join([ s[0] + '-'*(n-2) + s[-1] for (s, n) in zip(justify, widths) ]) + '|'
109    
110    # Insert the formatline back into the table.
111    formatted.insert(formatrow, formatline)
112    
113    # Return the formatted table.
114    return '\n'.join(formatted)
115  
116          
117  # Read the input, process, and print.
118  unformatted = unicode(sys.stdin.read(), "utf-8")
119  print normtable(unformatted)

The main additions come in Lines 88–100 and Lines 18–28. Instead of just tracking the length of each cell, the filter now also tracks the number of characters before and after the decimal point.

1. The overall width of the column is then either the longest text entry (including the header entry) or the sum of the longest lengths before and after the decimal point. If necessary, extra spaces are added before and after the number to create a new string that’s then centered in column.

Although I use this in BBEdit, there’s no reason it couldn’t be adapted to other intelligent text editors. It uses standard input and standard output, so it ought to be easy to incorporate into a Sublime Text, TextMate, Vim, or Emacs workflow.


  1. If you use commas as your decimal indicator, you’ll have to edit the filter a bit to get it to work for you. Be aware, though, that you’ll still need to use periods in the formatting line—MultiMarkdown doesn’t allow commas there.