Aligning text tables on the decimal point

I’ve recently written several reports with tables of numbers, and as I made more and more tables of this type, it became clear that my Normalize Table filter for BBEdit needed some updating. Now it can align numbers on the decimal point.

Here’s an example. Suppose I start with a table in MultMarkdown format

| Very long description | Short | Intermediate |
|.--.|.--.|.--.|
| 17,750 | 9,240.152 | 7,253.0 |
| 3,517 | 13,600.0 | 6,675 |
| 18,580.586 | 8,353.3 | 13,107.3 |
| 7,725 | 355.3 | 14,823.2 |
| None | 10,721.6 | 7,713 |
| 12,779.1592 | 14,583.867 | 3,153.2 |
| 18,850.03 | -4,756.6686 | 13,081.74 |
| -2,264.80 | 13,772.729 | 12,557 |
| -17,141.001 | 26,227.27 | — |
| 2,772.35 | 14,772.1509 | -3,814.0 |
| 934 | 20,853 | -10,272.2 |
| 8,082.139 | 7,864.0048 | 1,583.010 |

This is legal, but it’s impossible to read as plain text. Since readability is the chief virtue of Markdown, I want to adjust the spacing so all the columns line up. The older version of my filter worked well for left-aligned, right-aligned, and centered columns, but it made no attempt to align numeric columns on the decimal point. Now it does. After running the above through the Normalize Table filter, it comes out looking like this:

| Very long description |    Short    | Intermediate |
|.---------------------.|.-----------.|.------------.|
|       17,750          |  9,240.152  |   7,253.0    |
|        3,517          | 13,600.0    |   6,675      |
|       18,580.586      |  8,353.3    |  13,107.3    |
|        7,725          |    355.3    |  14,823.2    |
|          None         | 10,721.6    |   7,713      |
|       12,779.1592     | 14,583.867  |   3,153.2    |
|       18,850.03       | -4,756.6686 |  13,081.74   |
|       -2,264.80       | 13,772.729  |  12,557      |
|      -17,141.001      | 26,227.27   |      —       |
|        2,772.35       | 14,772.1509 |  -3,814.0    |
|          934          | 20,853      | -10,272.2    |
|        8,082.139      |  7,864.0048 |   1,583.010  |

The key features of the new filter are:

  1. Decimal points are aligned (duh).
  2. Numbers without decimal points are right-aligned just before the decimal point.
  3. Decimal-aligned lists of figures are centered in their columns.
  4. Entries that aren’t numbers are also centered in their columns.

The decimal points at either end of the format line (.----.) tell the filter to use decimal alignment on that column. I should point out that the extra spaces the filter adds have no effect on the processed output.

HTML doesn’t have a good and widely supported way to align table columns on decimal points, but that’s OK, because my reports get transformed into LaTeX, not HTML, and LaTeX has a nice dcolumn package for decimal alignment.

Here’s the new version of the filter:

python:
  1:  #!/usr/bin/python
  2:  
  3:  import sys
  4:  import re
  5:  
  6:  decimal = re.compile(r'^( -?[0-9,]*)\.(\d* )$')
  7:  integer = re.compile(r'^( -?[0-9,]+) $')
  8:  
  9:  def just(string, type, n, b, d):
 10:    "Justify a string to length n according to type."
 11:    
 12:    if type == '::':
 13:      return string.center(n)
 14:    elif type == '-:':
 15:      return string.rjust(n)
 16:    elif type == ':-':
 17:      return string.ljust(n)
 18:    elif type == '..':
 19:      isdec = decimal.search(string)
 20:      isint = integer.search(string)
 21:      if isdec:
 22:        before = len(isdec.group(1))
 23:        after = len(isdec.group(2))
 24:        string = ' ' * (b - before) + string + ' ' * (d - after)
 25:      elif isint:
 26:        before = len(isint.group(1))
 27:        string = ' ' * (b - before) + string + ' ' * d
 28:      return string.center(n)
 29:    else:
 30:      return string
 31:  
 32:  
 33:  def normtable(text):
 34:    "Aligns the vertical bars in a text table."
 35:    
 36:    # Start by turning the text into a list of lines.
 37:    lines = text.splitlines()
 38:    rows = len(lines)
 39:    
 40:    # Figure out the cell formatting.
 41:    # First, find the separator line.
 42:    for i in range(rows):
 43:      if set(lines[i]).issubset('|:.- '):
 44:        formatline = lines[i]
 45:        formatrow = i
 46:        break
 47:    
 48:    # Delete the separator line from the content.
 49:    del lines[formatrow]
 50:    
 51:    # Determine how each column is to be justified.
 52:    formatline = formatline.strip(' ')
 53:    if formatline[0] == '|': formatline = formatline[1:]
 54:    if formatline[-1] == '|': formatline = formatline[:-1]
 55:    fstrings = formatline.split('|')
 56:    justify = []
 57:    for cell in fstrings:
 58:      ends = cell[0] + cell[-1]
 59:      if ends in ['::', ':-', '-:', '..']:
 60:        justify.append(ends)
 61:      else:
 62:        justify.append(':-')
 63:    
 64:    # Assume the number of columns in the separator line is the number
 65:    # for the entire table.
 66:    columns = len(justify)
 67:    
 68:    # Extract the content into a matrix.
 69:    content = []
 70:    for line in lines:
 71:      line = line.strip(' ')
 72:      if line[0] == '|': line = line[1:]
 73:      if line[-1] == '|': line = line[:-1]
 74:      cells = line.split('|')
 75:      # Put exactly one space at each end as "bumpers."
 76:      linecontent = [ ' ' + x.strip() + ' ' for x in cells ]
 77:      content.append(linecontent)
 78:    
 79:    # Append cells to rows that don't have enough.
 80:    rows = len(content)
 81:    for i in range(rows):
 82:      while len(content[i]) < columns:
 83:        content[i].append('')
 84:    
 85:    # Get the width of the content in each column. The minimum width will
 86:    # be 2, because that's the shortest length of a formatting string and
 87:    # because that matches an empty column with "bumper" spaces.
 88:    widths = [2] * columns
 89:    beforedots = [0] * columns
 90:    afterdots = [0] * columns
 91:    for row in content:
 92:      for i in range(columns):
 93:        isdec = decimal.search(row[i])
 94:        isint = integer.search(row[i])
 95:        if isdec:
 96:          beforedots[i] = max(len(isdec.group(1)), beforedots[i])
 97:          afterdots[i] = max(len(isdec.group(2)), afterdots[i])
 98:        elif isint:
 99:          beforedots[i] = max(len(isint.group(1)), beforedots[i])
100:        widths[i] = max(len(row[i]), beforedots[i] + afterdots[i] + 1, widths[i])
101:        
102:    # Add whitespace to make all the columns the same width. 
103:    formatted = []
104:    for row in content:
105:      formatted.append('|' + '|'.join([ just(s, t, n, b, d) for (s, t, n, b, d) in zip(row, justify, widths, beforedots, afterdots) ]) + '|')
106:    
107:    # Recreate the format line with the appropriate column widths.
108:    formatline = '|' + '|'.join([ s[0] + '-'*(n-2) + s[-1] for (s, n) in zip(justify, widths) ]) + '|'
109:    
110:    # Insert the formatline back into the table.
111:    formatted.insert(formatrow, formatline)
112:    
113:    # Return the formatted table.
114:    return '\n'.join(formatted)
115:  
116:          
117:  # Read the input, process, and print.
118:  unformatted = unicode(sys.stdin.read(), "utf-8")
119:  print normtable(unformatted)

The main additions come in Lines 88–100 and Lines 18–28. Instead of just tracking the length of each cell, the filter now also tracks the number of characters before and after the decimal point.1. The overall width of the column is then either the longest text entry (including the header entry) or the sum of the longest lengths before and after the decimal point. If necessary, extra spaces are added before and after the number to create a new string that’s then centered in column.

Although I use this in BBEdit, there’s no reason it couldn’t be adapted to other intelligent text editors. It uses standard input and standard output, so it ought to be easy to incorporate into a Sublime Text, TextMate, Vim, or Emacs workflow.


  1. If you use commas as your decimal indicator, you’ll have to edit the filter a bit to get it to work for you. Be aware, though, that you’ll still need to use periods in the formatting line—MultiMarkdown doesn’t allow commas there.