Factories

Since writing this post about using Pandas and Matplotlib to plot the progression of the National League’s wildcard race, I’ve been periodically updating the input text files and remaking the graph. Here’s what it looked like as of this morning:

Wildcard plot

As I mentioned in that earlier post, I get the data for each game by copying and pasting from the mobile version of the Baseball Reference site. The game results come in looking like this:

CHC 7 v. ATL 1, Aug 20th
CHC 5 v. ATL 3, Aug 21st
CHC 9 v. ATL 7, Aug 22nd
CHC 9 v. ATL 3, Aug 23rd
CHC 2 v. CLE 1, Aug 24th
CHC 8 @ SFG 5, Aug 25th
CHC 2 @ SFG 4, Aug 26th

and I need to transform them into this

7     1     ATL   H     Aug-20
5     3     ATL   H     Aug-21
9     7     ATL   H     Aug-22
9     3     ATL   H     Aug-23
2     1     CLE   H     Aug-24
8     5     SFG   A     Aug-25
2     4     SFG   A     Aug-26

with tabs between each column. When I first did this transformation for that post, I just ran through a few manipulations using BBEdit’s find and replace tools, kind of making it up as I went along. But when I found myself in the habit of updating the graph a couple of times a week, I realized I needed something more automated. The quickest way to get what I wanted was to build a Text Factory.

Text Factories are a BBEdit feature that I don’t use as often as I should. They consist of a simple list of transformations—replacements, deletions, case changes, entabbing/detabbing, etc.—that are applied one after another to either the selected text (if there is any) or the the document as a whole. Their genius lies in their simplicity: there’s nothing a Text Factory can do that a Perl, Python, or Ruby script couldn’t do, but for simple series of transformations, Text Factories take less time to build and debug.

Here’s my Text Factory for transforming the Baseball Reference game data:

BBRef Text Factory

The first step is the trickiest. It’s a Grep (regular expression) that searches for

^[A-Z]{3}\s+(\d+)\s+([^ ]+)\s+([A-Z]{3})\s+(\d+),\s+([A-Za-z]+)\s(\d+)(st|nd|rd|th)$

and replaces it with

\1\t\4\t\3\t\2\t\5-\6

By itself, it turns lines like

CHC 2 v. CLE 1, Aug 24th
CHC 8 @ SFG 5, Aug 25th

into

2     1     CLE   v.    Aug-24
8     5     SFG   @     Aug-25

The regex search pattern is probably longer than it needs to be because I’ve added some defensive features, like using \s+ as field separators even though every example I’ve seen separates the parts with just a single space character. I’ve had enough experience with regexes breaking because of an unexpected extra space to know that sacrificing brevity for robustness is a worthwhile tradeoff.

With the hardest part done, the following two steps change the “v.” into “H” and the “@” into “A.” That completes the transformation and makes the input files ready for the plotting script.

The downside of using a Text Factory is that it locks me to BBEdit. I can’t do the transformation from the command line or use it as part of some longer pipeline, as I could if I’d written it as a Python script. For this transformation, that loss of flexibility is no big deal, as I don’t expect to be making this plot much longer. Rewriting it as a regular script isn’t worth the effort.

What’s kept me interested the past few weeks has been the tremendous run of success the Cubs have had recently, but I know myself and the Cubs too well to expect that to continue.

So I’m enjoying this August but waiting for the inevitable disgust to set in. When it does, the Text Factory will be shuttered and its regular expressions laid off.