Flight schedule reformatting in BBEdit

One of the handiest things to have when I’m on a business trip is a short list of alternate return flights. I schedule my return flight based on a best guess as to how long the work will last, but it’s common for me to be done a couple of hours early or to run a couple of hours long. The fastest way to choose a new return flight is to have the alternatives already saved on my phone and available for quick consultation.

I used to print this information out on an index card clipped to my Hipster PDA. You may think that connected smartphones and airline apps have made this sort of thing obsolete, but in my experience it’s still faster to have the flight numbers and times at your fingertips than it is to search for them in an app. I don’t print my alternate return flight schedules on index cards anymore, but I do save them as a text file on my phone.

The biggest problem in creating the list of alternative return flights is the convoluted formatting used on airline websites. I usually fly Southwest, and when I search for flights, Southwest’s website returns information in this form:

Southwest flight schedule

If I select the text of the schedule, copy it, and paste it into my text editor (BBEdit), it looks like this:

Schedule pasted in BBEdit

You can see all the information, but it’s a mess. Every cell of the table is on its own line and different flights are represented by different numbers of lines.

What I want is something that looks more like this:

Reformatted schedule in BBEdit

The text is a little jagged because I don’t bother adding spaces to single-digit hours. I usually view this on my phone in a proportional font, so aligning by adding spaces won’t work. Here’s what it looks like in Editorial:

Reformatted schedule in Editorial

This list typically goes at the end of a small file that has all the hotel, car rental, and meeting place information I need for my trip. I create the file in BBEdit as I’m planning the trip and save it to Dropbox, where almost any text editor can pick it up.

Each line starts with the flight times, because that’s what I base my rescheduling decision on. Then comes the flight number(s) and, if necessary, the intermediate airport at which I’ll change planes. Honestly, if there’s a change of planes, I usually avoid the flight—it’s too easy to miss connections and get stranded.

The reformatting is done by the following script, named Reformat SWA Schedule.py and saved in BBEdit’s Text Filters folder, which makes it available from the Text‣Apply Text Filter submenu.

 1:  #!/usr/bin/env python
 3:  import re
 4:  import sys
 6:  # Read standard input into lines.
 7:  lines = sys.stdin.readlines()
 9:  # Filter out the junk lines.
10:  lines = [ x for x in lines if 'Unavailable' not in x ]
11:  lines = [ x for x in lines if 'Sold Out' not in x ]
12:  lines = [ x for x in lines if 'No Plane Change' not in x ]
13:  lines = [ x for x in lines if not re.search(r'\$\d+', x) ]
14:  lines = [ x for x in lines if not re.search(r'(1 |2 |Non)stop', x) ]
15:  lines = [ x for x in lines if not re.search(r'^(\d+h \d+m|\d+h|\d+m)$', x) ]
17:  # Assemble into a single chunk of text.
18:  text = ''.join(lines)
20:  # Filter out extraneous text.
21:  text = text.replace(' (opens popup)', '')
22:  text = text.replace(' Connecting Flight\n', '/')
24:  # Assemble flight numbers and plane changes.
25:  text = re.sub(r'^((\d{2,4})(/\d{2,4}){0,2})$', r'SW \1', text, flags=re.M)
26:  text = re.sub(r'\nChange Planes (\S+)$', r' (\1)', text, flags=re.M)
28:  # Assemble flight times.
29:  text = re.sub(r'(^\d+:\d+ (A|P)M)\n(\d+:\d+ (A|P)M)$',
30:                r'\1 - \3', text, flags= re.M)
31:  text = re.sub(r'(:\d\d) AM', r'\1a', text)
32:  text = re.sub(r'(:\d\d) PM', r'\1p', text)
34:  # Put the flight numbers on the same lines as the times.
35:  text = re.sub(r'(:\d\d(a|p))\nSW', r'\1  SW', text)
37:  print 'Alternate return flights'
38:  print text

Most of this is pretty straightforward deletion and substitution, and because it uses standard input and output, it can be used with other text editors or from the command line.

One thing that may seem odd to you is that I start, on Line 7, by reading standard input into a list of lines rather than just a single chunk of text. I found it was easier to delete entire lines of junk that way. After the junk lines are deleted in Lines 10–15, I put the remaining lines together in Line 18 for the rest of the reformatting.

I’ve never seen a Southwest trip that has more than two flight numbers, but I’ve tried, in Line 25, to allow for the possibility of three. Since I have no examples to work from, there’s a decent chance that this line of code won’t work if I ever run into a three-legged trip.

In fact, this whole script will stop working the next time Southwest decides to redesign its website. Several years ago, I had a similar script that reformatted the tab-separated schedule that Southwest was using back then. Because the changes in the schedule layout were so extensive, that script was of absolutely no help in writing this on. Such is the fragility of web scraping.