Weather history without the web

Weather plays an important role in many failures of engineering systems. Snow can overload a roof, ice can build up on transmission lines and towers, rain can infiltrate electrical boxes, and low temperatures can make oils viscous and difficult to pump. In trying to figure out why a failure occurred, I often find myself collecting weather data at the website of the National Centers for Environmental Information. The information kept by the NCEI is both broad and deep, but the web-based interface used to access it is cryptic and clumsy to use, so I wrote a script to speed up the process. The script isn’t as comprehensive as the web interface, but its much faster at getting to the information I usually need.

My first complaint about the NCEI web site is that the names for its various data products are long and obscure. The name of the information you’re looking for is seldom obvious, which makes it easy to forget from one visit to the next. Also, navigating the the many pages that are supposed to lead you to the data set you want is tortuous, and it’s easy to go off on the wrong side path. My most useful tool in cutting through the NCEI nomenclature fog is this bookmark, which takes me directly to the access page for the data I usually want.

NCDC starting page

From this point on, the path to the data I want is no longer obscure, just tedious. Because there are so many weather stations across the country, organizing them in a single list would lead to endless scrolling, so the website has you drill down:

I don’t blame NCEI for this five-step process—they are, after all, serving a national audience—but because I’m almost always looking for data from just a handful of stations here in the Chicago area, I wanted a faster way to zip through these steps. So I studied the source code of the NCEI pages, figured out the names and input types of all the form elements, and wrote a script to replicate the drill-down process in a single step.

I call the script ncdc, because the NCEI was until recently known as the National Climatic Data Center, and I still think of it as the NCDC.1 The script is set up with defaults that match my most common interests, so I usually need to type very little to get the data I want. For example,

ncdc 12/25/2014 > ord.html

will get me an HTML-formatted document with the hourly observation data at O’Hare for last Christmas.

Hourly snapshot page

If I want an ASCII-formatted file of that same data (which is easier to import into an analysis program), I use

ncdc -a 12/25/2014 > ord.txt

which gives me a CSV file. Here are examples of the files the script can generate:

Because ncdc works like any other Unix command, it’s easy to incorporate into pipelines and other scripts.

The usage message (available via ncdc -h) gives a brief rundown of what it can do:

usage: ncdc [options] date
Return NCDC weather report for the given date.
Options:
  -a     : return ASCII file instead of HTML
  -d     : return daily summary instead of hourly reports
  -m     : entire month, not just a single day
  -p     : precipitation (hourly only, overrides -d)
  -s STA : the station abbreviation
           O'Hare     ORD (default)
           Midway     MDW
           Palwaukee  PWK
           Aurora     ARR
           Waukegan   UGN
           Lewis      LOT
           DuPage     DPA
           Lansing    IGQ
           Joliet     JOT
           Kankakee   IKK
           Gary       GYY
  -h     : print this message

Here’s the source code:

python:
  1:  #!/usr/bin/env python
  2:  
  3:  import getopt
  4:  import requests
  5:  import sys
  6:  from dateutil.parser import parse
  7:  
  8:  help = '''usage: ncdc [options] date
  9:  Return NCDC weather report for the given date.
 10:  Options:
 11:    -a     : return ASCII file instead of HTML
 12:    -d     : return daily summary instead of hourly reports
 13:    -m     : entire month, not just a single day
 14:    -p     : precipitation (hourly only, overrides -d)
 15:    -s STA : the station abbreviation
 16:             O'Hare     ORD (default)
 17:             Midway     MDW
 18:             Palwaukee  PWK
 19:             Aurora     ARR
 20:             Waukegan   UGN
 21:             Lewis      LOT
 22:             DuPage     DPA
 23:             Lansing    IGQ
 24:             Joliet     JOT
 25:             Kankakee   IKK
 26:             Gary       GYY
 27:    -h     : print this message
 28:  '''
 29:  
 30:  # The NCDC location.
 31:  url = 'http://www.ncdc.noaa.gov/qclcd/QCLCD'
 32:  
 33:  # Dictionary of stations.           
 34:  stations = {'MDW': '14819',
 35:              'ORD': '94846',
 36:              'PWK': '04838',
 37:              'ARR': '04808',
 38:              'UGN': '14880',
 39:              'LOT': '04831',
 40:              'DPA': '94892',
 41:              'IGQ': '04879',
 42:              'JOT': '14834',
 43:              'IKK': '04880',
 44:              'GYY': '04807'}
 45:  
 46:  # Dictionary of report types. The keys are a tuple of (ascii, daily, precip).
 47:  reports = {(False, False, False): 'LCD Hourly Obs (10A)',
 48:             (True, False, False):  'ASCII Download (Hourly Obs.) (10A)',
 49:             (False, True, False):  'LCD Daily Summary (10B)',
 50:             (True, True, False):   'ASCII Download (Daily Summ.) (10B)',
 51:             (False, False, True):  'LCD Hourly Precip',
 52:             (True, False, True):   'ASCII Download (Hourly Precip.)'}
 53:  
 54:  # Handle options.
 55:  sta = 'ORD'
 56:  ascii = False
 57:  daily = False
 58:  month = False
 59:  precip = False
 60:  try:
 61:    optlist, args = getopt.getopt(sys.argv[1:], 'adhmps:')
 62:  except getopt.GetoptError as err:
 63:    sys.stderr.write(str(err) + '\n')
 64:    sys.stderr.write(help)
 65:    sys.exit(2)
 66:  for o, a in optlist:
 67:    if o == '-h':
 68:      sys.stderr.write(help)
 69:      sys.exit()
 70:    elif o == '-a':
 71:      ascii = True
 72:    elif o == '-d':
 73:      daily = True
 74:    elif o == '-m':
 75:      month = True
 76:    elif o == '-p':
 77:      precip = True
 78:    elif o == '-s':
 79:      sta = a.upper()
 80:    else:
 81:      sys.stderr.write(help)
 82:      sys.exit(2)
 83:  if precip:
 84:    daily = False
 85:  
 86:  # The date is the first argument. All other arguments will be ignored.
 87:  d = parse(args[0], dayfirst=False)
 88:  
 89:  # Assemble the payload for the POST.
 90:  # Start with the parts that don't change.
 91:  payload = {'stnid': 'n/a', 'prior': 'N', 'version': 'VER2'}
 92:  
 93:  # Add the day.
 94:  if month:
 95:    payload['reqday'] = 'E'
 96:  else:
 97:    payload['reqday'] = '{:02d}'.format(d.day)
 98:  
 99:  # Add the station id/year/month string.
100:  try:
101:    payload['yearid'] = '{}{:4d}{:02d}'.format(stations[sta], d.year, d.month)
102:    payload['VARVALUE'] = payload['yearid']
103:  except KeyError:
104:    sys.stderr.write('No such station!\n')
105:    sys.stderr.write(help)
106:    sys.exit(2)
107:  
108:  # Add the report type.
109:  payload['which'] = reports[(ascii, daily, precip)]
110:  
111:  # Go get the report and print it.
112:  r = requests.post(url, payload)
113:  print r.text

As you can see from Line 4, ncdc uses Kenneth Reitz’s excellent requests module, which you’ll have to install yourself because it’s not in the Python Standard Library, nor does Apple provide it with OS X. Line 6 imports the non-standard dateutil module, but you don’t have to worry about installing it because Apple provides it.

Lines 30–52 set up the global variables that are used to assemble the HTTP request later in the script. If you wanted to customize ncdc for your own use, the dictionary of stations in Lines 34–44 is what you’d want to edit. The keys are the three-letter airport codes, and the values are the five-digit station IDs. Most of these can be found in one of the files listed on this NCEI page, but I’ve found that the best way to get the codes and IDs is from the station list on the second drill-down page. They’re in parentheses at the end of each station in the list.

Station list with codes

Lines 54–84 handle the options you can give to ncdc. As is my habit, I’m using the simple getopt module because I don’t trust the more advanced modules to remain in the Standard Library. Also, I find their “simplified” way of generating a usage message harder to understand than just writing my own.

Lines 86–109 assemble all the information needed to pass to the NCEI server in a POST request. Lines 108–113 send the request and print the response.

Although this script is longer than most I post here, it didn’t take that long to write. The source code of the NCEI web pages is straightforward, and it was easy to find the appropriate <form> element and pick out all the required inputs.


  1. You may have noticed that the URLs of the NCEI pages still carry the NCDC legacy.