# Weather history without the web

Weather plays an important role in many failures of engineering systems. Snow can overload a roof, ice can build up on transmission lines and towers, rain can infiltrate electrical boxes, and low temperatures can make oils viscous and difficult to pump. In trying to figure out why a failure occurred, I often find myself collecting weather data at the website of the National Centers for Environmental Information. The information kept by the NCEI is both broad and deep, but the web-based interface used to access it is cryptic and clumsy to use, so I wrote a script to speed up the process. The script isn’t as comprehensive as the web interface, but its much faster at getting to the information I usually need.

My first complaint about the NCEI web site is that the names for its various data products are long and obscure. The name of the information you’re looking for is seldom obvious, which makes it easy to forget from one visit to the next. Also, navigating the the many pages that are supposed to lead you to the data set you want is tortuous, and it’s easy to go off on the wrong side path. My most useful tool in cutting through the NCEI nomenclature fog is this bookmark, which takes me directly to the access page for the data I usually want.

From this point on, the path to the data I want is no longer obscure, just tedious. Because there are so many weather stations across the country, organizing them in a single list would lead to endless scrolling, so the website has you drill down:

• Then to the station within your state
• Then to the year and month of interest.
• Then to the day of interest.
• And finally to the type of information (hourly snapshots or daily summary) you want.

I don’t blame NCEI for this five-step process—they are, after all, serving a national audience—but because I’m almost always looking for data from just a handful of stations here in the Chicago area, I wanted a faster way to zip through these steps. So I studied the source code of the NCEI pages, figured out the names and input types of all the form elements, and wrote a script to replicate the drill-down process in a single step.

I call the script ncdc, because the NCEI was until recently known as the National Climatic Data Center, and I still think of it as the NCDC.1 The script is set up with defaults that match my most common interests, so I usually need to type very little to get the data I want. For example,

ncdc 12/25/2014 > ord.html


will get me an HTML-formatted document with the hourly observation data at O’Hare for last Christmas.

If I want an ASCII-formatted file of that same data (which is easier to import into an analysis program), I use

ncdc -a 12/25/2014 > ord.txt


which gives me a CSV file. Here are examples of the files the script can generate:

Because ncdc works like any other Unix command, it’s easy to incorporate into pipelines and other scripts.

The usage message (available via ncdc -h) gives a brief rundown of what it can do:

usage: ncdc [options] date
Return NCDC weather report for the given date.
Options:
-a     : return ASCII file instead of HTML
-d     : return daily summary instead of hourly reports
-m     : entire month, not just a single day
-p     : precipitation (hourly only, overrides -d)
-s STA : the station abbreviation
O'Hare     ORD (default)
Midway     MDW
Palwaukee  PWK
Aurora     ARR
Waukegan   UGN
Lewis      LOT
DuPage     DPA
Lansing    IGQ
Joliet     JOT
Kankakee   IKK
Gary       GYY
-h     : print this message


Here’s the source code:

python:
1:  #!/usr/bin/env python
2:
3:  import getopt
4:  import requests
5:  import sys
6:  from dateutil.parser import parse
7:
8:  help = '''usage: ncdc [options] date
9:  Return NCDC weather report for the given date.
10:  Options:
11:    -a     : return ASCII file instead of HTML
12:    -d     : return daily summary instead of hourly reports
13:    -m     : entire month, not just a single day
14:    -p     : precipitation (hourly only, overrides -d)
15:    -s STA : the station abbreviation
16:             O'Hare     ORD (default)
17:             Midway     MDW
18:             Palwaukee  PWK
19:             Aurora     ARR
20:             Waukegan   UGN
21:             Lewis      LOT
22:             DuPage     DPA
23:             Lansing    IGQ
24:             Joliet     JOT
25:             Kankakee   IKK
26:             Gary       GYY
27:    -h     : print this message
28:  '''
29:
30:  # The NCDC location.
31:  url = 'http://www.ncdc.noaa.gov/qclcd/QCLCD'
32:
33:  # Dictionary of stations.
34:  stations = {'MDW': '14819',
35:              'ORD': '94846',
36:              'PWK': '04838',
37:              'ARR': '04808',
38:              'UGN': '14880',
39:              'LOT': '04831',
40:              'DPA': '94892',
41:              'IGQ': '04879',
42:              'JOT': '14834',
43:              'IKK': '04880',
44:              'GYY': '04807'}
45:
46:  # Dictionary of report types. The keys are a tuple of (ascii, daily, precip).
47:  reports = {(False, False, False): 'LCD Hourly Obs (10A)',
49:             (False, True, False):  'LCD Daily Summary (10B)',
51:             (False, False, True):  'LCD Hourly Precip',
53:
54:  # Handle options.
55:  sta = 'ORD'
56:  ascii = False
57:  daily = False
58:  month = False
59:  precip = False
60:  try:
61:    optlist, args = getopt.getopt(sys.argv[1:], 'adhmps:')
62:  except getopt.GetoptError as err:
63:    sys.stderr.write(str(err) + '\n')
64:    sys.stderr.write(help)
65:    sys.exit(2)
66:  for o, a in optlist:
67:    if o == '-h':
68:      sys.stderr.write(help)
69:      sys.exit()
70:    elif o == '-a':
71:      ascii = True
72:    elif o == '-d':
73:      daily = True
74:    elif o == '-m':
75:      month = True
76:    elif o == '-p':
77:      precip = True
78:    elif o == '-s':
79:      sta = a.upper()
80:    else:
81:      sys.stderr.write(help)
82:      sys.exit(2)
83:  if precip:
84:    daily = False
85:
86:  # The date is the first argument. All other arguments will be ignored.
87:  d = parse(args[0], dayfirst=False)
88:
89:  # Assemble the payload for the POST.
91:  payload = {'stnid': 'n/a', 'prior': 'N', 'version': 'VER2'}
92:
94:  if month:
96:  else:
98:
99:  # Add the station id/year/month string.
100:  try:
101:    payload['yearid'] = '{}{:4d}{:02d}'.format(stations[sta], d.year, d.month)
103:  except KeyError:
104:    sys.stderr.write('No such station!\n')
105:    sys.stderr.write(help)
106:    sys.exit(2)
107:
108:  # Add the report type.
109:  payload['which'] = reports[(ascii, daily, precip)]
110:
111:  # Go get the report and print it.
113:  print r.text


As you can see from Line 4, ncdc uses Kenneth Reitz’s excellent requests module, which you’ll have to install yourself because it’s not in the Python Standard Library, nor does Apple provide it with OS X. Line 6 imports the non-standard dateutil module, but you don’t have to worry about installing it because Apple provides it.

Lines 30–52 set up the global variables that are used to assemble the HTTP request later in the script. If you wanted to customize ncdc for your own use, the dictionary of stations in Lines 34–44 is what you’d want to edit. The keys are the three-letter airport codes, and the values are the five-digit station IDs. Most of these can be found in one of the files listed on this NCEI page, but I’ve found that the best way to get the codes and IDs is from the station list on the second drill-down page. They’re in parentheses at the end of each station in the list.

Lines 54–84 handle the options you can give to ncdc. As is my habit, I’m using the simple getopt module because I don’t trust the more advanced modules to remain in the Standard Library. Also, I find their “simplified” way of generating a usage message harder to understand than just writing my own.

Lines 86–109 assemble all the information needed to pass to the NCEI server in a POST request. Lines 108–113 send the request and print the response.

Although this script is longer than most I post here, it didn’t take that long to write. The source code of the NCEI web pages is straightforward, and it was easy to find the appropriate <form> element and pick out all the required inputs.

1. You may have noticed that the URLs of the NCEI pages still carry the NCDC legacy.