November 15, 2011 at 8:17 PM by Dr. Drang
I have a project going on at work that involves the measuring, testing, and analysis of hundreds of machine parts. I need to keep track of all the work done on each part I receive and to see if there are any trends in the data.
Normally, I’d just keep all this information in a text file and manipulate it as needed through some simple one-off scripts. But because I have to share the information with others, I’m putting it in a Numbers spreadsheet instead. I’m not a big fan of spreadsheets in general or of Numbers in particular—they’re great for entering tabular data but limited when it comes to manipulating it—but we do what we must. The people I’m sharing this with will be more comfortable with a spreadsheet than a text file.
One of the first things I wanted to do was determine the manufacturing dates. Each part has a unique ID number, part of which is a simple encoding of the date on which it was made. The first digit of the ID number is last digit of the year of manufacture. The parts were first made in 2009, so the year digits are 9, 0, and 1. The project may extend into next year, so it’s possible I’ll be seeing IDs that start with 2 in a few months. The next three digits of the ID number represent the day of the year on which the part was made: January 1 is 001, February 1 is 032, etc.
Every row of my spreadsheet represents one part, and the ID numbers are in the first column. This morning I opened up a new column next to it and began writing a function to calculate the date from the ID. As the function grew and grew I realized that debugging it would be hell, and I began to look for a way out.
After a bit of thinking, I decided there was no need for the date to be calculated from the ID within the spreadsheet. The rows of my spreadsheet were going to be like the records of a database, and some of its fields would be referred to, searched for, and sorted on, but never edited. Because the date was going to be one of those non-edited fields, it could consist of just static data—data that I calculate outside the spreadsheet and paste into it.
With that decision made, I wrote this little Python script to decipher the date code part of the ID and print out a list of dates.
python: 1: #!/usr/bin/env python 2: 3: import fileinput 4: import datetime 5: 6: for code in fileinput.input(): 7: try: 8: yearCode = int(code) 9: year = 2009 if yearCode == 9 else 2010 + yearCode 10: dayOfYear = int(code[1:4]) 11: prevYearEnd = datetime.date(year - 1, 12, 31).toordinal() 12: codeDate = datetime.date.fromordinal(prevYearEnd + dayOfYear) 13: print codeDate.strftime("%b %d, %Y") 14: except ValueError: 15: print "Unknown"
I used the
fileinput library, which I’ve talked about before, to give me the flexibility of either piping the data to the script or having it read from a file. Date calculations are done through the
For each line of input, the script
- Gets the first character of the ID and turns it into a year (Lines 8 and 9).
- Gets the next three characters and turns it into the day of year (Line 10).
- Calculates the “proleptic Gregorian ordinal”—an integer count of the day number in an assumed Gregorian calendar that starts with Day 1 being January 1 of the year 1—of the last day of the previous year (Line 11).
- Calculates the proleptic Gregorian ordinal of the manufacturing date and turns it into a date object (Line 12).
- Prints the result in “Nov 15, 2011” form (Line 13).
I’ve found that the ID numbers on some of the parts are unreadable, in whole or in part. Unreadable digits are marked as an x, so the
int() operations on Lines 8 and 10 will throw errors when unreadable date codes are encountered. The
try/except is there to gracefully handle those situations.
You’ll note that there’s no code explicitly distinguishing between leap and non-leap years. That’s the advantage of using a library that already handles leap years. The calculations in Lines 11 and 12 will work in any year.
The script is called
get-dates.py, and using it was really simple. I copied the column of IDs from the spreadsheet, which gave me a big list of numbers, one per line. I then issued this command in the Terminal,
pbpaste | python get-dates.py | pbcopy
which processed the IDs from the clipboard and wrote the results back out onto the clipboard. I then pasted the list of calculated dates into the appropriate column of my spreadsheet. Boom.
This was much easier to write and debug than a spreadsheet function, and the combination of Unix pipelines with the
pbcopy commands meant I didn’t even have to litter my hard disk with temporary files.