September 17, 2017 at 3:46 PM by Dr. Drang
I updated my curriculum vitae the other day and realized that the script I ran to generate the PDF file is twenty years old. Time flies.
In late ’96/early ’97, I transitioned away from the Mac and started using Linux as my full-time desktop operating system. Since 1985, my CV had been saved in many forms: MacWrite, MS Word, WriteNow, PageMaker, MS Word again, and Claris Works. Six different file formats over twelve years. I was sick of word processors, which was good because Linux didn’t have any to speak of. It would be years before OpenOffice arrived. I had to decide on a plain text solution to generate a printed CV.1
The two obvious solutions were troff and LaTeX. I hated LaTeX, mainly because I hated the Computer Modern font and didn’t know to change it. That left me with troff and its macro packages, ms, mm, and me. I wasn’t too thrilled with them, either. But there were other ways to use troff.
Like most people learning HTML in the 90s, I’d read that it was derived from SGML, a much more flexible markup system that could be specialized for, seemingly, any type of document. I was already figuring out ways to use SGML to produce reports, correspondence, and other documents, so it was no big deal to whip out a system for CVs.
Here’s the document type declaration (DTD) for my CV:
xml: 1: <!ELEMENT cv - - (name, pos, intro?, s+)> 2: <!ELEMENT name - O (#PCDATA)> 3: <!ELEMENT pos - O (#PCDATA)> 4: <!ELEMENT intro - O (#PCDATA)> 5: <!ELEMENT s - O (h,(item|ditem)*)> 6: <!ELEMENT h O O (#PCDATA)> 7: <!ELEMENT item - O (#PCDATA | cite | br)*> 8: <!ELEMENT ditem - O (#PCDATA | cite | br)*> 9: <!ATTLIST ditem date CDATA #REQUIRED> 10: <!ELEMENT br - O EMPTY> 11: <!ELEMENT cite - - (#PCDATA)>
By this definition, a CV (Line 1) consists of a name, a position (or job title), an optional introductory paragraph, and a series of sections. The name, position, and intro (Lines 2–4) are just text. A section (Line 5) consists of a header followed by a set of items or dated items. A header (Line 6) is just text. Items and dated items (Lines 7–8) are a mixture of text, citations, and line breaks. Dated items have a single attribute (Line 9): text that defines the date. Line breaks (Line 10) contain no data, and citations (Line 11) contain text.
One of the great things about SGML—as opposed to its duller child, XML—is that you could define your document in such a way as to make certain element tags optional. The middle column of the DTD, the one that’s mostly hyphens and zeros, shows that the closing tag is optional for most of the elements. The opening of the next item implies the closing of the current item. In fact, headers need neither opening nor closing tags; because they’re the first item in a section, there’s no ambiguity in the header.
Here are the education and professional membership sections of my CV:
xml: <s> Education <ditem date=1985> Ph.D. in Civil Engineering; University of Illinois at Urbana-Champaign <ditem date=1982> M.S. in Civil Engineering; University of Illinois at Urbana-Champaign <ditem date=1981> B.S. in Civil Engineering; University of Illinois at Urbana-Champaign <s> Professional societies <item> American Society of Civil Engineers <item> American Society of Mechanical Engineers <item> American Institute of Steel Construction <item> American Concrete Institute <item> American Welding Society <item> ASM International <item> American Statistical Association
You can see how the optional tags keep the clutter to a minimum compared to XML.
How do you parse an SGML document? With James Clark’s NSGMLS parser, part of his SP package, which has been around since 1994. There’s a parallel system called OpenSP, the purpose of which I’ve never understood, as Clark’s work was all open source. The easiest way to get this software on your Mac is by installing Homebrew’s
What NSGMLS does is convert your SGML document into a line-based format called ESIS that’s very easy to parse. What I did what write a simple Perl script, called
cv2roff, to read the ESIS and output the troff codes necessary to format the CV. The pipeline to generate a PDF is
onsgmls cv.sgml | cv2roff | groff | ps2pdf - cv.pdf
This seems ridiculous, I know, but you have to consider this:
- Writing the DTD took almost no time after I’d already worked out the DTDs for more complex documents.
- Similarly, writing
cv2roffconsisted mainly of stealing code from the more complicated script that generated troff for reports.
- I don’t actually type out the pipeline. It’s invoked through a shell script called
The upfront complexity in building this system led to great simplicity in use. Once or twice a year I add a couple of lines to
cv.sgml and run
drangCV. Changes in text editors and even operating systems have not affected this. The only change I’ve made in 20 years has been to add
ps2pdf to the pipeline. The
groff output used to be piped to
lpr, which sent the PostScript directly to a printer.
James Thomson tweeted this yesterday:
Never assume that code will be short-lived. I’m still maintaining the first app I ever wrote when I was at university…
…25 years ago.
— James Thomson (@jamesthomson) Sep 16 2017 7:52 AM
I don’t think I’m still running any code of mine that’s 25 years old, but I’m close. And although the Perl in
cv2roff is pretty ugly—no, I’m not posting it—it’s been quite reliable.
Paper was still the norm, at least among the people I worked with. It would be quite a while before I could email a PDF to clients and be certain they’d know how to open it—and that’s if they had an email address, which most didn’t. ↩