SGML nostalgia
September 13, 2014 at 9:25 PM by Dr. Drang
When I switched to Linux in the late ’90s, I needed a way to write reports and correspondence for work. At the time, there weren’t any open source word processors worth mentioning, and I was done with wordprocessors, anyway. So I set up a report-writing workflow based on SGML, HTML’s big brother, and groff, the GNU version of the ancient Unix text formatter, troff.
I actually enjoyed writing in SGML. Creating a DTD for my reports forced me to think hard about how they ought to be structured. Although my current workflow is different, and I write my reports in Markdown, I still structure them according to the rules I had to formalize back in 1997. And SGML isn’t the straightjacket that XML is; you don’t need closing tags—or even opening tags—if there’s no way to misinterpret an element.
I kind of went SGML-happy in the late ’90s, creating DTDs for every type of structured document I wrote, including my CV. The workflow for generating a PostScript version of my CV was basically the same as the one for reports. Here’s my CV DTD:
xml:
1: <!ELEMENT cv - - (name, pos, intro?, s+)>
2: <!ELEMENT name - O (#PCDATA)>
3: <!ELEMENT pos - O (#PCDATA)>
4: <!ELEMENT intro - O (#PCDATA)>
5: <!ELEMENT s - O (h, (item|ditem)*)>
6: <!ELEMENT h O O (#PCDATA)>
7: <!ELEMENT item - O (#PCDATA | cite | br)*>
8: <!ELEMENT ditem - O (#PCDATA | cite | br)*>
9: <!ATTLIST ditem date CDATA #REQUIRED>
10: <!ELEMENT br - O EMPTY>
11: <!ELEMENT cite - - (#PCDATA)>
The structure isn’t too hard to work out. The CV as a whole consists of my name, my position with the company, an optional introductory paragraph, and then one or more sections. Each section consists of a header followed by some number of items or dated items. Dated items must have a date attribute; otherwise they’re identical to regular items. Items of either sort can contain citations and line breaks.
Here’s an example:
xml:
1: <!DOCTYPE cv SYSTEM "/Users/drang/dtd/cv.dtd">
2: <cv>
3: <name>
4: Dr. Drang, Ph.D., P.E.
5: <pos>
6: Engineering Mechanics
7: <s>
8: Employment
9: <ditem date="1991-present">
10: Principal, Drang Engineering, Inc.
11: <ditem date="1985-1990">
12: Assistant Professor, Small Big Ten University
13: <s>
14: Education
15: <ditem date=1985>
16: Ph.D. in Civil Engineering; University of Illinois at Urbana-Champaign<br>
17: Thesis: <cite>An Approach To Structural Analysis That No One Uses</cite>
18: <ditem date=1982>
19: M.S. in Civil Engineering; University of Illinois at Urbana-Champaign
20: <ditem date=1981>
21: B.S. in Civil Engineering; University of Illinois at Urbana-Champaign
22: <s>
23: Professional societies
24: <item>
25: American Society of Civil Engineers
26: <item>
27: American Institute of Steel Construction
28: <item>
29: American Concrete Institute
30: <s>
31: Professional licenses and registrations
32: <item>
33: Professional Engineer, State of Illinois
34: <item>
35: Professional Engineer, State of Indiana
36: <item>
37: Professional Engineer, State of Ohio
38: </cv>
Note that the only closing tags are for the <cv>
and <cite>
elements. If you look in the DTD, you’ll see - 0
in most of the element definitions. That means the opening tag is required but the closing tag is optional. Both the opening and closing tags are optional for the <h>
element; because it’s always the first element within an <s>
and it’s always followed by either an <item>
or a <ditem>
, there’s no need for tags. The SGML processor will know that things like “Employment” and “Education” are <h>
elements.
For several years I kept my CV in this form, updating it as necessary. Sometime after switching back to the Mac, I stopped maintaining the SGML version, updating only the troff version. Even though troff isn’t the easiest markup language to write in, adding an item to my CV was pretty simple. I’d just copy a chunk of formatting code from one item, paste it in, and then add the new text.
Yesterday, though, I needed to update a few items in the CV and had the bright idea to return to the SGML form. I still had an old SGML version, so it wasn’t too hard to add the stuff necessary to bring it up to date. But I soon realized I didn’t have an SGML processor—I’d never installed it on my iMac at work.
Back when I was using SGML regularly, the standard processor was nsgmls
, part of James Clark’s SP suite of programs. I couldn’t find a precompiled version for OS X, so I decided to download the source and build it myself. Unfortunately, some of the commands in the makefile threw errors; something in either OS X’s compiler or its libraries wasn’t what the makefile expected. So I started a little yak-shaving adventure.
Installing gcc via Homebrew so I can compile an SGML processor so I can run a Perl program I wrote in 1996.
As you do.
— Dr. Drang (@drdrang) Sep 12 2014 9:47 AM
Luckily, while gcc was compiling, continued Googling led me to a Homebrew recipe for OpenSP. I would never have guessed there was an OpenSP—Clark’s SP has always been open source. But after a
brew install open-sp
I was in business and was able to stop the installation of gcc and delete the dependencies had already been built. I generated my CV just as I had in the ’90s with only two differences:
- The SGML processor of the OpenSP project is called
onsgmls
, notnsgmls
. - I had to convert the PostScript generated by
groff
to PDF. I don’t print my CV very often anymore. I usually email the PDF to prospective clients.
Neither of these was a big deal. The pipeline looked like this:
onsgmls drangCV.sgml | cv2roff | groff | ps2pdf - > drangCV.pdf
The cv2roff
part is a Perl script that converts the ESIS output of onsgmls
into a troff document. I won’t be showing it here because it’s embarrassing. I had been programming Perl for less than a year when I wrote it, and it’s a mess. Worse, even, than my early Perl is the mixture of tabs and spaces in the source code. I’m sure I was using Emacs at the time and must not have known how to configure it yet. Ick.
Was it worth the trouble? I think so. Because of increased continuing education requirements to maintain my professional engineering licenses, and because I expect to be getting licensed in more states, I’ll be updating my CV more often. Having it in a concise SGML form will make it easier to edit. And even though my old Perl code is ugly, it’s fun to still be able to use a script I wrote over 15 years ago.