Text files and me - Part 2

I ended the first installment of this reminiscence in late 1996, a time at which I was dealing with two sources of frustration:

  1. Everything I’d written in the previous decade was either lost or on the way to being lost. I’d done my writing in a variety of word processors (see Part 1 for the gory details); several of the file formats had already become extinct and those that hadn’t soon would. The only way to prevent this gradual rot of word processing files was to continually translate older files into newer formats when I switched or updated applications. In theory this was possible, but in practice I knew I’d never go back to a client report written two years earlier just to update it from Word N to Word N+1.
  2. Apple had admitted to the world, through its courting of NeXt and Be, that it had lost the ability to write its own OS. Longtime Mac users had suspected as much, but the public search for set of brains to buy made it official. Even if Apple made a good choice—which history has shown it did—it would be years before the transition to a useful, stable operating system would be complete.

It was these two things that led me to switch to Linux.

Let me pause here to say that Apple’s financial condition didn’t play into my decision. I’d read all the obituaries, of course, but I knew Apple had lots of cash (holding large reserves isn’t a new thing) and wouldn’t be going out of business anytime soon. It wasn’t the state of Apple that was the problem, it was the state of the Mac itself.

Why Linux instead of Windows? Partly because the plain text document creation tools I’d been exploring were more naturally suited to a Unix environment. And partly because I tried Windows and just didn’t like it. In fact, I started out on a dual-boot machine with RedHat Linux in one partition and Windows 95 in the other and used ClarisWorks for Windows to write one large report for work while getting my new text-based workflow established. That month or two convinced me that Windows and I weren’t meant for each other.

SGML and troff

The workflow I built was based on SGML for writing and troff for formatting. In between were James Clark’s NSGMLS tool for processing the SGML into an intermediate format (ESIS) and a homemade Perl script that turned the intermediate format into troff code. Groff, the open source version of troff, generated PostScript from the troff code and that’s what got printed.

If you’ve read any of my equation-laden posts here, you might be wondering why I went with troff over TeX or LaTeX. It was a near thing, but what convinced me to go with troff was that it had commands for putting text at specific positions on the page. While there are LaTeX packages that are supposed to be able to do this, it isn’t a natural part of the LaTeX way.

Precise text placement was important to me at the time because my reports at work almost always included photographs with captions. And when I say they “included photographs” I mean that literally. This was the late ’90s, remember, and I was still shooting with film and taking it to a lab for processing and printing. I’d get the 3½×5 prints back from the lab and paste them onto the pages of my reports.1 The captions were printed out before the pasting and had to go in certain spots to end up adjacent to the photos. I could fit two photos per page. Landscape photos had their captions underneath; portrait photos had their captions to the side.2 This was much more important to me than nicely formatted equations—and the equations I put in reports were easily handled by troff’s eqn preprocessor.

So I made a DTD for my reports and set up a workflow that went this way:

SGML/troff workflow

I know this looks complicated, but it was all wrapped up in a single shell script that piped the output from one process to the next. Except when debugging my script, I never looked at the ESIS or troff versions of the report—they were never even saved as files. I just wrote the SGML version of the report, passed it to the shell script, and printed out the PostScript.

It was actually much easier than writing in a word processor. I never got bogged down in formatting because that was all predetermined by the troff codes my Perl script produced. Similarly, all the figures, photos, and equations were numbered automatically. If I decided late in game to add another photo to a report, it was no big deal. All the photos and all the references to the photos would renumber themselves.

I wrote the Perl script that converted ESIS to troff code. Initially, it used ideas I found in Norman Smith’s Practical Guide to SGML Filters, but later I rewrote it to take advantage of David Megginson’s SGMLS Perl module.

When I say the script output troff code, I mean that literally. It didn’t use any of the common macro packages, just pure raw troff. I didn’t like the formatting of mm, ms, me, et al., and I figured that since I was writing in SGML, that was my macro package.

Nowadays SGML has a reputation for being incredibly complex, but it was actually easy to write in if you spent time making yourself a good DTD. Unlike XML, which has rules designed to make it easy for programs to parse, SGML is quite flexible. My report DTD allowed me to omit most end tags, which was a big boost to productivity.

This workflow was definitely geared to getting ink on paper. That’s what my clients wanted and that’s what I needed to supply. In that sense, I was still working the way I had been when I used word processors. The difference, though, was that my words, the plain text SGML file at the top of the workflow, were no longer subject to file format obsolescence. Even if NSGMLS and my Perl program were lost, the words themselves were still there in a plain text file. I could always retrieve them.

Text editors

Another advantage of moving to plain text is the freedom to switch text editors at will. Any text editor can read and write a file created by any other text editor. I was quite promiscuous in my Linux years, flitting between GNU Emacs, XEmacs, vim, and NEdit, always looking for that elusive “perfect” editor.

NEdit was the editor I used the most. It wasn’t as powerful as the Emacsen, and it wouldn’t work in the console the way vim would, but it looked nice and it fit me well. It’s not surprising to me that the Allan Odgaard, the developer of my current editor, was a NEdit user.

NEdit had a Perl-like scripting language, and I wrote several little macros to help me write my reports. One that was particularly useful was a capitalization macro. I have a tendency to lift my pinky off the Shift key a little too soon and don’t realize it until I’m a few letters into the word I want to capitalize. I needed a command I could quickly invoke that would jump to the beginning of the current word, capitalize the first letter, then jump back to where I’d stopped typing. NEdit’s scripting language made that easy.

(Years later, when I’d moved back to the Mac, I rewrote that macro, first for BBEdit and then for TextMate. I just used it when writing this paragraph.)

Coming attractions

Part 3 will cover my switch from SGML/troff to LaTeX, my switch from Linux back to the Mac, and the BBEdit/TextMate dilemma.


  1. Several interesting low-tech tools have been made obsolete by digital photos. Chartpak made rolls of tiny arrow labels in several colors that I’d stick on the photos and reference in the text. They were too small to handle with my fingers, so I’d use tweezers or the tip of an Xacto knife to get them placed just right. Even getting the photos to stick to the paper required a specialty tool. Glue sticks weren’t sticky enough; double-sided tape took too long to apply. Eventually I settled on glue tape, a sort of weird amalgam of the two. 

  2. It was a bit more complicated that that. If a page had two portrait photos, the top one would be placed at the left margin with its caption to the right and the bottom one would be placed at the right margin with its caption to the left. Because more than half the page width would be consumed by the photo, portrait captions had to formatted with short line lengths. The SGML entries for photo captions included an attribute that indicated whether the photo was landscape or portrait, and my Perl program used that information, along with the photo number, to put the captions in the right spots.