Mechanics lipsum

In yesterday’s Back to Work, during the commercial for Smile Software and TextExpander, Merlin Mann talked about a snippet he uses that inserts a specially-crafted lorem ipsum that’s exactly 500 characters and 100 words long. Most people use lorem ipsums (or lipsums) as placeholder text when designing the layout of a publication or web site. Merlin uses his, he said, when he needs to write a piece of a certain length. Maybe it was the blunt I was smoking at the time, but I didn’t understand how a precise amount of nonsense text would be helpful in writing that same length of non-nonsense. I hope he’ll explain in a later episode.

It reminded me, though, that I’d updated my lipsum snippet without writing about it. Time to fix that.

My lipsum snippet doesn’t do the usual things. First, it doesn’t insert pseudo-Latin nonsense; it inserts real English nonsense. I prefer English words because they don’t cause the spell checker to fill my text editor with angry red lines the way “Lorem ipsum dolor…” does.

Second, it doesn’t insert a fixed amount of unchanging text; both the word count and the words themselves change every time the snippet is used. I prefer random text because it looks more natural. Fixed text leads to uniformly spaced paragraph breaks and rivers of whitespace as the same chunks keep fitting together in the same way.

My snippet mimics Dissociated Press. It opens a text file and grabs word pairs from it, using a Markov chain-like approach. It’s written in Perl and uses Avi Finkel’s Games::Dissociated library. If you want to use it, you’ll have to

  1. Download the library from the linked page.
  2. Expand the gzipped tarfile.
  3. Run perl Makefile.PL. There are likely to be warnings that you’re missing other libraries. Ignore them. The libraries you’re missing are test libraries that don’t affect the functioning of Games::Dissociated.
  4. Run make.
  5. Run sudo make install. Enter your administrator password when prompted.

In my earlier version of this snippet, the text file I used to pull word pairs from was Darwin’s Origin of Species. It sort of fit the type of writing I do because it was technical nonfiction, but it wasn’t a great fit because its vocabulary was biological, not mechanical. So I hunted around on Project Gutenberg and found a book that was a better match with my usual subject matter: The Machinery of the Universe by A. E. Dolbear.1

I downloaded the text of the book and stripped out the Project Gutenberg boilerplate text from the beginning and end. Scanning through the text, I noticed a few other things that I didn’t want creeping into my snippets:

  1. Illustration references in brackets. I got rid of these by doing a regex find on

    \[Illustration[^]]+\]
    

    and replacing it with nothing.

  2. All-caps headings, some of which were preceded by numbers. I deleted these by doing a regex find on

    ^(\d+\. )?[A-Z .,;]+.
    

    and replacing it with nothing.

  3. Underscores to indicate italics. A simple (non-regex) find and replace stripped out all of these.

When I was done, I saved the file as ~/text/machinery.txt.

The snippet itself is defined in TextExpander as a shell script. Its content is

perl:
 1:  #!/usr/bin/perl
 2:  
 3:  use Games::Dissociate;
 4:  
 5:  # Slurp in the given corpus as a single string.
 6:  open(my $fh, "$ENV{'HOME'}/text/machinery.txt") or die "Can't open";
 7:  {local $/; $corpus = <$fh>;}
 8:  
 9:  # Dissociate the corpus, using word pairs, and return 15-50 pairs.
10:  $length = int(15 + rand(35));
11:  $dis = dissociate($corpus, -2, $length);
12:  
13:  # Insert periods before capitals.
14:  $dis =~ s/( [A-Z])/.$1/g;
15:  $dis =~ s/[,.;?!]\././g;
16:  
17:  # Capitalize the first word and end with a period.
18:  $dis =~ s/^(.)/\u\1/;
19:  $dis =~ s/[.);:?'", -]+$/./;
20:  
21:  print $dis;

and its abbreviation is ;mech.

Most of this script is explained in my earlier post. The only additions are Lines 13-15. I didn’t like the way the initial version put capitalized words in the middle of sentences; these two lines insure that capitalized words are always preceded by a period.

Each time the snippet is invoked, it inserts roughly 30-100 words of Dolbearian nonsense. Some examples:

Gravitative attraction of the others. These various stresses have been confounded. Let us consider what is important to bear in mind is, that when a form of energy, are convertible into one another. We are asked.

Friction, which causes the latter to vibrate at the above rate per second is to be visible as red light. If the densest and hardest substances are sufficiently heated they will become gaseous. This is only one hypothesis.

Than the one assigned by. Maxwell, yet nearly all the qualities that belonged to it before it was common to have the same difficulties to meet as the underlying stratum of matter, opposes to a change in the room could. The only significance any or all of these have since been discovered, long ago. It is not needful for most scientific purposes that another magnet in circuit, so there will of course be 1728 times that number. One may if.

Amount of matter in the universe. The attraction of atoms or molecules, may cohere for other reasons, gravitative or magnetic, effects; which one of the essential qualities are modified in any degree, but vibratory, in the molecule that it can be written down in a minute drop of water, but there would be interspaces and unoccupied spaces which would present us with phenomena which imply that molecules originate them by one who.

Of the positive plate of a dynamo; and steam-engine is at liberty to say, or think, that the remotest visible stars are so well known that a magnet field.

No more feathers, beaks and flowers from Darwin. Now it’s hard substances, circuits, and confounded stresses. Frighteningly close to the writing I do for work.

A month ago, in the comments to this post about spam. I threatened to use the comment spam caught by Akismet as the corpus for a spam lipsum. I haven’t followed through on that, but you can see how easy it would be.


  1. You really should read the Wikipedia entry for Dolbear. It’s short and interesting. His work anticipated Bell in telephony and Marconi in radio. But he’s best known now for correlating temperature to the chirping rate of crickets