My contribution to Markdown

I was listening to the most recent episode of The Talk Show this morning, and my extremely important contribution to Markdown came up. Sort of.

Rosemary Orchard is the guest, and she does a great job with the single most important task of a Talk Show guest: staying engaged and making relevant contributions while Gruber is on a long digression. Lots of guests find it hard to balance letting Gruber go while also making sure the audience knows you’re still there. Rosemary does this perfectly.

Anyway, it was during a digression—actually a digression within a digression— that Gruber talks about code blocks in Markdown and how one of his favorite features is that you don’t have to escape anything in a code block. You can paste source code directly into your Markdown document without any changes, and it will appear as expected in the rendered HTML.1 That’s my doing.

In 2004, any code that contained backslashes (\) was likely to need some editing when placed in a Markdown code block. Backslashes weren’t treated literally within a code block; they acted as escape characters and made the output look different from the input. Very un-Markdown-ish.

I pointed this out in the Markdown mailing list, and Gruber agreed that it should be changed. In the next Markdown release—which was, I believe, his last—he made the change, and all the text in code blocks has been treated literally ever since.

Undoubtedly, as Markdown became more popular, someone else would have pointed out this problem. Gruber himself would have been annoyed by it if he ever needed to write a code block with backslashes in it. But I was there first. And you’re welcome.

  1. The code block has to be indented, of course, or—in many implementations, but not Gruber’ssurrounded by fences

Filtering my RSS reading

A couple of weeks ago, I decided to cut back on my RSS feed reading.1 Not by reducing the number of feeds I’m subscribed to, but by filtering articles to eliminate those that would just be a waste of my time. The change was inspired by a particularly stupid post by Erik Loomis at Lawyers, Guns & Money. I realized that in all the years I’ve been reading LGM, I’ve liked very few Loomis articles. I start out thinking “maybe this one will be different,” but it seldom is. I just needed to cut him out.

My feedreader is NetNewsWire, which has been working well for me since I started using it about a year ago. Although there’s been some talk of adding filtering to NNW, it hasn’t happened yet. So what I need to do is set up filtered feeds and subscribe to them.

In olden times, I might have used Yahoo Pipes to do the filtering. Today’s equivalents are Zapier and IFTTT. After a bit of reading, it seemed like the parts of Zapier I’d need would require a $20/month subscription. And while I feel certain IFTTT could do what I wanted, I’m not interested in learning to write IFTTT applets—if I’m going to write filtering code, I’d rather do it in a more general purpose way.

I could subscribe to Feedbin or a similar service and point NetNewsWire to my subscription. This would be the right choice if, in addition to filtering, I wanted to fold a bunch of other things Feedbin does—like email newsletters, for example—into my RSS reading, but I’m not interested in that. If I’m going to spend %5/month, I’ll get a lot more out of a low-end virtual machine at Linode or Digital Ocean, which could host both my RSS filtering and other cloud-related services I build. And since I already have such a subscription…

My approach is very Web 1.0. For each feed I want to filter, I create a CGI script on my server. The script reads the original feed, filters out the articles I don’t want, and returns the rest. The URL of that script is what I subscribe to in NetNewsWire.

So what should the script be? My first thought was to use Python. It has the feedreader library, which I’ve used before. It parses the feed—from almost any format—and builds a dictionary from it. At that point, it’s easy to filter the articles using standard dictionary methods. Unfortunately, the filtered dictionary then has to be converted back out into a feed, which feedreader can’t do. I got around this by printing out the filtered dictionary as a JSON Feed. Since Brent Simmons is the driving force behind both NetNewsWire and the JSON Feed standard, I knew NNW would be able to parse the output of my filtering script.

This worked fine, and I used it for a couple of days, but it felt wrong. RSS and Atom feeds are XML files, and XML is supposed to be filtered using XSLT. The thing is, I haven’t used XSLT in ages, and I didn’t much care for it then. It was invented back when clever people thought everything was going to be put in XML format, so they built a programming language in XML. I’m sure they thought this was great—just like Lisp programs being written as Lisp lists—but it wasn’t. I’m sure there are many reasons XML hasn’t turned out to be as revolutionary as was thought 20 years ago, but one of them has to be the shitty language used for XML transformations.

Still, all I wanted to do was search for certain text in a certain node and prevent those records from appearing in the output. Everything else would be passed through as-is. Sal Mangano’s XSLT Cookbook has an example of a simple pass-through XSLT file (also known as the identity transform), which I used as the basis for my script:2

1:  <xsl:stylesheet version="1.0" xmlns:xsl="">
3:  <xsl:template match="node() | @*">
4:    <xsl:copy>
5:      <xsl:apply-templates select="@* | node()"/>
6:    </xsl:copy>
7:  </xsl:template>
9:  </xsl:stylesheet>

XSLT is a rule-based language. The rules define how the various elements of the incoming XML document are to be treated. In the pass-through example, the match in the template rule on Line 3 matches all the elements (node()) and all the attributes (@*). The copy command then copies whatever was matched, which was everything.

With the pass-through rule in place, the script can be expanded to add additional rules that are more specific matches to particular elements or attributes. The Lawyers, Guns & Money feed identifies the author of each post this way:

  [other tags]
  <dc:creator><![CDATA[Erik Loomis]]></dc:creator>
  [more tags]

So I needed to add the following to the pass-through script:

Here’s what I came up with:

 1:  <xsl:stylesheet version="1.0"
 2:    xmlns:xsl=""
 3:    xmlns:dc="">
 5:    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
 7:    <xsl:template match="node() | @*">
 8:      <xsl:copy>
 9:         <xsl:apply-templates select="node() | @*"/>
10:      </xsl:copy>
11:    </xsl:template>
13:    <xsl:template match="item[contains(dc:creator, 'Loomis')]"/>
14:  </xsl:stylesheet>

You can see the namespace addition in Line 3 and the new rule for the <dc:creator> element in Line 13. Because there’s no action within this rule, nothing is done when an <item> contains “Loomis” in its <dc:creator> tag. And by “nothing,” I really mean nothing—there’s no output associated with this rule, which means Loomis’s posts are omitted.

With this XSLT file in place, I just needed a shell script to download the original feed and process it through the filter.

 1:  #!/bin/bash
 3:  echo "Content-Type: application/rss+xml"
 4:  echo
 6:  curl -s \
 7:  | xsltproc loomis-filter.xslt -

Lines 3–4 provide the header and blank separator line. Lines 6–7 contain the pipeline that downloads the LGM feed via curl and passes it to xsltproc for filtering with the above XSLT file. xsltproc is part of the GNOME XML/XSLT project. It’s not the most capable XSLT processor around (it’s limited to XSLT 1.0, which is missing a lot of nice features), but it’s perfectly fine for this simple application, and it’s quite fast.

Assuming the CGI shell script is named filtered-lgm-feed and it’s on a server called, the URL I use for the subscription is

Once I had this filtered feed working, I thought of other parts of my regular reading that could use some pruning. Here’s the filter I wrote for the Mac Power Users forum:

 1:  <xsl:stylesheet version="1.0"
 2:    xmlns:xsl=""
 3:    xmlns:dc="">
 5:    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
 7:    <xsl:template match="node() | @*">
 8:      <xsl:copy>
 9:         <xsl:apply-templates select="node() | @*"/>
10:      </xsl:copy>
11:    </xsl:template>
13:    <xsl:template match="item[contains(translate(title, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'obsidian')]"/>
14:    <xsl:template match="item[contains(translate(title, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'devon')]"/>
15:  </xsl:stylesheet>

I wish all of you who use Obsidian and DEVONthink the best, but I don’t want to read about them anymore.

The translate function in Lines 13 and 14 converts all uppercase letters to lowercase before passing the result on to the contains function. Unlike the previous filter, which expects “Loomis” to have consistent capitalization (it does), this one doesn’t trust the forum users to capitalize the trigger words in any standardized way. This is especially important for the various products from DEVONtechnologies, which get almost every possible permutation of capitalization: DevonThink, devonTHINK, DevonTHINK, etc.

Using translate is a verbose way of making the change, but unfortunately XSLT 1.0 doesn’t have a lower-case function. XSLT 2.0 does, but xsltproc doesn’t support XSLT 2.0. The Java-based XSLT processor, Saxon, does, and for a while I had an XSLT 2.0 version of the MPU filter running through Saxon. But it was way slower than using xsltproc, so I returned to the more clumsy filter you see above.

The script that runs the filter and returns the Obsidian- and DEVON-less MPU posts looks pretty much like the LGM script:


echo "Content-Type: application/rss+xml"

curl -s \
| xsltproc topic-filter.xslt -

Although this post is kind of long-winded. building the filters didn’t take much time. It’s easy to download an RSS feed and look through it to see which nodes and attributes to use for the filter. Now that I have a couple of examples to build on, I expect to be adding more filters soon.

  1. Throughout this post, I’ll be using “RSS” as a catch-all term for any kind of feed, regardless of format. My apologies to all the Atom fans out there. 

  2. No, I don’t own the XSLT Cookbook, but my local library provides its patrons with a subscription to O’Reilly’s ebooks and courseware. It’s a good service, and you should look into whether your library does the same. 

Integers, decibels, and graphs

Earlier this week, John Cook wrote a couple of posts (first, second) about decibels and their analog to bases other than 10. In the second post, he included a graph for which he has some reservations. I thought I’d see if I could improve the graph. I think I did, but I have my own reservations.

First, let’s remind ourselves of what decibels are. Decibels are a way of representing the ratio of signal strengths on a logarithmic scale. The ratio of the powers of two signals, \(r\), is said to be \(n\) decibels when that ratio is

\[r = 10^{n/10}\]

An interesting thing about the decibel scale is that the power ratio gets very close to integer values for certain values of \(n\). Here’s a table where I’ve highlighted the four ratios that are pretty close to integers.

Decibels, n Power ratio, r
1 1.259
2 1.585
3 1.995
4 2.512
5 3.162
6 3.981
7 5.012
8 6.310
9 7.943

This relatively large set of near-integers sparked Cook’s curiosity.

Is base 10 unique in this regard? If we were to look at the analogs of decibels in other bases, would we see a higher or lower proportion of near integers? For example, we could look at the base 12 (duodecimal) analog of decibels, or “duodecibels” for short. Or we could look at the base 16 analog “hexadecibels.”

For these analog decibels, which I will hereby refer to as “decibels” (in quotes), the equation for the power ratio is

\[r = b^{n/b}\]

Cook calculated the ratios for a large number of bases and counted how many were close to being integers. “Close” being a matter of taste, he counted for three levels of closeness: within 0.10 of an integer, within 0.05 of an integer, and within 0.02 of an integer. He called these levels \(\varepsilon\) and produced this graph:

Cook original graph

Cook uses matplotlib to make most of his graphs, and here we can see that he’s allowing its defaults to give the graph a couple of stylistic shortfalls. The bases are all integers, but the x-axis tick labels are in multiples of 2.5 units. The legend doesn’t use consistent numbers of decimal places. But these are small things, easily fixed. Cook’s reservation is about using connecting lines.

I first made this plot using discrete markers rather than connected lines. Generally that’s a good thing to do for functions only defined on integers. But the plot was hard to read. Connecting the lines makes it easier to see which values correspond to the same value of ε.

It is easier to see what’s going on than if he’d just done a scatter plot, but there are still problems, mainly because of the common results for different values of \(\varepsilon\) and the order of plotting that covers up certain parts of the orange and blue plots.

My fix, to avoid both connecting lines and cover-ups, was to apply a technique that Kieran Healy often uses to good effect: jittering. Or something very like jittering, anyway. Because the bases are discrete and our audience knows they take on only integer values, we can shift them horizontally just a bit for each value of \(\varepsilon\) without fear of confusion. Here’s what I came up with:

Near-integer decibels

The change in horizontal position for each value of \(\varepsilon\) prevents overlap. The light vertical lines connect the results to the appropriate base, and by not drawing tick marks on the horizontal axis, we avoid any suggestion that the blue and purple values represent non-integer bases. In a sense, we’re treating the bases as categories and plotting columns for each \(\varepsilon\).

I did try this first with columns only, but it didn’t make a strong enough visual impression, mainly because the columns had to be thin to stay clustered near the integer bases. Adding the marker at the top of each column improved the appearance, and when I did that, I was able to lighten the color of the columns to make the markers the focal point while still providing a visual connection to integer bases. Here’s a closeup of the cluster at \(b = 16\), where you can see the lightening clearly:

Lollipop closeup

You’ll also notice that I changed the aspect ratio of the graph, stretching it out horizontally to give better separation of the clusters. Even with the stretching, there wasn’t enough room to include all the bases along the horizontal axis. Even without labels, I think it’s clear where the odd bases are.

Minor changes included shifting the colors to a “colorblind safe” set from ColorBrewer and adding light horizontal grid lines to make the values easier to read, especially those from the higher bases.

Overall, I think this addresses Cook’s proper concern with having connecting lines that don’t represent actual connections while still providing a clear indication that base 10 is special in how many near-integer powers it has. It is a bit on the anemic side, but I can’t make the markers any bigger without overlapping.

Vintage mechanics

Last week I read Vintage Murder by Ngaio Marsh and was surprised to find that the setup for the murder was very much like the homework problems I used to assign in my sophomore-level mechanics class. Solving the problem—which didn’t help in solving the murder—was a fun little exercise.

Marsh’s hero, Detective Chief Inspector Roderick Alleyn, is on holiday in New Zealand. An English theatrical company happens to be travelling the same route as Alleyn, and they all stop in some small town on the North Island. Because he’s gotten to know the members of the company, Alleyn is invited to a birthday party for the company’s leading lady, Carolyn Dacres.

As a surprise, Dacre’s husband and the manager/owner of the company, Alfred Meyer, has brought along a jeroboam of champagne and has worked in secret with the stagehands to have it magically revealed at the party, which takes place on stage after a show.

Here’s Marsh’s description of the setup:

The giant bottle was suspended in the flies with a counterweight across a pulley. A crimson cord from the counterweight came down to the stage and was anchored to the table. At the climax of her party, Carolyn was to cut this cord. The counterweight would then rise and the jeroboam slowly descend into a nest of maiden-hair fern and exotic flowers, that was to be held, by Mr. Meyer himself, in the centre of the table.

And here’s my sketch of it. I’ve drawn the cord connecting the bottle to the counterweight in blue to help distinguish it from the cord that’s supposed to get cut.

Bottle setup

Alfred and the stagehands practiced the cord-cutting several times to make sure the bottle would come down at the right speed and in the right place. But at some point before the party, the murderer removed the counterweight, connected the red and blue cords, and shifted the pulley a couple of feet sideways. When Carolyn cut the red cord, the jeroboam fell rapidly and smashed into poor Alfred’s head, killing him instantly and ruining the party.

The dynamics of the altered setup aren’t very interesting, but the dynamics of the original setup are. As a homework problem, this could be assigned in two ways:

  1. Given the masses of the jeroboam and the counterweight, determine the downward acceleration of the jeroboam when the red cord is cut.
  2. Given the masses of the jeroboam and the counterweight and the height of the jeroboam above the table, determine the downward velocity of the jeroboam just before it reaches the table.

Let’s solve both. We’ll make the usual sophomore-level assumptions that the following values are negligible and can be ignored in the analysis:

The acceleration problem is best solved using Newton’s Second Law. We’ll start with the free-body diagrams of the bottle (I’m tired of typing jeroboam) and the counterweight.

Free-body diagrams

\(g\) is the acceleration due to gravity. \(m\) is the mass of the bottle. \(\mu\) is the mass ratio of the counterweight to the bottle; it’s a dimensionless parameter that’s less than one. \(T\) is the tension in the blue cord, which is equal on both sides of the pulley because of the assumptions above.

Applying Newton’s Second Law to the bottle FBD, and taking the positive direction to be downward, we get

\[mg - T = ma \quad \Longrightarrow \quad T = m \left( g - a \right)\]

where \(a\) is the downward acceleration of the bottle.

Note that the blue cord provides a kinematic constraint that makes the downward movement—displacement, velocity, and acceleration—of the bottle equal to the upward movement of the counterweight. Using this, Newton’s Second Law applied to the counterweight, with the positive direction upward, is

\[T - \mu m g = \mu ma \quad \Longrightarrow \quad T = \mu m \left( a - g \right)\]


\[m \left( g - a \right) = \mu m \left( a - g \right)\]


\[a = g \frac{1 - \mu}{1 + \mu}\]

This is an nice, neat little solution, and it makes sense at the two extremes:

You might feel uncomfortable about my flipping the coordinate system around—down is positive for the bottle, up is positive for the counterweight. Don’t. We are in charge of the coordinate systems we use, not the other way around. We just have to stay consistent within the equations for each free-body diagram, which we did.

Now let’s move on to the velocity problem. We could take the acceleration result we just got and do some kinematics, but it’s more fun to use energy principles. Also, when the homework problem has things moving up or down and asks for velocity, it’s a good bet that your professor wants you to solve the problem using energy.

Before the red cord is cut, the bottle and counterweight are at rest and the kinetic energy of the system is zero. When the bottle reaches the level of the table, when both it and the counterweight are traveling at a speed of \(v\), the kinetic energy is

\[\frac{1}{2} m v^2 + \frac{1}{2} \mu m v^2 = \frac{1}{2} m v^2 \left( 1 + \mu \right)\]

Because we start at zero, this is also change in kinetic energy.

There are no springs in the system, so the potential energy is all due to gravity. The bottle loses potential energy as it goes down, and the counterweight gains potential energy as it goes up. The overall change in potential energy of the system is

\[-mgh + \mu mgh = - \left( 1 - \mu \right) m g h\]

I’ve written this with a leading negative sign to emphasize that we have a loss of potential energy because the bottle weighs more than the counterweight.

The gain in kinetic energy must equal the loss in potential energy, so

\[\frac{1}{2} m v^2 \left( 1 + \mu \right) = \left( 1 - \mu \right) m g h\]

and, after a little algebra,

\[v = \sqrt{2gh} \sqrt{\frac{1 - \mu}{1 + \mu}}\]

This solution also makes sense at the extremes. You may not remember that an object dropped from a height of \(h\) hits the ground at a speed of \(\sqrt{2gh}\), but that is the well-known result.

This is fine as far as the homework-style problems are concerned, but Alfred and the stagehands had to solve the inverse problem: What counterweight is necessary to get the bottle to reach the table at a gentle speed? For that, we just rearrange the above solution to solve for \(\mu\):

\[\mu = \frac{2gh - v^2}{2gh + v^2}\]

Let’s apply some numbers to this and see what we get. I would put the original height of the bottle at 4 m above the table, and I doubt Alfred would want the bottle moving faster than 1 m/s when it reaches the table. With those values, and the usual gravitational acceleration, we get

\[\mu = \frac{2\cdot 9.81 \cdot 4 - 1^2}{2\cdot 9.81 \cdot 4 + 1^2} = 0.9748\]

So the counterweight had better be pretty damned close to the weight of the bottle, which is in the 10–11 pound range. You would think the stagehands would have to cobble together a few weights to hit this target, but the book says it was just a single counterweight—an item that was up in the flies and would normally be used to help lift scenery. How the company happened to have a counterweight of exactly this size is a mystery even DCI Alleyn couldn’t solve.