I subscribe to two feeds from TypePad: Lisa Schmeiser’s Penny Wiseacre and the pseudonymous Lance Mannion. You may be surprised to hear that TypePad is still up and running, but it is, and I’m sure those who use it have spent a lot more time writing than they have fiddling with styles, plugins, and scripts. But however convenient TypePad may be for writers, the feeds it produces are a pain for me to read.

The problem is that all the formatting is striped away, leaving nothing but the text itself. You might say “well, yes, that’s what RSS does,” but I mean all the tags are stripped away—I get no links, no text styling, no images, and, most importantly, no paragraph breaks. Just long string of text. Here’s an example in Reeder.

This isn’t so bad for very short posts, but it ruins the experience of reading anything longer. Only James Joyce would make it through this filter unscathed.

I thought maybe the RSS readers I use (Reeder on the iPhone and ReadKit on the Mac) were misinterpreting what TypePad was sending. To check on that, I downloaded the feeds directly with curl and took a look:

curl http://lancemannion.typepad.com/lance_mannion/atom.xml


The content portion of the entry for the post listed above is this:

<content type="xhtml" xml:lang="en-US" xml:base="http://lancemannion.typepad.com/lance_mannion/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><em>Barnes
&amp; Noble. Thursday. April 17, 2014. Six forty five
p.m.</em></p>  <p><a
href="http://www.planet-of-the-blind.com/"
target="_blank">Steve,</a></p>  <p>A hundred less than
solitudinous years ago when I was in Boston working in a
bookstore and in charge of our literature section, Avon
Books was publishing a series of paperback editions of the
great Latin American writers of the day.&#0160; Jorge Amado,
Julio Cortazar, Mario Vargas Llosa, many others. The covers
were white with fragmented paintings on the front, all in a
similar style, maybe by the same artist.&#0160; They were
bright and rich with lush greens and sugary browns
dominating the motifs, calling up images of jungles and

[etc.]

</content>


Well, shucks, there are certainly tags in there, including paragraphs. At this point, it dawned on me that neither Reeder nor ReadKit were seeing this feed. Because I use Feed Wrangler as my RSS sync service, the feed readers display what they get from it, not what TypePad produces directly. So I used the Feed Wrangler API to see what it’s putting out. For the entry above, the JSON structure looks like this:

{"feed_item_id":731542571,
"published_at":1397790488,
"created_at":1397793684,
"version_key":1397800399,
"updated_at":1397800399,
"starred":false,
"author":"Lance Mannion",
"feed_id":78301,"feed_name":"Lance Mannion",
"body":"Barnes & Noble. Thursday. April 17, 2014. Six forty
five p.m.   Steve,  A hundred less than solitudinous years
ago when I was in Boston working in a bookstore and in
charge of our literature section, Avon Books was publishing
a series of paperback editions of the great Latin American
writers of the day.\u00a0 Jorge Amado, Julio Cortazar, Mario
Vargas Llosa, many others. The covers were white with
fragmented paintings on the front, all in a similar style,
maybe by the same artist.\u00a0 They were bright and rich
with lush greens and sugary browns dominating the motifs,
calling up images of jungles and

[etc.]

"title":"Marquez"}


So it’s Feed Wrangler that’s stripping out the tags. Is it because the TypePad feeds are in Atom format? Nope. I have an Atom feed for this blog,1 and Feed Wrangler delivers it with all tags intact:

{"feed_item_id":735156914,
"published_at":1397767278,
"created_at":1398016519,
"version_key":1398016519,
"updated_at":1398016519,
"starred":false,
"author":"Dr. Drang",
"feed_id":90437,
"feed_name":"And now it\u2019s all this",
"body":"<p>Because of the OpenSSL bug, I&#8217;ve been doing
a lot of password changing recently. <a
</a> has generally been a big help in this, but my continual
interaction with it has revealed one annoying aspect of its
design.</p>\n\n<p>As we all know, 1Password is exceptionally
good at recognizing which web site you&#8217;re browsing and
page automatically. But when you need that information in
another application\u2014to set up an email account or a
calendar subscription, for example\u2014you have to search
is best done through <a
app that has access to the most used features of the full

[etc.]



The only difference I can see between my Atom feed and those from Lance Mannion and Penny Wiseacre is that my <content>s are wrapped in a CDATA structure,

<content type="html" xml:base="http://www.leancrew.com/all-this/2014/04/the-wrong-sided-arrow-in-1password/">
<![CDATA[<p>Because of the
OpenSSL bug, I&#8217;ve been doing a lot of password
changing recently. <a
/a> has generally been a big help in this, but my continual
interaction with it has revealed one annoying aspect of its
design.</p>

<p>As we all know, 1Password is exceptionally good at
recognizing which web site you&#8217;re browsing and filling
automatically. But when you need that information in another
application—to set up an email account or a calendar
subscription, for example—you have to search the 1Password
database and copy out your credentials. This is best done
through <a
app that has access to the most used features of the full

[etc.]

</content>


and theirs aren’t. Is TypePad violating the Atom format by not using CDATA? By my reading of the Atom spec, no.

One last bit of investigation: Both Reeder and ReadKit can subscribe to feeds directly—they don’t need to go through a syncing service. So I subscribed to the three Atom feeds directly to see how they were displayed. Reeder displayed all three just fine. Here’s an excerpt from the Mannion entry we’ve been looking at:

Much nicer with the paragraph breaks, isn’t it?

Here, on the other hand, is what it looks like in ReadKit:

Nada. Oh, it gets the feed name, the author, the date, and the title, but not the content. Same with Penny Wiseacre. On the plus side, it displays entries from ANIAT with no trouble at all.

If you’re scoring at home, that’s

1. Reeder at the top, able to parse all Atom feeds I’ve given it with the expected formatting.
2. Feed Wrangler in second, able to parse some Atom feeds as expected, but stripping the formatting from others.
3. ReadKit pulling up the rear, able to parse some Atom feeds as expected, but failing to deliver even the unformatted text of others.

I have no idea where in this ranking Feedbin, Feedly, or the other sync services would fall. Gabe Weatherhead did a nice series of reviews of these services last year, but he didn’t cover this issue, probably because he never ran across it in his set of subscriptions.2

I started this exercise with a vague understanding that RSS and Atom were tricky devils. Now I have a better sense of just how tricky they are. Skimming through the Atom spec, I saw so many options and exceptions, it’s a wonder they can be parsed at all. I have a feeling that if you program a reader for this stuff, you’ll spend more time on heuristics than reading the spec.

I’ll show what I’ve found to Underscore David Smith and Balazs Varkonyi. They’re both nice guys who want to improve their products, so I wouldn’t be surprised to see both Feed Wrangler and ReadKit better able to handle TypePad’s feeds in short order.

Update 4/22/14
Brent Simmons, who has a bit of experience parsing feeds, explains what’s wrong with the Atom feed and how he handled problems like this:

Just look at it. This feature is a giant invitation to screwed-up feeds. The HTML [inside the <content>] — which is probably a blog post, typed by a human — has to be valid XML. People writing scripts to generate these feeds have to make sure they can turn that HTML into valid XML.

The unshortened version of the example URL is

http://schmeiser.typepad.com/penny_wiseacre/rss.xml


TypePad’s RSS feed doesn’t wrap the article’s HTML in a CDATA structure, either, but it does encode all the tags inside the <content>. Using Mannion’s Marquez post as an example again:

<content:encoded>&lt;p&gt;&lt;em&gt;Barnes &amp;amp; Noble.
Thursday. April 17, 2014. Six forty five
p.m.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;a
href=&quot;http://www.planet-of-the-blind.com/&quot;
target=&quot;_blank&quot;&gt;Steve,&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A hundred less than solitudinous years ago when I
was in Boston working in a bookstore and in charge of our
literature section, Avon Books was publishing a series of
paperback editions of the great Latin American writers of
the day.&amp;#0160; Jorge Amado, Julio Cortazar, Mario
Vargas Llosa, many others. The covers were white with
fragmented paintings on the front, all in a similar style,
maybe by the same artist.&amp;#0160; They were bright and
rich with lush greens and sugary browns dominating the
motifs, calling up images of jungles and

[etc.]

</content>


This allows proper parsing in the testing I’ve done so far.

In the grand internet tradition, I’ll promote this as

How a Naperville dad fixed his TypePad subscriptions with one simple trick!

1. WordPress can create feeds in two or three formats. I used to think it was cool to give readers a choice between RSS2 and Atom. Then I decided that was stupid because feed readers could handle either type, and I stopped linking to the Atom feed (although I kept generating it so the subscribers who used it wouldn’t get cut off). Now I’m not so sure about the universality of feeds.

2. Gabe does talk about NewsBlur having a mode in which it grabs the content directly from the site instead of displaying the content portion of the feed. That would avoid this problem entirely.