ROT13: A Twitter Odyssey

This started with Chuck Wendig’s recent tweet storm about the new Star Wars movie, the first of which is this:

Gur zbivr jnf n qryvtug. Ybggn sha, cerggl shaal, qrsvavgryl nf GSN jnf n erzvkrq NAU naq GYW jnf n erzvkrq Rzcver, fb gbb vf guvf bar n erzvk bs Erghea bs gur Wrqv.

To avoid inadvertent spoilers to his followers, he encoded the tweets using the old ROT13 system.

Serenity Caldwell linked to the tweet storm and then mentioned that she read it using Matthew Cassinelli’s shortcut for ROT13 decoding, first written two years ago. Jason Snell then mentioned that he’d written his own ROT13-decoding shortcut that used the Text Case app.

@settern @mattcassinelli (Unhelpfully, @Twitterrific seems to only pass URLs in the share sheet? So I load the twitter web page, snip out the title text, then use Title Case to unrot13 it.)

Yeah, he says “Title Case,” but that’s probably because converting to title case is his most frequent use of the app. Text Case does all sorts of text manipulations and, as Jason found very handy, provides a Shortcuts action that you can use to access its features (including ROT13) from your own shortcuts.

Matt’s shortcut doesn’t rely on Text Case, but it does go off-device via a link to decode.org. I don’t think there’s good way to do ROT13 within Shortcuts itself; you either follow Jason’s route by using another app or Matt’s by using an online service.1

I installed both Matt’s and Jason’s shortcuts and tried them out. They both work well, but I found some weird output due to Twitter’s oddball handling of certain characters.

For example, as I used them on Chuck Wendig’s spoiler tweets, I found that Twitter uses the &#39; HTML entity to depict straight apostrophes. It’s weird that they use an entity for a perfectly legal HTML character, but I guess they do it to avoid single quotes within single quotes in their code. The upshot is that both shortcuts translate

srryf yvxr vg'f


into

feels like it&#39;s


which is certainly readable, but not ideal.

As I looked further into Twitter’s output (see below), I also saw &quot; for straight double quotes (why they weren’t done as &#34; is beyond me) and &#10; for newlines. Jason’s shortcut handles newlines, but neither display straight double quotes the way you’d like.

I decided to steal bits of both shortcuts and try to adjust the way they handle entities to get output that looks more normal. The first step is to get the tweet’s text from its URL. Jason does it with these steps:

Matt does it with these:

They both do a GET on the tweet’s URL, treat the contents returned as HTML and then extract the tweet text from within the HTML. But they get it from different places. After looking at the HTML of a handful of tweets, I’ve found different variations on the tweet text in five places. How you process the text depends on where you get it from.

The first place you find the text of the tweet is where Jason’s shortcut extracts it: in the <title> tag. Here’s the <title> of Jason’s tweet above:

<title>Jason Snell on Twitter: &quot;(Unhelpfully,
@Twitterrific seems to only pass URLs in the share
sheet? So I load the twitter web page, snip out the
title text, then use Title Case to unrot13 it.)…
https://t.co/ACJHPAYJNk&quot;</title>


As you can see, this includes a preamble, an ellipsis after the actual text, and also a URL. In this case, the URL points to Jason’s tweet itself, which is a little weird. In other tweets I looked at, there is no trailing URL, so I’m not sure what the criteria are for including it. And, frankly, I’m not all that interested in finding out. I just know that I’d prefer to use one of the other versions of the tweet text, if possible.

The second place you find it is in an attribute of a <link> tag:

<link rel="alternate" type="application/json+oembed"
@Twitterrific seems to only pass URLs in the share
sheet? So I load the twitter web page, snip out the
title text, then use Title Case to unrot13 it.)…
https://t.co/ACJHPAYJNk&quot;">


This also includes the preamble, ellipsis, and URL, so let’s move on.

The third place you find the tweet text is as an attribute of a <meta> tag:

<meta  property="og:description" content="“@settern
pass URLs in the share sheet? So I load the twitter web
page, snip out the title text, then use Title Case to
unrot13 it.)”">


This looks like what I want. It has no preamble or trailing cruft and it includes the handles of the people being replied to. Basically, it’s exactly what you think of as “the tweet text” when you read a tweet.

The fourth place is the content of a <p> tag. This is where Matt’s shortcut extracts the text.

<p class="TweetTextSize TweetTextSize--jumbo js-tweet-text
js-nav" dir="ltr" data-mentioned-user-id="643443" ><s>@</s>
<b>Twitterrific</b></a> seems to only pass URLs in the share
sheet? So I load the twitter web page, snip out the title
text, then use Title Case to unrot13 it.)</p>


I don’t like this one because it presents a new problem: the @Twitterrific part is presented as the HTML of a link to the Twitterrific user profile page. I suppose I could work out a way to strip out all the anchor tag crap, but why bother when there’s already a string with exactly what I want?

The final spot for the tweet text is in some absurdly long HTML-encoded JSON string embedded in an <input type="hidden"> tag. And when I say “absurdly long,” I mean over 300 kilobytes. Here’s just the bit near the tweet text:

&quot;title&quot;:&quot;Jason Snell on Twitter: \&quot;
(Unhelpfully, @Twitterrific seems to only pass URLs in
the share sheet? So I load the twitter web page, snip out
the title text, then use Title Case to unrot13 it.)\u2026
https:\/\/t.co\/ACJHPAYJNk\&quot;


I’m rejecting this out of hand because it’s in such a long tag.

If you’re wondering why I bothered looking at tweets that weren’t already ROT13 encoded, it’s because there are more examples of non-encoded tweets than encoded ones. Looking at non-encoded tweets gave me a wider variety of input to test. Also, encoding is in the eye of the beholder, and the ROT13 algorithm doesn’t care whether the characters look like English text or not.2

After deciding on which version of the tweet text to extract and playing around with different ways to handle the HTML entities, this is the shortcut I came up with:

The first four steps are just like Matt’s and Jason’s, except I extract the text from a different part of the HTML.

The best way to handle most HTML entities is to convert the HTML to rich text, which is done in the 6th step. Unfortunately, this doesn’t preserve the newlines; it flattens them into a single space. So the 5th step converts every &#10; into a series of ten equals signs prior to the conversion to rich text. Then after Text Case does the ROT13 encoding in Step 7, Steps 8 and 9 covert the equals sign string into a newline. (The text in Step 8 is just a blank line. I would have preferred to use \n as the replacement text in Step 9, but that doesn’t work, even if you tell Shortcuts to use regular expressions in the Show More part of that step.)

I decided to use Text Case for the ROT13 encoding instead of decode.org for a few reasons:

1. I already own Text Case, so using it doesn’t cost me anything.
2. It’s faster to do the conversion on-device than calling out to a website.
3. In some of my tests, Matt’s shortcut returned an explanation of ROT13 instead of the decoded text. I suspect decode.org doesn’t have the most robust API in the world.

If you want to use this shortcut without copying all the regex crap, you can download it from iCloud. This is what it looks like after running:

Given that I see very few ROT13 tweets, I can’t imagine using this shortcut too often. On strictly economic terms, it wasn’t worth the effort. But it was fun to go through the logic of Matt’s and Jason’s shortcuts, dissect the structure of the HTML returned by a Twitter URL, and then figure out how to handle the edge cases (of which I’m sure there are more).

During a checkup a year or so ago, my doctor asked if I do puzzles and things like that. It was a more clear indication of my advancing years than a prostate exam or colonoscopy. I’m not sure how definitive the research is on the connection between keeping your brain active and fending off Alzheimer’s, but I guess he thought of it as “it couldn’t hurt” advice. He’d probably approve of this shortcut.

1. After typing this, I thought about using JavaScript with a Data URI. This would get rid of the dependence on both 3rd party apps and online resources, I couldn’t get it to work within Shortcuts.

2. Furthermore, ROT13 is unique among the Caesar ciphers in that you use the same letter shift for both encoding and decoding. So you could argue that there’s no real difference between it’s encoded and decoded text.