Seaching through my Mastodon graveyard

I’ve been sketching out a post on the ChatGPT/Bing/Sydney fuss, but I’m not sure I’ll finish it. It’s kind of meandering and not deliberately so. One thing I’m sure of is that I want to start by quoting this Mastodon post of mine:

At some point, it will be revealed that ChatGPT is a Pentium in a closet running Emacs with a mashup of Eliza and Dissociated Press.

Because this was written before I switched instances, and because Mastodon is deliberately bad at searching for text that isn’t in a hashtag, finding this post isn’t as easy as it would be if it were on Twitter. Here are the few ways of searching I know about.

First, you can try the site: feature of your search engine. A query of

drdrang eliza site:mastodon.cloud

returned the post I wanted as the first hit on DuckDuckGo.

Top result from DuckDuckGo search

A Google search didn’t work as well.

Top result of Google search

It certainly looks like it found the post, but the URL is wrong. Instead of

https://mastodon.cloud/@drdrang/109621019690225449

it’s

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=
web&cd=&ved=2ahUKEwiOueOw4qL9AhWzk4kEHaMHAMEQFnoECAkQAQ
&url=https%3A%2F%2Fmastodon.cloud%2F%40drdrang%3Fmax_id
%3D109633044504924855&usg=AOvVaw2Pz4o7DP5l6Po9YeIoRVCF

or, after the Google cruft is stripped out and the percent-encoding is undone,

https://mastodon.cloud/@drdrang?max_id=109633044504924855

which takes me to my old profile page, not the page with the post. And this is the only result Google returned.

What about Bing? The post of interest was the second item returned,

Top results using Bing

but at least it’s linked to the post I was searching for.

A second option would be to search through the archive you downloaded from your old Mastodon instance before you switched. You did that, right? Certainly you did if you’re the kind of person who wants to link to yourself.

In your archive will be a file named outbox.json, which has all of your posts. Unfortunately, it’s one long string—no linebreaks, no formatting of any kind. Mine starts off like this:

{"@context":"https://www.w3.org/ns/activitystreams","id
":"outbox.json","type":"OrderedCollection","totalItems"
:304,"orderedItems":[{"id":"https://mastodon.cloud/
users/drdrang/statuses/100580643948101506/activity","
type":"Create","actor":"https://mastodon.cloud/users/
drdrang","published":"2018-08-20T04:20:29Z","to":["

You can open this file in any text editor and search, but it’s going to be very hard to read. I used jq to save mine in a more readable format,

jq < outbox.json > outbox-formatted.json

which starts out like this:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "outbox.json",
  "type": "OrderedCollection",
  "totalItems": 304,
  "orderedItems": [
    {
      "id": "https://mastodon.cloud/users/drdrang/statuses/100580643948101506/activity",
      "type": "Create",
      "actor": "https://mastodon.cloud/users/drdrang",
      "published": "2018-08-20T04:20:29Z",

The posts themselves are in the orderedItems list, where each post looks like this:

{
  "id": "https://mastodon.cloud/users/drdrang/statuses/109621019690225449/activity",
  "type": "Create",
  "actor": "https://mastodon.cloud/users/drdrang",
  "published": "2023-01-02T18:26:56Z",
  "to": [
    "https://www.w3.org/ns/activitystreams#Public"
  ],
  "cc": [
    "https://mastodon.cloud/users/drdrang/followers"
  ],
  "object": {
    "id": "https://mastodon.cloud/users/drdrang/statuses/109621019690225449",
    "type": "Note",
    "summary": null,
    "inReplyTo": null,
    "published": "2023-01-02T18:26:56Z",
    "url": "https://mastodon.cloud/@drdrang/109621019690225449",
    "attributedTo": "https://mastodon.cloud/users/drdrang",
    "to": [
      "https://www.w3.org/ns/activitystreams#Public"
    ],
    "cc": [
      "https://mastodon.cloud/users/drdrang/followers"
    ],
    "sensitive": false,
    "atomUri": "https://mastodon.cloud/users/drdrang/statuses/109621019690225449",
    "inReplyToAtomUri": null,
    "conversation": "tag:mastodon.cloud,2023-01-02:objectId=196633804:objectType=Conversation",
    "content": "<p>At some point, it will be revealed that ChatGPT is a Pentium in a closet running Emacs with a mashup of Eliza and Dissociated Press.</p>",
    "contentMap": {
      "en": "<p>At some point, it will be revealed that ChatGPT is a Pentium in a closet running Emacs with a mashup of Eliza and Dissociated Press.</p>"
    },
    "attachment": [],
    "tag": [],
    "replies": {
      "id": "https://mastodon.cloud/users/drdrang/statuses/109621019690225449/replies",
      "type": "Collection",
      "first": {
        "type": "CollectionPage",
        "next": "https://mastodon.cloud/users/drdrang/statuses/109621019690225449/replies?only_other_accounts=true&page=true",
        "partOf": "https://mastodon.cloud/users/drdrang/statuses/109621019690225449/replies",
        "items": []
      }
    }
  },
  "signature": {
    "type": "RsaSignature2017",
    "creator": "https://mastodon.cloud/users/drdrang#main-key",
    "created": "2023-01-25T15:26:07Z",
    "signatureValue": "blahblahblah"
  }
}

(I’ve changed the signatureValue because it has no bearing on the discussion here and might be a security problem.)

As you can see, the text we’ll be searching for is in the content and contentMap items of the object. Once you’ve found the text you want, the URL to the post is the url value in that same object.

I think we can all agree that this is a clumsy way to search for the post, mainly because you have to remember where you’ve saved the archive and open the outbox-formatted.json file before you can do any searching. But by looking at the structure of the JSON, I was able to write a short script that did the opening and searching for me.

So the third option is to run that script:

mastodon-cloud-find eliza

This will find all the posts with “eliza” (regardless of case) and open them as tabs in Safari (or whatever my default browser happens to be). Here’s the script:

python:
 1:  #!/usr/bin/env python3
 2:  
 3:  import json
 4:  import os
 5:  import sys
 6:  import subprocess
 7:  
 8:  # Combine the arguments to set the search term
 9:  term = ' '.join(sys.argv[1:])
10:  
11:  # Get the JSON data from my old Mastodon account
12:  mfile = open('/full/path/to/outbox.json')
13:  mastodon = json.load(mfile)
14:  
15:  # Open every post with the search term.
16:  for p in mastodon['orderedItems']:
17:    if p['type'] == 'Create':
18:      if term.lower() in p['object']['content'].lower():
19:        subprocess.run(['open', p['object']['url']])
20:        # print(p['object']['url'])
21:        # print(p['object']['published'])
22:        # print(p['object']['content'])
23:        # print()
24:  
25:  # Button up
26:  mfile.close()

Line 9 combines all the arguments into a single search string. This makes it easier to search for multi-word strings. For example, I can run

mastodon-cloud-find steamboat willie

instead having to remember to put the phrase in quotes,

mastodon-cloud-find 'steamboat willie'

Lines 12–13 open up the outbox.json file and parse the JSON contents into the appropriate Python structure—in this case a dictionary. By putting the full path to the file into Line 12, I don’t have to remember where I saved it anymore.

Lines 16–19 then loop through the posts in orderedItems and look for the search term in the content item. Note that Line 17 checks the type of each post before doing the search. That’s because boosts, which have a type of “Announce” instead of “Create,” don’t have a content item to search. Line 18 then runs the very useful open command, which, when given a URL as an argument, opens that URL in a new tab in the default browser.

Lines 20–23 are commented out, but I left them there in case I need to debug the script. They print out certain information about the found post.

I have no idea how long my old posts will remain available at mastodon.cloud. At some point, I assume, they will flush them out, and the links to them—however I might find them—will be dead. But I’ll still have my archive, and even if the links are dead, I can still get the text of my old posts by uncommenting Lines 20–23.

Was it necessary to write a script to do the searching? Certainly not, especially since I’m unlikely to do this kind of searching very often. But it’s good to practice your scripting skills, and this was a simple script that allowed me to remind myself of how the json and subprocess modules work. It was worth it for that, even if I never use the script again.