Feed reading

Gabe Weatherhead has a nice article today on how he uses Feedbin to handle his reading on the web. If you follow Gabe—and if not, why don’t you?—it will not surprise you to learn that he’s formidably organized. I don’t see myself following in his footsteps, but it’s always useful to learn how smart people do things. His article also reminded me that I’ve been meaning to write about my feed reading setup.

A couple of years ago, I let my subscription to Feed Wrangler lapse and started using a homemade, web-based RSS reading system. The heart of the system is still the script described in this post, but with a some changes as I thought of better ways of doing things.

The biggest change came in the past few months. Initially, I created a single static web page with all of today’s articles from the feeds I subscribe to. A cron task updated the page a few times an hour throughout the day. In this system, “today” was defined as “from 10:00 pm last night until now,” and the page would grow in size from morning to night.

The advantage of this temporal arrangement from a programming point of view was that I didn’t have to write any code to keep track of whether I’d already read an article or not, and there were no external dependencies. If it was published “today,” it got on the page.

The disadvantage was from the reading point of view. As I visited the page throughout the day, it became more and more filled with article I’d already read. This wasn’t as terrible as you might think. The articles were arranged in reverse chronological order by publication time, so the ones I’d read were typically at the bottom of the page. I say “typically” because some feeds—XKCD comes to mind—are very bad at providing accurate publication times and their articles would sometimes end up at the bottom despite being recently published.

Eliminating the reading disadvantage meant keeping track of what I’d read and showing only what I hadn’t—eliminating the programming advantage. I decided to keep track of read articles in an SQLite database and to add items to that database through a button placed at the bottom of each article on my RSS page.

RSS buttons

This meant

  1. Building a database that would uniquely identify every article. This was pretty simple. Each record has just two fields: the name of the website and the unique article ID (which is often just the article’s URL but is sometimes a long alphanumeric string generated by the site’s blogging software).
  2. Altering the existing script that builds the RSS page to filter out feed items that are in the database. Because Python has an SQLite module as part of its standard library and the syntax of SQL commands is straightforward, this wasn’t as tricky as I thought it would be. In fact, the new code is easier to read than the time-based filtering code I removed.
  3. Writing a server script (basically just a CGI script) to add an article to the database when given the blog name and article ID via the POST method. It’s been a while since I last wrote a CGI script, but it was like riding a bicycle.
  4. Adding some JavaScript with XMLHttpRequest to the RSS page to call the server script when a button is pressed. This took the most time, mainly because everyone in the world (except me) knows how to do AJAX now, and finding references written at an appropriately low level was harder than I expected. I found this Stack Overflow discussion helpful.

So now I usually tap the Mark as read button when I get to the end of an article. If it’s a long article that I want to read later, I don’t mark it as read, and it’ll be there the next time I bring up the RSS page.

Fearing I’d forget how to use XMLHttpRequest, I quickly included another form at the end of each article for adding that article to my Pinboard account. I didn’t bother adding labels to the text field, because I’m the only one who uses this and I know the field is for tags. I did, however, include some DOM stuff to make it obvious when I’d marked an article as read or added it to Pinboard.

RSS buttons marked

What I like about this system is how portable and (I hope) future-proof it is. I’ve been reluctant to sign on with Feedbin or Feedly or BazQux or any of the other Google Reader replacements because I worry they’ll write a Medium post and disappear with my subscription list and whatever organization scheme I’ve created. My system can run on any web server with Python, SQLite, and a cgi-bin directory. I think that’ll mean “any server, anywhere” for a very long time.