Blogging from stdin

August 15, 2012 at 11:15 PM by Dr. Drang

Lately I’ve been figuring out ways to wean myself from TextMate. While it’s possible that the open-sourcing of TM2 will turn out to be a great success, I wouldn’t lay money on it. And even if it does, it’s unlikely to happen quickly (remember when Netscape open-sourced Navigator?) and there’s no guarantee I’ll like the result.

So I need some tools to help me work in other editors. One of my favorite TextMate tools is Brad Choate’s Blogging Bundle, and I wanted something similar that I could call from any editor or from the command line. There’s probably a way to dig into the bundle and extract a few standalone scripts, but that seemed as time-consuming as just writing the few commands I need from scratch—especially since the Blogging Bundle is written in Ruby, a language I’m not comfortable with.

The two essential parts of a blogging system are:

A way of previewing posts locally before publishing. Luckily, the blog previewing command I’d written for TextMate wasn’t much more than a simple call to a standalone PHP script. Calling that script from another editor is easy.
A way of publishing posts after they’ve been written. This is where the Blogging Bundle shines, and I had nothing to replace it.

I decided to write a Python script to handle the publishing. There are a few Python WordPress libraries floating around, but none of them seem to be the library, so I decided to write my script without such a library, using xmlrpclib instead. Apart from some idiosyncrasies in both xmlrpclib and the WordPress MetaWeblog implementation, it went pretty smoothly.

My goal was to start with text that looks like this:

Title: Blogging from stdin
Keywords: programming, python, blogging, wordpress, xmlrpc

Lately I've been figuring out ways to wean myself from
TextMate. While it's possible that the open-sourcing of
TM2 will turn out to be a great success, I wouldn't lay
money on it. And even if it does, it's unlikely to
happen quickly (remember when Netscape open-sourced
Navigator?) and there's no guarantee…

I’d then run it through a script that

Parses the header lines (which could also include a line for the date and time at which the post is to be published).
Publishes the post.
Returns text that looks like the above, but with more header lines. This text could, if necessary, be edited and the post republished.

The return value should look like this:

Title: Blogging from stdin
Keywords: blogging, programming, python, wordpress, xmlrpc
Date: 2012-08-15 23:15:27
Post: 1913
Slug: blogging-from-stdin
Link: http://leancrew.com/all-this/2012/08/blogging-from-stdin/
Status: publish
Comments: 1

Lately I've been figuring out ways to wean myself from
TextMate. While it's possible that the open-sourcing of
TM2 will turn out to be a great success, I wouldn't lay
money on it. And even if it does, it's unlikely to
happen quickly (remember when Netscape open-sourced
Navigator?) and there's no guarantee…

This is basically the way the Blogging Bundle’s Post to Blog command works, but with fewer header lines. Because I’m not writing a general-purpose tool, I don’t need a line for the blog’s name; and because I don’t use categories here on ANIAT, I don’t need a line for that, either.

Here’s the script, named publish-post:

python:
 1:  #!/usr/bin/python
 2:  
 3:  '''
 4:  Take text from standard input in the format
 5:  
 6:    Title: Blog post title
 7:    Keywords: key1, key2, etc
 8:  
 9:    Body of post after the first blank line.
10:  
11:  and publish it to my WordPress blog. Return in standard output
12:  the same post after publishing. It will then have more header
13:  fields (see hFields for the list) and can be edited and re-
14:  published again and again.
15:  
16:  The goal is to work the same way TextMate's Blogging Bundle does
17:  but with fewer headers.
18:  '''  
19:  
20:  import xmlrpclib
21:  import sys
22:  import os
23:  from datetime import datetime, timedelta
24:  import pytz
25:  
26:  # Blog parameters (url, user, pw) are stored in ~/.blogrc.
27:  # One parameter per line, with name and value separated by colon-space.
28:  p = {}
29:  with open(os.environ['HOME'] + '/.blogrc') as bloginfo:
30:    for line in bloginfo:
31:      k, v = line.split(': ')
32:      p[k] = v.strip()
33:  
34:  # The header fields and their metaWeblog synonyms.
35:  hFields = [ 'Title', 'Keywords', 'Date', 'Post',
36:              'Slug', 'Link', 'Status', 'Comments' ]
37:  wpFields = [ 'title', 'mt_keywords', 'date_created_gmt',  'postid', 
38:               'wp_slug', 'link', 'post_status', 'mt_allow_comments' ]
39:  h2wp = dict(zip(hFields, wpFields))         
40:  
41:  def makeContent(header):
42:    "Make the content dict from the header dict."
43:    content = {}
44:    for k, v in header.items():
45:      content.update({h2wp[k]: v})
46:    content.update(description=body)
47:    return content
48:  
49:  # Read and parse the source.
50:  source = sys.stdin.read()
51:  header, body = source.split('\n\n', 1)
52:  header = dict( [ x.split(': ', 1) for x in header.split('\n') ])
53:  
54:  # For uploading, the date must be in UTC and a DateTime instance.
55:  utc = pytz.utc
56:  myTZ = pytz.timezone('US/Central')
57:  if 'Date' in header:
58:    # Get the date from the string in the header.
59:    dt = datetime.strptime(header['Date'], "%Y-%m-%d %H:%M:%S")
60:    dt = myTZ.localize(dt)
61:    header['Date'] = xmlrpclib.DateTime(dt.astimezone(utc))
62:  else:
63:    # Use the current date and time.
64:    dt = myTZ.localize(datetime.now())
65:    header.update({'Date': xmlrpclib.DateTime(dt.astimezone(utc))})
66:  
67:  # Connect and upload the post.
68:  blog = xmlrpclib.Server(p['url'])
69:  
70:  if 'Post' in header:
71:    # Editing an old post.
72:    postID = int(header['Post'])
73:    del header['Post']
74:    content = makeContent(header)
75:    blog.metaWeblog.editPost(postID, p['user'], p['pw'], content, True)
76:  else:
77:    # Publishing a new post.
78:    content = makeContent(header)
79:    postID = blog.metaWeblog.newPost(0, p['user'], p['pw'], content, True)
80:  
81:  # Return the post as text in header/body format for possible editing.
82:  post = blog.metaWeblog.getPost(postID, p['user'], p['pw'])
83:  header = ''
84:  for f in hFields:
85:    if f == 'Date':
86:      # Change the date from UTC to local and from DateTime to string.
87:      dt = datetime.strptime(post[h2wp[f]].value, "%Y%m%dT%H:%M:%S")
88:      dt = utc.localize(dt).astimezone(myTZ)
89:      header += "%s: %s\n" % (f, dt.strftime("%Y-%m-%d %H:%M:%S"))
90:    else:
91:      header += "%s: %s\n" % (f, post[h2wp[f]])
92:  print header.encode('utf8')
93:  print
94:  print post['description'].encode('utf8')
95:

I think it’s commented well enough, but there are a few points worth expanding on:

I keep my blog’s XMLRPC server URL and my username and password in a .blogrc file in my home directory. The file is formatted like this:
```
url: http://blahblahblah
user: myusername
pw: mypassword
```
The script gets its input from stdin and returns its output to stdout rather than using files. This seemed like the most flexible arrangement, as I can always used piping and redirection if I need to hook the script up to particular files. An unlikely advantage of doing it this way: as I was debugging, I ran the script directly from the Terminal, piping pbpaste into it and pbcopy out—no need for test files.
The documentation for WordPress’s MetaWeblog API has some errors, which I learned by exploring the return values from the metaWeblog.getPost command. The documentation says that the value of the mt_allow_comments field will be either open or closed; the value is actually either 1 or 0. It says the value of the mt_keywords field will be an array; it’s actually a string with the keywords separated by commas.
The ~~dateCreated~~ date_created_gmt field has to be expressed as a DateTime object. Confusingly, this is not an instance of the standard Python datetime class. It’s a special class defined in xmlrpclib. Some of the messing around you see in the code consists of handling this distinction.
When publishing, the ~~dateCreated~~ date_created_gmt field has to be given in UTC. Because I prefer to work in US/Central, that field ~~is given in~~ must be converted back to my local time zone when a post is retrieved. There’s more messing around in the code to convert back and forth between time zones. I use the nonstandard pytz library to do the conversions.
Lines 35 through 39, where I define the header fields and relate them to the field names in the WordPress MetaWeblog API, may look weird to you. Why am I defining two lists and then ziping them into a dictionary? Why not just make the h2wp dictionary directly? It’s because you can’t define the order of a dictionary’s keys, and I want the header fields ordered in a particular way in the returned text. The hFields list seemed like the simplest way to do that.

So far, the script is working well, but I’m under no illusions—its error handling is practically nonexistent, and I’m sure I’ll run into problems eventually. I’ll solve them as they come along.

Update 8/16/12
As expected, there were bugs, and they didn’t take long to appear.

First, I had forgotten to encode the output to handle non-UTF characters. That was a pretty easy fix.

More troublesome was my confusion over the dateCreated field. When I’d upload a post with dateCreated set to UTC, the publication time appeared correct (in US/Central) in the WordPress web interface, but the post wouldn’t get published at the indicated time. Very frustrating. After examining the metaWeblog.getPost output, I saw that the date_created_gmt field was 10 hours ahead of dateCreated, not 5 hours as it should be. Somehow, the time zone correction was being doubled.

I decided to dispense with dateCreated entirely and just use date_created_gmt.¹ I convert from local time to UTC before publishing and convert the other way after retrieving. I’m sure there’s a way to use dateCreated correctly, but I don’t have the patience to look into it.

Thanks to reader Adam Tinsley for pointing out the publication time bug.

One last tool: I made this TextExpander snippet for inserting the header:

(Yes, I see the typo in the keywords. I fixed it before publishing.)

The snippet uses the new optional fill-in feature for the date. If I include the Date line, the post gets the date/time in it. If I don’t, the post gets the date/time when the command is run.

Do you get the sense that the WordPress MetaWeblog API was written by different people at different times with very different ideas about naming standards? CamelCase for one date field, underscores for another. Every time I write them out, I have to check which is which. ↩

And now it’s all this

I just said what I said and it was wrong
Or was taken wrong

Blogging from stdin

Site search

Meta

Recent posts

Credits

And now it’s all this

I just said what I said and it was wrong Or was taken wrong

Blogging from stdin

Site search

Meta

Recent posts

Credits

I just said what I said and it was wrong
Or was taken wrong