Updating scripts

Yesterday’s post ended with this little script for searching a Mastodon archive:

python:
 1:  #!/usr/bin/env python3
 2:  
 3:  import json
 4:  import os
 5:  import sys
 6:  import subprocess
 7:  
 8:  # Combine the arguments to set the search term
 9:  term = ' '.join(sys.argv[1:])
10:  
11:  # Get the JSON data from my old Mastodon account
12:  mfile = open('/full/path/to/outbox.json')
13:  mastodon = json.load(mfile)
14:  
15:  # Open every post with the search term.
16:  for p in mastodon['orderedItems']:
17:    if p['type'] == 'Create':
18:      if term.lower() in p['object']['content'].lower():
19:        subprocess.run(['open', p['object']['url']])
20:        # print(p['object']['url'])
21:        # print(p['object']['published'])
22:        # print(p['object']['content'])
23:        # print()
24:  
25:  # Button up
26:  mfile.close()

The script works fine, but it would have been better to write it like this (thanks to saska for pointing this out):

python:
 1:  #!/usr/bin/env python3
 2:  
 3:  import json
 4:  import os
 5:  import sys
 6:  import subprocess
 7:  
 8:  # Combine the arguments to set the search term
 9:  term = ' '.join(sys.argv[1:])
10:  
11:  # Get the JSON data from my old Mastodon account
12:  with open('/full/path/to/outbox.json') as mfile:
13:    mastodon = json.load(mfile)
14:  
15:  # Open every post with the search term.
16:  for p in mastodon['orderedItems']:
17:    if p['type'] == 'Create':
18:      if term.lower() in p['object']['content'].lower():
19:        subprocess.run(['open', p['object']['url']])
20:        # print(p['object']['url'])
21:        # print(p['object']['published'])
22:        # print(p['object']['content'])
23:        # print()

Why is this better? It’s not because it’s a few lines shorter. It’s because Python’s file handling has evolved over the years, and the with construct in Lines 12–13 of the updated script is cleaner than the open/close construct in Lines 12–13 and 26 of the original script. Here’s an explanation of why I wrote the script the way I did, why the update is better, and how your history of programming over many years can often work to the detriment of your programming now.

As I was writing out the script the first time, there was no open command at all. I thought I could simply write

mastodon = json.load('/full/path/to/outbox.json')

where the argument to json.load was a file path (which is a string) instead of a file object (which is… an object). I wrote it this way because I’ve been doing a lot of programming with the pandas library over the past several years, and a line that appears in almost every one of my pandas scripts imports the data from a CSV file into a dataframe like this:

df = pandas.csv('/full/path/to/data.csv')

The read_csv function has two really nice features:

  1. As you can see, it accepts a file path as its argument. It can also take a file object or other Python objects that understand the read method. I always call it with a file path argument because that’s the easiest way to use it.
  2. It talks to the underlying operating system to open the file for reading and then close it after the dataframe is constructed. It gives you three operations in a single function.

Unfortunately, json.load isn’t as helpful as pandas.read_csv, and my initial version of the script failed at that line with this error message:

AttributeError: 'str' object has no attribute 'read'

Oh, right, I thought, I’m doing json, not pandas. I need to open the file first. The word “open” led me to edit that single line into two:

mfile = open('/full/path/to/outbox.json')
mastodon = json.load(mfile)

This is a perfectly legal way to open a file for reading, but it hasn’t been the preferred way in many years. For some reason, my mind jumped back to the Python 1.5 days, and I wrote the script as if it were 1999.1

Why isn’t this the preferred way to open a file? Because it leaves the file open and leaving files open can—in theory, at least—cause trouble. You have to explicitly close the file, as I did on Line 26.

The preferred way to open files nowadays2 is to use open as part of a with block, as you can see in Lines 12–13 of the updated script:

with open('/full/path/to/outbox.json') as mfile:
    mastodon = json.load(mfile)

You put everything that needs to be done while the file is open inside the block; Python closes the file when it leaves the block.

As a practical matter, closing the file isn’t all that important in this particular script, as Python will also close the file when the script finishes. The dangers of leaving a file open are

  1. Having so many files open the OS chokes. That’s not going to happen in a script that opens only one file.
  2. Corrupting the file if the script doesn’t run to completion. That’s definitely a concern if the file is being written to, but this script is only reading.

Still, it’s better to be in the habit of closing files, and that’s best done with a with block (or using read_csv in pandas). As I said in yesterday’s post, I wrote this script mainly to keep in practice, and using out-of-date methods is poor practice.

There’s an old saying among programmers that you can write Fortran in any language. That is, you can always use outdated methods, but when you’re writing in a newer language—or a newer version of a language—it’s best to take advantage of the cleaner techniques it makes available. In this case, a brain fart sent me back to the Python I first learned instead of the Python I should be writing now.


  1. I was dreaming when I wrote it. Forgive me if I went too fast. 

  2. “Nowadays” being since Python 2.5