Sort of handy

This afternoon, I wanted to see how much disk space I had left on the server that runs this blog. As is often the case with Unix/Linux commands, after I getting the needed information, I started thinking about other ways to do things. And, as is also often the case, I learned something new. New to me, anyway.

First, I logged into the server and ran the df command:

df -h .

The output, which summarizes the disk usage of the file system containing the given file (in this case, the current directory, ., was my home directory) was

Filesystem                 Size  Used Avail Use% Mounted on
/dev/disk/by-label/DOROOT   25G  9.5G   14G  41% /

This shows I’m using 41% of the 25 GB I’m paying for. The -h option told df to use a “human” format for the output, Instead of showing the usage in “1-K blocks,” it shows it in kilobytes, megabytes, and gigabytes. Lots of GNU utilities have an -h option that works this way.

This disk usage includes everything on my virtual server—all the executables, libraries, and support files in addition to the files specific to the blog. I wanted to refine this to see just what the blog was using. That called for the du command:

du -hd 1 .

Once again, the -h option meant “human formatted values.” The -d 1 option told du to go only one directory level deep. The output was

8.0K    ./.gnupg
68K     ./pagelogs
114M    ./all-this
9.0M    ./.local
116K    ./php-markdown
1.5M    ./.cache
68K     ./.ipython
20K     ./.pip
8.0K    ./.ssh
522M    ./tmp
16K     ./bin
8.0K    ./.conda
1.1G    ./public_html
4.0K    ./.nano
3.4G    ./anaconda3
5.1G    .

The line for public_html was what I was looking for: 1.1 GB. So relatively little of the space on the server is being used for the blog.

As I looked at the du output, I thought it would be more useful to have it in numerical order. I thought about adding an -s option to du, but that doesn’t work. The du man page shows no option for sorting the output.

The standard Unix way of doing things would suggest piping the output to sort, but I was sure that wouldn’t work here. Because although sort has an -n option for sorting numerically, the numbers in du’s human output weren’t what needed to be sorted. It’s the quantities I wanted sorted, and that means the suffixes had to be accounted for. A test with

du -hd 1 . | sort -n

gave me

1.1G    ./public_html
1.5M    ./.cache
3.4G    ./anaconda3
4.0K    ./.nano
5.1G    .
8.0K    ./.conda
8.0K    ./.gnupg
8.0K    ./.ssh
9.0M    ./.local
16K     ./bin
20K     ./.pip
68K     ./.ipython
68K     ./pagelogs
114M    ./all-this
116K    ./php-markdown
522M    ./tmp

which confirmed my suspicions. Perfect for sorting the numbers but useless for sorting the quantities.

I could use a different output switch for du:

du -kd 1 . | sort -n

The -k tells du to output the sizes in kilobytes. The sorted output is

4       ./.nano
8       ./.conda
8       ./.gnupg
8       ./.ssh
16      ./bin
20      ./.pip
68      ./.ipython
68      ./pagelogs
116     ./php-markdown
1448    ./.cache
9184    ./.local
115896  ./all-this
534104  ./tmp
1098316 ./public_html
3470796 ./anaconda3
5333936 .

which is great until the numbers get up past five or six digits and you lose track of the order of magnitude.

But here comes the part where I learn something. It turns out the GNU folks recognized the need to read human-formatted values as well as write them, and they added an -h option to sort. So

du -hd 1 . | sort -hr

gives

5.1G    .
3.4G    ./anaconda3
1.1G    ./public_html
522M    ./tmp
114M    ./all-this
9.0M    ./.local
1.5M    ./.cache
116K    ./php-markdown
68K     ./pagelogs
68K     ./.ipython
20K     ./.pip
16K     ./bin
8.0K    ./.ssh
8.0K    ./.gnupg
8.0K    ./.conda
4.0K    ./.nano

which is exactly what I wanted: easy to read and properly sorted. (The -r switch tells sort to reverse so the biggest directories come first.)

The -h option was added to GNU sort in 2009, four or five years after I moved back to the Mac and stopped being as intense a command line user as I had been. I don’t feel too bad about not knowing of it.


Timing in SSH

A lot of the work I do “on” my iPad consists of me essentially using the iPad as a terminal to edit files and run commands on one of my iMacs. Editing files is done in Textastic and synced to the Mac through iCloud (used to be through Dropbox). Running commands is done through either Textastic’s internal terminal, which works well enough for simple stuff, or Prompt, which has more robust terminal emulation and is better for more complex sessions. Since a lot of the terminal work I do isn’t complex, and consists of repeated invocations of

pdflatex report

and

python script.py

the responsiveness of the SSH connection to the Mac isn’t important to my productivity. The most common key I use is ↑ to rerun the last command.

But when I do need to construct and run longer commands, I’ve noticed a lag between the keyboard and the screen. I am by no means a fast typist, but I regularly get ahead of the display. This leads to typos that take a while to correct because repetition on the ⌫ key also outruns the display. I often just don’t know where the cursor is on the Mac because it’s not necessarily where I see it on the iPad.

This is not a bug in Prompt or Textastic. When I log into the Linux box that hosts this blog, both terminal programs are smoothly responsive. That the web server is halfway across the country makes it all the more frustrating when I have a choppy terminal session to the Mac that’s no more than 20 feet away from me.

When I decided to try to fix this annoyance, my working assumption was that there was something in the SSH server configuration causing the hiccups. So that was where my Googling started. I found an older post with a suggested change to the DNS settings in /etc/ssh/sshd_config, but that didn’t help because the change had already been made in more recent versions of macOS.

Somehow, though, I ran into this answer from Pistos on StackExchange. Apparently, in its neverending quest to save battery, Apple is powering down the wifi system between packets, which means a delay when new packets arrive or need to be sent. This doesn’t materially affect file transfers or streaming because the packets keep coming, but it plays havoc with intermittent communication like a terminal session.

Pistos’s solution was to set up two connections: one that keeps up a constant, albeit low volume, flow of bytes between the Mac and whatever was connected to it; and another for what he really wanted to do. I took his solution and turned it into this short shell script, which I called nolag:

#!/usr/bin/env bash
while true; do echo -n .; sleep 0.5; done

Pistos used a sleep argument of 0.1 seconds, but I have found 0.5 works just as well.

Now what I do is start a Prompt connection to my Mac and run nolag. Then I duplicate that session in Prompt and do my real work there.

Prompt session menu

This works perfectly and saves me much frustration.

You might be wondering why I don’t just plug my iMac into a wired network. It’s because my home wired network was strung back in the 90s and has bottlenecks that make it slower than my current wifi setup. A better question might be why Apple is trying to save battery life on a Mac that doesn’t run on battery.


Timing is everything

Most people describe Shortcuts as being like Automator, but that misses the mark. Yes, there is a resemblance to Automator because of the block-like visual way you program them and I would never deny the connection, but Shortcuts’ deeper similarity is with AppleScript. Both Shortcuts and AppleScript have significant capabilities on their own, but their real power comes from exploiting hooks into apps and the underlying operating system. And, significantly, those hooks have to be programmed into the apps by the developers. For ages, Mac users have bemoaned the lack of AppleScript support in many of our apps; in the past few years, iOS users have had the same frustrations with apps and Shortcuts.

Recently, I’ve run across another connection between AppleScript and Shortcuts: they both have filtering operations that are really easy to use but which can be unbelievably slow. In AppleScript, there’s the whose construct, which provides a very compact and English-like way of filtering lists.

applescript:
tell application "ThisApp"
  set aVariable to every appClass whose property is value
end tell

The property is value part can be changed to any legal AppleScript condition:

applescript:
property is less than value
property contains value
property starts with value

And so on.

I use variations on this snippet quite often, both in one-off scripts to solve a single problem and in automations that run frequently. A couple of months ago, I wrote about how slow this kind of filtering can be with Calendar events, but it isn’t confined to that app. Another slowpoke is part of my system for following up on unpaid invoices at work.

I don’t want to get into the details of my invoicing system here. Suffice it to say that I have an Invoices list with an entry for every outstanding invoice. These entries have both a due date and a recurrence relation that keeps reminding me to send followup emails to the client until the invoice is paid. Whenever I send a followup email, I click or tap the completion button in Reminders, which

  1. Marks that entry as completed.
  2. Uses the recurrence relation to create an identical entry with a due date of (typically) two weeks later.

Note that marking an entry as complete does not delete it from the Invoices list. Over time, completed entries can build up unless they’re deleted.

The automation that goes along with this system runs at 5:00 am. It figures out which active invoice reminders will be due that day and generates the followup emails that will be sent to the appropriate clients. The emails are saved as drafts and are available on all my devices through iCloud. That way, whenever I get a notification to send a followup email, the email is already written and waiting for me.

The snippet of AppleScript that figures out which invoices need a followup is this:

applescript:
set tonight to (current date)
set time of tonight to 18 * 60 * 60
tell application "Reminders"
  tell list "Invoices"
    set duns to name of every reminder whose (due date < tonight) and (completed is not true)
  end tell
end tell    

The filtering, which is based on both the due date and the completed status, can take a surprisingly long time to run. With just a couple hundred entries in Invoices (all but a dozen or so of which are completed), it can take over 10 seconds to run this little snippet of AppleScript (on my Late 2012 27″ iMac).1

Now, as a practical matter, I don’t care how long this automation takes, because it runs on my office computer while I’m still at home in bed. But it is weird that a list of only a couple hundred entries takes so long to filter. And I’ve used this same filtering construct in other AppleScripts that I run while I’m sitting at the computer, and it’s always frustrating to wait for something you know shouldn’t take that long.

I learned that Shortcuts can have this same problem after listening to Episode 49 of the Automators podcast. David and Rosemary’s guest was Scotty Jackson, and one of the shortcuts he discussed needs certain contact information for people he works with. He’s using Data Jar for storing this information, which struck me as odd, because surely all that information is already in Contacts. Couldn’t he just organize his coworkers in groups and use the group names to filter them according to context?

So after listening to the episode, I decided to see if a shortcut that used Contacts would work. I have a “Work” group in Contacts, so I wrote a shortcut with just this one step:

Contacts Filter

You can’t beat the simplicity, but on my 2018 iPad Pro, with about 950 entries in Contacts and 5 people in the Work group, this takes 13–15 seconds to run. That’s way too long and explains why Scotty J violated the DRY principle and duplicated some of his Contacts information in Data Jar. The time he spent duplicating that data will be made up for in faster run times and less frustration.

As I learned a couple of months ago, the slowdown in filtering Calendar events via AppleScript has to do with the interaction between the Open Scripting Architecture and the Calendar app. I suspect that basically the same interaction bottleneck happens with Reminders, as Reminders is at least partially built on Calendar. And maybe there’s a similar structure to Shortcuts’s interaction with Contacts that makes it so slow.

Whatever it is, it’s especially annoying when you know how little time it should take. To get a sense of this, I exported my contacts as a CSV file and ran this little Python script:

python:
#!/usr/bin/env python

import pandas as pd
df = pd.read_csv('contacts.csv')
print(df[['First name', 'Last name']][df.Group=='Work'])

It took about 0.7 seconds to run on my 2012 iMac and I bet most of that was importing the Pandas library. There’s no excuse for Apple giving us tools that run slower than something an amateur like me can whip up in a couple of minutes.


  1. You might well ask, “Why don’t you clean up your completed reminders using that script you wrote a few years ago?” All I can say is we are all sinners. 


Acceptance

There’s a certain type of work I do for which I write a short proposal in single-page letter format and send it to the prospective client as a PDF attached to an email. When I started doing this type of work a few years ago, my thinking was that the client would simply email me back saying they accept the proposal and that I could start the work. But many clients wanted something more formal, or at least more analog.

They wanted something they could sign and send back to me. So I made a small PDF image I could use as a sort of stamp.

Acceptance stamp

After writing the proposal in LaTeX and generating a PDF, I’d open it in PDFpen and add the stamp. At first I’d just open the stamp file, copy it out of that document, and paste it into the proposal. Then I got more clever and added the stamp to my PDFpen Library, so there was no need to open a second file.

The problem with adding the stamp this way was twofold: I had to remember to do it, and then I had to do some manual work in PDFpen. Given that the stamp would always be in the same spot—lower right corner, with a ¾″ margin from the page edges—it seemed like I could get rid of the manual part.

I probably could have written an AppleScript to control PDFpen, but when I finally go around to automating it, I used PDFtk and the stamp option I wrote about last year. What I built was a short shell script that was basically the same as the overlay script I described in that post, but with one of the PDF files being set to the acceptance stamp file, which I had to “grow out” to be a full page.

Acceptance overlay page

This PDFtk automation didn’t last long. It solved the manual placement problem, but I still had to remember to run the script, and it turned out that that was the more important problem to solve. I began searching for a simple way to add the stamp directly in LaTeX.

There are undoubtedly ways to add the stamp using TikZ or some similar drawing package, but I didn’t want to learn a graphics macro language just to remake a drawing I already had. What I ended up using is the pdfoverlay package, which let me put my acceptance stamp in the background and layer the proposal letter over it.1

Pdfoverlay is included in recent TeX Live and MacTeX distributions. I don’t remember how old my installation was, but I had to update it to get pdfoverlay installed and working. Once I did, though, these two lines were all I needed to add to the preamble to my proposal letters:

\usepackage{pdfoverlay}
\pdfoverlaySetPDF{/path/to/acceptance.pdf}

I edited the TextExpander snippet I use for my proposal template to include these two lines, and now I don’t have to remember anything. Which is one of the main goals of automation, right?


  1. So it’s more like the inverse of a stamp.