Do quote me

After Friday’s post was published, Leon Cowle tweeted a good question:

Isn’t it a safer habit to always quote/escape args containing wildcards?

To put this question in context, recall that the command I ran to get all the reports I’d written in the past 60 days was

find . -name *report*.pdf -mtime -60

The command was issued from within my ~/projects directory. Within it are subdirectories for every project and subsubdirectories within each of them for the various aspects of those projects. The idea behind the find command is to search down through the current directory (.) for files with names that match the glob *report*.pdf that were last modified less than 60 days ago.

Leon’s question, which was really a suggestion politely formed as a question, was about my leaving the argument to the -name expression unquoted. He thinks I should have used

find . -name '*report*.pdf' -mtime -60

He’s right. Quoting the argument to -name is a good habit to get into. But it’s a habit I find hard to form.

The reason to quote the argument is to prevent the shell from expanding the glob before find has a chance to get at it. If there were a file at the top level of the ~/projects directory with a name that matched the glob—and if it were less than 60 days old—that would be the only file that find would have returned.

I got away with leaving out the quotes because there were no such files in ~/projects. Except for a couple of SQLite database files, ~/projects has nothing but subdirectories at its top level. I knew that, which is why the command worked without quoting. And although I know that quoting is a safer habit, I wrote the post using the command just as I used it—I didn’t add the quotes to model better behavior than I usually engage in.

It’s not that I never use quotes when working in the shell. But I do tend to forget them more often than I should. One thing I will say in my defense: I build up my shell commands and pipelines incrementally, making sure every step works the way I expect before adding another. I would never write

find . -name something | xargs rm

without first checking the output of

find . -name something

Also, on those rare occasions when I write a shell script, I am much more diligent about quoting than I am when working interactively at the command line. Commands given interactively are written for a particular set of circumstances in which shortcuts can be perfectly fine. Shell scripts get used more widely, where unforeseen conditions are more likely.

There is, by the way, a way to get at the same information without find. If your shell is zsh (or if, like me, you use a recent version of bash installed via Homebrew), you can use the **/ wildcard pattern:

ls -lt **/*report*.pdf | head

In this command, the shell looks down the whole subdirectory tree for files that match the globbing pattern, ls -lt prints them in long form in reverse chronological order, and then head extracts just the first ten, which will be the ten most recent. Because the long form of ls includes the modification date, I could’ve looked through the output of this command and easily determined which reports were written in the past two months.

And because this is a situation where I want the shell to interpret the globbing pattern, quoting the pattern would be wrong. Which is a better fit to my sloppy habits.


Finding my way after getting lost

Things were busy at work at the end of July, and I didn’t get my invoices sent out. No problem, I thought, the first week of August will be fine. I had forgotten that I’d be traveling most of the first week of August, and travel usually means I’m extra busy the days before a trip.

So the bills had to go out this week, which was also fine—albeit less fine—but I had another problem. During the busy July—and going back into a kind of busy June—I had stopped updating my todo lists in Things. So I didn’t have a list of the projects that needed to be billed out. This meant trusting my memory, always fraught with peril, or…

Trusting find. I opened an iTerm window, cd’d into my ~/projects directory, and issued this command:

find . -name *report*.pdf -mtime -60

A few seconds later, I had a list of file paths to the reports I’d written in the couple of months. It looked sort of like this:

./projectA/results/structural-report.pdf
./projectB/tower1/results/glass-report.pdf
./projectB/tower1/results/concrete-report.pdf
./projectC/results/foundation-report.pdf
[etc]

Because I always put the word “report” in the names of my reports, and they’re always PDF files, I knew that the first expression in my find command

-name *report*.pdf

would do the trick. The asterisks work just as you’d expect: they’re a wildcard globbing pattern. The second expression,

-mtime -60

restricted the results to files that were last modified (that’s what the m means) less than 60 days age. There are two parts of the -mtime expression that I find tricky:

  1. That the -mtime argument is given in days. There’s a similar expression -mmin, which works just like -mtime, but its argument is given in minutes—the units are built into the name. For consistency, I wish -mtime were called -mday.
  2. That the argument to -mtime needs a minus sign to mean “less than.” I tend to write -mtime 60, which finds only files that were modified exactly 60 days ago. Then when I get no results, I grit my teeth and rerun the command with the minus sign added.

Anyway, despite my quarrels with -mtime, I got the information I needed. Once I knew which projects I’d written a recent report on, I could check whether they’d been billed out. A couple had been, but most had not. That has since been fixed and a fiscal crisis averted.

A better solution to getting my invoices out on time would be to hop back on the Things wagon. That would better than dusting off my find skills again in September.


Scripts for my homemade wiki

In yesterday’s post, we went through the directory structure of my personal wiki and the Makefile that builds and uploads the various HTML pages that comprise it. Today, we’ll go through the three little scripts the Makefile runs to do the work. As a reminder, here’s what a wiki page looks like:

Example of Hints page

The fundamental script is md2html, which does exactly what you’d expect from its name. It takes a Markdown file as input and produces an HTML file as output. Here it is:

python:
 1:  #!/usr/bin/python3
 2:  
 3:  import sys
 4:  import os
 5:  import markdown as md
 6:  
 7:  # Markdown extensions
 8:  myExt = ["tables", "codehilite", "smarty", "md_in_html"]
 9:  
10:  # Read in the Markdown source
11:  source = open(sys.argv[1], "r", encoding="utf8").read()
12:  
13:  # Get the page title from the first line
14:  title = source.splitlines(keepends=False)[0]
15:  title = title.strip('# ')
16:  
17:  # Convert it to a section of HTML
18:  mainSnippet = md.markdown(source, extensions=myExt)
19:  
20:  # Page template
21:  fullHTML=f'''<html>
22:  <head>
23:    <title>Hints: {title}</title>
24:    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
25:    <link rel="stylesheet" type="text/css" media="all and (max-device-width:480px) and (orientation:portrait)" href="../resources/mobile.css" />
26:    <link rel="stylesheet" type="text/css" media="all and (max-device-width:480px) and (orientation:landscape)" href="../resources/style.css" />
27:    <link rel="stylesheet" type="text/css" media="all and (min-device-width:480px)" href="../resources/style.css" />
28:    <link rel="stylesheet" type="text/css" media="all" href="../resources/pygstyle.css" />
29:    <script type="text/javascript"
30:      id="MathJax-script" async
31:      src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
32:    </script>
33:  </head>
34:  <body>
35:    <div id="container">
36:    <div id="content">
37:  
38:    {mainSnippet}
39:  
40:    <!--#config timefmt="%b %d, %Y at %-I:%M %Z" -->
41:    <p style="text-align: right;font-size: 75%;margin-top: 2em;">Last modified: <!--#echo var="LAST_MODIFIED" -->
42:  
43:    </div> <!-- content -->
44:  
45:    <div id="sidebar">
46:  
47:    <h1><a href="../AAAA/index.html">Hints home</a></h1>
48:  
49:    <!--#include virtual="/path/to/resources/sidebar.html">
50:  
51:    </div><!-- sidebar -->
52:    </div><!-- container -->
53:  </body>
54:  </html>
55:  '''
56:  
57:  print(fullHTML)

As you can see, most of the code, Lines 21—55, is just an HTML template. The real work is done by the lines above it.

I could do the Markdown-to-HTML conversion in any number of ways, but I chose to do it in Python for two reasons:

  1. I’m most comfortable in Python.
  2. Python has the well-tested and wide-ranging Pygments library for highlighting code. As I said yesterday, my wiki is expected to include a lot of code snippets, and I want that code to be both easy to read and easy to copy from. Pygments gives me both of those.

The main library imported by md2html is Python-Markdown, which has a variety of extensions that go beyond Gruber’s Markdown. The ones I’m using are shown in Line 8:

Line 11 reads the Markdown input file into the variable source. All the source files start with a level-one header that defines the page’s title. Here, for example, is the source of the page shown above:

# Skipping header lines while reading file #

If a data file that's going to be read line-by-line starts with a set of header lines, skip over them using [`next()`][1]:

    :::python
    headerlines = 12

    with open('data.txt', 'r') as f:
        for dummy in range(headerlines):
            next(f)
        for line in f:
            <process data line>

If the data portion is in CSV format, and I want to use Pandas to read it into a dataframe, use the `skiprows` parameter to the [`read_csv()`][2] function:

    :::python
    import pandas as pd

    df = pd.read_csv('data.csv', skiprows=12)

In this case, don't skip the header row that provides the names of the columns.


[1]: https://docs.python.org/3/library/functions.html#next
[2]: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

As you can see, the first line starts with (and optionally ends with) the single hash mark that identifies a level-one header. Lines 14–15 of md2html get that line from source, strip off the hashes and spaces from the beginning and end, and define the remainder as the page’s title.

Line 18 then does all the conversion of source into a section of HTML code. It uses the extensions described above in the conversion.

The rest of the code is just an HTML page template with three spots for insertion:

  1. Line 38 is where the mainSnippet from Line 18 is inserted.
  2. Lines 40–41 are server side include (SSI) lines for inserting the modification date of the file.
  3. Line 49 is an SSI line for inserting the table of contents in the right sidebar.

mainSnippet is inserted when md2html is run. The SSI lines are left verbatim in the resulting HTML file—their insertions are done by the server when the page is requested by a browser.

Line 57 prints out the HTML. It gets redirected to a file in site directory by the shell command we discussed yesterday:

mkdir -p ../site/Python && ./md2html ../source/Python/skip-header-lines.md > ../site/Python/skip-header-lines.html

Where this shell command comes from is the subject of our next script: buildPages.

python:
 1:  #!/usr/bin/python3
 2:  
 3:  import os
 4:  
 5:  # The root directory is the top level of the source tree
 6:  rootDir = '../source'
 7:  
 8:  # Walk the source tree to get all the Markdown files and
 9:  # create the commands for writing the HTML files
10:  for sourceDir, subdirList, fileList in os.walk(rootDir):
11:    if sourceDir == '../source':
12:      # Skip the top level of the source tree
13:      continue
14:    else:
15:      # The HTML files go in the site tree
16:      siteDir = sourceDir.replace('/source/', '/site/')
17:  
18:      # Go through all the files in the current subdirectory
19:      for f in [ x for x in fileList if x[-3:] == '.md' ]:
20:        # The name of the HTML file is the same but with a different extension
21:        hf = f.replace('.md', '.html')
22:  
23:        # The command to write an HTML file
24:        print(f'mkdir -p {siteDir} && ./md2html {sourceDir}/{f} > {siteDir}/{hf}')

This command walks through the source directory, identifies all the Markdown source files, and generates a shell script line like we just saw for creating a new folder in the site directory (if necessary) and making an HTML file from each Markdown file. buildPages doesn’t actually do any building; it just creates a file of shell commands that the Makefile will call upon to do the building.

The key to buildPages is Python’s os.walk command, called on Line 10. When used in a loop, it recursively works its way down the directory structure, finding all the subdirectories and files within. Since none of our Markdown source files are in the top level of the source directory, Lines 11–13 skip over that to concentrate on the topic subdirectories.

Line 16 figures out where the resulting HTML file should go in the site directory by simply replacing source with site in the file’s path. Then Lines 19–24 loop through the Markdown source files (identified by a .md extension) and build a shell command similar to the one we saw above.

The Makefile calls buildPages during the prep stage and redirects the output to a file called buildAll. When that’s done, buildAll contains a line for creating each HTML file. As we saw yesterday, the Makefile executes whichever lines are necessary to build new pages or update edited ones.

The last script we need is pagelist, which creates the table of contents that’s put in the sidebar.

python:
 1:  #!/usr/bin/python3
 2:  
 3:  import os
 4:  import os.path
 5:  import sys
 6:  
 7:  # The root directory is the top level of the source tree
 8:  rootDir = '../source'
 9:  
10:  # Walk through all the subdirectories in rootDir except AAAA
11:  # and create category headers and page lists
12:  for sourceDir, subdirList, fileList in os.walk(rootDir):
13:    subdirList.sort()
14:    if sourceDir in (rootDir, os.path.join(rootDir, 'AAAA')):
15:      continue
16:  
17:    # The category name is the last part of sourceDir
18:    category = os.path.basename(sourceDir)
19:    print(f"<h1>{category}</h1>")
20:    print("<ul>")
21:  
22:    # Loop through all the Markdown files in fileList
23:    fileList.sort()
24:    for f in [ x for x in fileList if x[-3:] == '.md' ]:
25:      # HTML file name
26:      hf = f.replace('.md', '.html')
27:  
28:      # Get the name of the page from the first line of the source file
29:      firstLine = open(f'{os.path.join(sourceDir, f)}', 'r', encoding='utf8').readline()
30:      name = firstLine.strip(' #\n')
31:  
32:      # Set the path to the HTML file
33:      htmlPath = os.path.join('..', category, hf)
34:  
35:      # Add the page link to the list
36:      print(f'<li><a href="{htmlPath}">{name}</a></li>')
37:  
38:    # Close out the category list
39:    print("</ul>")

The basic structure of pagelist is the same as that of buildPages. It uses os.walk to work its way down the source directory tree and identify all the Markdown documents in it. The key difference is that pagelist alphabetizes the subdirectories (Line 13) and uses the names of those subdirectories to create a header (Lines 18–19). It also alphabetizes the individual Markdown source files (Line 23) before looping through them (Lines 24–36).

For each Markdown source file, Line 26 determines the name of the corresponding HTML file by replacing the extension. Lines 29–30 then get the page title by reading the first line of the source file and stripping off the hashes and whitespace from the line’s beginning and end. Because the readline command keeps the trailing newline character, the strip command has to include \n in its argument to get rid of it.

Line 33 then sets up the path to the HTML page, and Line 36 builds the list item that links to that page. Line 39 closes out the list of pages for the current category that was started on Line 20.

The output of pagelist is redirected by the Makefile to the sidebar.html file of the resources subdirectory. The file looks like this,

html:
<h1>LaTeX</h1>
<ul>
<li><a href="../LaTeX/breaking_pages.html">Breaking pages</a></li>
<li><a href="../LaTeX/set-up-mactex.html">Set up MacTeX</a></li>
<li><a href="../LaTeX/splitting_big_tables.html">Splitting big tables</a></li>
</ul>
<h1>Mac</h1>
<ul>
<li><a href="../Mac/realphabetize-mail-folders.html">Realphabetize Mail folders</a></li>
<li><a href="../Mac/sudo-by-touchid.html">Sudo via TouchID</a></li>
</ul>
<h1>Matplotlib</h1>
<ul>
<li><a href="../Matplotlib/tweakable-plot-template.html">Tweakable plot template</a></li>
</ul>
<h1>Pandas</h1>
<ul>
<li><a href="../Pandas/combine_rows_through_groupby.html">Combine rows through groupby</a></li>
<li><a href="../Pandas/three_value_booleans.html">Three-value booleans</a></li>
</ul>
<h1>Python</h1>
<ul>
<li><a href="../Python/skip-header-lines.html">Skipping header lines while reading file</a></li>
</ul>
<h1>Shapely</h1>
<ul>
<li><a href="../Shapely/defining-irregular-shapes.html">Defining an irregular shape</a></li>
<li><a href="../Shapely/inset.html">Inset of a shape</a></li>
<li><a href="../Shapely/point-position-checks.html">Point position checks</a></li>
</ul>

and is inserted into every page by the SSI directive described above.

So there you have it. Just 136 lines of code—these three scripts plus the Makefile—with over half of that being either HTML boilerplate, comments, or blank lines. Yes, you’re right, I’m not including the CSS files that organize the layout of the pages; but as I said yesterday, that was already written—I just had to edit the color values in 5—10 lines.

As I said on MPU, I did try Obsidian, which has the nice feature of starting from a folder of Markdown files. But I was appalled by its out-of-the-box looks, and most of the community themes were just as bad.1 And then when I looked into the structure of Obsidian’s themes, I realized that rolling my own wiki would be just as fast as tweaking a theme to my liking.

The growing enthusiasm over Obsidian hasn’t changed my mind. If anything, it’s hardened me against it. The plethora of plugins and tweaks available for it have convinced me that if I were to use it for my wiki of hints, most of the hints would be about Obsidian itself. That’s not my goal. If you can use Obsidian without continually adjusting it, you have a stronger mind (and stomach) that I do. I’ll stick with plain ol’ HTML.


  1. I agree with just about everything Adam Engst said about dark mode. It can be useful in certain graphics applications, but it’s miserable for dealing with text. I know lots of people think it looks cool. They’re wrong. 


My homemade wiki

David and Stephen recently had me on the Mac Power Users podcast and teased me (gently) about rolling my own wiki1 instead of using one of the many off-the-shelf systems. I explained on the show the various trials I went through before landing on my current system and—to some extent, at least—the reason I prefer what I’m doing to Obsidian/Craft/Roam Research/Notion/Whatever. In this post I’ll give some details on how my system works.

First, you should know that I have a very specific use for this wiki. It tells me how to do certain things on my computer. These are typically programming hints—techniques and code excerpts that have helped me solve problems that I expect to run into again—but some are little configuration hacks that make using my computer easier. They’re mostly the culmination of long stretches of research and trial-and-error testing, put into my own words with links back to the original sources. Despite my calling it a wiki, there are very few internal links.

Here’s a screenshot of one page:

Example of Hints page

A couple of things should be clear from the screenshot:

  1. I don’t have many pages yet. The table of contents along the right side has a link to every entry. If I continue with this system, I’ll eventually have to turn that into some sort of dropdown menu system to keep it compact. And I’ll need a search feature. But there was no point in building niceties like that in the first iteration.
  2. The layout is basically a stripped-down and recolored version of the layout of this blog. That’s not a coincidence. I copied the ANIAT CSS files, changed a few color values, and boom—I had a layout for my hints system with almost no effort. Again, I wanted to spend as little time as possible on this until I knew it was going to work for me. If I abandon it tomorrow, I won’t feel like I wasted a lot of time.

Now that you know what it looks like, here’s how the files are organized:

Hints directory structure

The base directory is named Hints, and its three subdirectories are bin, source, and site. The bin directory holds the scripts that build the site. The source directory contains subdirectories for each main topic, and those subdirectories contain the Markdown source files for the individual pages.

The site subdirectory is what gets synced to the server. Its images subdirectory holds all the image files (mostly screenshots) used in the pages. Its resources subdirectory holds the CSS style files and the partial HTML file used to fill in the table of contents. At present, none of the pages use JavaScript, but if that changes as the site gets more complicated, resources is where the JavaScript files will go. Finally, there’s a set of subdirectories that match the topic subdirectories in source. For every Markdown source file, there’s a corresponding HTML file.

The Hints directory is on my MacBook Air, but because it’s saved in the ~/Library/Mobile Documents/com~apple~CloudDocs tree, it’s also on iCloud, accessible from all my Apple devices. I can add new Markdown source files from any device and then update the site by running the scripts in the bin directory.2

Although there are a handful of small scripts in bin, they’re all controlled by this single Makefile:

md=$(shell find ../source -name *.md)
html=$(patsubst ../source/%.md, ../site/%.html, $(md))

all: prep update sync

prep:
    ./pagelist > ../site/resources/sidebar.html
    ./buildPages > buildAll

update: $(html)

../site/%.html: ../source/%.md
    fgrep $@ buildAll | bash

sync:
    rsync -arz --exclude .DS_Store ../site/ drdrang@site.com:path/to/Hints/

I’m not a make expert, but fortunately I don’t need to be. This one is pretty simple.

The first two lines define variables that include all the Markdown files in the source tree and all the HTML files that should be in the site tree. I say “should be” because when a new Markdown file is created, there is initially no corresponding HTML file. But that’s OK, because the HTML file list is created by taking the Markdown file list and substituting site for source in the path and .html for .md as the file extension. This means the html variable will contain names of all the HTML pages that ought to be there. This is exactly what we want.

The all rule, which is the default, triggers three other rules in order: prep, update, and sync.

The prep rule runs two scripts in the bin directory: the pagelist script, which creates the table of contents that will be put in the right sidebar; and the buildPages script, which defines a series of commands for building each page.

The update rule triggers the rule for the HTML files defined earlier.

Next, we have the rule for HTML files. It depends on Markdown files (that is, if the corresponding Markdown file is newer, the HTML file is rebuilt) and runs lines plucked from the buildAll file created by the prep rule.

Finally, the sync rule runs rsync to transfer all the updated files in site (except the ubiquitous and annoying .DS_Store) to the server. The server information in this command isn’t real, but the overall construct of the command is.

I’ll go through the various scripts called by the Makefile in a later post. Here I’ll just describe their effects.

The pagelist script walks through the source tree to figure out the names and relative URLs of all the HTML files. It uses that information to create the table of contents HTML snippet that’s saved in the resources/sidebar.html file of the site directory. That file currently looks like this:

<h1>LaTeX</h1>
<ul>
<li><a href="../LaTeX/breaking_pages.html">Breaking pages</a></li>
<li><a href="../LaTeX/set-up-mactex.html">Set up MacTeX</a></li>
<li><a href="../LaTeX/splitting_big_tables.html">Splitting big tables</a></li>
</ul>
<h1>Mac</h1>
<ul>
<li><a href="../Mac/realphabetize-mail-folders.html">Realphabetize Mail folders</a></li>
<li><a href="../Mac/sudo-by-touchid.html">Sudo via TouchID</a></li>
</ul>
<h1>Matplotlib</h1>
<ul>
<li><a href="../Matplotlib/tweakable-plot-template.html">Tweakable plot template</a></li>
</ul>
<h1>Pandas</h1>
<ul>
<li><a href="../Pandas/combine_rows_through_groupby.html">Combine rows through groupby</a></li>
<li><a href="../Pandas/three_value_booleans.html">Three-value booleans</a></li>
</ul>
<h1>Python</h1>
<ul>
<li><a href="../Python/skip-header-lines.html">Skipping header lines while reading file</a></li>
</ul>
<h1>Shapely</h1>
<ul>
<li><a href="../Shapely/defining-irregular-shapes.html">Defining an irregular shape</a></li>
<li><a href="../Shapely/inset.html">Inset of a shape</a></li>
<li><a href="../Shapely/point-position-checks.html">Point position checks</a></li>
</ul>

This HTML snippet gets injected into every page by a server side include, an ancient web technology that’s still very convenient for adding boilerplate to several pages.

The buildPage command also walks through the source tree. It creates a set of shell commands for building each page. For example, the command for building the “Skipping header lines” page shown above is this:

mkdir -p ../site/Python && ./md2html ../source/Python/skip-header-lines.md > ../site/Python/skip-header-lines.html

The mkdir part creates (if necessary) the appropriate subdirectory in site and then runs the md2html script (also kept in bin) to build an HTML file from the Markdown source. buildPages creates a line like this for every page. A file with all these lines is saved in the buildAll file in the bin directory.

If an HTML file doesn’t exist or is older than its corresponding Markdown file, this command from the Makefile is run:

fgrep $@ buildAll | bash

What this does is search through buildAll for the line that contains the path to the HTML file (that’s what $@ expands to in the Makefile) and then pipes that line to bash to be run. If I had recently updated the Markdown source of the “Skipping header lines” page, fgrep would search for

../site/Python/skip-header-lines.html

in buildAll and would find the

mkdir -p ../site/Python && ./md2html ../source/Python/skip-header-lines.md > ../site/Python/skip-header-lines.html

shell command we saw above. Piping this to bash would update the HTML file to match the current Markdown source.

The make command is clever. If you write rules with the proper dependencies, only the files that need updating will be rebuilt. HTML files whose Markdown source hasn’t changed are skipped over because make knows they won’t be any different.

The rsync command is clever in the same way make is. It compares the local files to the remote ones and only uploads those that need updating. That doesn’t save much time with the HTML files, none of which are more than a few kilobytes, but it prevents uploading hundreds or thousands of kilobytes of unchanged image files.

The Makefile is run by cding into the Hints/bin directory and simply running

make

I have a Keyboard Maestro macro that does this, triggered by ⌃⌥⌘H.

As mentioned above, there will be a followup post with explanations of the scripts called by the Makefile. They’re pretty short—the longest one has fewer than 60 lines and most of that is HTML—but I thought it would be best to give an overview first.


  1. The current term of art is Personal Knowledge Management (PKM) systems, but I still think of them as wikis. In my case, as you’ll see, it’s really just a set of web pages with a table of contents. 

  2. If I’m on an iPad or iPhone, I’ll have to connect to one of my Macs by using Prompt or some other SSH client.