Scripts for my homemade wiki
August 8, 2021 at 11:52 AM by Dr. Drang
In yesterday’s post, we went through the directory structure of my personal wiki and the Makefile that builds and uploads the various HTML pages that comprise it. Today, we’ll go through the three little scripts the Makefile runs to do the work. As a reminder, here’s what a wiki page looks like:
The fundamental script is md2html
, which does exactly what you’d expect from its name. It takes a Markdown file as input and produces an HTML file as output. Here it is:
python:
1: #!/usr/bin/python3
2:
3: import sys
4: import os
5: import markdown as md
6:
7: # Markdown extensions
8: myExt = ["tables", "codehilite", "smarty", "md_in_html"]
9:
10: # Read in the Markdown source
11: source = open(sys.argv[1], "r", encoding="utf8").read()
12:
13: # Get the page title from the first line
14: title = source.splitlines(keepends=False)[0]
15: title = title.strip('# ')
16:
17: # Convert it to a section of HTML
18: mainSnippet = md.markdown(source, extensions=myExt)
19:
20: # Page template
21: fullHTML=f'''<html>
22: <head>
23: <title>Hints: {title}</title>
24: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
25: <link rel="stylesheet" type="text/css" media="all and (max-device-width:480px) and (orientation:portrait)" href="../resources/mobile.css" />
26: <link rel="stylesheet" type="text/css" media="all and (max-device-width:480px) and (orientation:landscape)" href="../resources/style.css" />
27: <link rel="stylesheet" type="text/css" media="all and (min-device-width:480px)" href="../resources/style.css" />
28: <link rel="stylesheet" type="text/css" media="all" href="../resources/pygstyle.css" />
29: <script type="text/javascript"
30: id="MathJax-script" async
31: src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
32: </script>
33: </head>
34: <body>
35: <div id="container">
36: <div id="content">
37:
38: {mainSnippet}
39:
40: <!--#config timefmt="%b %d, %Y at %-I:%M %Z" -->
41: <p style="text-align: right;font-size: 75%;margin-top: 2em;">Last modified: <!--#echo var="LAST_MODIFIED" -->
42:
43: </div> <!-- content -->
44:
45: <div id="sidebar">
46:
47: <h1><a href="../AAAA/index.html">Hints home</a></h1>
48:
49: <!--#include virtual="/path/to/resources/sidebar.html">
50:
51: </div><!-- sidebar -->
52: </div><!-- container -->
53: </body>
54: </html>
55: '''
56:
57: print(fullHTML)
As you can see, most of the code, Lines 21—55, is just an HTML template. The real work is done by the lines above it.
I could do the Markdown-to-HTML conversion in any number of ways, but I chose to do it in Python for two reasons:
- I’m most comfortable in Python.
- Python has the well-tested and wide-ranging Pygments library for highlighting code. As I said yesterday, my wiki is expected to include a lot of code snippets, and I want that code to be both easy to read and easy to copy from. Pygments gives me both of those.
The main library imported by md2html
is Python-Markdown, which has a variety of extensions that go beyond Gruber’s Markdown. The ones I’m using are shown in Line 8:
tables
is an implementation of tables similar to MultiMarkdown’s.codehilite
calls Pygments to highlight code snippets.smarty
is an implementation of Gruber’s SmartyPants code for converting straight quotation marks and apostrophes into the appropriate curly versions.md_in_html
allows me to use Markdown inside blocks of straight HTML code. I don’t expect to use this much, but it’s very handy to have available.
Line 11 reads the Markdown input file into the variable source
. All the source files start with a level-one header that defines the page’s title. Here, for example, is the source of the page shown above:
# Skipping header lines while reading file #
If a data file that's going to be read line-by-line starts with a set of header lines, skip over them using [`next()`][1]:
:::python
headerlines = 12
with open('data.txt', 'r') as f:
for dummy in range(headerlines):
next(f)
for line in f:
<process data line>
If the data portion is in CSV format, and I want to use Pandas to read it into a dataframe, use the `skiprows` parameter to the [`read_csv()`][2] function:
:::python
import pandas as pd
df = pd.read_csv('data.csv', skiprows=12)
In this case, don't skip the header row that provides the names of the columns.
[1]: https://docs.python.org/3/library/functions.html#next
[2]: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
As you can see, the first line starts with (and optionally ends with) the single hash mark that identifies a level-one header. Lines 14–15 of md2html
get that line from source
, strip off the hashes and spaces from the beginning and end, and define the remainder as the page’s title.
Line 18 then does all the conversion of source
into a section of HTML code. It uses the extensions described above in the conversion.
The rest of the code is just an HTML page template with three spots for insertion:
- Line 38 is where the
mainSnippet
from Line 18 is inserted. - Lines 40–41 are server side include (SSI) lines for inserting the modification date of the file.
- Line 49 is an SSI line for inserting the table of contents in the right sidebar.
mainSnippet
is inserted when md2html
is run. The SSI lines are left verbatim in the resulting HTML file—their insertions are done by the server when the page is requested by a browser.
Line 57 prints out the HTML. It gets redirected to a file in site
directory by the shell command we discussed yesterday:
mkdir -p ../site/Python && ./md2html ../source/Python/skip-header-lines.md > ../site/Python/skip-header-lines.html
Where this shell command comes from is the subject of our next script: buildPages
.
python:
1: #!/usr/bin/python3
2:
3: import os
4:
5: # The root directory is the top level of the source tree
6: rootDir = '../source'
7:
8: # Walk the source tree to get all the Markdown files and
9: # create the commands for writing the HTML files
10: for sourceDir, subdirList, fileList in os.walk(rootDir):
11: if sourceDir == '../source':
12: # Skip the top level of the source tree
13: continue
14: else:
15: # The HTML files go in the site tree
16: siteDir = sourceDir.replace('/source/', '/site/')
17:
18: # Go through all the files in the current subdirectory
19: for f in [ x for x in fileList if x[-3:] == '.md' ]:
20: # The name of the HTML file is the same but with a different extension
21: hf = f.replace('.md', '.html')
22:
23: # The command to write an HTML file
24: print(f'mkdir -p {siteDir} && ./md2html {sourceDir}/{f} > {siteDir}/{hf}')
This command walks through the source directory, identifies all the Markdown source files, and generates a shell script line like we just saw for creating a new folder in the site
directory (if necessary) and making an HTML file from each Markdown file. buildPages
doesn’t actually do any building; it just creates a file of shell commands that the Makefile will call upon to do the building.
The key to buildPages
is Python’s os.walk
command, called on Line 10. When used in a loop, it recursively works its way down the directory structure, finding all the subdirectories and files within. Since none of our Markdown source files are in the top level of the source
directory, Lines 11–13 skip over that to concentrate on the topic subdirectories.
Line 16 figures out where the resulting HTML file should go in the site
directory by simply replacing source
with site
in the file’s path. Then Lines 19–24 loop through the Markdown source files (identified by a .md
extension) and build a shell command similar to the one we saw above.
The Makefile calls buildPages
during the prep
stage and redirects the output to a file called buildAll
. When that’s done, buildAll
contains a line for creating each HTML file. As we saw yesterday, the Makefile executes whichever lines are necessary to build new pages or update edited ones.
The last script we need is pagelist
, which creates the table of contents that’s put in the sidebar.
python:
1: #!/usr/bin/python3
2:
3: import os
4: import os.path
5: import sys
6:
7: # The root directory is the top level of the source tree
8: rootDir = '../source'
9:
10: # Walk through all the subdirectories in rootDir except AAAA
11: # and create category headers and page lists
12: for sourceDir, subdirList, fileList in os.walk(rootDir):
13: subdirList.sort()
14: if sourceDir in (rootDir, os.path.join(rootDir, 'AAAA')):
15: continue
16:
17: # The category name is the last part of sourceDir
18: category = os.path.basename(sourceDir)
19: print(f"<h1>{category}</h1>")
20: print("<ul>")
21:
22: # Loop through all the Markdown files in fileList
23: fileList.sort()
24: for f in [ x for x in fileList if x[-3:] == '.md' ]:
25: # HTML file name
26: hf = f.replace('.md', '.html')
27:
28: # Get the name of the page from the first line of the source file
29: firstLine = open(f'{os.path.join(sourceDir, f)}', 'r', encoding='utf8').readline()
30: name = firstLine.strip(' #\n')
31:
32: # Set the path to the HTML file
33: htmlPath = os.path.join('..', category, hf)
34:
35: # Add the page link to the list
36: print(f'<li><a href="{htmlPath}">{name}</a></li>')
37:
38: # Close out the category list
39: print("</ul>")
The basic structure of pagelist
is the same as that of buildPages
. It uses os.walk
to work its way down the source
directory tree and identify all the Markdown documents in it. The key difference is that pagelist
alphabetizes the subdirectories (Line 13) and uses the names of those subdirectories to create a header (Lines 18–19). It also alphabetizes the individual Markdown source files (Line 23) before looping through them (Lines 24–36).
For each Markdown source file, Line 26 determines the name of the corresponding HTML file by replacing the extension. Lines 29–30 then get the page title by reading the first line of the source file and stripping off the hashes and whitespace from the line’s beginning and end. Because the readline
command keeps the trailing newline character, the strip
command has to include \n
in its argument to get rid of it.
Line 33 then sets up the path to the HTML page, and Line 36 builds the list item that links to that page. Line 39 closes out the list of pages for the current category that was started on Line 20.
The output of pagelist
is redirected by the Makefile to the sidebar.html
file of the resources
subdirectory. The file looks like this,
html:
<h1>LaTeX</h1>
<ul>
<li><a href="../LaTeX/breaking_pages.html">Breaking pages</a></li>
<li><a href="../LaTeX/set-up-mactex.html">Set up MacTeX</a></li>
<li><a href="../LaTeX/splitting_big_tables.html">Splitting big tables</a></li>
</ul>
<h1>Mac</h1>
<ul>
<li><a href="../Mac/realphabetize-mail-folders.html">Realphabetize Mail folders</a></li>
<li><a href="../Mac/sudo-by-touchid.html">Sudo via TouchID</a></li>
</ul>
<h1>Matplotlib</h1>
<ul>
<li><a href="../Matplotlib/tweakable-plot-template.html">Tweakable plot template</a></li>
</ul>
<h1>Pandas</h1>
<ul>
<li><a href="../Pandas/combine_rows_through_groupby.html">Combine rows through groupby</a></li>
<li><a href="../Pandas/three_value_booleans.html">Three-value booleans</a></li>
</ul>
<h1>Python</h1>
<ul>
<li><a href="../Python/skip-header-lines.html">Skipping header lines while reading file</a></li>
</ul>
<h1>Shapely</h1>
<ul>
<li><a href="../Shapely/defining-irregular-shapes.html">Defining an irregular shape</a></li>
<li><a href="../Shapely/inset.html">Inset of a shape</a></li>
<li><a href="../Shapely/point-position-checks.html">Point position checks</a></li>
</ul>
and is inserted into every page by the SSI directive described above.
So there you have it. Just 136 lines of code—these three scripts plus the Makefile—with over half of that being either HTML boilerplate, comments, or blank lines. Yes, you’re right, I’m not including the CSS files that organize the layout of the pages; but as I said yesterday, that was already written—I just had to edit the color values in 5—10 lines.
As I said on MPU, I did try Obsidian, which has the nice feature of starting from a folder of Markdown files. But I was appalled by its out-of-the-box looks, and most of the community themes were just as bad.1 And then when I looked into the structure of Obsidian’s themes, I realized that rolling my own wiki would be just as fast as tweaking a theme to my liking.
The growing enthusiasm over Obsidian hasn’t changed my mind. If anything, it’s hardened me against it. The plethora of plugins and tweaks available for it have convinced me that if I were to use it for my wiki of hints, most of the hints would be about Obsidian itself. That’s not my goal. If you can use Obsidian without continually adjusting it, you have a stronger mind (and stomach) that I do. I’ll stick with plain ol’ HTML.
-
I agree with just about everything Adam Engst said about dark mode. It can be useful in certain graphics applications, but it’s miserable for dealing with text. I know lots of people think it looks cool. They’re wrong. ↩