A name display bug in Files

I’ve written in the past about how I’ve had to use tags to work around a misfeature of Files: that it doesn’t display full file names, including the extensions. Today I learned that there’s a bug in that misfeature.

I was working from my iPad on some files that were stored in iCloud Drive. Here’s Files showing the contents of the folder I was working in:

Files inconsistent in showing extensions

The file with the blue dot has a .tex extension. The extension isn’t shown, which is the normal (and frustrating) way Files behaves. But the file with the green dot has a .py extension and its extension is shown. What’s going on?

My first concern was that the green file somehow had two extensions—that its full name was sampling.py.py—so I opened a Terminal in Textastic,1 navigated to the appropriate iCloud folder, and ran ls to see the real, full names of both files.

Terminal session showing full file names

As you can see, there was no double extension.2 The problem wasn’t that I had somehow misnamed the Python file, it was that Files has a bug in the name display. It happens to be a bug I approve of, and one that I wish were consistent across all file types at all times, but it’s a bug nonetheless.

It says something about the state of software quality at Apple that it can’t even get its design errors right.

  1. Normally, I’d do this in Prompt, but I was in the middle of editing one of the files in Textastic when I noticed the extension anomaly in Files, so it was quicker to start a terminal session there. I really like Textastic. 

  2. If you’re looking at the path in the terminal session and wondering why an iCloud folder isn’t in ~/Library/Mobile Documents/com~apple~CloudDocs, its because I made a symbolic link to the projects folder in iCloud from a projects directory in my home directory. Saves time when using the cd command, whether I’m working directly on my Mac or remotely through terminal emulation. 

Floor plans with Python and Shapely

In yesterday’s post, we outlined portions of a bitmapped floor plan drawing of the the US Capitol, saved the resulting graphic as an SVG file, and used one of Python’s XML libraries to extract the coordinates of the various boundaries. We ended up with seven CSV files, outline.csv, scale.scv, and five files named cutout-01.csv through cutout-05.csv. Today, we’ll use the Shapely library to calculate some properties of the shapes we drew.

Annotated Capitol floor plan

Recall that the blue lines are the overall outline of the building’s third floor interior, the red lines are the outlines of the openings in the third floor that accommodate various multistory rooms in the building, and the green box is set to match the length of the 64-foot scale below the title.

The feature of the Shapely library that does the lion’s share of the calculating for us is the Polygon object, which has some very nice properties for handling real-world shapes. First, it doesn’t require the polygon to be convex, so we can model the reentrant corners that are common in buildings, landscaping, and property boundaries. Second, it allows for holes within the outer shell of the polygon, which is perfect for handling floor openings, atriums, and other items that make shapes multiply connected.

Our goal is to work out the floor area of the third floor. That’s the area enclosed by the outer walls minus the openings. Here’s the code that does it:

 1:  from shapely.geometry import Polygon
 2:  import pandas as pd
 4:  # Work out the scale of the drawing
 5:  dfscale = pd.read_csv('scale.csv')
 6:  scale = 64/(dfscale.x.max() - dfscale.x.min())
 8:  # Get the outline and scale it
 9:  dfoutline = pd.read_csv('outline.csv')
10:  dfoutline.x = dfoutline.x*scale
11:  dfoutline.y = dfoutline.y*scale
13:  # Do the same for the cutouts
14:  dfcutouts = []
15:  for i in range(5):
16:    dfcutouts.append(pd.read_csv(f'cutout-{i+1:02}.csv'))
17:    dfcutouts[i].x = dfcutouts[i].x*scale
18:    dfcutouts[i].y = dfcutouts[i].y*scale
20:  # Make a polygon without holes
21:  ocoords = list(zip(dfoutline.x, dfoutline.y))
22:  outline = Polygon(ocoords)
23:  print(f'{"Area of outline":>20s}: {outline.area:7,.0f} ft²')
25:  # Make a polygon with holes to represent the floor
26:  ccoords = []
27:  for i in range(5):
28:    ccoords.append(list(zip(dfcutouts[i].x, dfcutouts[i].y)))
29:  floor = Polygon(ocoords, ccoords)
30:  print(f'{"Floor area":>20s}: {floor.area:7,.0f} ft²')
32:  # Check all the holes individually
33:  holes = ['House Chamber', 'Statuary Hall', 'Great Rotunda', 'Old Senate Chamber', 'Senate Chamber']
34:  for i in range(5):
35:    hole = Polygon(ccoords[i])
36:    print(f'{holes[i]:>20s}: {hole.area:7,.0f} ft²')

And here are the results:

     Area of outline: 134,107 ft²
          Floor area:  95,848 ft²
       House Chamber:  13,493 ft²
       Statuary Hall:   5,128 ft²
       Great Rotunda:   7,010 ft²
  Old Senate Chamber:   2,869 ft²
      Senate Chamber:   9,759 ft²

The script starts by using the Pandas library, to read in the CSV files and arrange them in to dataframes. Pandas isn’t really necessary, as the Python standard library has a CSV module, but I’m used to Pandas and I like how quickly you can do calculations on dataframes.

Lines 5 and 6 use the coordinates of the green box saved in scale.csv to calculate the scale of the drawing in feet per pixel. That value, saved in the variable scale, is later used to convert the coordinates of the outline and the cutouts from pixels to feet.

When Lines 9–11 are done, the dfoutline dataframe contains the coordinates in feet of all the vertices of the outer boundary. Similarly, when Lines 14–18 are done, dfcutouts is a list of dataframes with coordinates in feet, one for each of the floor openings.

Now we start using Shapely. Line 21 puts the x and y values of dfoutline into a list of coordinate pairs. Line 22 constructs a Polygon, named outline, from that list, and Line 23 uses the area property to print out the area. This is the gross area enclosed by the boundary. Our next step is to figure out the net area.

Lines 26–28 set up a list of lists of coordinate pairs for all the floor openings. The floor is then defined in Line 29 using a variant on the Polygon constructor that includes holes. This means floor is a multiply connected polygon, and the area property used in Line 30 accounts for the holes.

Note that if all we wanted was the net floor area, we didn’t have to create the outline polygon. I just did that so we could see the difference.

Another unnecessary bit is the set of calculations in Lines 33–36, which tells us the areas of each of the openings. It just lets us check that the sum of the holes is equal to the difference between the outline and floor areas.

While Shapely would be nice if all it did was calculate the areas of irregular shapes, that isn’t all it does. One of the things I use it for is to generate random sampling spots for floor tile, grout, slab concrete, etc. I can generate (x, y) coordinates over the enclosing rectangle of a floor and then use the contains method to filter out those that aren’t within the irregular floor shape of interest. There are also some very helpful set-theoretic functions like intersection, union, and difference, for handling relationships between several shapes.

I should mention that breaking the work into two scripts, one for creating the vertex CSVs and the other for calculating the area, is inefficient. A single script that reads the SVG and puts the vertex coordinates directly into lists would have eliminated all the CSV stuff. But sometimes, especially when you’re first learning the details of a library, it’s nice to have the intermediate steps saved out to files so you can satisfy yourself that things are working the way you expect.

A couple of years ago, I wrote a library for calculating section properties. It does a lot of things that Shapely doesn’t, but it also repeats a lot of what Shapely does and isn’t as versatile in handling input data. On my list of future projects is merging the two, so I can get all the important section properties from a Shapely Polygon.

Processing layered SVGs

Over the past few months, I’ve been creating and processing a lot of SVG files. Initially, the processing was mostly manual, via cutting, pasting, finding, and replacing in BBEdit. But I’ve gradually learned how to use Python’s XML modules to automate the processing, which has both sped up my work and increased its accuracy.

Most of my work has started with me tracing over the areas of interest in plan view drawings so I can calculate distances and areas. The drawings usually come to me as bitmapped images1 like this one of the US Capitol, which I downloaded from Wikipedia.

Third floor Capitol floor plan

Let’s say I want to get the area of this floor. I start by importing the file into a vector graphics program like Graphic, OmniGraffle, or Affinity Designer. Then I make layers above the bitmap and draw in the outer boundary, the holes or cutouts in the floor, and some length or feature that I can use to scale the drawing. Here’s what it looks like in Graphic on the iPad.

Annotating floor plan in Graphic

You can see the layers to the right of the image. The drawing layer is for the bitmap, the outline layer is for the outer boundary in blue; the cutouts layer is for the open areas of the Rotunda, the House and Senate chambers, the old Senate chambers, and Statuary Hall outlined in red; and the scale layer is for the green box drawn over the 64-foot scale just below the title.

I’ve made this graphic higher in resolution than I normally do for images here so you can zoom in to see the structure if you like. If you do, you’ll notice that the curved areas of the cutouts are approximated by a series of straight lines. I could have used Bezier curves for those portions, but straight lines work better with the Shapely library, which is what I’m going to use to calculate the area (in the next blog post; this one is going to concentrate on processing the SVG, which I’m about to get to).

So now I have a document in Graphic’s native format, which doesn’t do me much good. What I need to do next is export it as an SVG file, which, conveniently, Graphic can do on both the Mac and iPad. What comes out is a very large text file that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg"
  xmlns:xlink="http://www.w3.org/1999/xlink" x="0" y="0" width="2000" height="1500"
  viewBox="0, 0, 2000, 1500">
  <g id="Background">
    <rect x="0" y="0" width="2000" height="1500" fill="#FFFFFF"/>
    <clipPath id="Clip_1">
      <path d="M80.5,144.5 L1919.5,144.5 L1919.5,1355.5 L80.5,1355.5 z"/>
  <g id="drawing">

      [lots and lots of base64 snipped out]

      opacity="1" x="80.5" y="144.5" width="1839" height="1211"
      preserveAspectRatio="xMidYMid" clip-path="url(#Clip_1)"/>
  <g id="outline">
    <path d="M123.599,412.613 L446.663,412.9 L452.08,664.029 L578.999,662.481
      L579.773,552.975 L796.461,552.575 L797.666,380.968 L1200.281,379.81
      L1199.042,551.939 L1420.726,551.234 L1422.843,657.932 L1547.177,658.461
      L1547.066,407.545 L1876.229,404.746 L1872.399,972.508 L1548.302,975.134
      L1547.528,831.19 L1419.062,831.19 L1419.448,910.514 L1195.406,911.675
      L1195.793,895.036 L803.041,895.423 L803.815,912.836 L579.367,914.412
      L577.414,836.858 L453.829,837.974 L452.992,978.854 L124.207,981.041
      L123.599,412.613 z" fill-opacity="0" stroke="#0000FF" stroke-width="12"
      stroke-opacity="0.518" id="Shape"/>
  <g id="cutouts">
    <path d="M173.197,526.183 L401.5,526 L404.38,869.031 L175.5,870.5 z" 
      fill-opacity="0" stroke="#FF0000" stroke-width="12"/>
    <path d="M677.405,581.473 L701.206,587.925 L724.173,600.382 L743.054,615.953 
      L759.793,635.807 L769.33,658.19 L775.753,680.185 L776.142,709.575
      L769.33,733.516 L760.377,755.121 L743.248,776.532 L725.536,791.714
      L702.374,803.976 L679.406,809.815 L654.687,810.594 L654.687,789.962
      L614.591,789.962 L614.172,600.618 L650.926,600.486 L650.753,581.647
      L677.405,581.473 z" fill-opacity="0" stroke="#FF0000" stroke-width="12"/>
    <path d="M1005.316,580.723 L1024.764,584.138 L1043.23,589.835 L1059.928,598.479
      L1075.054,608.89 L1087.037,621.659 L1096.663,633.249 L1103.342,648.965
      L1110.218,663.305 L1112.182,677.449 L1113.164,694.933 L1113.361,709.273
      L1110.807,723.221 L1103.735,741.294 L1094.699,757.992 L1084.483,773.118
      L1070.339,784.905 L1057.767,794.138 L1043.426,801.603 L1027.121,807.692
      L1012.192,810.594L998.244,810.594 L983.314,808.675 L966.813,804.942
      L949.919,797.87 L934.399,789.227 L918.684,774.886 L908.862,761.921
      L898.646,747.384 L891.574,732.847 L889.021,717.721 L886.86,703.969
      L886.467,692.969 L887.449,678.628 L892.557,661.537 L899.039,641.696
      L907.29,628.927 L919.666,614.587 L933.614,602.211 L947.168,593.764
      L961.705,586.692 L976.832,582.763 L987.44,580.723 L1005.316,580.723 z"
      fill-opacity="0" stroke="#FF0000" stroke-width="12"/>
    <path d="M1221.709,806.373 L1222.147,790.763 L1225.794,776.028 L1232.213,760.419
      L1240.383,747.581 L1250.157,735.91 L1263.141,726.865 L1276.416,719.716
      L1289.108,715.923 L1302.238,713.443 L1313.471,712.568 L1325.58,713.881
      L1337.396,716.507 L1348.338,720.592 L1360.009,725.114 L1371.242,732.846
      L1379.703,741.453 L1386.852,750.644 L1393.271,761.44 L1398.814,773.111
      L1401.44,784.781 L1402.316,792.805 L1403.483,805.643 L1377.223,805.497
      L1375.473,831.757 L1246.51,830.444 L1245.926,806.518 L1221.709,806.373 z"
      fill-opacity="0" stroke="#FF0000" stroke-width="12"/>
    <path d="M1611.074,553.144 L1814.482,549.597 L1811.978,833.534 L1611.32,831.757
      L1611.074,553.144 z" fill-opacity="0" stroke="#FF0000" stroke-width="12"/>
  <g id="scale">
    <path d="M927.108,1295.893 L1081.541,1295.893 L1081.541,1313.799 L927.108,1313.799 z"
    fill="#008100" fill-opacity="0.588"/>

I’ve redacted most of the data associated with the bitmapped drawing, and I’ve added hard line breaks to make it easier to read. In the original, for example, each of the <path> elements is just one long line.

As you can see, each of the layers is inside a <g> element, and the id attribute is the layer’s name. The individual boundaries are <path> elements, and the coordinates of their vertices are inside the d attribute.

If you’ve ever programmed in PostScript or similar graphical languages, the d attributes might look familiar. The M is equivalent to PostScript’s moveto command, the L is like lineto, and the Z is like closepath.2

To prepare for analysis by the Shapely library, I want to pull out all the vertex coordinates and write them out into CSV files. There will be a CSV file for each of the boundaries, and each line of a CSV file will be the x, y coordinates of a vertex. Here’s the code that does it:

 1:  import xml.etree.ElementTree as et
 2:  import re
 4:  def path2csv(path):
 5:    '''Transform an SVG path into vertex coordinates for a CSV file.
 7:    The path must consist of only moveto (M) and lineto (L)
 8:    commands with an optional closepath (Z) at the end.'''
10:    pstr = path.get('d')
11:    # Delete the starting moveto command
12:    pstr = re.sub(r'^M *', '', pstr)
13:    # Delete the closepath command
14:    pstr = re.sub(r' *[zZ]$', '', pstr)
15:    # Put all the vertex points on separate lines
16:    pstr = re.sub(r' *L *', '\n', pstr)
17:    # Separate the x and y coordinates with a comma only
18:    pstr = re.sub(r' +, +| +,|, +| +', ',', pstr)
19:    # Return the multiline string of comma-separated coordinates
20:    return pstr
22:  # Parse the SVG file and get the root
23:  tree = et.parse('capitol.svg')
24:  svg = tree.getroot()
26:  # Handle the outline layer
27:  layer = svg.find('.//{http://www.w3.org/2000/svg}g[@id="outline"]')
28:  opath = layer.find('.//{http://www.w3.org/2000/svg}path')
29:  f = open('outline.csv', 'w')
30:  f.write('x,y\n')
31:  f.write(path2csv(opath))
32:  f.close()
34:  # Handle the cutouts layer
35:  layer = svg.find('.//{http://www.w3.org/2000/svg}g[@id="cutouts"]')
36:  cpaths = layer.findall('.//{http://www.w3.org/2000/svg}path')
37:  for i, p in enumerate(cpaths):
38:    f = open(f'cutout-{i+1:02}.csv', 'w')
39:    f.write('x,y\n')
40:    f.write(path2csv(p))
41:    f.close()
43:  # Handle the scale layer
44:  layer = svg.find('.//{http://www.w3.org/2000/svg}g[@id="scale"]')
45:  spath = layer.find('.//{http://www.w3.org/2000/svg}path')
46:  f = open('scale.csv', 'w')
47:  f.write('x,y\n')
48:  f.write(path2csv(spath))
49:  f.close()

The module we’ll use for parsing and traversing the SVG is xml.etree.ElementTree, imported as et on Line 1. This treats the SVG as a tree structure, with the elements of the document acting as nodes. Lines 23 and 24 read in the SVG file and parse it, leaving us with the root element of the tree saved in the svg variable. We can now traverse down the tree from svg3 to find the data we’re looking for.

The next three sections of the script deal with each layer of drawing—apart from the bitmap layer—in turn. Starting with the outline layer, we

  1. Find the first (and only) <g> element in the tree that has an id attribute of outline. We’re using XPath syntax to define the query passed to the find command. The query starts at the current node (.) and searches at all levels below it (//) for a g node with the desired id ([@id="outline"]). The URL in curly braces is the XML namespace that defines the structure of an SVG. You can see it as one of the xmlns attributes of the <svg> root node.
  2. Once we have the outline layer, we search it to find the first (and only) path. The XPath syntax is similar to what we used to find the outline layer.
  3. We then create a CSV file and write the path’s d data to it. The data is formatted through the path2csv function defined on Lines 4—20.

These three steps are then repeated to handle the cutouts and scale layers. The only difference is that cutouts has more than one <path>, so we have to use findall instead of find and then loop through all the paths.4 I suppose I should have refactored these stanzas into a single function that gets called repeatedly.

I skipped over the path2csv function because I think it’s pretty easy to follow. Basically, it pulls out the d string from the supplied path and applies a series of regular expression substitutions to make a series of text lines with comma-separated coordinates. The trickiest part is that the syntax for d can vary a bit with regard to spaces and commas. I’ve tried to cover the variations I know about and end up with a consistent CSV format. The outline.csv file, for example, looks like this:


OK, so now we have a series of CSV files with the boundary coordinates given in pixels (or points, depending on which graphics software was used to create the SVG). The next step is to use the Pandas and Shapely libraries to do the analysis. That’ll be next time.

  1. If they came as CAD drawings, the dimensions would be immediately available and I wouldn’t need to go through this rigmarole. 

  2. OK, the Z doesn’t look like closepath at all, but C was already taken for curveto

  3. Computer scientists live in a world in which trees have their roots at the top. This explains a lot about how computers work. 

  4. The output file names for the cutouts include a sequential number to distinguish between them. The i+1 in Line 38 accounts for the fact that computer scientists like to start counting at zero instead of one. I don’t know if this is related to their trees being upside down. 


Congress’s latest retreat from even the barest pretense of being an equal branch of government, setting precedents that will make it impossible for any future Congress to re-establish its prerogatives (assuming any future Congress would even want to do so), has been weighing on me. I’m finding it harder to believe the US will ever climb out of the hole we’ve dug for ourselves. So I was in the perfect mood for this post from John Gruber today.

When I read William Shirer’s The Rise and Fall of the Third Reich1 a year or so ago, I was struck by how open Hitler and the Nazi party were about rolling back democracy as they consolidated control in the 30s. I had a typical US education, in which the sweep of history is shown as an inevitable move toward democracy, and it was shocking to learn that the Nazis didn’t even bother to pay lip service to those pieties.

So far, our slide into autocracy hasn’t been as overt—partially because people like David Frum have, until recently at least, been smoothly telling us that everything was fine—but we’re getting there. I’m waiting for the GOP to drop the fig leaf of “voter fraud” and just come out and say they’re changing the laws because certain people shouldn’t be allowed to vote. I don’t know what they’re waiting for; most of their voters wouldn’t blink an eye.

  1. Yes, I know there are better histories, and I have another one lined up, but Shirer was there, and I wanted his perspective.