Followup on Wolfram county data
July 16, 2024 at 12:34 PM by Dr. Drang
You may remember a post I wrote back in December about inexplicable holes in US county-level data in the Wolfram Knowledgebase. I say “you may remember” because I certainly didn’t until I got an email from Wolfram last week telling me that the holes had been filled. Yes and no, as it turns out.
I first discovered the problem in August of last year. I wanted to use Mathematica to get a list of all the county seats in Illinois, and I noticed that the data for DeKalb and DuPage counties were missing. It wasn’t available through a Mathematica function call or through a WolframAlpha query. Given that Wolfram is based in Illinois, I thought this was a particularly glaring error, so I sent them feedback.
By the time I wrote the post in December, I had looked into the issue further and discovered that the Knowledgebase was missing county seats for counties all over the country. I didn’t follow up with Wolfram on that because I thought they’d already memory-holed my original complaint.
But no! The email that came last week told me that the problem (for Illinois) had been fixed and said I could go to this link to see for myself. If you follow the link, you’ll have to tap the More button on that page, but when you do, you’ll see that DeKalb and DuPage counties now have their county seats. Which is nice.
I changed the WolframAlpha query to look for California counties and found that the county seats for Mono and Sierra counties had also been filled in. Similarly for DeKalb and LaPorte counties in Indiana. So it looked like Wolfram had cleaned up the data across the country.
But that was in WolframAlpha. When I tried to get the same information through an equivalent function call in Mathematica, the county seats for DeKalb and DuPage were still missing.
It might not be immediately obvious, but if you look carefully, you’ll see Missing[NotAvailable]
in the 19th and 22nd positions of the list.
The same problem can be found in other states: county seats that now appear in WolframAlpha on the web are still missing from the equivalent Mathematica function calls. This inconsistency is even weirder than the original missing data. How can calls to what should be the same database produce different results?
So I’ve sent feedback on this to Wolfram, and I’ll let you know when they answer. Given their previous speed, you can expect that post sometime next June.
A good (or bad) example
July 9, 2024 at 3:24 PM by Dr. Drang
The one rule of plotting that every newly minted data scientist can repeat without fail is that your graphs should always start at zero. A graph that doesn’t start at zero is misleading, dishonest, and possibly nefarious. This rule is, of course, bullshit. Not a rule at all.
Oh sure, there are plenty of times when you should start at zero, but the best range to plot depends on what you’re plotting. This morning I saw a great example of a plot that started at zero when it really shouldn’t have.
It’s in an app that, among other things, is tracking my weight. Here’s the graph:
Because my weight over these seven days hasn’t changed by more than about 1.5%, the columns are all basically the same height. Even if I were to lose a lot of weight, I’d never go anywhere near zero. The only useful part of this chart is labeling over the columns—it could just as well be a table.
Where should the y-axis start? Depends on what the graph is being used for. If I had a target weight, I’d probably start at some round number below that target. I might put a thin horizontal line at the target. But I’d never do this.
The makers of the app have either internalized the rule that isn’t a rule or have decided that following it fends off criticism by idiots who think it is a rule. Too bad.
Numeronymize
July 5, 2024 at 5:07 PM by Dr. Drang
Anil Dash quote-tweeted this post from Ian Brown on Mastodon this morning:
I was so excited to learn that abbreviations like “a11y” or “c14n” are called “numeronyms” that I wrote an #Emacs^H^H^H^H^H^H #E3s extension to make it easier to use them.
The tweet1 links to this GitHub page. First I thought it was funny to see a link to an Emacs Lisp script the day after I posted about my date-convert
script. Then I thought “I want to be able to do that, but not in Emacs.” So I built this Keyboard Maestro macro, which I also called Numeronymize:
If you download and install it, it will appear in your Global Macro Group.
Here’s how to use it: Type the word you want to numeronymize in any editable text field. For example,
accessibility
With the cursor blinking at the end of the word, type ⌃⌥⌘3 (on standard American keyboards, the number symbol, #, is on the 3 key). The word before the cursor will be selected and shortened to its numeronym,
a11y
Honestly, I don’t think I’ll be using this macro very much, but it was fun and easy to write. The key is the Perl one-liner in the third step:
/usr/bin/perl -C -pe 's/(.)(.+)(.)/$1 . length($2) . $3/e'
I have more than one Perl executable on my computer, so I’m being explicit here about calling the one in /usr/bin
that comes with macOS. The -C
switch tells Perl to treat the input and output as UTF-8 (more on this below); the -p
switch tells it to loop through the input, apply the code to it, and print out the result; and the -e
switch tells it to treat the following string as the code to execute:
s/(.)(.+)(.)/$1 . length($2) . $3/e
This is the cool part, because Perl’s substitute command has an e
option that means “evaluate.” It treats the replacement as a chunk of Perl code, evaluates it, and returns the result. Here, it concatenates the first letter, the length of the middle string of characters, and the final letter. When I saw what Brown’s numeronymize extension did, I immediately thought of this feature of Perl and knew I could do it in a one-liner.
The -C
switch isn’t needed if all we care about are words made of ASCII characters. But what if we want to shorten this?
streichholzschächtelchen
Without the -C
, we’d get
s23n
because the length
function normally returns the length in bytes, and the ä
takes up two bytes. But with the -C
, length
understands that we want characters, not bytes, so the macro returns
s22n
which is what we want.
A couple of other notes:
- The macro starts out by simulating the ⌥⇧← keyboard combination to select the word to the left of the cursor. My thinking was that I’d most often use this macro right after typing or pasting a long word. So it made sense to have the macro do the word selection this way. If you think it would be better to do the selection yourself, delete this first step.
- Because I use the clipboard to copy the long word and to hold the shortened numeronym before pasting, two items are added to Keyboard Maestro’s clipboard history. But because they’re just temporary items, I don’t think they belong in the history, so the last two steps in the macro delete the most recent two items in the clipboard history.
Thanks to Ian Brown for making the Emacs extension and to Anil Dash for bringing it to wide attention. Dash says his favorite numeronym is “e13n,” but I think he’s leaving out one of the T’s.
-
Yes, I’m using “tweet” and “retweet” to refer to posts on Mastodon. Since Twitter has given up its name, I feel these words are now fair game for any Twitter-like service. After all, they were invented by the Twitter users, not the company. ↩
Happy 2,460,496!
July 4, 2024 at 2:42 PM by Dr. Drang
I was thinking about calendar conversions the other day and remembered that it had been years since I used my date-convert
script. I wondered if it would still run. It didn’t, but it was easy to fix.
The first problem was the shebang line. date-convert
is written in Emacs Lisp, and the first line used to be
#!/usr/bin/emacs --script
Because Apple no longer includes Emacs as part of macOS, there’s no executable with that name at that location anymore. The Emacs on my computer now was installed via Homebrew and is where all the other Homebrew executables are. So I changed the shebang line to
#!/opt/homebrew/bin/emacs --script
That made it run fine if called with no arguments, but it failed if run with arguments, i.e.,
$ date-convert 7 4 1776
Error: void-function (string-to-int)
[many lines of error messages]
Symbol’s function definition is void: string-to-int
I looked up the Emacs Lisp string conversion functions and found that string-to-int
had been replaced with string-to-number
.1 So I changed that function call and everything was hunky-dory.
For reasons lost in the mists of time, I originally had one of the conversions be to the Mayan calendar. Maybe that’s because there was a lot of silly talk back then about the Mayan calendar predicting the end of the world. Whatever the reason, I didn’t care about that conversion anymore, so I dropped it.
I then added the Julian Day Number, which seemed like it could conceivably be useful if I ever need to look up some astronomical observation from back when Western calendars were in flux. OK, that’s pretty unlikely, but it’s more likely than needing the Mayan calendar. I labeled the JDN “Astro” in the script’s output.
So the source code for date-convert
is now this:
#!/opt/homebrew/bin/emacs --script
(require 'calendar)
; Use current date if no date is given on the command line
(if (= 3 (length command-line-args-left))
(setq my-date (mapcar 'string-to-number command-line-args-left))
(setq my-date (calendar-current-date)))
; Make the conversions and print the results
(princ
(concat
"Gregorian: " (calendar-date-string my-date) "\n"
" ISO: " (calendar-iso-date-string my-date) "\n"
" Astro: " (calendar-astro-date-string my-date) "\n"
" Julian: " (calendar-julian-date-string my-date) "\n"
" Hebrew: " (calendar-hebrew-date-string my-date) "\n"
" Islamic: " (calendar-islamic-date-string my-date) "\n"
" Chinese: " (calendar-chinese-date-string my-date) "\n" ))
It works like this. With no arguments, it converts today:
$ date-convert
Gregorian: Thursday, July 4, 2024
ISO: Day 4 of week 27 of 2024
Astro: 2460496
Julian: June 21, 2024
Hebrew: Sivan 28, 5784
Islamic: Dhu al-Hijjah 27, 1445
Chinese: Cycle 78, year 41 (Jia-Chen), month 5 (Geng-Wu), day 29 (Ji-Si)
With three arguments (in the American ordering of month, day, year) it converts the given day:
$ date-convert 7 4 1776
Gregorian: Thursday, July 4, 1776
ISO: Day 4 of week 27 of 1776
Astro: 2369916
Julian: June 23, 1776
Hebrew: Tammuz 17, 5536
Islamic: Jumada I 17, 1190
Chinese: Cycle 74, year 33 (Bing-Shen), month 5 (Jia-Wu), day 19 (Ji-Chou)
So it’s been 90,580 days since the signing of the Declaration of Independence. If you’re wondering when we’ll hit the 100,000-day mark, that’ll be on JDN 2,469,916, or April 19, 2050. I’m hoping to still be around then, and I hope the country is, too.
-
The documentation makes it sound as if both functions existed simultaneously and
string-to-int
was dropped as an unnecessary redundancy. I don’t feel like digging through change logs to get the full story. ↩