Followup on Wolfram county data

You may remember a post I wrote back in December about inexplicable holes in US county-level data in the Wolfram Knowledgebase. I say “you may remember” because I certainly didn’t until I got an email from Wolfram last week telling me that the holes had been filled. Yes and no, as it turns out.

I first discovered the problem in August of last year. I wanted to use Mathematica to get a list of all the county seats in Illinois, and I noticed that the data for DeKalb and DuPage counties were missing. It wasn’t available through a Mathematica function call or through a WolframAlpha query. Given that Wolfram is based in Illinois, I thought this was a particularly glaring error, so I sent them feedback.

By the time I wrote the post in December, I had looked into the issue further and discovered that the Knowledgebase was missing county seats for counties all over the country. I didn’t follow up with Wolfram on that because I thought they’d already memory-holed my original complaint.

But no! The email that came last week told me that the problem (for Illinois) had been fixed and said I could go to this link to see for myself. If you follow the link, you’ll have to tap the More button on that page, but when you do, you’ll see that DeKalb and DuPage counties now have their county seats. Which is nice.

Excerpt from WolframAlpha showing Sycamore and Wheaton

I changed the WolframAlpha query to look for California counties and found that the county seats for Mono and Sierra counties had also been filled in. Similarly for DeKalb and LaPorte counties in Indiana. So it looked like Wolfram had cleaned up the data across the country.

But that was in WolframAlpha. When I tried to get the same information through an equivalent function call in Mathematica, the county seats for DeKalb and DuPage were still missing.

It might not be immediately obvious, but if you look carefully, you’ll see Missing[NotAvailable] in the 19th and 22nd positions of the list.

The same problem can be found in other states: county seats that now appear in WolframAlpha on the web are still missing from the equivalent Mathematica function calls. This inconsistency is even weirder than the original missing data. How can calls to what should be the same database produce different results?

So I’ve sent feedback on this to Wolfram, and I’ll let you know when they answer. Given their previous speed, you can expect that post sometime next June.