Holes in the Wolfram Knowledgebase
December 3, 2023 at 3:02 PM by Dr. Drang
Wolfram touts its Knowledgebase as “the world’s largest and broadest repository of computable knowledge” and “carefully curated expert knowledge directly derived from primary sources.” There’s certainly a lot in there, but there are some inexplicable holes that could be filled with little effort.
I used the Knowledgebase last year in my post about orbital curvature. Things like
Entity["Planet", "Earth"]["AverageOrbitDistance"]
and
Entity["Planet", "Earth"]["Mass"]
pulled information out of the Knowledgebase so I didn’t have to look it up outside of Mathematica and paste it into my code. Very convenient.
But I learned a few months ago that another part of the Knowledgebase was missing data, which could get in the way of other types of calculation. I was testing out the kind of state- and county-level information I could access, and my initial explorations focused on where I live: DuPage County, Illinois.
To be sure, the Knowledgebase has lots of info on DuPage County. It knows, for example, the area, the population, the per capita income, and the number of annual births and deaths. But it doesn’t know the county seat, which I would think is easier to determine and enter into the Knowledgebase than most of the other stuff—not to mention more stable than transient figures like population and income.
Broadening my exploration to all the counties in Illinois, I learned that of our 102 counties, Wolfram knew the capitals of all of them except DuPage and DeKalb counties. So this command
AdministrativeDivisionData[
Entity["AdministrativeDivision", {"CookCounty", "Illinois",
"UnitedStates"}], "CapitalName"]
returns
Chicago
as expected, while both of these commands,
AdministrativeDivisionData[
Entity["AdministrativeDivision", {"DuPageCounty", "Illinois",
"UnitedStates"}], "CapitalName"]
and
AdministrativeDivisionData[
Entity["AdministrativeDivision", {"DeKalbCounty", "Illinois",
"UnitedStates"}], "CapitalName"]
return
Missing["NotAvailable"]
This is not exactly obscure information, and there are reliable sources from which to get it. Here, for example is a map from the Illinois Blue Book, an official publication of the state.
As you (and the folks at Wolfram) can see, the DuPage and DeKalb county seats are Wheaton and Sycamore, respectively.
I sent an email to Wolfram about the missing county seat data and got a boilerplate reply saying their development team would review it. That was in August; the Knowledgebase still returns Missing["NotAvailable"]
.
Recently, I decided to look for missing county seats in every state. Here are all the counties—or administrative divisions that Wolfram treats like counties— that are missing their capitals in the Knowledgebase:
County w/o seat | State |
---|---|
DeKalb County | Alabama |
Aleutians West | Alaska |
Bethel | Alaska |
Chugach | Alaska |
Copper River | Alaska |
Dillingham | Alaska |
Hoonah-Angoon | Alaska |
Nome | Alaska |
Prince of Wales-Hyder | Alaska |
Petersburg | Alaska |
Skagway | Alaska |
Southeast Fairbanks | Alaska |
Kusilvak Census Area | Alaska |
Wrangell | Alaska |
Yukon-Koyukuk | Alaska |
Mono County | California |
Sierra County | California |
Conejos County | Colorado |
Wakulla County | Florida |
Columbia County | Georgia |
Crawford County | Georgia |
DeKalb County | Georgia |
Echols County | Georgia |
Kalawao County | Hawaii |
Owyhee County | Idaho |
DeKalb County | Illinois |
DuPage County | Illinois |
DeKalb County | Indiana |
LaPorte County | Indiana |
Plaquemines Parish | Louisiana |
St. James Parish | Louisiana |
Keweenaw County | Michigan |
Lake of the Woods County | Minnesota |
DeSoto County | Mississippi |
Franklin County | Mississippi |
DeKalb County | Missouri |
McPherson County | Nebraska |
Esmeralda County | Nevada |
Eureka County | Nevada |
Lincoln County | Nevada |
Storey County | Nevada |
Burlington County | New Jersey |
Mora County | New Mexico |
Rio Arriba County | New Mexico |
Bronx County (The Bronx) | New York |
Broome County | New York |
Kings County (Brooklyn) | New York |
New York County (Manhattan) | New York |
Queens County (Queens) | New York |
Richmond County (Staten Island) | New York |
Camden County | North Carolina |
Currituck County | North Carolina |
Hyde County | North Carolina |
Dunn County | North Dakota |
Bristol County | Rhode Island |
Kent County | Rhode Island |
Buffalo County | South Dakota |
DeKalb County | Tennessee |
Borden County | Texas |
Glasscock County | Texas |
Kenedy County | Texas |
King County | Texas |
Loving County | Texas |
McMullen County | Texas |
Montague County | Texas |
Palo Pinto County | Texas |
Young County | Texas |
Rich County | Utah |
Alexandria (independent city) | Virginia |
Amelia County | Virginia |
Bath County | Virginia |
Bland County | Virginia |
Bristol (independent city) | Virginia |
Buckingham County | Virginia |
Buena Vista (independent city) | Virginia |
Charles City County | Virginia |
Charlottesville (independent city) | Virginia |
Chesapeake (independent city) | Virginia |
Colonial Heights (independent city) | Virginia |
Covington (independent city) | Virginia |
Cumberland County | Virginia |
Danville (independent city) | Virginia |
Dinwiddie County | Virginia |
Emporia (independent city) | Virginia |
Fairfax (independent city) | Virginia |
Falls Church (independent city) | Virginia |
Fluvanna County | Virginia |
Franklin (independent city) | Virginia |
Fredericksburg (independent city) | Virginia |
Galax (independent city) | Virginia |
Goochland County | Virginia |
Hampton (independent city) | Virginia |
Hanover County | Virginia |
Harrisonburg (independent city) | Virginia |
Hopewell (independent city) | Virginia |
Isle of Wight County | Virginia |
King and Queen County | Virginia |
King George County | Virginia |
King William County | Virginia |
Lancaster County | Virginia |
Lexington (independent city) | Virginia |
Lunenburg County | Virginia |
Lynchburg (independent city) | Virginia |
Manassas (independent city) | Virginia |
Manassas Park (independent city) | Virginia |
Martinsville (independent city) | Virginia |
Mathews County | Virginia |
Middlesex County | Virginia |
Nelson County | Virginia |
New Kent County | Virginia |
Newport News (independent city) | Virginia |
Norfolk (independent city) | Virginia |
Northumberland County | Virginia |
Norton (independent city) | Virginia |
Nottoway County | Virginia |
Petersburg (independent city) | Virginia |
Poquoson (independent city) | Virginia |
Portsmouth (independent city) | Virginia |
Powhatan County | Virginia |
Prince George County | Virginia |
Radford (independent city) | Virginia |
Richmond County | Virginia |
Roanoke County | Virginia |
Salem (independent city) | Virginia |
Stafford County | Virginia |
Staunton (independent city) | Virginia |
Suffolk (independent city) | Virginia |
Sussex County | Virginia |
Virginia Beach (independent city) | Virginia |
Waynesboro (independent city) | Virginia |
Williamsburg (independent city) | Virginia |
Winchester (independent city) | Virginia |
Quite a list. Now there are legitimate (or at least arguable) reasons some of these counties are missing their county seat:
- Some counties really don’t have a county seat. Kalawao County in Hawaii, for example.
- Some administrative districts in Alaska don’t have a capital. Alaska has boroughs rather than counties, and one of them, called the Unorganized Borough,1 is further subdivided into “census areas,” none of which have a capital.
- Virginia has, in addition to counties, “independent cities,” which appear to be at the same level as counties.2 While I think Wolfram would be justified in saying these independent cities are their own capitals, it’s decided to say the capitals are missing, which is a legitimate choice, too.
- Five counties in New York match up with the boroughs of New York City. Pretty hard to choose the capital of a portion of a city, so these counties also have missing capitals.
But most of the counties with missing county seats are like DuPage and DeKalb counties—regular counties with regular county seats that are just not included in the Knowledgebase, despite them being easy to look up and verify. There are fewer than 100 of them. I don’t know why they’re missing, but filling in missing values like this is a pretty standard data cleaning operation. And as I said earlier, this is pretty much a one-time operation; counties just don’t change seats very often.
I haven’t sent this list off to Wolfram. If the people on its development team can’t be bothered to clean the data in their own home state, how likely is it that they’ll fill in all the other states’ data? But they should.
Update 4 Dec 2023 11:45 AM
Chon Torres on Mastodon informed me that the two California counties, Mono and Sierra, do have county seats, but they’re unincorporated, and that might explain why they’re missing from the Knowledgebase. That’s a good explanation, but I would argue with Wolfram that it’s a poor reason for excluding a capital. A county seat should have county government offices—Chon mentioned that he’s been at the Sierra County Courthouse in Downieville—but I don’t see why it needs a municipal government.
Adding to the madness of Virginia government, Sam Davies told me that an independent city can also be the county seat of a county that it’s been carved out of. The example he gave was Charlottesville, which is both an independent city and the capital of Albermarle County. To me, a more disturbing example is Fairfax, which is an independent city but also the county seat of—yes, that’s right—Fairfax County.
Thanks to Chon and Sam for the local government expertise.
-
Which is apparently not actually a borough itself, despite its name. This is more than I wanted to know about Alaska’s government. ↩
-
As with the Unorganized Borough in Alaska, this is more than I wanted to know about Virginia’s government. ↩