Doug McIlroy and Bing Copilot

This morning, I read a short article describing certain deficiencies in Bing Copilot when it comes to doing math. The article interested me for two reasons:

The article investigates three math problems: one having to do with time zones, one an elementary logic problem, and the third a simple calculus problem. At first, I thought the time zone problem wasn’t really a math problem, but I soon learned that it was. McIlroy asked Copilot for the minimum time difference between Oregon and Florida. Copilot knew that both states have two time zones, and it could regurgitate the definition of “minimum,” but it didn’t know how to put those two pieces of information together.

I’ll let you explore the logic problem on your own. Suffice it to say that you shouldn’t worry about beating Copilot in a game of rock-paper-scissors.

The calculus problem was the most complicated: given a general ellipse defined by the usual equation,

x 2a 2+y 2b 2=1

find the points on the ellipse where the magnitude of the slope is one.

McIlroy gave Copilot several opportunities to solve this problem. Although it always got the formula for the slope correct, it failed in different ways each time to apply that formula to the problem at hand.

The value of the article isn’t in simply pointing out that LLMs can be wrong. It’s in McIlroy’s detailed review of Copilot’s answers and where it went wrong in every step. By taking small problems and exploring the errors thoroughly, McIlroy does a better job of finding faults in large language models than any of the broad-brush criticisms I’ve read.

In the notes at the end of the article, McIlroy gives us its editing history. First written in December of last year; last edited just a few days ago. I should mention here that McIlroy will be celebrating his 92nd birthday tomorrow.