June 11, 2016 at 10:55 PM by Dr. Drang
Some of the most common predictions for Monday’s WWDC keynote have to do with Siri. Siri coming to the Mac is a popular prediction, as is the opening of a Siri API to third-party developers. Behind both of these is the unspoken prediction—or maybe it’s just a hope—that Siri will be improved. Apple was out in front in voice assistance when Siri was introduced in 2011, but now it’s seen to be lagging behind both Google (unsurprisingly) and Amazon (quite surprisingly).
In a recent episode of The Talk Show, John Gruber and Merlin Mann talked about the troubles with Siri, and their key complaint, I think, could be summed up in one word: context. Siri doesn’t seem to take advantage of what it knows, or should know, to give reasonable responses. I agree and find Siri’s inability to put requests in context far more frustrating than its mistakes in voice recognition. In fact, just before I listened to The Talk Show, Siri displayed for me a particularly egregious example of contextual ignorance.
I was driving up through central Illinois, a trip I’ve made more times than I can count. I was not using Siri/Maps to give me turn-by-turn directions. I was just listening to some podcast or another and watching the scenery, such as it is, go by. Somewhere south of Effingham, I realized that I’d lost track of how far away it was. Not a big deal, but Effingham is about the halfway point of my trip, and I usually stop there for gas, a bathroom break, and to text my wife an ETA.
My iPhone was charging and sitting upside-down in a cupholder in the center console. I pushed the home button, waited for the Siri beep to come through my car’s speakers, and asked “How far is it to Effingham?”
Siri’s response: “Which Effingham? Tap the one you want.”
On the positive side, Siri recognized the word “Effingham” and recognized it as a place name. But those successes made its two context failures even more annoying.
First, I’m driving north on I-57 in Illinois between Mount Vernon and Effingham. Which effing Effingham do you think I want?!
Now you might argue that Siri has no way of telling which Effingham I want because I’m not using turn-by-turn directions. Don’t be such an apologist. Siri knows it’s being asked for a distance between where I am and some other place. The only way it can get that data is through Maps. In fact, it knows there are multiple Effinghams because of Maps. If it’s done all that already, why not make a reasonable guess as to which one I mean. Isn’t that the point of “artificial intelligence”?
But let’s say Siri is being extra careful. It’s taken that “ass out of you and me” thing to heart and refuses to assume. That leads us to its second contextual error: asking me to tap on something while I’m driving instead of allowing me to answer verbally.
And Siri knows damned well I’m driving. It’s connected to my car via Bluetooth. It can use its GPS to figure out I’m moving 80 mph. It has no business asking me to tap on a choice.
(Frankly, I could have picked up my phone and tapped on a choice safely. But because Siri’s stupidity had stunned me into a second or two of inaction, it disconnected as I was picking the phone out of the cupholder.)
This is the kind of thing Siri must get better at. It’s failure to handle my request had nothing to do with big data or privacy concerns—things that are often cited as reasons Apple can’t compete with the data-hoovering Google. Siri knew everything it needed to know to answer my question. It just wasn’t smart enough to put it all together.
This kind of frustration isn’t exclusive to Siri, though. A few days later, I was at a Sonic drive-in, placing an order. I pushed the red button at the bottom of the menu board and had the following conversation with the young woman who answered.
“Welcome to Sonic. Can I take your order?”
“Yes, I’ll have the Number One Combo with a Coke.”
“Do you want ketchup, mustard, and mayo on the burger?”
“No mayo. Everything else.”
“So you want ketchup and mustard?”
“Do you want fries with that?”
(I had never been to a Sonic before, and I confess I hadn’t read the menu carefully enough to know there was a choice between fries and “tots.” Had I known, I would have given my choice when I first ordered. I’ve since been told tots are better.)
“I’m sorry, could you repeat that?”
“Are you saying ‘tots’?”
“No, I’m saying ‘yes’.”
“Yes, to your question if I want fries. Yes, I want fries.”
“What would you like to drink?”
People often say they want Siri to act more like a human being. I hope Apple sets the bar higher than that.