Ahmed Bouzid, Weiye Ma
On a Sunday afternoon a few months ago, we called the local number of a large national retail store to find out what time the store was closing. As expected, we were greeted by an automated interactive voice response (IVR) system that opened with the customary, "Thank you for calling...." But then it did something completely unexpected. Instead of rambling on with a long menu of options, it simply said, "Our Sunday hours are from 10 a.m. to 6 p.m. Our regular weekday hours are from 10 a.m. to 9 p.m." Then it politely asked: "Do you need anything else?"
We were astounded!
As IVR professionals and voice user interface (VUI) practitioners, we are always attentive to the types of IVR experiences we encountered in our daily lives. This was not a typical experience. It was as if the system had read our mind, as if it knew why we were calling. The entire call lasted no more than 30 seconds, and when we hung up, we had the exact information we needed!
After the call, we smiled and shook our heads. "It doesn't really take much to delight customers these days, does it?" we said to each other. "All it takes is for the company to care about its customers and to go that extra mile to understand their needs." In this case, someone at this retail chain had figured out that the majority of calls to their local stores on a Sunday afternoon were about store hours; he or she then designed and built an IVR system that volunteered that information upfront. The result: shorter calls, fewer calls to busy store associates and managers, and perhaps most important, satisfied customers (in our case, astounded and delighted ones).
But the real reason for our astonishment has to do with the extremely low expectations we have of the IVR systems that we tolerate on a regular basis. Not only do we not expect such systems to read our mind and anticipate our needs, but we outright expect them to frustrate and abuse us. We expect them to be stupid and boorish. We expect them to waste our time and annoy us. All while we patiently wait for a human to come to our rescue.
The IVRs with which we regularly interact force us to listen to several minutes of announcements that have nothing to do with us and our needs. We endure long menus that often list unclear or ambiguous options, so that we often don't know which option applies to us or the difference between options. We are bounced from one IVR to another; sometimes we are made to wait a long time only to be directed to voicemail, and sometimes the IVR just hangs up. At such times, the outcome is the opposite of delight: it is outrage.
That is why, of all the technologies that people use on a daily basis, IVR systems are by far the most despised. People react emotionally rather than intellectually to their IVR experiences. They experience a range of emotions, from irritation to annoyance, confusion, and bewilderment, to frustration and disgust, frequently ending up in a state of anger or even fury. And when, once in a blue moon, the IVR spares them those experiences and gives them what they want, they are elated.
In fact, it is difficult to think of another ubiquitous technology that its users perceive so negatively. People are relatively happy using ATM machines, even though ATMs have not evolved in any meaningful way since their introduction in the early 1980s. No one complains anymore about having to pump their own gas; in fact, people are annoyed if they can't serve themselves at a gas station. And even the relatively newly introduced self-checkout machines in supermarkets are gaining wider acceptance by customers who prefer to serve themselves rather than wait in line for a cashier.
So, clearly, it is not the self-service aspect of IVRs to which people are reacting. When provided with a technology that enables them to serve themselves, people will embrace that technology as an empowering tool. But they will embrace it only if they are able to use it, and they will see it as empowering only if they are able to systematically accomplish their goals without frustration.
Currently deployed IVRs are generally not viewed as empowering tools, or ones that can effectively serve caller needs. Instead, users tend to perceive them as obstacles installed by companies to keep callers from reaching expensive human agents.
The glaring disconnect between what companies aim to achieve in deploying IVR systems (better customer service) and what they actually do achieve (customer frustration) can be squarely laid on the shoulders of shabby voice user interface (VUI) design and implementation. The vast majority of today's IVRs are, simply put, shamefully unusable, and customers detest them.
The viscerally strong reaction that callers have to IVRs is fully justified. IVRs not only fail to do their job, but they also fail while pushing some of our most sensitive emotional buttons. They treat us with little intelligence and thoughtfulness and exhibit an unsettling degree of irrationality that breeds contempt, if not revulsion, against them. Rare and precious is the IVR system that respects your time, anticipates your needs, and gives you exactly what you need, quickly and efficiently.
Now envision a new breed of IVR system.
Imagine this: You are at home, lying on the couch watching your favorite TV show, and you want to quickly ascertain your checking-account balance. So you reach for your cell phone, fling it open, press the "9" key, press the "call" button, place the cell phone to your ear, and engage an IVR system as follows:
System: Hi there! The last four digits?
System: Okay. Hang on. Your balance is $5,235 dollars and 23 cents. Anything else?
System: Great. Goodbye.
At which point you would flip your cell phone shut and then go back to watching your show. The whole interaction would have taken you between 20 and 30 seconds, no more.
Compare this with getting your information from the Web. If you are like us and you use a desktop at home, it means you would have to get up from your sofa, walk to the room where the desktop is, turn the computer's monitor on, log in to the machine (we have it password-protected), open the browser, click on the tab that points to your bank's login page, type the login credentials, and then navigate to where your checking balance is displayed. After that, you'd log out from the account and bring the browser down (to minimize any security risks), switch the monitor off, and shuffle back to your sofa. At best, it would have taken you between four and five minutes to accomplish your task.
What if you had a laptop? Well, maybe you would be able to shave off a minute or so, but only if you had the laptop nearby and it was connected to the Internet (which would probably mean that you had WiFi at home).
What if you had a smart phone (BlackBerry, Palm, iPhone, etc.)? You wouldn't have to get up from your sofa, right? Yes, but have you tried navigating the Internet with any of those devices? At best, the experience is less than gratifyingusually it's downright painful. The iPhone has made great strides over its competitors in the display of Web pages, but it took a step backward in information entry: It is relatively easier to type with a BlackBerry or a Palm than it is with an iPhone. I say "relatively easier," because typing with the BlackBerry or the Palm is no mean feat: You need to learn to firmly grip the device in both hands, hold it about 10 inches away from your face, chin down, and then hunt and peck with both thumbs. This is an uncomfortable activity that no doubt we will learn, sooner or later, is not healthy for our hands. (The American Society of Hand Therapists is already issuing warnings, and an informal name for pain experienced when engaging in excessive typing on these devices has already arisen: "BlackBerry Thumb.")
So, then, it turns out that when it comes to the simple task we described here, the most cutting-edge technologies (desktops, laptops, smart phones) do not compare with our humble telephone. In other words, even with expensive hardware and the service charges that come with them, it would take more time and effort to accomplish the simple task of learning your current checking-account balance while using such technology than it would by doing so with a phone and an IVR system.
What does it tell us? Simply that IVR technology is here to stay. It is here to stay because for certain tasks, it can do the job at a lower cost, more quickly, and with less effort on the part of the end user than any of the most cutting-edge communication technologies out there.
But then, you ask, why do people hate IVRs? Why do they groan and shake their heads in dismay when they realize they are about to interact with a machine over the telephone?
The interaction we described is not your typical exchange between a user and an IVR system. Your typical IVR would have greeted you with some 30 seconds of chest-thumping messaging about the company, followed by some mindless instructionssuch as, "For English, press 1" or "Please listen carefully, as our menu options have changed." It would have listed a long menu of options; required you to select the "check balance" option, then "the checking account" option; and then would have required you to enter your full checking-account number. And finally, for security purposes, a PIN, and only then would have finally given you the balance. A grueling three or four minutes would have elapsed. And if you had not committed your 14-digit checking account number to memory, you would have had to get up and retrieve your checkbook, unless you happened to have it nearby!
So, what did it take to have the ideal IVR system we initially described to behave as it did?
Here are the keys to its effectiveness: (1) it recognized who the caller was, (2) it knew that they were calling to retrieve their checking-account balance, (3) it did not waste time talking, saying only what it needed to say, and (4) it let the caller speak their answers.
Can this interaction be implemented with today's technology? Absolutely. With caller ID and the last four digits of the caller's checking account (easy to memorize, especially if you are calling once a week), the user can be identified and validated, and the balance retrieved and spoken back to the user in a matter of seconds. With some intelligence in the backend (a simple Naïve Bayesian algorithm would amply do), the system can quickly learn that most of your calls are about balance inquiries. With that knowledge, the system can adapt its interaction to shorten all of its verbal prompts to the bare minimum (e.g., "The last four digits?" rather than "What are the last four digits of your checking-account number?"), ask only for the information needed to accomplish its task, and then execute that task. And with the current state of speech recognition, letting the user speak back the last four digits of their account and say "no" is a trivial task.
There is no reason, then, why every single IVR system in use today cannot be as effective as the one described here. Give the customer a system that helps them, that solves their problem without wasting their time, and they will use it and love it every time.
Ahmed Bouzid has been practicing professionally in the IVR industry since 1995, when he joined Unisys Corporation's Natural Spoken Language group. He was in charge of building tools for designing and deploying speech-enabled, natural language-powered IVR systems. Since 1995 he has worked extensively both on the technical aspect of deploying IVR solutions and on the customer-interfacing, business side of the industry. Bouzid has published several papers related to voice user interface design, natural language processing, and speech. He is currently head of product at Angel.com.
Weiye Ma obtained her Ph.D. in speech processing and recognition from Katholieke Universiteit Leuven (Belgium) in 1999. She has been practicing professionally in the speech recognition field since 1994. She joined the Unisys Corporation's Natural Spoken Language group in 1995 and worked there at the speech division, focusing on integrating speech recognizer software with IVR telephony platforms. Ma has worked extensively in voice interface design of IVR systems in the context of speech and has published several papers on speech recognition. She is currently a speech scientist at Talvent Corporation.
©2010 ACM 1072-5220/10/0300 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.