Crossing the thresholds of indignation and inclusiveness

XV.2 March + April 2008
Page: 76
Digital Citation

UNDER DEVELOPMENTRaising a billion voices

Sheetal Agarwal, Arun Kumar, Sougata Mukherjea, Amit Nanavati, Nitendra Rajput

back to top 

Almost a year back, we started working on an exploratory research project Pyr.mea.IT [1]. "The bottom of the pyramid is the largest but poorest socio-economic group. In global terms, this is the four billion people who live on less than $2 per day, typically in developing countries [2]". Since almost all of our research in computer science and information technology has until recently focused on the top of the pyramid, we thought it might be a good idea for us to get at least somewhat acquainted with our end users. All of the authors have lived and grown up in India, so we have a reasonable understanding of the people around us, or so we thought. We conducted some initial surveys, in about 10 cities and towns in India, with fruit sellers, milk-delivery men, auto rickshaw drivers, plumbers, and the like, to get a firsthand idea of the way technology, not just IT, impacts their lives and their level of comfort in using it.

Two things, seemingly contradictory, are inescapable in today's India: the lack of literacy and the penetration of mobile phones. While the former has been around for years, the latter is a recent phenomenon. Even people whose monthly salary is one-fifth the cost of a mobile phone are carrying one around with them (the mobile is shared with the family). One milkman we talked to does not use the address book to store and retrieve numbers! He dials the number every time. Another young man plays games on his, although he cannot read or write. Invariably, all the folks surveyed use the mobile phone to talk and stay connected with family and clients. Although sending a text message is often cheaper than making a phone call, lack of literacy makes that a nonexistent option for most of these people. Somewhat interestingly, it is also true that many educated people in India do not use the mobile except for talking. One thing becomes clear: Services relevant to various sections of the society are either nonexistent or the interface is practically unusable.

Until as recently as four to five years ago, mobile phones were still expensive, and getting a landline phone connection was complicated. Many of the plumbers, electricians, and carpenters come to the city from neighboring towns and villages and stay with friends and relatives, so address verification becomes an issue, and it could take several months to get a connection. The processes were slow, and there was no competition for the telecom company. As a result, freelance plumbers, electricians, and carpenters used to associate themselves with an electrical shop or a hardware store to find job assignments. People typically call up these shops for such services, and the shopkeeper sends the workers on assignments and collects a fee from them. The falling price of the mobile phone has changed this system. The workers can now buy a prepaid connection over the counter, thereby gaining independence from the shopkeepers. They get assignments by word-of-mouth and through inexpensive advertisements in the local yellow pages.

In almost all developing countries around the world, Internet penetration is much lower than that of the mobile phone, and the rate of increase of mobile-phone penetration far exceeds that of the Internet. This fact, coupled with the obvious preference of speech interfaces over textual ones, led us to the vision of the Telecom Web [3, 4]. The Telecom Web is a worldwide network of Voicesites, just as the World Wide Web is a network of websites. A Voicesite is a voice-driven application that consists of voice pages (say, VoiceXML files) that are hosted in the telecom infrastructure.

The Telecom Web exists and operates on the telephony network. People browse Voicesites by talking with them, traverse from one Voicesite to another via VoiLinks, and even conduct transactions over voice. The Telecom Web figure shows several Voicesites connected to each other via VoiLinks, which make it possible to move from one Voicesite to another by uttering commands or keywords. This introduces a "browsing-by-talking" experience that includes the possibility of supporting "back buttons" ("go to the previous Voicesite"), bookmarks, etc. The Voicesites can be identified by phone numbers playing the role of URLs. When one traverses a VoiLink to go from one Voicesite to another, this is more than a simple call transfer—the context of the conversation also needs to be transferred along with the call [4].

A common objection to the general acceptance of such an approach is the frustrating experience we've had so far in using voice applications. However, we believe that there is a reason for cautious optimism: In already developed regions, alternatives to voice have been available, and so expectations are different. For our targets, this will enable them to do things they have never been able to do, and by starting out with small applications [5], we might find the right way to use voice. Just as the proliferation of the World Wide Web hinged upon the simplicity of creating a website (HTML), so will the proliferation of the Telecom Web depend upon the ease of creation of Voicesites. We have built a system called VoiGen [6], which lets you create your Voicesite just by making a phone call.

For these micro-business freelancers, a missed call is missed revenue. Now, suppose our freelancers could have their Voicesites—this would mean an online presence for them. What if a potential client could reach a plumber's Voicesite and schedule an appointment with him? We created a template for a plumber, which included questions such as "Enter your welcome message," "What are your working hours," and "Would you like to mention references for your work?" The plumber's answers are recorded by VoiGen and used to create a Voicesite so that when a potential client calls up the plumber, he hears the plumber's voice taking the client through various possible interactions with the Voicesite. The system can be set up such that when the plumber is unable to pick up the call, the call is redirected to his Voicesite, or alternatively, all calls first get directed to the Voicesite, and you are connected to the plumber only if you need to speak with him. VoiGen becomes the equivalent of a "talking HTML editor" for creating a Voicesite.

Just to try this with real targets, we sampled 12 freelancers in South Delhi. None of them had ever interacted with an IVR before, let alone browsed the Internet. We explained the whole idea of having a Voicesite to them, and also the mechanism of creating one. Ten out of those 12 were able to create their Voicesite in under four minutes (this includes the time it took us to explain things), which means that the concept of a Voicesite and the user interface to create it were reasonably compelling and intuitive. Two of them could not: The very first interaction was in a noisy environment, and the user did not have the patience to repeat what he was supposed to say. To reduce noise, the interaction venue was shifted to a car. Another one failed to create his Voicesite because he thought he was interacting with a human at the other end and assumed that free speech would work.

In several parts of the world where Internet access is deep and literacy is not an issue, the World Wide Web suffices. There are several ongoing efforts to make the Web accessible over voice; the notion of a Telecom Web in such regions is superfluous. And yet in regions where the telephony (largely mobile) penetration is far higher, and rising faster than Internet penetration, the Telecom Web has a major role to play: online presence, information, and commerce for everyone. For better impact, the two webs will have to leverage each other. It should become possible for the websites to be accessible from the Telecom Web, and the Voicesites to be accessible from the World Wide Web. Excuse me, I hear my Voicesite calling!

back to top  References

1. Pyr.mea.ITPermeating IT towards the Base of the Pyramid. projects.nsf/pages/pyrmeait.index.html.

2. Bottom of the Pyramid.

3. Kumar, A., Rajput, N., Chakraborty, D., Agarwal, S., and A. Nanavati, "WWTW: World-Wide Telecom Web," ACM SIGCOMM Workshop on Networked Systems for Developing Regions, ACM SIGCOMM 2007.

4. Agarwal, S., Chakraborty, D., Kumar, A., Nanavati, A., and N. Rajput, "HSTP: Hyperspeech Transfer Protocol," ACM Conference on Hypertext and Hypermedia, Manchester, UK, 2007.

5. Kumar, A., Rajput, N., Agarwal, S., Chakraborty, D., and A. Nanavati, "Organizing the Unorganized: Employing IT to Empower the Underprivileged," To appear in The 17th International World Wide Web Conference, Beijing, China, 2008.

6. Kumar, A., Rajput, N., Chakraborty, D., Agarwal, S., and A. Nanavati, "VOISERV: Creation and Delivery of Converged Services through Voice for Emerging Economies," IEEE WoWMoM Industry Track, 2007.

back to top  Authors

Sheetal K. Agarwal
IBM India Research Laboratory (New Delhi)

Arun Kumar
IBM India Research Laboratory (New Delhi)

Sougata Mukherjea
IBM India Research Laboratory (New Delhi)

Amit A. Nanavati
IBM India Research Laboratory (New Delhi)

Nitendra Rajput
IBM India Research Laboratory (New Delhi)

About the Authors

Sheetal Agarwal is a technical staff member at IBM India Research Lab. She did her master's in computer science at the University of Maryland, Baltimore County. Her research interests include pervasive and ubiquitous computing, mobile applications, and, more recently, information and communication technologies for emerging economies.

Arun Kumar is a research staff member at IBM Research, India. He obtained a master's degree in computer science engineering in 1999 and is currently pursuing a Ph.D. from IIT Madras, India. He served on the program committee for ACM SAC 2008 and 2007 and has been published in reputed international conferences and journals. His research interests include ICT for developing regions, service oriented computing, object oriented programming, semantic Web services, and distributed systems.

Sougata Mukherjea is a research staff member and manager of the Telecom Research Innovation Center at the IBM India Research Lab. He received his bachelor's from Jadavpur University, Calcutta, a master's from Northeastern University, and a Ph.D. from Georgia Institute of Technology (all in computer science). Before IBM, he held research and software architect positions at NEC USA, Inktomi, and BEA Systems. His research interests include middleware technologies and its applications to telecom, data analysis, information retrieval, and visualization.

Amit Anil Nanavati is a research staff member at IBM India Research Lab. He has a Ph.D. in computer science from Louisiana State University. Prior to IBM Research, he worked for Netscape Communications. His recent research focus has been on telecom solutions, especially for emerging economies. He is also interested in applications of graph theory in various domains. Before completing his Ph.D., he spent a summer at the Jet Propulsion Laboratory, Caltech, NASA.

Nitendra Rajput has been working as a researcher at the IBM India Research Lab, New Delhi since March 1998. His areas of interest include speech processing, image processing, and dialog management. He has done projects on audio-visual speech recognition, Hindi speech recognition, and conversational systems for pervasive devices. His current work involves application of speech technology interfaces for developing countries. Prior to joining IBM Research, he completed master's from IIT Bombay in communications.

Gary Marsden

back to top  Footnotes


back to top  Figures

UF1Figure. Creating/accessing a Voicesite: Scene from a village in India.

back to top 

©2008 ACM  1072-5220/08/0300  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2008 ACM, Inc.

Post Comment

No Comments Found