“Zhège,” she said, pointing emphatically at the top right of her iPhone screen. She leaned farther into the gap between the passenger and driver seats of the taxi. Then, lifting her head, she pointed forward through the windshield in a direction that, I assumed, was where we were hoping to soon be headed.
The taxi driver looked at her quizzically.
Undeterred, she repeated the motion, accompanied by a slower, more carefully enunciated rendition of the word: “zhège.” This time she added a new motion. She pointed at the bottom left of her iPhone screen, at herself, at the taxi driver himself, and then at the ground below us. Balletic though this motion was, it did not reduce the look of confusion on the driver’s face.
Gently taking the device from her hand, he studied the screen. A moment later, his expression changed. He smiled and nodded. He stretched out the index finger on his right hand, pointed to the location on the screen she had isolated, and said, “Zhège.” He handed the device back to her, flipped on the meter, and grasped the steering wheel. A second later we accelerated out of the taxi rank. He had understood the point of her point(s).
My traveling partner, Shelly, and I know precisely six words of Chinese. “Zhège,” meaning “this,” is one of them. We cannot actually pronounce any of the words we know with any consistency. Sometimes, people nod in understanding. Mostly they don’t. However, the scenario I painted here is how we navigated two weeks in China. The word “navigated” is intentionalit is about the physical and conceptual charting of courses, the traversal of options. We navigated space and location. We navigated food. We navigated products. We navigated shopping locations, shopping possibilities, and shopping traps (always a concern for tourists, wherever they may be). We did all this navigation speechlessly; owing to our linguistic ignorance, we accomplished it by pointing. We pointed at menus. We pointed at paper and digital maps. We pointed at applications on our phones. We pointed at ourselves. We pointed at desired products. We pointed in space toward unknown distant locations… Basically, we pointed our way to just about all we needed and/or wanted, and we got our way around Beijing with surprisingly few troubles.
Pointing of this kind is a deictic gesture. The Wikipedia definition of “deixis” is the “phenomenon wherein understanding the meaning of certain words and phrases in an utterance requires contextual information. Words are deictic if their semantic meaning is fixed but their denotational meaning varies depending on time and/or place” . In simpler language, if you point and say “this,” what “this” refers to is fixed to the thing at which you are pointing. In the taxi scenario, it was the location on a map where we wanted to go. Linguists, anthropologists, psychologists, and computer scientists have chewed over deixis for decades, examining when the words “this” and “that” are uttered, how they function in effective communication, and what happens when misunderstandings occur. In his book Lectures on Deixis, Charles Fillmore describes deixis as “lexical items and grammatical forms that can be interpreted only when the sentences in which they occur are understood as being anchored in some social context, that context defined in such a way as to identify the participants in the communication act, their location in space, and the time during which the communication act is performed.” In his 1983 book, Pragmatics, Stephen Levinsohn states that deixis is “the single most obvious way in which the relationship between language and context is reflected.”
Pointing does not necessitate an index finger. If conversants are savvy to each other’s body movementsthat is, their body languageit is possible to point with a minute flicker of the eyes. Even a twitch can be an indicator of where to look for those who are tuned in to the signals. Arguably, the better you know someone, the more likely you will be to pick up on subtle cues because of well-trodden interactional synchrony. But even with unfamiliar others, with whom there is no shared culture or shared experience, human beings as a species are surprisingly good at seeing what others are orienting toward, even when the gesture is not as obvious as an index finger jabbing into the air. Perhaps it is because we are a fundamentally social species with all the nosiness that entails; we love to observe what others are up to, including what they are turning their attention toward. Try it out sometime: Stop in the street and just point. See how many people stop and look in the direction in which you are pointing.
Within the field of human-computer interaction, much of the research on pointing has been done in the context of remote collaboration and telematics. However, pointing has been grabbing my interest of late as a result of a flurry of recent conversations in which it was suggested that we are on the brink of a gestural revolution in HCI.
In human-device/application interaction, deictic pointing establishes the identity and/or location of an object within an application domain. Pointing may be used in conjunction with speech input, but not necessarily. Pointing also does not necessarily imply touch, although touch-based gestural interaction is increasingly familiar to us as we swipe, shake, slide, pinch, and poke our way around our applications. Pointing can be a touchless, directive gesture, in which what is denoted is determined through the use of cameras and/or sensors. Most people’s first exposure to this kind of touchless, gesture-based interaction was when Tom Cruise swatted information around by swiping his arms through space in the 2002 film Minority Report. However, while science fiction interfaces often inspire innovations in technologyit is well worth watching presentations by Nathan Shedroff and Chris Noessel  and by Mark Coleran  on the relationship between science fiction and the design of nonfiction interfaces, devices, and systemsthere really wasn’t anything innovative in the 2002 Minority Report cinematic rendition of gesture-based interaction, nor in John Underkoffler’s  presentation of the nonfiction version of it, g-speak, in a TED Talk in 2010 . Thirty years before that talk, Richard Bolt created the “Put that there” system (demoed at the 1984 CHI conference). In 1983 Gary Grimes at Bell Laboratories patented the first glove that recognized gestures, the Digital Data Entry Glove. Pierre Wellner’s work in the early 1990s explored desktop-based, gesture-based interaction, and Thomas Zimmerman and colleagues used gestures to identify objects in virtual worlds using the VPL DataGlove in the mid-1980s.
This is not to undermine the importance of Underkoffler’s demonstration; gesture-based interfaces are now more affordable and more robust than these early laboratory prototypes. Indeed, consumers are experiencing the possibilities every day. Devices like the Nintendo Wii and the Kinect for Xbox 360 system from Microsoft are driving consumer enthusiasm for the idea that swatting digital information with swinging arms is just around the corner. Anecdotally, an evening stroll around my neighborhood during a holiday weekend reveals that a lot of people are spending their evenings jumping around gesticulating and gesturing wildly at large TV screens, trying to beat their friends at flailing.
There is still much research to be done here, however. The technologiestheir usability but also the conceptual design space around themneed exploration. For example, current informal narratives around gesture-based computing regularly suggest that gesture-based interactions are more natural than other input methods. But, along with Don Norman who opined on this topic last year , I wonder, what is “natural”? When I ask people this question, I usually get two answers: better for the body and/or simpler to learn and use. One could call these physical and cognitive ergonomics. Frankly, in agreement with Norman, I am not sure I buy either of these answers yet for the landscape of current technologies. I still feel constrained and find myself repeating micro actions with current gesture-based interfaces. Flicking the wrist to control the Wii does not feel natural to me, neither in terms of my body nor in terms of the simulated activity in which I am engaged. Mastering the exact motion on any of these systems feels like cognitive work, too. We may, indeed, have species-specific and genetic predispositions for being able to pick up certain movements more easily than others, but that doesn’t make most physical skills natural, in the sense of being effortless. Actually, with the exception of lying on my couch gorging on chocolate biscuits, I am not sure anything feels very natural to me. I used to be pretty good at the movements for the game “Dance Dance Revolution,” but I would not claim that these movements are in any sense natural, and these skills were hard won through hours of practice. It took hours of stomping in place before stomping felt natural. Postures and motions that some of my more nimble friends call “simple” and “natural” require focused concentration for me. “Natural” is also sometimes used to imply physical skill transfer from one context of execution to another. Not so. Although the motions in the Wii have a metaphoric or inspired-by relationship to their real-world counterparts, I know I can win a marimba dancing competition by sitting on the sofa twitching, and I can scuba-dive around reefs while lying on the floor more or less motionless, moving only my wrist. And, yesit is usually better not to throw the Wii remote at the TV even in the most perceptually “realistic” of game play.
An occupational therapist friend of mine claims there would be a serious reduction in repetitive-strain injuries if we could just get everyone full-body gesturing rather than sitting and tapping on keyboards with their eyes staring at screens. A technologist friend said we would soon be able to “personalize” user interfacesthat is, to map physical actions and gestures of our choosing to computer commands. It made me smile to think about the transformation that cube-land offices would undergo if we redesigned them to allow employees to physically engage with digital data though full-body motion. It also perturbs me that I might have to do a series of yoga sun salutations to find my files or deftly execute a “downward-facing dog” pose to send an email. In any case, watching my friends prance around with their Wiis and Kinects gives me pause and makes me think we are still nowhere near anything that is not repetitive-strain-injury-inducing; we are, I fear, far from something of which my occupational therapist friend would truly approve.
We know that different cultures differ in their gestural repertoires. Beyond differences in the kind of gestures that are common to a culture, though, a subtler social point is at play. That you gesture at all can be seen by others as a problem, a sign of their moral superiority. Erasmus’s 1530 bestseller De Civilitate Morum Puerilium published an admonition that translates as “[Do not] shrug or wrygg thy shoulders as we see in many Italians” . Adam Smith compared the English and the French in terms of the plenitude, form, and size of their gesturing: “Foreigners observe that there is no nation in the world that uses so little gesticulation in their conversation as the English. A Frenchman, in telling a story that is of no consequence to him or anyone else, will use a thousand gestures and contortions of his face, whereas a well-bred Englishman will tell you one wherein his life and fortune are concerned without altering a muscle” . Much work was done in the first half of the 20th century on gestural and postural characteristics of different cultural groups. This work was inspired in part by Wilhelm Wundt’s premise in Volkerpsychologie that primordial speech was a gesture and that gesticulation was a mirror to the soul.
Not only that you gesture, but also how you gesture or gesticulate is socially grounded and up for public scrutiny. Even the ways people execute what could be seen as the same gesture is socially prescribed and sanctioned; like regional accents in people’s speech, the way you gesture can be a sign of your standing as part of the in-group or your relegation to the out-group. People learn appropriate and inappropriate ways to perform gestures. To Adam Smith’s point, it’s not simply that you need to perform a gesture well enough for others to recognize it; the velocity and magnitude of the gesture are ripe for scrutiny.
A recent trip to Disneyland exemplified how gestures can be interpreted differently across cultures. Disney guides point with two fingers, not just an outstretched index finger but both the index finger and middle finger. When I asked why, I was informed that in some cultures pointing with a single index finger is considered rude. Indeed, the Urban Dictionary calls the two-finger point the “Disney Point,” stating “Cast Members must point using two fingers or their whole hand, as it’s rude to point with one finger. When pointing with one finger a guest may think that the Cast Member is pointing at him/her” . The tidbits of wisdom peppering the Internet suggest that cultural concerns about pointing are rife. A (draft) Wikipedia page on etiquette in North America states “Pointing is to be avoided, unless specifically pointing to an object and not a person” . An informal, comparative “study” of cafe-based interactions in the U.S., Canada, and Australia suggests to me that most people, in Western culture at least, are blissfully unaware of this particular gem of everyday etiquette. However, it may be that pointing in general is not the problem; rather, it could be just pointing with a finger is the social faux pas. Possibly apocryphally, I was told that in some Native American cultures it is considered appropriate to point with the nose. And that some cultures are quite happy with lip pointingI note that lip pointing looked more like I was learning to blow a kiss or executing a distorted pout when I tried it.
So why bother with this pondering on pointing? I am wondering what research lies ahead as this gestural interface revolution takes hold. What are we, as designers and developers, going to observe and going to create? Don Norman offers some cautionary tales, some pitfalls, and some excellent design suggestions of which we should be aware, including “well-defined modes of expression,” “a clear conceptual model” of system interaction, “means of navigating unintended consequences,” and, of course, a way to undo . But beyond those, what are we going to do to get systems learning with us as we point, gesture, gesticulate, and communicate? As humans, we know that getting to know someone often involves a subtle mirroring of posture, the development of an interpersonal choreography of motion: I learn how you move and learn to move as you move, in concert with you, creating a subtle feedback loop of motion that signifies connection and intimacy. Will this happen with our technologies? And how will they manage with multiple masters and mistresses of micro-motion, of physical-emotional choreography? More prosaically, as someone out and about in the world, as digital interactions with walls and floors become commonplace, am I going to be struck by people pointing? (Who has not been run into by people looking at their mobile phone screens, walking without looking where they are going?) Am I going to be abashed or offended by their ways of pointing? Julie Rice and Stephen Brewster of Glasgow University in Scotland have been doing field and survey work on just this, addressing how social settings affect the acceptability of interactional gestures. What would people prefer not to do in public when interacting with digital devices, and how much difference does it make if they do or don’t know others who are present? Head nodding and nose tapping apparently are more likely to be unacceptable than wrist rotation and foot tapping .
And what happens when augmented reality becomes a reality and meets gestural interaction? Remembering that to thumb one’s nose at someone is the highest order of rudeness and the cause of many deadly fights in Shakespearean plays, I may not even be able to see what you are thumbing your nose at, and may assume, for lack of a shared referent, that it is in fact me, not the unseen, digital interlocutor at whom the gesture is directed. And finally, will our digital devices also develop subtle sensibilities about how a gesture is performed beyond simple system calibration? Will they ignore us if we are being culturally rude? Or will they accommodate us, just as the poor taxi driver in China did, forgiving us for being linguistically ignorant, and possibly posturally and gesturally ignorant, too? I confess: I don’t know if pointing with one index finger is rude in China or not. I didn’t have the language, spoken or body, to find out.
1. Deixis; http://en.wikipedia.org/wiki/Deixis
2. Noessel, C. and Shedroff, N. Make it so: Learning from SciFi interfaces. UX Week 2010; http://www.youtube.com/watch?v=JMlyO8F0jxg
3. Coleran, M. The reality of fantasy. UX Week 2010; http://www.youtube.com/watch?v=Ep4nLFjEu20
4. John Underkoffler was the designer of Minority Report’s interface. The g-speak tracks hand movements and allows users to manipulate 3-D objects in space. See also SixthSense, developed by Pranav Mistry at the MIT Media Lab.
5. John Underkoffler points to the future of UI; http://www.ted.com/talks/john_underkoffler_drive_3d_data_with_a_gesture.html
6. Norman, D. Natural user interfaces are not natural. 2010; http://jnd.org/dn.mss/natural_user_inter-faces_are_not_natural.html
9. The Urban Dictionary on the “Disney point”; http://www.urbandictionary.com/define.php?term=disney%20point
10. Etiquette in North America; http://en.wikipedia.org/wiki/Etiquette_in_North_America
Elizabeth Churchill is a principal research scientist and manager of the Internet Experiences Group at Yahoo! Research. She is also the Vice President of ACM SIGCHI. Her research focuses on social media.
If you are bewildered by the array of work on gesture-based interaction that has been published, it is useful to have a framework. Happily, one exists. In her Ph.D. thesis, Maria Karam elaborates a taxonomy of ways of looking at gesture-based interaction in the literature on human-computer interaction . Also, in a very readable technical report written with m.c. schraefel that draws on work by Francis Quek from 2002 and earlier work by Alan Wexelblat in the late 1990s, she lays out the following ways of carving up the literature: by application domain, by emerging technologies, by system response to input, and by gesture types . Gesture styles elaborated with examples are gesticulation, manipulations, semaphores, deictic, and language gestures.
©2011 ACM 1072-5220/11/0900 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.