Candace Sidner, Christopher Lee
At Mitsubishi Electric Research Laboratories, we are investigating the role of hosting activities with humanoid robots. Hosting activities are activities where an agent in an environment provides services, particularly information and entertainment services. They include tasks such as being a docent in a museum or host in a laboratory, and greeting and guiding visitors in stores and shopping malls. This research is driven by three scientific questions:
- How do people convey connection both verbally and non-verbally in their interactions with one another?
- Given what is learned from the above, how can robots convey engagement in interactions with people?
- Under the constraints of current robotics, vision and language technologies, can humanoid robots act as hosts to people and successfully engage with them?
Our robot behavior has been informed by study of human-to-human interactions involving laboratory hosting . From videotapes of people hosting visitors to our lab, we have extracted an understanding of the engagement process, the process by which people begin, maintain and end their perceived connection to one another. While the ability to converse and collaborate are central to this process, non-verbal behaviors are also criticalgetting them wrong is likely to cause a human to misinterpret the robot’s behavior.
In our study we focused on the use of the face to look at another or the surrounding environment as a feature of engagement. In general, looking at another conveys engagement, while long looks away indicate a decline in maintaining engagement. However, since people cannot and do not always look at their conversational partners, even when engaged, what signals their engagement? Experience from that study has resulted in the principle of conversational tracking: participants in a collaborative conversation track the other’s face during the conversation in balance with the requirement to look away to participate in actions relevant to the collaboration, or multi-task activities unrelated to the collaboration at hand, such as scanning the surrounding scene for interest, avoidance of damaging encounters, or personal activities.
Lessons learned from the human-to-human study are used in developing a robot acting as a laboratory host to a visitor in our lab. The MERL robot, named Mel, depicts a penguin and is illustrated in Figure 1. It is a seven-degrees-of-freedom robot, with one dof in its beak, two in each wing, and two in its head. It uses a stereoptic camera for viewing its human partner, and a single lens camera for other vision. It does not move about, but, from a stationary position, collaborates with a human visitor to demonstrate laboratory equipment called IGlassware. The equipment with the robot is shown in Figure 1. The hosting process includes beginning the interaction (finding a person to host), undertaking a hosting activity collaboratively with the person, and ending the interaction when the goals of host and visitor are met.
The MERL robot combines sensory input from a vision system , and a sound location system (for finding a person to talk to), and provides the fused sensory information to the cognitive component. That component consists of a speech recognition system providing input to the Collagen system to interpret utterances and participate in conversations and collaborations. The Collagen system was developed at MERL for use with collaborative interfaces . The robot also has motor output to control its head and limb movements during the conversation, so that it can track the human visitor’s face, point to objects during the demo, direct its attention to or away from the visitor and indicate beat movements Figure 1. It is a seven-degrees-of-freedom robot, with one dof in its beak, two in each wing, and two in its head. It uses a stereoptic camera for viewing its human partner, and a single lens camera for other vision. It does not move about, but, from a stationary position, collaborates with a human visitor to demonstrate laboratory equipment called IGlassware. The equipment with the robot is shown in Figure 1. The hosting process includes beginning the interaction (finding a person to host), undertaking a hosting activity collaboratively with the person, and ending the interaction when the goals of host and visitor are met.
The MERL robot combines sensory input from a vision system . Motor control is partially determined by information from the conversational state because the robot chooses some of its movements based on the conversational process.
Our robot tracks its human visitor with its vision system. In contrast, the robot does not expect its human visitor to track the robot closely, in part because a robot is not a human counterpart. However, the robot does encourage the human partner to look at demo objects during their conversational interaction and can determine if the human has done so. The robot also expects that the human visitor will take appropriate turns in the conversation, and when the human fails to do so, it queries about the human’s interest in continuing the demo.
We have conducted several experiments with human visitors interacting with the robot. They collaborate (with no training) in a laboratory demonstration with the robot. Our first experiments were used to judge the success of such demos and the robot’s engagement abilities . A sample conversation is found in Figure 3, although the human’s and robot’s gestures are not described in the figure.
More recent experiments have focused on improving the robot’s ability to understand nodding in conversation .
Does the robot’s engagement gestural behavior have an impact on the human partner? The answer is a qualified yes. While details can be found in , in summary, all participants could perform the lab demo successfully in a conversational manner. They reported liking the robot and the interaction. A majority of participants in two different conditions were found to turn their gaze to the robot whenever they took a turn in the conversation, an indication that the robot was real enough to be worthy of conversation. Furthermore, participants in a condition where the robot moved in ways discussed above looked back at the robot significantly more often (than a control group interacting with a non-moving robot) whenever they were attending to the demonstration in front of them. The participants with the moving robot also responded by following the robot’s gaze at the table somewhat more than the other participants. A participant interacting with our robot is shown in Figure 2.
Future plans for our robot include further experiments concerning nodding behavior. Recently Mel has become mobile. Using mobile Mel, we plan to explore the robot’s use of locomotion to find a person to interact with, as well as to understand how body stance during an interaction contributes to conveying engagement.
The authors wish to acknowledge the work of Charles Rich and Neal Lesh on aspects of Collagen and Mel, and Cory Kidd in experiments on Mel with people.
1. C.L. Sidner, C.H. Lee, and N. Lesh, Engagement when looking: behaviors for robots when collaborating with people, Diabruck: Proceedings of the 7th workshop on the Semantic and Pragmatics of Dialogue (I.~Kruiff-Korbayova and C.Kosny, eds.), University of Saarland, 2003, pp. 123130.
2. C. Rich, C.L. Sidner, and N. Lesh. "COLLAGEN: Applying Collaborative Discourse Theory to Human-Computer Interaction," AI Magazine, Special Issue on Intelligent User Interfaces, AAAI Press, Menlo Park, CA, Vol. 22: 4: 15-25, 2001.
4. C. Lee, N. Lesh, C. Sidner, L. Morency, A. Kapoor, T.Darrell, "Nodding in Conversations with a Robot," in Proceedings of the ACM International Conference Human Factors in Computing Systems, ACM Press, 2004.
6. J. Cassell. "Nudge nudge wink wink: Elements of face-to-face conversation for embodied conversational agents," in Embodied Conversational Agents, J. Cassell, J. Sullivan, S. Prevost, and E. Churchill (eds.), Cambridge, MA: MIT Press, 2000.
Candace L. Sidner is a senior research scientist at Mitsubishi Electric Research Labs. Her research concerns natural language processing and human-computer collaboration. She is a fellow and past councillor of the American Association of Artificial Intelligence, a senior member of IEEE, and a past president of the Association for Computational Linguistics. She received a Ph.D. in Computer Science from MIT. firstname.lastname@example.org
Christopher Lee is a Visiting Scientist at the Mitsubishi Electric Research Laboratories in Cambridge, MA. He received the degree of Ph.D. in Robotics from Carnegie Mellon University in 2000, and was a Postdoctoral Associate at the Massachusetts Institute of Technology from 2000-2002. His research is motivated by the potential of robots to work with and to learn from people. email@example.com
©2005 ACM 1072-5220/05/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc.