Alan Schultz, J. Trafton
Imagine mobile robots of the future, working side by side with humans, collaborating in a shared workspace. For this to become a reality, robots must be able to do something that humans do constantly: understand how others perceive space and the relative positions of objects around themthey need the ability to see things from another person's point of view. Our research group and others are building computational, cognitive, and linguistic models that can deal with frames of reference. Issues include dealing with constantly changing frames of reference, changes in spatial perspective, understanding what actions to take, the use of new words and common ground.
Our approach is an implementation informed by cognitive and computational theories. It is based on developing computational cognitive models (CCMs) of certain high-level cognitive skills humans possess and that are relevant for collaborative tasks. We then use these models as reasoning mechanisms for our robots. Why do we propose using CCMs as opposed to more traditional programming paradigms for robots? We believe that by giving the robots similar representations and reasoning mechanisms to those used by humans, we will build robots that act in a way that is more compatible with humans.
Our foray into this area started when we were developing computation cognitive models of how young children learn the game of hide and seek . The purpose was to enable our robots to use human-level cognitive skills to make the decisions about where to look for people or things hidden by people. The research resulted in a hybrid architecture with a reactive/probabilistic system for robot mobility , and a high-level cognitive system based on ACT-R  that made the high-level decisions for where to hide or seek (depending on which role the robot was playing). Videos of the robot playing a game of hide and seek can be seen at www.nrl.navy.mil/aic/iss/aas.
While this work was interesting in its own right, the system led us to the realization that the ability to do perspective-taking is a critical cognitive ability for humans, particularly when they want to collaborate.
To determine just how important perspective and frames of reference are in collaborative tasks in shared space (and also because we were working on a DARPA funded project to move these capabilities to the NASA Robonaut), we analyzed a series of tapes of two astronauts and a ground controller training in the NASA Neutral Buoyancy Tank facility for an assembly task for Space Station mission 9A. We performed a protocol analysis of several hours of these tapes focusing on the use of spatial language and commands from one person to another. We found that the astronauts changed their frame of reference (as seen during their dialog) approximately every other utterance. As an example of how prevalent these changes in frame of reference are, consider this following utterance from ground control:
"... if you come straight down from where you are, uh, and uh, kind of peek down under the rail on the nadir side, by your right hand, almost straight nadir, you should see the..."
Here we see five changes in frame of reference (highlighted in italics) in a single sentence! These rates in the change of reference are consistent with work by Franklin, Tversky, & Coon, 1992 . In addition, we found that the astronauts had to take other perspectives, or forced others to take their perspective, about 25 percent of the time . Obviously, the ability to handle changing frames of reference and being able to understand spatial perspective will be a critical skill for robots such as NASA Robonaut and, we would argue, any other robotic system that needs to communicate with people in spatial contexts (i.e., any construction task, direction giving, etc.).
Imagine the following task. An astronaut and his robotic assistant are working together to assemble a structure in shared space. The human, who can see one wrench, says to the robot, "Pass me the wrench." Meanwhile, from the robot's point of view, two wrenches are visible, while the human has a partially occluded view and can only see one wrench. What should the robot do? Evidence suggests that humans, in similar situations, will pass the wrench that they know the other human can see since this is a jointly salient feature .
We developed two models of perspective taking that could handle the above scenario in a general sense. The first approach used the ACT-R/S system  to model perspective taking using a cognitively plausible spatial representation. The second approach used Polyscheme  and modeled the cognitive process of mental simulation; humans tend to mentally simulate situations in order to resolve problems.
Using these models we have demonstrated a robot being able to solve problems similar to the wrench problem. Videos of a robot and human in this task can be seen at http://www.nrl.navy.mil/aic/iss/aas/.
It is clear that if humans are to work as peers with robots in shared space, the robot must be able to understand the natural human tendency to use different frames of reference and to take the human's perspective. To create robots with these capabilities, we propose using CCMs, as opposed to more traditional programming paradigms for robots. First, a natural and intuitive interaction results in reduced cognitive load. Second, more predictable behavior engenders trust. Finally, more understandable decisions allow the human to recognize and more quickly repair mistakes.
1. J. G. Trafton, Alan C. Schultz, N, L. Cassimatis, L. Hiatt, D. Perzanowski, D. P. Brock, M. Bugajska, and W Adams, (2004). "Using Similar Representations to Improve Human-Robot Interaction," Agents and Architectures (in press), Erlbaum, 2004.
2. N. Cassimatis, J. G. Trafton, M. Bugajska, and A. C. Schultz (2004). "Integrating Cognition, Perception, and Action through Mental Simulation in Robots." Robotics and Autonomous Systems, 49(1-2), Elsevier, Nov. 2004, pp. 13-23.
8. Harrison, A. M., & Schunn, C. D. (2002). ACT-R/S: A computational and neurologically inspired model of spatial reasoning. In W. D. Gray & C. D. Schunn (Eds.), Proceedings of the Twenty Fourth Annual Meeting of the Cognitive Science Society (pp. 1008). Fairfax, VA: Lawrence Erlbaum Associates.
Alan C. Schultz is head of the Intelligent Systems Section, Navy Center for Applied Research in Artificial Intelligence at the Naval Research Laboratory (NRL) in Washington D.C. He has 18 years experience and over 70 publications in robotics, human-robot interaction, and machine learning, and is responsible for establishing and running the robotics laboratory at NRL. Mr. Schultz taught at the first IEEE/RAS Summer School on Human-Robot Interaction, and has chaired many conferences and workshops in robots and human-robot interaction. His research is in the areas of human-robot interaction, machine learning, autonomous robotics, and adaptive systems. email@example.com
Greg Trafton is a cognitive scientist at the Naval Research Laboratory. He received a B.S. in computer science (second major in psychology) from Trinity University in San Antonio, TX in 1989. He received a master's and Ph.D. in psychology from Princeton University in 1994. He is interested in putting appropriate computational cognitive models on robots to facilitate human-robot interaction. firstname.lastname@example.org
©2005 ACM 1072-5220/05/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc.