When people communicate with each other about spatially oriented tasks, they more often use qualitative spatial references rather than precise quantitative terms. For example, verbal instructions may include phrases such as "hand me the wrench in the top drawer of the toolbox," or "go through the double doors and turn left at the elevator." These examples include the spatial references of in, top, through, and left. Maps may also be used to provide navigation directions, but these are often hand-drawn informally as sketches. Such route maps are typically not drawn to scale but include qualitatively accurate landmark positions and turns so that the user can reach the destination.
Although natural for people, such interfaces are problematic for robots that "think" and move in terms of mathematical expressions and numbers. Yet, providing robots with the ability to understand and communicate with these spatial references has great potential for creating a more natural interface mechanism for robot users and will allow them to instruct and interact with a robot much as they would with another human.
Motivated by this vision, we have been working on human-robot interfaces that utilize qualitative spatial referencing in spatial language as well as in sketching interfaces. The spatial referencing is based on Matsakis' histogram of forces, which capture a quantitative model of the spatial relationship between two objects .
The language developed for robots is an extension of a system originally created for automated scene analysis of images. A representation of the environment is built using range sensors on a mobile robot. A robot-centric view is created by computing force histograms between each environment object sensed and the robot. After computing the histograms, features are extracted and fed into a system of fuzzy rules  which output a three-part linguistic expression: a primary direction (e.g., "mostly to the left"), an optional secondary direction (e.g., "and a little forward"), and an assessment of how well these expressions satisfy the relationship. If the assessment is poor, other relationships are investigated, such as different levels of being surrounded (e.g., "surrounded on the left," "surrounded with an opening on the right," and "completely surrounded") . A qualitative range expression is also generated for each object, based on the shortest distance between the object and the robot (e.g., close, very close, etc.). In addition, objects may be grouped and a coarse linguistic phrase generated to describe the group (e.g., "there is a group of objects in front") .
Spatial linguistic terms are also used to direct a mobile robot . The robot may be instructed to position itself to the left or right of an object or to the front or rear of an object. The robot may also be asked to find the nearest object on its right, left, front, or rear or any diagonal direction such as right-front. In addition, the robot can be instructed to go between two objects. All of these expressions involve an algorithm that computes the target-position point for the robot with respect to the robot's acquired environment map.
Using these simple concepts in a human-robot dialog, the user is able to inquire about detected landmarks and direct the robot around its environment with respect to those landmarks . The dialog also serves to overcome limitations in the robot's vision system by using the spatial referencing language to name environment landmarks. For example, the user can tell the robot, "The closest object on your right is a desk," and then say, "Go behind the desk." Figure 1 illustrates an example.
In creating such an interface, many supporting components must also be developed. The system described here utilizes the multimodal framework developed at the Naval Research Lab , which includes natural language understanding, an evidence grid map, planning and obstacle avoidance. One challenge in supporting the robot spatial language was to modify the semantic representations of the language system to support generalized expressions for different spatial references. See  for details.
One of the biggest challenges in the work has been to design and run user studies that capture human interpretations of spatial references. We have had to investigate studies in psychology and computational linguistics, which are not traditionally studied by robotics engineers. Developing the algorithms to compute the spatial references has not been as difficult as figuring out what references a person would use and how they would be interpreted. Our studies have identified some interesting variations and cultural differences that affect the spatial referencing .
Another challenge has been to incorporate vision-based object recognition. To make the interface more realistically usable, the robot should be able to reliably recognize at least some environment objects and landmarks and be able to learn to recognize additional objects using the dialog mechanism. To this end, we have been investigating the SIFT algorithm proposed by Lowe . Scale-invariant keypoints are computed to represent an object and are matched against those stored in a database. An object may be learned using only one example image.
We are currently working on extensions to add algorithms for more relations and also to support 3D language. The SIFT algorithm is used with stereo vision to compute 3D positions of keypoints. The extensions will be used for both mobile robots and humanoids to support expressions such as, "Find the cup on top of the desk to the right of the computer." Notice that the horizontal relation (right) is combined with the vertical relation (on top).
We anticipate that adding support for 3D spatial language will dramatically extend the usefulness of the robots and the potential applications. For example, NASA's humanoid robot, called the Robonaut, has been designed to function as an assistant working side by side an astronaut. Providing spatial referencing language to the Robonaut will allow an astronaut to speak requests such as "put the beam between the two structures," or "tighten the bolt on the left." This type of interaction may also be applied to service robots aiding the elderly or the disabled, such as, "Pick up the book on top of the desk and bring it to me," or "Look to the left of the telephone for my address book."
The spatial referencing system may be applicable for directing unmanned military vehicles using high-level commands, whether the commands are issued through language or another medium. Being able to interact with a robot at a high-level allows the user to concentrate on the strategic aspects of the mission without being burdened by manual control as in teleoperation.
The spatial referencing language, with SIFT object recognition, is also being integrated into a test platform for evaluating a cognitive model of working memory . The main focus of the project is to evaluate whether the working memory cognitive model can aid robots in learning and focusing on the most essential information necessary to carry out a task. However, the system could also be used for testing other cognitive theories to see how well they scale up in realistic environments with real sensory uncertainties.
We have also developed a sketch-based, PDA interface for mobile robots, in which the user draws a route map directing the robot through an environment of landmarks to a target destination. The user first sketches an approximate layout of the scene, drawing closed polygons to represent landmarks. Then a path is sketched through the field of landmarks to represent the robot's specified route. Editing commands are supported so that the user may delete, move, or label landmarks .
When the sketched route map is finished, the user transmits it to a server where it is analyzed for spatial references. A robot path is extracted in the form of a sequence of path segments, each containing a set of referenced landmarks with an associated turn (e.g., "Turn left when the library is on your right").
The sketch is analyzed by moving a virtual robot along the sketched path and computing force histograms between each sketched landmark and the virtual robot at the major turning points. Features are extracted from the histograms to form a representation of a qualitative landmark state that the robot attempts to match during execution of the route .
Thus far, we have been able to illustrate route extraction and successful navigation for sketches similar to Figure 2. In a user study, 21 participants were asked to sketch route maps of a given scene. The sketches were processed, and five test runs per sketch were made with the robot in the same environment configuration. The robot successfully reached the target goal in 71 percent of the runs.
In analyzing the failures from the above test, we have found that the landmark matching algorithm works very well, but currently, the route extraction algorithm works only in fairly accurate sketches. This is aggravated by the limited screen size and resolution on the PDA. Our current efforts are directed at improving the sketch analysis. This is especially challenging, as we are not using landmark recognition but only looking for a configuration of obstacles. Incorporating vision-based landmark recognition (e.g., with SIFT) would relax the need for accurate extraction of the qualitative landmark state.
Another challenge has been reliably recognizing sketched symbols on the PDA. To address this challenge, we have been working on a trainable, HMM-based symbol recognizer that is invariant to orientation . This is necessary to support symbols such as arrows that may point in any direction. However, while the method is robust, a model of the symbol must be trained for each drawing order.
We have targeted defense applications for the sketch-based interface, in controlling unmanned ground vehicles. While the military prefers to have a map of the environment, often an existing map may be out of date, either because new landmarks have been built or old landmarks have been removed. In these situations, an on-site sketch can provide quick direction for manned, as well as unmanned, operations. Geospatial intelligence may also use sketches as a means of updating a GIS database. In addition, the sketch-based platform may provide an interesting interface for computer games.
Professors Jim Keller (MU) and Pascal Matsakis (University of Guelph) and MU students Sam Blisard, George Chronis, Craig Bailey, and Derek Anderson have contributed to this work. Funding has been provided by the National Science Foundation (EIA-0325641) and the U.S. Naval Research Lab.
3. Skubic, M., Matsakis, P., Chronis, G. & Keller, J. (2003) Generating Multi-Level Linguistic Spatial Descriptions from Range Sensor Readings Using the Histogram of Forces. Autonomous Robots, 14(1), 51-69.
5. Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M. & Brock, D. (2004) Spatial Language for Human-Robot Dialogs. IEEE Trans. SMC, Part C, Special Issue on HRI, 34(2), 154-167.
Dr. Marjorie Skubic is an associate professor at the University of Missouri-Columbia. With joint appointments in Electrical and Computer Engineering, as well as Computer Science, Dr. Skubic's research interests are in robotics and sensory perception, with a special focus on human-robot interaction and technology for eldercare. firstname.lastname@example.org
©2005 ACM 1072-5220/05/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc.