The problem of finding information in large volumes of imagery is a challenging one, with few good solutions. While most search engines allow users to find information in collections of text quite efficiently, there is a lack of similar solutions when it comes to searching for imagery. The problem is computers aren’t able to interpret imagery very well. They can’t deal with novelty, variability, or exploit contextual information and prior knowledge to the extent that humans can.
Unfortunately, most manual image analysis tools currently in use are inefficienttapping into slow and deliberate cognitive processes. Most image search and analysis tools do not exploit the reliable split-second perceptual judgments that people make all the timethink of returning a tennis serve or reacting to an obstacle on the highway while driving. The question we have been asking is whether we can tap into these fleeting perceptual judgments, in order to find visual information within large image sets efficiently.
Split-Second Perceptual Judgments
Our efforts have relied on a combination of the rapid serial visual presentation (RSVP) presentation technique and the event-related potential (ERP) signal detected using electro-encephalograph (EEG) sensors. We have largely focused on broad area image analysis, a domain where users have to extract critical information from large collections of high-resolution satellite imagery. In our approach, broad area images, spanning tens of thousands of pixels in width and height, are decomposed into a grid of image chips a few hundred pixels wide and tall. These chips are presented to users in high-speed bursts, anywhere from 10 to 15 chips per second. A set of head-worn EEG sensors record neural responses to each chip presented to the user. Images that elicit an ERP signal are classified as targets.
The ERP signal is thought to reflect the activity of both “bottom-up” perceptual processes and “top down” interpretive processes that play a role in recognizing and coordinating a response to critical task-relevant information. The ERP signal originates in the visual areas of the brain, and within a few hundred milliseconds, propagates to frontal areas responsible for interpretation. The ERP signal has several advantages over an overt response. The ERP signal precedes an overt physical response by several hundred milliseconds. It also exhibits far less variability than an overt physical response since it does not require the deliberation needed to initiate a physical response.
Detecting the ERP Signal
Reliable detection of the ERP signal is difficult; background EEG is often an order of magnitude higher in amplitude than the ERP signal. Overcoming this signal-to-noise problem requires signal-processing steps to minimize the impact of noise artifacts and pattern-recognition techniques to reliably identify the signal. Pattern-recognition algorithms estimate the likelihood of a given image being a target given the pattern of neural activity associated with each image. These probability estimates can be used to produce a probability map, which is overlaid in the original broad area image to produce target hotspots. The analyst can confirm or rule out the presence of targets by zooming into these hotspots.
The search technique described has been evaluated in experiments with both professional image analysts and users drawn from the general population. Users searched for a variety of targets in satellite imageryincluding ships, surface-to-air missile sites, oil-storage depots, and golf courses. Compared with conventional broad area image analysis, the neurally driven search technique has produced a sixfold speed-up in search efficiency with no loss of accuracy.
While the approach just described has shown promise as a way to boost the efficiency of searching for information within imagery, there are several important issues that have to be addressed. As a manual search technique, performance hinges on human perceptual abilities. Targets embedded in cluttered scenes can be difficult to detect. Targets are also difficult to detect as they move away from the center of a user’s field of viewthere is little time for a user to search within a chip during the brief presentation duration of each image. Performance also suffers if chips are at an inappropriate scale with respect to targets of interest. We are currently developing computer vision techniques to raise the salience of features associated with targets. Computer vision techniques can also be used to detect segments of imagery where target detection may be challengingpresentation rates can be manipulated based on estimates of detection difficulty. Other factors that can have an impact on target performance include distractions and lapses of attention. We are currently working on identifying EEG signatures associated with these states. Images processed during periods of low attention could be marked for later review.
From Clinical Tool to HCI Modality
Until recently, the use of EEG sensor technology was largely limited to clinical contexts. However, researchers are now exploring a broad range of neurally based HCI application areas, including input mechanisms, interruptibility estimation, and usability evaluation. A variety of factorslower sensor costs, practical form factors, robust signal processing, and classification algorithmshave all combined to bring EEG technology out from the obscurity of the clinical laboratory to numerous real-world task contexts. It may not be too long before you find yourself interacting with your computer via neural interface.
Santosh Mathan is a researcher in the Human Centered Systems Group at Honeywell Laboratories in Redmond, Washington. His research focuses on the development of techniques for estimating cognitive state in challenging real-world application contexts. He is currently principal investigator on the DARPA funded Neurotechnology for Intelligence Analysts program. He is also exploring the feasibility of using sensor-based workload estimates for usability evaluation. Santosh has a Ph.D. in human computer interaction from Carnegie Mellon University.
©2008 ACM 1072-5220/08/0700 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2008 ACM, Inc.