Neel Patel, Darin Hughes
Human-computer interfaces (HCIs) have lead to enormous advances in both computing and everyday life. Fields from finance to medicine are adopting new forms of interaction between humans and computers to help enable faster and more accurate decisions. Scientists are benefiting from the ability to interact with their data, with the computer as an intermediary, in the form of graphs and charts; such interactions help researchers to better understand the data itself.
Though the uses of computers are innumerable, data manipulation and display are prominent among them. Whether a doctor is examining a patient's heart-rate variance or an economist is studying long-term market fluctuations, the computer is a tool that allows users to extract meaning from data. With today's HCIs, this usually entails looking at visual displays of data, from line graphs to scatter plots. Data display has certainly advanced since the pioneering graphical user interfaces of Xerox's Palo Alto Research Center, but even the modern version still relies almost entirely on visual data representation.
One may not immediately see the drawbacks of purely visual displaysindeed, they have served us very well until now. Why should they suddenly need to change? The answer is twofold: large amounts of data and the physical and cognitive limitations of visual displays.
A single desktop monitor can display only a limited amount of data, and even 20 monitors arranged in a rectangle will eventually reach their limits. Viewing large amounts of data can lead to visual fatigue and overload, causing a user to miss patterns and meanings [1, 2]. Physically, a display can limit users' mobility or distract them from the task at hand. Consider a doctor performing a delicate surgery. The doctor must continually be apprised of her patient's health but must also focus on the surgery itself. Moreover, visual displays represent only one method of thinking. A line graph may reveal much more information than a table, but other sensory modalities could hold even more potential for increasing meaning extraction and decreasing cognitive load.
The solution to these problems is to move beyond purely visual human-computer interfaces. Over the past three years, our team has researched the use of nonspeech sound to represent information in HCIs, a process called sonification. Imagine the possibilities of such auditory and visual HCIs: Researchers could examine larger amounts of data using the two senses combined; soldiers could receive crucial data in combat without pausing to look at a display; and the visually impaired could interact with data for the first time. In addition to taking data display to new areas, sonification may also let us think about data in entirely new ways, extracting previously hidden meanings by examining data in a different cognitive context.
Before the benefits of sonification can be realized, however, we must first understand how to design sonifications based on how we comprehend them. Visual displays have benefited from decades of research in visual graph comprehension theory, covering everything from working memory and graph variables to the process of extracting meaning from graphs . Our team has focused on extending this field into the auditory realm, from the roots of the comprehension process to the accuracy of various forms of comprehension.
Our first step was to study the theoretical applications and ranges of sonification. We studied the barely noticeable difference of various auditory variables, including pitch, intensity, and tempo. Our goal was to look at the number of different values a sound could take while still being distinguishable from another sound and within normal hearing ranges. As it turned out, pitch had a range of 140 possible values, intensity could take one of 41 levels, and tempo had up to 18 possibilities. Is this enough, we asked?
Our goal was to compare the potential viability of sonifications to that of visualizations, so we compared the potential values of auditory variables with those of visual variables. The visual characteristics of color, size, and flashing rate have 107, 77, and eight possibilities respectively, showing the dimensions of sound hold just as many, if not more, possibilities for data display as those of visualization.
We wanted sonification to eventually have its place in the real world, creating tangible benefits and affecting a wide range of people. Before going on to human testing of sonification, we needed to study the sonificability of common, everyday data sets, or how well these data sets could be converted into sound. More than 150 graphs were selected from publications ranging from National Geographic Kids to The Economist. For each graph, a sonificability score was assigned based on the number of ways the data could be represented using audio, with a score of 0 indicating that it was impossible to sonify the graph, and a score of 3 meaning the graph was easily sonifiable in a number of ways.
The results showed that 3 percent of graphs had a sonificability score of 0; 34 percent had a score of 1; 14 percent had a score of 2; and 49 percent had a score of 3 (see Figure 1). To study the relationship between the complexity of the source material and how well the data could be sonified, the Gunning Fog Index (GFI) was calculated for each publication. The GFI approximates the "grade level" of a passage, or how many years of education are needed to comprehend it. This analysis revealed the interesting fact that sonificability decreased only slightly as GFI increased, without a significant difference between publications in different GFI ranges. With the theoretical viability of sonification demonstrated, it was time to take the next step and start listening to our dataliterally .
To help us study how humans responded to sonifications, we needed a fast and easy way to create the sonifications based on user-specified parameters. We created a tool that let us import a data set, choose a range of data and sampling rate, and then created a sonification in MusicXML format, playable by a simple open-source MusicXML player. Using a set of 12 pitch-based sonifications, we began testing basic comprehension and reaction time.
The very basic type of comprehension was pattern matching. Subjects received a sonification and four visual graphs and were asked to match the sonification to the correct graph. To do this, listeners had to extract key patterns from the sonification (a distinctive peak in pitch at the end of the pattern, for example) and match it to a similar pattern in one of the graphs (which would show a visual peak near the end). A total of 12 trials was used, each with a different level of difficulty. Difficulty was measured in terms of the similarity of graphs in the trial; finding the most similar graph to a sonification becomes much harder as the graphs become increasingly similar themselves.
The comprehension test was presented to 70 untrained listeners from various demographic groups. The results (see Figure 2) indicated a 60 percent accuracy rate overall, significantly higher than the 25 percent expected through random guessing alone. A measure called visual gap index (VGI) was calculated for each trial based on the visual similarity of the four visual graphs, with a high VGI indicating a large visual difference between graph choices. Accuracy remained high even with similar graphs, a testament to the intuitiveness of highlighting distinctive patterns in a sonification and matching them to a visual graph. However, results indicated that listeners could not identify the relative location of a pattern within a graph; though they could easily identify a distinctive pattern, if the same pattern was placed at a slightly different position in each graph, accuracy plummeted. This indicates the temporal-to-spatial translation of data may require further training or exposure to sonifications.
The next step tested listener reaction time and the ability to discern a relevant pattern from background "noise." Subjects were first trained with a specific pattern to memorize either one, five, or nine tones in length. They then received a 20-minute stream of randomly generated tones and were asked to identify each time their specific pattern was played.
Subjects in the reaction-time test correctly identified their pattern within one second 82 percent of the time, with an average reaction time of 0.3 seconds. Interestingly, differences did exist between the patterns of various lengths. Possibly due to being long enough to not miss, but short enough to quickly recognize, the five-tone-long pattern had the highest accuracy rate overall (see Figure 3). Additionally, a slight decrease in accuracy over the course of the 20-minute test tells us that continuous streams of data should be avoided to maximize accuracy of pattern recognition without cognitive fatigue or loss of vigilance .
Our final phase of experimentation looked at a much more involved and relevant form of comprehension: replication accuracy. Subjects received a sonification and were asked to draw a visual graph representing the same pattern. Because subjects must now extract and process the entire pattern, rather than simply identify a small sub-pattern, as in the first comprehension test, this experiment is much closer to a "real" application of sonification. Beyond simply testing the accuracy of comprehension, we also aimed to study the cognitive context of comprehension and all of the mental processes that make comprehension possible.
What we really want to study is the process of comprehension itself. However, because comprehension is a relatively intangible process without invasive brain-scanning equipment, we used the subject's visual replication as a proxy to measure the accuracy of comprehension itself. The subject must first mentally process and extract meaning from the sonification (the comprehension task) and then create a visual replica of the same pattern (the replication task). Through this novel two-phase model of comprehension, we were able to study the core processes of sonification processing for one of the first times.
In a group of 50 subjects, each was presented with 12 sonifications during the experiment, and each subject was randomly assigned to the control or experimental group. The control group drew their graphs on only blank slides, meaning they listened to sonifications only in reference to a blank visual context. The experimental group, however, received four blank slides, four slides with x- and y-axes, and four slides with axes and grid points. They then listened to the sonifications in reference to these cognitive contexts, using the same pattern-extraction techniques on the sonification as they would with a visual graph.
As shown in Figure 4, for each sonification, most of the subjects' drawings were more than 50 percent accurate. The overall accuracy rate, as measured by comparing the subjects' (normalized) drawn graph with the original data set, was 76 percent, demonstrating a very high degree of comprehension as well as the viability of sonification in real-life scenarios. The experimental group tended to have higher accuracy rates than the control group, indicating that comprehension of sonifications does take place in the same cognitive context, and using the same methods of pattern extraction as in visual graphs. This is one of our first clues as to how we extract data from sonifications; hopefully, such findings will let us design sonifications in terms of how they are perceived.
Sonifications represent a new way of thinking about and extracting information from data. They open a new frontier in human-computer interfaces including the possibility of augmenting typical screen-based interfaces or even expanding HCIs into previously impossible realms. Even more important, sonifications are a new way of interpreting the data itself and may allow us to uncover previously hidden meanings by changing the very way we think.
The field of sonification is still relatively young, and we hope this article spreads knowledge of its potential Examples of practical uses of sonification in HCIs already exist ranging from basic navigation interfaces for the visually impaired to complex uses of ambient (background) sonification of stock indices . Sonification is quickly becoming a relevant topic for those involved in HCI design, and we hope the next five to 10 years bring breakthroughs on par with the development of visual GUIs. Let's hear what the future holds.
The material presented in this article is based on work supported by the National Science Foundation (DRL0840297). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
6. Georgia Tech Sonification Lab; http://sonify.psych.gatech.edu/research/
Neel S. Patel is a researcher at the University of Central Florida's Institute for Simulation and Training. He is interested in innovative human-computer interfaces, the physiology of perception, and inquiry-based educational simulations.
Darin E. Hughes is research faculty at the University of Central Florida's Institute for Simulation and Training. His research interests include auditory perception human-computer interfaces, educational simulation and gaming and mixed and augmented reality. Hughes holds degrees in children's literature, information technology, and modeling and simulation.
©2012 ACM 1072-5220/12/0100 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2012 ACM, Inc.