Paul Robare, Jodi Forlizzi
John Cage is said to have once sat in an anechoic chamber for some time. Upon exiting, Cage remarked to the engineer on duty that after some time he was able to perceive two discrete sounds, one high pitched and one low. The engineer then explained that the high-pitched sound was his nervous system and the low his circulatory system. There really is no escaping sound.
Recently, one of us embarked on a similar experiment with our desktop computer. A surprising number of sounds emanated from the machine: the whirr of the fans, the clicking of the drives, and a whole suite of sounds from the interface, which had previously gone unnoticed. Even more surprising, many of these sounds play informational roles. For example, the fans speed up when the processor is doing double time; the quality of sound changes just before the graphic interface provides an alert.
Though it rarely receives much formal consideration, sound has been a part of computing for as long as digital computers have existed. These days, many designers must make decisions regarding the use of sound in products at some point in their career, but there are few resources regarding how sounds can and should be used. As a result, sound is generally underutilized by designers and underappreciated by users. To help establish a framework for understanding sound in digital products, this article briefly traces the historical use of sound in computing.
Many of the earliest computers were equipped with a speaker (known as a "hooter") that could sonify the machine's operations. The hooters were wired directly into a mainframe's accumulator (or other likely spot); they produced rhythmic noises that engineers and operators could passively monitor. In the event of a bug, the noises would change. The interaction paradigm used was thus quite similar to that of a car engine: The user passively monitors a steady-state noise for any change in sound that might signal malfunction.
It was not long before smart people figured out that if it could make noise, it could make music. In 1951 the Mk1 (a hulking mainframe in Sydney, Australia), was used to give the world's first public computer music performance. According to the University of Melbourne, "The sound production technique used on the CSIR Mk1 was as crude as is possible to imagine on a computer.... However, this occurred when there were no digital-to-analog converters, there was no digital audio practice and little in the way of complete digital audio theories ."
Musicians, of course, cannot leave a new instrument alone once discovered and thus continued to invent new ways of producing music with computers. In 1964 or 1965, for instance, undergraduate students at Reed College discovered a way to coax music out of the IBM 1620 and later the faster IBM 1130 (a table-sized machine with a whopping 8K of memory and a punch card only interface). The computers generated RF interference that could be picked up and sonified by a nearby radio. A student named Peter Langston developed a way to use this "feature" to produce music; and a friend named Lenny Schrieber wrote an algorithm to produce a specific frequency N by writing code that branched to a new cycle every k/N cycles (k being some constant). Langston recalls that during this time a duet between a classical violinist and the computer was performed for a local news program.
Though the 1970s saw great strides in computing technology, researchers still gave sound little formal thought. Most computer systems of the time relied on centralized computing resources (due to the high cost of hardware), so users were often provided only a green screen text terminal. Most such terminals had the ability to produce a beeping sound and would do so whenever a process completed, but otherwise, they did not utilize sound. This use of sound as a functional alert may have been a carryover from 1960s Teletype machines, which would ring a bell upon completing a transmission.
The important human-factors work of B.H. Deatherage, published during this period, remains relevant today. In 1972 Deatherage proposed the following set of criteria for deciding when to use audio rather than visual displays (see Table 1).
Though Deatherage was primarily concerned with human-factors issues related to the design of human-operated "equipment," his guidelines are a useful starting point for designers considering functional sound in interfaces today.
The most important practical development in the use of sound in the 1970s came from games, which, beginning with Atari's Pong, utilized sound as a standard element. What is particularly interesting about Pong is that its sounds performed absolutely no function and made no obvious aesthetic contribution. Pong could be played equally well with the sounds turned off; the noises were little more than a monotonous beeping. Pong thus represented a new form of sound aesthetic in computing: sound as character (or branding). Though Pong's sounds were unhelpful (and potentially grating), they were a key element of the Pong experience. Modern examples of this are the startup sounds used by the Apple and Windows operating systems.
As technology improved, sound rapidly took on a larger aesthetic role in video games. The simple music of Space Invaders, which debuted in 1978, sped up as the game progressed, creating a purposefully heightened state of anxiety in the player. Abstract sound was thus used to create emotional responses (much as film scores had been doing for decades).
The 1980s marked a new era for sound and computing. With faster microprocessors and cheaper memory, the ability to record, play back, and synthesize sound became widespread, stirring greater interest in its use. Accompanying a broader interest in human-computer interaction, significant attention was given to sound and computing in the 1980s and '90s.
In 1982 the Commodore 64 was released and became the best-selling personal computer of all time. A built-in sound chip allowed it to produce a greater range and depth of sounds than many of its consumer-oriented predecessors; it proved a popular platform for early computer games. As a result, computer and video game music became an entire genre in and of itself. Sound in computer games also began to take on more than a purely aesthetic role: Music often alerted players to a change in the game (such as a level timer running out), and sounds with an identifiable source were used to signal off-screen events. In 1984 the classic Epyx game, Impossible Mission, even used musical puzzles as game elements, thus bringing the use of sound back into the functional realm.
Also in 1982, Sarah Bly presented her seminal work on information sonification to the Conference on Human Factors in Computing Systems. Bly's dissertation focused on whether or not sound could be used to assist people in the exploratory analysis of multivariate data sets. In particular, Bly was interested in how sound might help users find patterns in data that contained too many variables to be accurately displayed with a purely visual computer interface. Bly's work was especially notable as one of the first investigations to experimentally determine if sound representation could, in fact, provide useful information. Bly found that sonic representation did indeed improve people's ability to find patterns in the data and marked the beginning of sound as display, a new functional role .
An important development at Apple Computers in the mid-1980s was Bill Gaver's SonicFinder, an aural layer on top of the Macintosh's graphical interface. SonicFinder was an effort to provide a greater depth of information than could be delivered by the visual interface alone without a risk of information overload. The SonicFinder added sounds to the interface based on the theory that "it is better when possible to map the attributes of computer events to those of everyday sound-producing events ."
SonicFinder, probably the most thorough implementation of "auditory icons" to date, offered a new paradigm for sound design. As Buxton noted, Gaver's "intention is to design user interfaces that use the same skills employed in everyday tasks such as crossing the street.... His use of sound is based on a theory of sources. That is, what is important is what you think made the sound, rather than the psychophysical properties of the sound itself." Gaver's use of sound was thus a functional use of recognizable sounds to augment the GUI, representing a new theory of how sound elements should be designed. Asked whether he would use SonicFinder on his current computer, Gaver says " would be happy to have it on my own computeras long as it was very quiet, which is how it always worked best!"
An important paper by Blattner, Sumikawa, and Greenberg introduced the idea of earcons in a 1989 issue of Human-Computer Interaction. Earcons use single pitches or rhythmic sequences of pitches to represent information. Unlike auditory icons, earcons have no inherent meaning; they are constructed using musical principals and must be learned. Earcons were thus the abstract (and potentially more aesthetically pleasing) complement to Gaver's recognizable sounds: functional and capable of representing several layers of information simultaneously, but without inherent meaning.
In the 1990s, as personal computing continued to advance technologically and interaction design emerged as a distinct discipline, sound became truly pervasive in computers and other digital devices. Video games spurred the development of a new art of sound design, combining music with Foley effects and dynamically triggered audio events to produce entire auditory worlds. As more personal computers were produced with built-in surround sound, games took advantage of this technology to fully spatialize audio, resulting in games that cannot be played without sound.
Since 1992 an International Conference on Auditory Display (ICAD) has occurred annually. The first conference's organizer described ICAD as having grown "out of a desire to pull together researchers in auditory data display and to fill this mutual need for sharing results, stimulating new ideas, and identifying the field as a whole." Judging from the published proceedings, which included an appendix of informal comments regarding the conference, some attendees were disappointed in the state of the field. Sarah Bly stated that "it was discouraging that we've not made greater advances in the area of sonification," and Bill Buxton wrote, "I continue to be disappointed at the lack of science and experimental/human validation of so much of the work." By 1997, however, members of ICAD wrote that "[t]he question is no longer whether it works, or even whether it is useful, but rather, how one designs a successful sonification." The discussion thus moved from questions of whether or not sound had a legitimate functional role to a focus on design methods. ICAD continues to serve as the primary forum for presenting research in the area .
The Macintosh and Windows operating systems of the 1990s began allowing users to customize the sounds generated by desktop interactions. Users could choose the sounds associated with various events and (perhaps most important) disable sounds completely. Interestingly, few of the sounds produced by the Apple and Microsoft operating systems appear to have been designed to take full advantage of the published research. Standard interface sounds generally consist of semi-abstract beeps and clicks to alert the user of system and application errors and events (such as clicking on an icon), but they rarely provide any information that is not readily available visually.
Since the mid-1990s, there has been a steady stream of conference papers related to sound and computing but few major publications. One notable essay is Stephen Brewster's chapter in the 2008 Human-Computer Interaction Handbook, which provides an extremely useful and comprehensive review of research related to sound and interface .
Future work in sound is likely to focus on mobile devices, which have unique constraints that may make them well suited to sound-enhanced (or purely auditory) interfaces. Brewster concludes his review of nonspeech sound research by stating that "nonspeech sound has a large part to play in the near future ... in mobile computing devices." Bill Gaver says, "It seems to me that there's lots of opportunity for exploring sound in design still. Certainly, auditory feedback plays an important, if fairly crude, role in mobile devices."
Unfortunately, generalized guidelines regarding the use of sound have yet to emerge. This lack of design theory is all the more unfortunate because in the past 10 years, technology has advanced to the point where sampled and synthesized sounds can be easily included in almost any product at low cost. An ever-increasing range of products include myriad sounds, from mobile phones to washing machines, and thus an ever-increasing number of practicing interaction designers make decisions about sounds in products.
Designers should begin considering the sounds of their products at the concept-generation stage. If sound is not considered until late in the design process, it will not play a larger role than at present, and it is unlikely to prove any more useful than it has in the past.
Special thanks to Bill Gaver, Lenny Shrieber, Peter Sisk, Mei Lin, Bill English, and an anonymous editor for their help with and contributions to this article.
1. "The Music played by CSIRAC." University of Melbourne, Melbourne School of Engineering, Department of Computer Science and Software Engineering, 15 May 2006. <http://www.csse.unimelb.edu.au/dept/about/csirac/music/music.html>.
2. Deatherage, B. H. "Auditory and Other Sensory Forms of Information Presentation." In Human Engineering Guide to Equipment Design (Revised Edition), edited by H. P. Van Cott and R. G. Kinkade. Washington D.C.: U.S. Government Printing Office, 1972.
3. Bly, S. "Presenting Information in Sound." In the Proceedings of the 1982 Conference on Human Factors in Computing Systems. New York: ACM Press, 1982. For a discussion of the historical significance of Bly's work see Bly, Gaver, and Buxton's unfinished manuscript from 1994, Auditory Interfaces: The Use of Non-Speech Audio at the Interface (http://www.billbuxton.com/Audio.TOC.html).
4. Gaver, W. "The SonicFinder: An Interface that Uses Auditory Icons," Human-Computer Interaction 4, no. 1 (1989). This same issue contained Blattner, Sumikawa, and Greenberg's seminal "Earcons and Icons: Their Structure and Common Design Principles," an article by Alistair Edwards describing an interface for blind users, and an introduction by Bill Buxton.
5. Kramer, G., ed. Auditory Display: Sonification, Audification and Auditory Interfaces, Santa Fe Institute Studies in the Sciences of Complexity, 55758. Reading, MA: Addison Wesley Longman, 1994. and Kramer, G. et al. Sonification Report: Status of the Field and Research Agenda. Arlington, VA: National Science Foundation, 1997. <http://www.icad.org/websiteV2.0/References/nsf.html>
6. Brewster, S. "Non-Speech Auditory Output." In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications edited by Andrew Sears and Julie A. Jacko. 2d ed. Boca Raton, FL: CRC Press, 2007. For an exhaustive survey of research into sound and interfaces prior to the mid-1990s, see Bill Buxton's "Speech, Language & Audition," chap. 8 in Readings in Human Computer Interaction: Toward the Year 2000, edited by R.M. Baecker, J. Grudin, W. Buxton, and S. Greenberg, Morgan Kaufmann: 1995.
Paul Robare is currently pursuing a master's in interaction design at Carnegie Mellon University. His interests range from nonspeech sound and multimodal interfaces to service strategy, experience design, and games. He maintains a portfolio at www.paulrobare.com.
Jodi Forlizzi is an associate professor of design and human-computer interaction and the A. Nico Habermann Chair of Computer Science at Carnegie Mellon University, and an interaction designer contributing to design theory and practice. Her theoretical research examines theories of experience, emotion, and social product use as they relate to interaction design. Other research and practice centers on notification systems ranging from peripheral displays to embodied robots, with a special focus on the social behavior evoked by these systems. Jodi was trained as an illustrator at Philadelphia College of Arts and as an interaction designer at the Carnegie Mellon School of Design. She holds a self-defined Ph.D. in design in HCI from Carnegie Mellon.
©2009 ACM 1072-5220/09/0100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2009 ACM, Inc.