Enrico Bertini, Catherine Plaisant, Giuseppe Santucci
Not every new visualization is useful. How can we better separate the wheat from the chaff, report the thrill or gauge utility? We organized BELIV'06, a workshop about evaluation in information visualization, to answer this kind of question. BELIV'06 was born from an acknowledgment of current methods' limitations, and some frustration with time-consuming evaluations that too often lead to unsatisfactory results. Good information visualization (InfoVis) provides users with accurate visual representations of data as well as powerful interaction tools that facilitate exploration, understanding, and discovery. InfoVis is successful when users can gather new, nontrivial insights, a process that takes place over days or months and can rarely be simulated with series of short tasks. We judge metrics such as task time completion and number of errors to be insufficient to quantify the success of a system, thus the name of the workshop: "Beyond Time and Errors."
We received a surprisingly diverse set of papers, many of which addressed topics not covered in InfoVis or HCI conferences. About 35 researchers in the field participated in lively discussions. John Stasko launched the workshop with a presentation titled "Evaluating information evaluation: Issues and Opportunities." Presentation of papers followed as outlined below.
To start the debate, Keith Andrews reported on his experience with formative testing, summative testing, and usage studies, stating specific limitations for each technique. Ellis and Dix analysed 170 InfoVis papers, showing that very few of them contained an evaluation. The authors made the provocative claim that even the papers containing some form of evaluation all had problems, and a heated discussion followed about the need for reporting challenges while confirming the value for evaluation.
Henry and Fekete presented an experiment exploring how users perceive patterns in matrix visualizations. They used an interesting evaluation method of providing a printout of the visual representation and asking users to freely annotate findings and patterns. All the results could be seen at once by stacking them and using transparency. Mazza presented his experience using focus groups to evaluate a visualization for instructors. Focus groups proved useful to elicit unanticipated questions and thus to cover a broader range of issues. Rester et al. discussed the relative advantages and disadvantages of evaluation techniques comparing three different solutions to aid psychologists coping with data reported by patients. Both papers agreed that using a mix of methods can produce better results than using a single method.
Plaisant and Shneiderman proposed MILC (Multidimensional In-Depth Long-Term case studies), a methodology inspired by ethnographic methods and the evaluation of creativity support tools. They commented on the need for long-term studies in InfoVis and on their potential advantages over traditional methods, notably the possibility of documenting expert users' success in achieving their professional goals. Their paper gives specific guidelines and tips on how to conduct this type of study.
Bertini and Santucci proposed a review and analysis of "visual quality metrics," a set of measures to quantify intrinsic qualities of visualizations in three distinct classes: size metrics, visual effectiveness metrics, and feature preservation metrics. They described the scope and potential of each, emphasizing feature preservation metrics, a set of metrics inspired by Tufte's "lie factor." Feature preservation metrics qualify the goodness of a visualization in terms of how well it preserves the underlying data features. Goodell, Grinstein et al. proposed a system to record session histories in a visualization environment that allows for moving from the current visualization state to any of the previous ones, recording all the steps taken in the exploration. Characteristics of the graphs representing the user's paths might be able to detect usability problems.
Two papers dealt with heuristics. Zuk et al. described three different sets of heuristics showing that the kind of problems discovered by the evaluators is highly dependent on the chosen set, thus putting forward the problem of finding a minimal set of heuristics able to cover a larger group of problems. Discussion highlighted the benefit of having InfoVis heuristics, but also the challenge of having too many heuristics, making them unpractical to use. Ardito et al. also confronted the problem of heuristic evaluation for InfoVis. They proposed to adapt an established methodology called AT for abstract tasks methodology to the visualization domain. The methodology defines a precise protocol that evaluators must follow, allowing even novice evaluators to use the heuristics in a rigorous way.
This last group of papers dealt with benchmark datasets and metrics to facilitate evaluation and allow the comparison of visualization systems.
Guiard et al. proposed a platform based on Shakespeare's complete work to compare task performance in the navigation of multiscale documents. The platform can accommodate novel navigation techniques, input devices, and contexts. Participants found it a good example of how shared datasets and tasks can be very effective when designed for specific domains. A common debate about benchmark datasets is whether they should be designed to be as general as possible or tailored to a series of different areas.
Whiting et al. presented the "Threat Stream Data Generator," a tool to generate a synthetic dataset with ground truth. First a believable scenario is designed. The scenario is then expressed semimanually in data and injected in an existing real dataset. The benefit of having ground truth available was discussed extensively during the workshop: Known findings permit deriving precise figures about the effectiveness of a tool in supporting the discovery process. Synthetic data generation gives control over data characteristics and levels of complexity, but with the risk of being perceived as unrealistic by analysts. Melancon discussed the generation of synthetic graphs, highlighting how the generation process might produce misleading results. One key point of the paper is that it is important to take into account "scale-free networks" and "small world networks."
Two papers discussed task taxonomies. Lee et al. described a task taxonomy for graphs. The authors started from an established low-level task taxonomy and proposed higher-level objects and tasks for the domain of graphs. Each high-level task is described with examples taken from multiple real-world applications. The discussion focused on the challenge of refining task taxonomies and the possible use of wikis (Web pages supporting collaborative editing) to support this activity. Valiati et al. provided an extensive review of previous models and a novel taxonomy for multidimensional data. Tasks are organized in a hierarchical fashion, thus enabling the selection of tasks at different levels of detail. The authors conducted two user studies to test the taxonomy, checking for inconsistency and incompleteness. While all the observed tasks were covered by the taxonomy, the authors noticed that some tasks required a finer level of description and that the hierarchical organization was too rigid. These two papers demonstrated how task taxonomies can operate at different levels of detail and how useful specific sets of tasks can be for particular domains. The workshop's participants discussed the relative usefulness of low-level/high-level tasks and specific/generic tasks. Indeed, low-level tasks cannot always easily be reformulated in terms of specific domains. On the other hand, crafting a particular set of tasks for each domain can become a daunting activity. We discussed the difficulty of judging the quality of one taxonomy over another. One factor seems to be the number of tasks, favoring compact and easily sharable taxonomies.
Participants were satisfied with the workshop format and content, and we discussed the possibility of a second workshop in 2008. More important, we discussed the need for appropriate funding for evaluation activities, and the development of a formal infrastructure to support research on evaluation methods, encourage collaboration, provide benchmarks datasets, and share results.
For more info on BELIV'06 see http://www.dis.uniroma1.it/~beliv06/
University of Fribourg
University of Maryland
University of Rome "La Sapienza"
About the Authors
Enrico Bertini is a researcher at the University of Fribourg, Switzerland. He earned a Ph.D. in computer engineering in 2006 from the University of Rome "La Sapienza," where he was involved in various HCI and InfoVis projects. His main research interest is information visualization, with a specific focus on clutter reduction, evaluation techniques, and visualization for network security.
Catherine Plaisant is an associate research scientist at the Human-Computer Interaction Laboratory of the University of Maryland. She recently coauthored with Ben Shneiderman the fourth edition of Designing the User Interface. She earned a doctorat d'ingenieur degree in France and enjoys working with multidisciplinary teams on designing and evaluating new interfaces, in particular information visualization techniques.
Giuseppe Santucci is an associate professor in the department of computer science at the University of Rome "La Sapienza." He graduated in electrical engineering from the same institution in 1987. Since then he has been teaching courses in computer science. His main research activities involve user interfaces to databases, human computer interaction, and information visualization. He is a member of the steering committee of the International Working Conference on Advanced Visual Interfaces (AVI) and a member of the Institute of Electrical and Electronics Engineers (IEEE).
©2007 ACM 1072-5220/07/0500 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2007 ACM, Inc.