If you ask someone outside the Human-Computer Interaction
(HCI) field about usability, many will mention the
"classic" discount methods popularized by Jakob
Nielsen and others. Discount methods have the appeal of
seeming easy to do, and, more importantly for business, being
inexpensive. This is especially attractive to smaller startup
companies with low budgets. But are discount methods really
too risky to justify the "low" cost? This month's
business column authors think so, based on their research and
experience. Indeed, they believe that these discount methods
may actually backfire and end up discrediting the field.
Following a lively discussion on the CHI-WEB listserv, we
asked them to explain what they see the risks to be, and what
they believe we, as a profession, can and should do about
David Siegel and Susan Dray
From Cost Cutting to Cost-Benefit
The value of so-called "discount" usability methods is much discussed. These methods cut corners in the hope that "some usability is better than no usability." They cut costs by reducing demands on the critical resources of time, facilities, cash, and skill. For example, discount user testing saves time and cash by testing up to five users, and reduces demands on skill by allowing limited test planning, simple test execution and "lite" analysis of usability "results." Inspection methods discount by examining only some aspects of usability problems, thus saving on analysis time and analyst skill requirements (Cockton, Lavery and Woolrych 2002).
Discount methods certainly eased industry uptake of HCI approaches in the 1990s. However, the real determinant of appropriateness is not "discountability," per se, but rather cost-benefit, and must include not only an assessment of benefits but also of risks, especially risks of errors. In our research, we have used the concept of Effectiveness as developed by Sears (1997). Figure 1 shows the definitions and formulae used in Sears' measures, which address two kinds of error: missed usability issues and false alarms. Our research has led us to the conclusion that discount methods may be so error-prone that they discredit usability practitioners, and should be cleared off the HCI store's shelves.
Three ways to get usability on the cheap
To evaluate discount methods, we have to look at what is being traded-off in order to reduce costs, and consider the risks. Let us examine these tradeoffs in three broad categories of cost cutting tactics.
Cost Cut Tactic 1: Reduce the range of factors considered
Understanding usability problems requires attention to three key facets: (1) the contexts in which they arise, (2) the actual immediate and eventual difficulties, and (3) the assumed cause(s) of these difficulties (Lavery and Cockton 1997). Discount methods operate by reducing the facets under consideration. Inspection methods necessarily concentrate on causes. However, inspection methods cannot find actual difficulties, and do not pay attention to contexts. Discount-testing methods avoid the experimental controls that can confidently establish causation. Therefore, uncertainty over real difficulties or actual causes inevitably has an impact on the quality of recommended solutions.
Discount inspection methods save on time and skill by reducing the theory space for potential usability problems. Put simply, they narrow the scope of what the analyst has to consider. With less to look at and think about, analysts can work more quickly. Thus in Cognitive Walkthrough (Wharton et al. 1994), causes of potential problems are limited to "labels" that are hard to find or interpret, given a user's assumed knowledge. In Heuristic Evaluation (HE, Nielsen 1994), the theory space is limited to (classes of) system features that can cause problems. No discount method takes analysts systematically through a search space. Analysts must essentially pick sample user tasks or system features at random.
In discount user testing, limited user differences and data collection instruments restrict the range of difficulties that can be recorded and/or reliably analyzed, as well as severely limiting consideration of the contexts under which difficulties can arise in the first place.
User difficulties result from a complex interaction of user and system factors. Strengths in one may compensate for weaknesses in others. So, expert or highly motivated users may be able to overcome design problems. Conversely, user incapability may not lead to usability problems if the system is not demanding. Discount methods are generally too simple to take such complex interactions into account. One result is false alarms, such as failing to realize that a system defect is neutralized within a particular interaction context (e.g., misleading status bar messages have no effect when users never read them!). Equally, a failure to consider complex interaction contexts can lead to problems being missed, for example when task breakdown occurs following several previous seemingly harmless user actions.
Inspection methods do not encourage analysts to take a rich or comprehensive view of interaction. Too often, most system features and user tasks get ignored, as does consideration of likely user knowledge or capabilities. Worst of all, inspection methods very rarely lead analysts to consider how system, user and task attributes will interact to either avoid or guarantee the emergence of a usability problem. Similarly, discount user testing inevitably restricts the range of user capabilities, knowledge and tasks sampled, and may similarly fail to expose test users to the system features that are most likely to result in unsatisfactory interaction.
One discounting tactic that has been advocated for user testing has been restricting the number of users tested to five or fewer. This will inevitably reduce user differences. Furthermore, many problems can often still be found with additional users (Spool and Schroeder 2001), even for a small subsystem. Figure 2 shows the results of a study involving 12 users (Woolrych and Cockton, 2001). Random selections of even six participants would result in widely differing views on the existence, frequency and severity of problems. For example, contrast problems for Participants 1 to 6 with those for Participants 5, 6, 8 and (the more expert) 10 to 12. The testing order was due to participant availability, so the yield from the `first' six is accidental.
Cost Cut Tactic 2: Pint sized methods inside a big box
Discount stores often offer small goods in large boxes. It looks like you are getting more than you actually are. Discount methods can do the same thing. In our research on HE, predictions attributable to HE rattled around within a big box of predictions based on analysts' common sense. We knew common sense was at play, since all analysts read training materials containing conformance questions for appropriate heuristic applications, and had to base predictions on these questions. We could thus code cited heuristics in reports as (in)appropriate. We found that, overall, 69 percent of predictions were associated with inappropriate heuristics. Furthermore, thoroughness of evaluations was mostly attributable to the individual skills of different analyst groups (Cockton and Woolrych 2001). Thus, it appeared that the analysts themselves, rather than the heuristics they were supposedly applying, provided the discovery resource. Interestingly, additional analysis indicated that hits associated with correct applications of the heuristics tended to be problems that had minor impact and/or low frequency. High frequency/severity problems may simply be more "obvious" to analysts based on common sense, leading them to use arbitrary heuristics as post hoc justifications for the most critical usability problems. This can only impair the quality of recommendations for resolving a usability problem. Indeed, another weakness of HE is that it does little to support analysis of problem causes, leading to inappropriate solution generation.
Cost Cut Tactic 3: Self-Assembly Problem Sets
With self-assembly furniture, although the price may be reduced, the cost is not. The time that the purchaser spends assembling the furniture may be worth more than the price discount. The same is often true of discount usability methods, especially when multiple methods or HE carried out by multiple analysts are used. Who integrates and prioritizes the predictions and draws the conclusions from them? Often this is left to the consumer of the report. Without standard report formats, merging the predictions of individual analysts can be a frustrating experience (Connell and Hammond 1999). With a standard format, it remains time consuming and requires skill. Analysts' meetings to jointly prioritize problem predictions are effective, but this again requires time and skill, undoing any potential cost savings of the so-called discount method. You never get anything for nothing.
Using multiple analysts may not be a safe way to compensate for the weakness of discount methods. Multiple analysts improve thoroughness because it only takes one analyst to discover a problem for it to be predicted. The impact on validity is less positiveit only takes one analyst to not eliminate a problem for it to not be eliminated. So, without a consensus-based prioritized master list, multiple analysts will reduce a method's effectiveness (Woolrych and Cockton 2002).
Figure 3 shows data from our study (Cockton and Woolrych 2002) on how thoroughness, validity, and effectiveness change as analysts are added. Thoroughness was asymptotic (Landauer and Nielsen 1993), but validity got worse as analysts were added. The effectiveness trend was most interesting. It peaked at seven analysts, remained unaffected at eight, and then declined slightly, due to the negative impact of declining validity.
If analysts cannot be brought together to form a consensus, then perhaps simple frequency based elimination (as used in some method assessments) may help. Our study suggests that this is not necessarily true. There were thirteen unique predictions, including nine false alarms and four hits (Cockton and Woolrych 2001). When eliminated on the basis of prediction frequency, thoroughness and validity and effectiveness would all drop. Analyst consensus is safer than independent frequency-based elimination. However, analyst groups (or a more expert evaluation manager) could still eliminate actual problems or preserve false alarms.
The challenge is to improve all HCI methods, so that discount methods are less discounted and "full strength" methods can be applied in more contexts.
What does all this mean for our field?
Our recommendations for practice reflect input from colleagues at HCI 2001 (Ken Dye/Microsoft, David Roberts/IBM Warwick) and on CHI-WEB (AmandaPrail/Netusability, Fraser Hamilton/IconMediaLab, Josh Paluch/ Ovo Studios).
First, there will probably always be a place for discount methods. The challenge is to improve all HCI methods, so that discount methods are less discounted and "full strength" methods can be applied in more contexts. Discount methods must become more effective and other methods must become more practical.
Second, discount methods are most appropriately used to drive design iterations, as opposed to providing summative evaluation, benchmarking or competitor analysis. However, even here they are risky. In most cases, a little more planning, better analysts, more users and more analysis will all pay off.
Third, participants are only one cost in user testing. Planning and analysis generally take more time than testing, and the difference in cost between five and 10 users can be relatively low. Clients on a limited budget have reduced costs by carrying out some planning themselves and by having developers attend during testing (thus reducing analysis costs). Look at the real costs of user testing and know where the costs originate. Try to find cost savings in planning and analysis as well as on participants. For inspections, too, look for ways to reduce hidden costs, such as problem merging.
Fourth, we must acknowledge that our studies only look at prediction effectiveness, and not at method impact. In real working contexts, impact comes not from usability experts generating solution recommendations in isolation, but from working together with multidisciplinary project teams to generate solutions. However, it seems fair to say that prediction effectiveness should be considered a prerequisite for impact effectiveness.
Fifth, the value of discount methods as training devices should not be underestimated! One valuable outcome of collaborative inspections may well be that the developer team will see that user testing is essential.
Sixth, errors arising from discount methods may be more costly in some contexts than others. Different business models make different demands. In some contexts, hits may always be wins irrespective of the misses. However, in contexts such as online shopping, misses can be fatal. Savings on support costs and a few more attractive features are a benefit for retailed software, but for free-use software on the Web, it may be vital to eliminate all severe problems. Once software is bought, most users will (have to) struggle on with it. This is not true of free Web applications such as e-commerce sites. In general, discount methods are unable to address the whole product/site experience that is a key concern to DotCom managers.
Don't believe everything you read on the Web! Discount methods aren't very safe. They can and should be improved. Research has a key role here. In the meantime, understand method risks and do what you can to mitigate them.
1. Cockton, G. and Woolrych, A. (2001). "Under-standing Inspection Methods: Lessons from an Assessment of Heuristic Evaluation," in Blandford, A. and Vanderdonckt, J. and Gray, P. (eds.), People and Computers XV, Springer-Verlag, 171-192.
3. Connell, I. W. and Hammond, N. V. (1999). "Comparing Usability Evaluation Principles with Heuristics: Problem Instances vs. Problem Types," in Sasse, M. A. and Johnson, C. (Eds.), Proc. INTERACT '99, IOS Press,. 621-629.
5. Lavery, D. and Cockton, G. (1997),"Representing Predicted and Actual Usability Problems," in Johnson, H., Johnson, P., and O'Neill, E. (Eds.), Proceedings of International Workshop on Representations in Interactive Software Development, Queen Mary and Westfield College, University of London, 97-108.
9. Wharton, C., Rieman, J., Lewis, C., and Polson, P. (1994). "The Cognitive Walkthrough: A Practitioner's Guide," in Nielsen, J. and Mack, R. L. (Eds.), Usability Inspection Methods, John Wiley and Sons, 105-140.
10. Woolrych, A. and Cockton, G., "Why and When Five Test Users aren't Enough," in Proceedings of IHM-HCI 2001 Conference, eds. J. Vanderdonckt, A. Blandford, and A. Derycke, Cépadèus Éditions: Toulouse, Volume 2,105-108, 2001
11. Woolrych, A. and Cockton, G., "Testing a Conjecture based on the DR-AR Model of Usability Inspection Method Effectiveness," to appear in Proc. HCI 2002 Conference, eds. H. Sharp et al., Volume 2, British Computer Society, London, 2002.
Gilbert Cockton and Alan Woolrych School of Computing and Technology, University of Sunderland, UK [email protected]
Business Column Editors
Susan Dray & David A. Siegel
©2002 ACM 1072-5220/02/0900 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.