Reflections

IX.6 November + December 2002
Page: 56
Digital Citation

Must the sale end?


Authors:


I have some practical issues with Cockton and Woolrych’s article “Sale Must End: Should Discount Methods be Cleared off HCI’s Shelves?” in interactions IX.5 (September-October 2002).

First, they misrepresent. For example, when I read, “Analysts must essentially pick sample user tasks or system features at random,” my eyes bulged. Why must analysts do this? We don’t here at Forrester. When we review a site for a client we get the client to describe the most important target users and detail the users’ most important tasks. Then we evaluate how well the site supports those tasks.

Working with the client upfront in this way accomplishes two important goals. First, it gives the review focus. Sure, the reviewer will not find every problem with a 100,000-page site. However, the problems the reviewer does find will be ones that are important to fix because they interfere with critical tasks. Second, it gives the client focus. I wish I could say that every company begins design work by doing the research to craft user personas and detail critical user goals. Most don’t. So sometimes the most important value we bring is when we alter the clients’ perspective by awakening them to the fact that they have skipped two of the first, most critical steps in the design process. (Cockton and Woolrych touch on this in their fifth summary point; however, they grossly underestimate whom this affects and the impact it can have. I’ve seen a good heuristic evaluation change the thinking of C-level executives at Fortune 1000 companies, as well as senior people in government.)

Second, much of the authors’ argument depends on the assumption that there is an objective way to determine that a “miss” is, indeed, a miss. If only this were true! Unfortunately, the state of the art in automated testing can’t find 100 percent of all problems any more than an analyst can.

What about usability labs? Molich found that multiple lab teams testing the same program or site don’t find the same problems (Molich et.al. 1998; Molich et.al. 1999) In other words, the teams miss many problems, so why should a lab’s call on the validity of a problem found by an analyst be taken as the final word? Cockton and Woolrych even cite Spool and Schroder’s research showing that the number of users needed to find all the problems “even for a small subsystem” is closer to 80 than it is to 8. Did Cockton and Woolrych really test that 100th user to make sure that “misses” didn’t eventually show up as hits?

So, in the absence of a magical Error Meter, it’s just as possible that an analyst’s “false alarm” might actually be a problem that other methods didn’t reveal.

Third, I groaned at Cockton and Woolrych’s positioning of the role of the analyst versus the heuristics. They cite one of their studies, in which they found that “analysts themselves, rather than the heuristics, provided the discovery resource.” What’s odd is that they also point out (in their Figure 3) that adding more analysts to the process drives up the number of both misses and false alarms. So what’s going on? Are analysts the solution or the problem?

The answer is not really so difficult to understand in the light of basic realities; for instance, that some analysts do a better job of conducting evaluations than other analysts. Compared with a poor or inexperienced analyst, a superior analyst will find more real problems and better understand their causes even using the same heuristics. But great analysts are in short supply. So, the more analysts you add, the more likely you are to drive down the average rate of effectiveness.

Does this therefore mean that heuristics aren’t valuable? In other words, would a superior analyst do just as good a job without the aid of any heuristics? I can’t imagine why this would be true unless the heuristics were so bad that they misled. (And I think it goes without saying that reviewers should use only valid, relevant heuristics.)

And, of course, there are lessons to be learned about how to identify valid, relevant heuristics—but I feel like I’ve flamed long enough.

References

1. Molich, Bevan, Curson, Butler, Kindlund, Miller, Kirakowski. “Comparative evaluation of usability tests.” Proceedings of the Usability Professionals Association, 1998.

2. Molich, Thomsen, Karyukina, Schmidt, Ede, Oel and Arcuri. “Comparative evaluation of usability tests.” CHI’99 Extended Abstracts, (1999), pp.83-84.

Author

Harley Manning, hmanning@forrester.com, Research Director, Forrester Research

©2002 ACM  1072-5220/02/1100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2002 ACM, Inc.

 

Post Comment


No Comments Found