Robert Schumacher, Kirsten Jerch
Recently, NIST released guidance NISTIR 7804, “Technical Evaluation, Testing and Validation of the Usability of Electronic Health Records,” also known as the EHR Usability Protocol (EUP). The document is part of the NIST Health IT Usability initiative, which is focused on establishing a framework to define and assess the usability of health information technology (HIT). The reason for the initiative is the belief that usability represents a critical yet often overlooked factor affecting the acceptance, adoption, and use of electronic health record (EHR) systems. Simply put, measuring usability of HIT is an essential step toward improving the usability and safety of these applications. The response to this guidance, however, has been focused on the assumption that usability is difficult, if not impossible, to measure, and the resulting fear that asymmetrical measurement will create unfair standards of comparison. This concerns us, because the fact is, good usability research has the opposite effect on software development: It spurs people-focused innovation, which begets more effective, efficient, and satisfied users, which begets more satisfied customers.
To abstract the arguments in interactions by Swanson and Lind, the authors claim there are too many moving parts to conduct valid and reliable validation tests on the usability of EHRs . They argue that an EHR could be configured completely differently in different medical environments where any number of unique users, user tasks, patient data, and connecting data sources are possible. We agree that, mathematically, the combinations are endless, making systematic user testing in every scenario seem impractical. EHRs undoubtedly represent a complex domain, but where Swanson and Lind err is in assuming it’s too hard to conduct validation (aka summative) testing if the end goal is to use the results “as a measure of comparative usability.” The authors’ critique is built upon a fallacious warrant that the EUP is mainly about comparability of results. As a result, much of the criticism is at cross-purposes to the premise of the EUP.
Comparison of EHR performance in summative validation testing is the anti-objective, and it says so in the body of the EUP NISTIR 7804:
“The EUP is not intended to result in a relative or comparable score of usability…each case involves the development of metrics that are unique in their level of detail to the specific context of use, user characteristics, potential risks, and tasks of the system against which they are tested” [2, p. 29].
The authors of the EUP fully understand that comparative usability metrics are not possible (or even necessarily meaningful) given the complexities and unique environments of EHRs. But this was never the point. Measuring usability is a practice. The only standard of comparison is original intent.
The expert evaluation procedures in the EUP are intended to drive a process whereby qualified experts focus on the user interface and then make positive changes based on the findings. The critics of the EUP should look beyond the premise of comparison or competition, and cease the unproductive conversations about why it is impossible to create a level playing field. Instead, they should focus on the critical task of developing effective methodologies to evaluate whether a user interface puts patient safety at unnecessary risk.
Let’s reflect on how we got here. The EUP was developed primarily to define a standard protocol for evaluating and testing the usability of EHRs. Many have shown that people are harmed, sometimes fatally, by use errors brought about by poorly designed user interfaces. The usability of EHRs must be held up to greater scrutiny; as far as we know, Microsoft Word has never killed anyone, but EHRs have.
Most HIT vendors are very concerned about patient safety. Many will test data interoperability with other systems ad nauseum but resist formally integrating usability testing in the software-development process. What they may not appreciate is that users are the most complex variable in their applications. Without adequate user testing, we see a lack of systematic task analysis to get workflows right, resulting in too many steps for simple tasks and too few precautions in the workflow for dangerous tasks. We see egregious mistakes in EHR layouts, leading to chaotic visual scan patterns and increased cognitive load (e.g., pediatric growth charts that violate decades-old conventions). We also see misapplications of techniques for assessing user behavior and performance, such as using focus groups to test performance. Focus groups assess users’ subjective opinions, not objective end-user performance. As psychologists and human factors professionals well versed in process control and engineering psychology, we see the risks that such oversight creates in medical environments.
A handful of application vendors are doing the right things and do recognize the pivotal role that improving the user experience plays in the market. Others have convinced themselves they are doing the right things or have a tin ear about any serious professional human factors assessment of their products. And some, unfortunately, are totally ignorant of the problem.
Measuring user performance in a valid and repeatable way is challenging. Some have taken the curious approach of indicting the whole discipline of usability by invoking Rolf Molich’s Comparative Usability Evaluation (CUE) studies , suggesting there is some kind of universal axiom about the fallibility of usability testing. But the CUE studies have not focused on validation testing, so the whole argument is off base from the start. Moreover, the point of the CUE studies is not that the methods themselves are unsound, but that the execution of those methods is unsound. Appealing to the validity and reliability arguments is misguided at best; it does not respect the established science of the field, but rather focuses on the commercially popular.
The alternative to confronting measurement challenges in the EUP is not to walk away, but to refine methodologies. One might imagine in situ testing over an extended period of time, where collection and evaluation of user performance measures drive improvements in EHR user interfaces. Clearly, this would be more valid and reliable than doing a lab-based test. These are the issues that human factors professionals and HIT developers should be working out, instead of fixating on the idea that we are at odds because we can never achieve comparable usability data with so many variables on the table.
A presentation by Harris and North at the 2012 Human Factors and Ergonomics Society Healthcare Conference gives us a peek at what we might learn if we measured behavior in situ. Many EHR users have taken to putting issues with their EHR into the FDA adverse events database for medical devices (called MAUDE). Harris and North’s analysis of these reports reveals dozens of unique adverse events over the course of five years, in which patients were harmed due to poor user interface design . Controlled, systematic study of these self-reported events could likely expose a multitude of issues.
We have a long history of successful application of human factors validation testing in other fields and industries (e.g., aviation, transportation, and power generation), especially where safety is a concern. In healthcare, validation testing is currently a key part of the FDA medical device 510k submission process . In fact, the frequent, dramatic, and deadly incidents in aviation, transportation, and power generation prodded government and industry to embrace human factors approaches. The result of this approach is that one is now orders of magnitude safer in a plane than in a hospital.
The call to give up on measurement is at best organizational languor, or at worst, subjugation to profit. The growth in size and sophistication of a life-and-death industry such as HIT should not be met with the response of “let’s measure less”there are too many lives on the line.
2. Lowry, S. et al. Technical evaluation, testing and validation of the usability of electronic health records (NISTIR 7804). (Feb. 2012); http://www.nist.gov/healthcare/usability/upload/EUP_WERB_Version_2_23_12-Final-2.pdf
3. A general representation of the CUE studies can be found at http://www.dialogdesign.dk/CUE.html.
4. Harris, S and North, R. Mining MAUDE: Perspectives on EHR and device design from the FDA. Proc. of the 2012 Symposium on Human Factors and Ergonomics in Health Care: Bridging the Gap. Human Factors and Ergonomics Society, 2012.
Bob Schumacher is managing director and co-owner of User Centric. He is a co-author of NISTIRs 7804, 7742, and 7741, and served on the National Research Council’s Committee on the Role of Human Factors in Home Healthcare.
Kirsten Jerch is a user experience specialist at User Centric. She has conducted both formative and human factors validation studies with medical devices intended for clinical and patient use. She served as editor for NIST 7804.
©2012 ACM 1072-5220/12/0700 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2012 ACM, Inc.