Art Swanson, Scott Lind
Recently there has been a lot of discussion about the usability of electronic health record (EHR) systems, with many ongoing efforts to standardize usability practices and metrics, and even the user interfaces, of EHR systems. Specifically, the recently proposed NIST Usability Evaluation Protocol (UEP) seeks to standardize summative usability testing for all EHR products and to allow those results to be compared as a measure of the systems' usability. An important note is that the stated goal of the UEP is to reduce patient-safety errors, and that the data will be published as a measure of comparative usability. The task faced by NIST is not inconsequential, as pointed out in the conclusion of a 2005 article by Bob Bailey: "The research litera ture is fairly clear that even highly experienced usability specialists cannot agree on which usability issues will have the greatest impact on usability" .
Adding to this difficulty, during the initial discussions on the UEP, it became clear to us that there were tactical challenges in creating meaningful summative usability tests of EHRs that were not being explicitly addressed. We have done extensive usability testing of our companies' EHR products and have faced the significant challenges in developing comparative summative usability tests. Here we detail these challenges, not only to start the conversations on how to solve them, but also to start to educate the broader community on the specific complexities of working in health IT. At a higher level, the challenges we present are not unique to health IT. Recruiting, realistic tasks, system configurationthese are all issues when usability testing any large enterprise productivity system. The reason we begin at this level is to create a baseline of knowledge of the complexity of these systems to inform the standards protocols like the NIST UEP. Additionally, we will propose strategies that may accomplish the same goals as the NIST UEP (reduction of patient safety errors) in a more robust and pragmatic manner.
Having significant experience in usability testing EHRs, we hope that our experiences on the front lines can help inform the discussions among healthcare providers, industry, academia, and government as they begin these standardization efforts.
Jamie Heywood, CEO of the medical social networking site Patients Like Me, said at a recent conference: "The biggest problem with healthcare is doctors" . There are many challenges in designing for physicians: They are smart, demanding, and exacting. They are used to being the experts and not having their judgment challenged. Like all people, physicians are pattern-matching machines, but their training intensifies this behaviorthey can't help looking for patterns and abnormalities in clinical data. This characteristic means that having real (but deidentified) clinical data for usability testing is absolutely critical. Any data that does not fit or does not make clinical sense is like a big red flashing beacon to physicians and other clinicians that they simply cannot ignore. When we built test systems for our first usability tests (see Challenge 7) and loaded sample data for the testing scenario, we thought we needed valid data only for those elements with which our doctors would be asked to interact. Our assumption was the surrounding data could be fake (as has often been the case when testing in other domains), but this was proven false time and again, as doctors were continually distracted from their primary tasks by our unrealistic data.
The data in EHR usability tests must be real and complete, or it becomes a serious distraction to participants. Finding or creating actual patient data that is de-identified according to HIPAA regulations and that fits the scenario of the task is difficult and time consuming, but it would be necessary for standardized testing. And this means creating a standardized data set for each user type (see Challenge 4) and each patient type. Having a standard data set for a 78-year-old male congestive heart failure (CHF) patient will do nothing for evaluating the usability of the system for a pediatrician or an obstetrician.
The modern enterprise EHR system is much more of an interdisciplinary workflow system than a stand-alone data-recording tool. Real-life workflows are both highly variable and complex, requiring extensive configuration and adaptation of the system to enable successful use by organizations, practices, and even specific users. These configurations, including master file definitions (for example, drug master, order sets, pick lists, and so on), will cause systems to behave in dramatically different ways, resulting in different steps, different labels, and even different default values for the prescribed workflows. This raises some questions regarding the logistics of traditional usability testing. How do we perform a common testing protocol against systems that by necessity allow such diverse configuration? Do we standardize the configuration instead of having it represent the normal workflow? If so, how do we then compare results from a common scenario across multiple EHR systems?
We could attempt to prescribe an out-of-the-box configuration that might standardize the results (at least within a particular system), but this approach would not be representative of the optimal or even the intended level of usability for these systems. Conversely, we could take a much more investigative approach and allow users to use their pre-configured and "optimized" systems, but then the task time, steps, and even error rates would not be directly comparable.
We have used both approaches and learned that testing with a standardized configuration very often introduces issues and distractions into tasks ("That isn't how my system does it"), whereas using an optimized system yields results that are not comparable in any meaningful way. If the goal is to quantitatively compare the usability of two products, then a common configuration is the only mechanism that provides results that can reasonably be evaluated against each other. However, developing an initial configuration of these systems that enables a standard workflow for the task and is both representative and standard across EHR systems is another area of research that could benefit the broader community, particularly because a full workflow often involves many modules from various systems.
The tasks within an EHR system are very frequently triggered by outside systems (for example, a lab result, an appointment, or an order). That flow of tasks has a context and series of associated actions that varies substantially based on the connected systems, their degree of integration, the organization's process and guidelines, and even the communication between providers. Defining tasks that are representative and yet consistent across systems and use environments is challenging. Thus, for usability testing, the tasks need to be defined in terms of the broader scenario, including how the scenario was influenced by previous data and/or other systems as well as by the test subjects' model of how the clinical tasks flow in their work environment. Different practices and healthcare facilities have different processes, so a system may test well in one context and poorly in another. This is obvious to usability professionals and is important to learn about in formative testing, but it poses a challenge in terms of the level of granularity that would be required to have meaningful comparative summative tests.
As an example, the task of a refill request depends significantly on how that request was triggered. If the refill request came from a patient phone call, then it is frequently handled as a normal task or message. However, if the system is connected to a pharmacy or a pharmacy benefit manager (PBM), the refill may come in as a dedicated refill message with much more data attached to it. Similarly, some systems connect to patient portals, and different kinds of patient portals vary in their abilities to send refill messages to the EHR and in the granularity of those messages.
When trying to capture the usability of these systems, having complete coverage of all the user types is critical. The high-level roles are obvious: doctor, nurse, medical assistant, medical coder, front desk, appointment scheduler, and so onbut a thorough evaluation requires much more granularity within these roles. For example, doctors should at least be broken out into primary care, procedure-based specialties like surgery, and non-procedure-based specialties like endocrinology. While there are some common tasks across user types, such as prescribing medications, the goals and workflows for the tasks can be radically different. How a surgeon prescribes medications in a hospital is very different from how a primary care doctor prescribes medications in an out-patient clinic.
We must compare the usability of core tasks within a specialty (or group of specialties). In addition, users need to be further categorized based on the environment (office, clinic, hospital, or home), because the types of tasks and related information may differ depending on the location. And even within a hospital, the Intensive Care Unit is a very different environment from the Emergency Departmentnot to mention the combinations of care team configurations and cross team configurations of staff and environments that patients' care takes them through.
The intended use of a system, its complexity, and the risks involved in using it necessarily assume a level of up-front training or user certification. A related issue is system familiarity. Should you only test users of the current system or should you test users of other EHR systems as well? We've tested users along the continuum of minimal to extensive training on our systems and level of experience with other EHR systems, and consistently found that the worst performers are users of other EHR systems with no training. This is likely because the conceptual model of the different EHR system is very often so different that it is much more difficult for users to shift their conceptual models than it is for them to learn a new model.
Different practices and healthcare facilities have different processes, so a system may test well in one context and poorly in another.
Ensuring consistent training and familiarity with a system is required to enable reasonable comparison of results across EHR systems. This requires more targeted recruiting and standards for task-level training.
As noted earlier, the breadth of users and profiles that must be tested is both significant and specific. The overall recruiting process to create a qualified, representative pool of testers can be challenging, as clinicians who use other EHR systems want to help improve their own products, not those of competitors. And those who do not use an EHR system often believe it is a sales effort. Attempting to get enough time and focus with physicians is problematic. They are very busy and highly paid, which makes finding incentives particularly difficult. Finally, consistently finding new physicians (who are not biased by previous testing sessions) will be challenging as we expand the scope and number of usability tests that must be performed.
The process of configuring test systems is complex. These systems are large and complicated, and they must be installed and configured properly. Additionally, external interfaces to other systems must be enabled or at least simulated to allow realistic task completion. Few if any systems cover all workflows of interest, particularly considering that an EHR is only part of a provider's health IT infrastructure; this is especially true in medium to large institutional settings. Systems also must be configured with the proper external results and queues to support the scenario and tasks. Finally, the systems must be structured such that the data can be reset to a default state.
Much of the complexity of these systems (and much of the potential for patient-safety errors) actually comes from second- and third-order effects that are almost impossible to meaningfully replicate in a usability testing scenario. Because these systems are often only one component within a broader network of other health IT systems and manual processes, it is nearly impossible to predict how the flow of information and/or tasks will propagate. We can define static scenarios, but that will not account for the non-linearity inherent in a complex and chaotic work environment.
The challenges described here all contribute to making the summative usability testing of EHR systems complex and expensive. To be clear, we support usability testingthe issue is in developing summative tests that are comparable and meaningful across different contexts and systems. In our experience, deriving a valid, standard, and generalizable summative usability test is not the most effective use of limited resources to achieve the goals of increased patient safety, improved patient outcomes, and reduced healthcare costs. We believe there are more efficient mechanisms and are suggesting three areas to pursue.
Establish high-level design guidelines. Collaboratively establish basic design guidelines with a focus on reducing the chance for user error. For example, provide general guidance and conventions for the display of patient context, advance directive instructions, and lethal-dose indications, as well as standards for terminology and data. These guidelines must be flexible enough to allow for innovation as we rapidly move to new platforms and interaction models, such as tablets and touch-based UIs. A similar undertaking by Microsoft with the National Health Service in the UK yielded the MS Common User Interface (www.mscui.net). We would expect that we could use this as a model for collaboration and standards.
Instrumentation of applications. By capturing and reviewing real-world data, we could better analyze usage patterns and look for opportunities to improve usability and safety. This could provide valuable insight into preventing selection of the wrong patient, wrong treatment action, wrong medication, delay in care, and unintended care "never events" as outlined by Emily Patterson at the recent NIST Usability Workshop in June 2011 . Some combination of data capture and user reports (similar to the FAA Aviation Safety Reporting System) would likely yield optimization and error information to improve EHR systems.
Support education of EHR users. Help provider organizations to become more informed customers when making EHR purchasing decisions. Offer guidance on evaluating EHR usability specifically for their workflows and users, such as the Healthcare Information and Management Systems Society (HIMSS) guide to usability in EHR vendor selection . Consumer-level awareness will foster competition and help the market to value user experience, including usability, more highly.
We strive to design systems that help clinicians perform their jobs as effectively and efficiently as possible. We also believe it is critical that vendors be able to innovate and compete based on the best combination of features, usability, and user experience.
We are encouraged by the increased focus on improving usability and are optimistic that vendors, providers, government, and other stakeholders can work together to develop practical, effective techniques to continue to enhance the usability and safety of EHR systems.
1. Bailey, B. Judging the severity of usability issues on web sites: This doesn't work. Usability.gov, Oct. 2005; http://www.usability.gov/articles/newsletter/pubs/102005news.html
2. Heywood, J. Closing Keynote: the power is in your hands. Healthcare Experience Design Conference 2011; http://2011.healthcareexperiencedesign.com/speakers2-2/
3. Patterson, E. NIST Usability Workshop 2011; http://www.nist.gov/healthcare/usability/upload/patterson-NIST-June-7-2011.pdf
4. HIMSS EHR Usability Task Force. Selecting an EHR for Your Practice: Evaluating Usability. Aug. 2010; http://www.himss.org/content/files/HIMSS%20Guide%20to%20Usability_Selecting%20an%20EMR.pdf
Art Swanson is the director of user experience for Allscripts and has worked on Ambulatory Care EHRs for the past six years. He has performed usability and design consulting for a number of industries, including aviation, automotive, and mobile devices, as well as enterprise productivity applications in finance and healthcare. The views expressed here are his own and do not represent the views of Allscripts Healthcare Solutions.
Scott Lind has been a director of user experience at Siemens Healthcare since 2005. Prior to that, he was director of usability engineering at Telcordia Technologies (formerly Bellcore) for seven years, where he worked on numerous telecommunications products for customers such as Verizon and AT&T. The views expressed here are his own and do not represent the views of Siemens Medical Solutions USA.
©2011 ACM 1072-5220/11/11 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.