Susan Dray, David Siegel
The growth of our field has produced both an increasing range of user-centered design (UCD) services and a drive to find the most cost-effective ways of doing things. Increased demand within companies places more pressure on budgets. Suppliers of usability services respond inventively to the need to find ways to provide useful information for less money. This type of evolution always brings new challenges to quality. Unfortunately, the people who control budgets may not be very sensitive to the fine points of methodology and may need help evaluating the pros and cons of approaches that seem simple and inexpensive.
Remote usability testing for international studies is one case in point. On the one hand, companies increasingly recognize that succeeding in business globally means they need to take specific steps to understand and design for their international users. On the other hand, international UCD typically includes costs beyond those for local UCD projects. On the face of it, remote usability testing seems like an obvious answer to some of the difficulties and costs of doing international UCD. There is a large and growing literature on remote evaluation methods [1, 6, 7], but we have seen little on their application in international work.
"Remote evaluation" can refer to many different types of research that fall into two major categories, synchronous and asynchronous. In synchronous methods, a facilitator and observers can receive the data and manage the evaluation in real time with a participant who is remote. Depending on the nature of the evaluation, this can be done using methods such as video conferencing, or using remote application sharing tools such as WebEx or NetMeeting (to name just two). In contrast, with asynchronous methods observers do not have access to the data in real time, and there is no facilitator interacting with the user during data collection. Asynchronous methods can be as simple as user logs of critical incidents or diaries of activities. Asynchronous methods also include automated approaches, whereby a user's click stream is collected automatically.
Asynchronous methods add a set of issues to the issue of remoteness per se. Although they may be less intrusive to the user once the data collection process is set up, they essentially abandon the effort to collect observations or data, and therefore, for their qualitative aspects, they are limited to self-report with its associated validity problems. Automated methods are best adapted to collecting quantitative information. In fact, the ability to sample a large group of users in order to collect statistical data is one of their key advantages . Even if supplemented by self-report qualitative measures, they are not able to correlate specific user intentions with specific user actions. For example, a study of remote versus laboratory usability testing for mobile devices  used a tool called WebQuilt to capture click streams. The authors point out that the method did not allow them to distinguish clicks that took the user off the task path out of curiosity from those that were errors. Opportunities for misinterpretation associated with asynchronous methods will be multiplied by the international context. For this reason, and because synchronous evaluations come the closest to being analogous to laboratory testing, we will limit our focus to them.
Three main advantagescost, freedom from facilities, and time savingtend to be cited for remote testing. Although they actually overlap to some degree, we will discuss them in turn, with a particular focus on their implications for international work.
The most often-cited reason for considering remote testing is that it is perceived to be less costly than non-remote testing. Of course, there should be a significant saving in travel, especially if the alternative includes bringing a team along. However, the overall saving may not be as great as you expect. Even if the data collection is remote, you will have many of the same costs as when doing an international test: onsite recruiting and honoraria, a realistic setup, translation of materials, moderator, and simultaneous translation if needed. You will likely need some kind of local coordination. If you are conducting the study in a facility, you will need to plan for facility charges and for the costs of data transmission. Charges for remote video links can be quite high. A video feed can be useful even when evaluating software, but if you are testing hardware or firmware, you may have no alternative to using a facility's video conferencing capability as your main source of data.
Freedom from Facilities
Of course, when the product being tested is software that can be distributed electronically, remote testing can in principle be done without a facility. In addition to the saving in facility costs, Jean Scholtz  points out that remote testing can facilitate evaluations being done in the context of the user's other tasks and technology. Velda Bartek and Dean Cheatham of IBM's Pervasive Computing Division [2, 3] point out that not being tied to a facility can allow you to sample from a larger geographical area without proportionally increasing costs. This can be an attractive benefit if you are sampling a specialized population that is not heavily concentrated geographically. This same argument may apply when the development team is widely dispersed.
However, there are a number of potential obstacles to consider. To be useful, the application sharing tools require broadband connections, which are still quite rare in many countries, especially for home users. Restricting your sample to these people obviously introduces a bias. Second, if your application or Web site is oriented toward the work context, you often have to deal with many sensitive issues to gaining access to the workplace. We have usually found face-to-face time with people who have to give permission and encourage employees' participation to be extremely important in this process. The context of international work already complicates this. Remote testing raises some additional issues and may be perceived as more intrusive than if you were actually present. The fact that you need to interact with users through their information technology (IT) systems and network can introduce an additional level of security issues, both organizational and technical. Finally, don't forget that "freedom from facilities" is not unique to remote testing but can also be achieved by doing field usability testing in person.
Many people perceive that conducting remote testing is quicker and easier than testing in person. In our opinion, they are probably underestimating the time needed to arrange and set up a remote test, especially internationally. You will still need to carry out recruiting, time with users needs to be scheduled, materials must be prepared and translated (assuming you are testing in local language), and arrangements must be made to set up the participants with whatever technology you are using for the remote evaluation. In addition, you will have to figure out where in the communication setup simultaneous translation will take place, if you are using it.
One possible advantage to remote testing is that users can be spread out over time. This may allow for more flexibility in scheduling. On the negative side, don't forget that your observation team may be many time zones away from the user. Be prepared to watch your usability sessions at three o'clock in the morning.
When the Unexpected Happens
In all usability evaluations, but especially international ones, you must be prepared for the unexpected. Most of our international projects have needed some improvisational problem solving, changes of protocol, and so on. In our opinion, it would be highly risky to leave these things to chance and difficult to handle them remotely.
The infrastructure required for remote testing itself introduces additional management challenges. As with any technology, there are risks associated with system compatibility and stability, getting people through the set up, network issues, and so on. You may have to do remote troubleshooting when a user is affected by a crash or is simply confused about the application-sharing tool. Setting up users for remote data collection may be difficult to coordinate at a distance [Morgan Ames, personal communication), especially if you are trying to coach them through it via a translator. In considering these risks, and their cost implications, take into account the overall project budget that could be put in jeopardy if you run into unforeseen technical problems you can't solve remotely.
Missing Indirect Cues
When you are not physically present with a participant, it is difficult to accurately judge nonverbal and paraverbal cues, (body language, tone of voice, and nonverbal sounds). These can affect interpretation not only of the user's affective responses, but also of the user's moment-by-moment intentions. We see this even when we compare the observations of an in-room facilitator with the perceptions of people watching from behind a one-way mirror in a traditional lab. With remote testing, even when a video link is used, it can be very difficult to interpret these cues. Admittedly, when you are conducting an evaluation through a native-speaking facilitator and an interpreter, there is already some filtering of this information. However, it is easier to catch these cues, even in a foreign language, when you are present, even if only to explore them in the debriefing. In any case, remoteness inevitably adds an additional layer of filtering.
Difficulty Managing the Interpersonal Dynamics of the Evaluation
One way or another, in international testing, the interpersonal dynamics of the evaluation situation must be managed across cultural and linguistic barriers and may require different approaches in different countries. Remote evaluations add an additional layer of technology between the user and the facilitator that can reduce your flexibility in handling these dynamics. Furthermore, the dynamics of remote evaluation itself will interact with the inherent challenges of international evaluations.
For example, building trust with remote evaluators can be a real challenge. As Bos and colleagues  found in exploring development of trust in virtual groups, using remote methods, such as video- and audio-conferencing, can delay development of trust and increase vulnerability to problems and miscommunications. In a remote test, it can be harder to monitor trust, rapport, or negative affect about the evaluation process and to take timely steps to manage problems if they arise.
Building trust with remote evaluations can be a real challenge.
In addition, we have to remember that usability evaluation itself often needs to be adjusted for people in different cultures. In general, it will be much easier to manage these issues sensitively in person. Remote testing might make the process seem even more alien, especially for nontechnical users. Indeed, Masaaki Kurosu [personal communication] suggests that remote testing with Japanese participants may be culturally inappropriate because of the general cultural challenges posed by even in-person usability evaluations.
Missing the Opportunity to Understand Cultural Context
Users will often dutifully attempt tasks that you give them, even ones that don't make sense in their context. It can be easy to misinterpret the resulting difficulties they experience. Being physically present in another culture exposes you to clues that help clarify the meaning of behaviors you observe in user testing. Some of this insight can come informally from the opportunity to experience an unfamiliar culture firsthand, and some of it can come from taking advantage of your time in country to do field research along with usability evaluation. Sometimes, this can lead to insights that change the interpretation of usability findings, allowing the team to understand for example that the problem was not that the functionality was hard to discover, but that it did not make sense in the lifestyle of the country . In short, if opting for remote international testing decreases your chances to experience the country in person, or makes opportunities for field research harder to come by, it increases the risk that "you won't know what you don't know."
Taking the above issues into account, here are some recommendations about an appropriate role for conducting synchronous remote evaluations in international usability:
- Don't do remote testing as your first international study for a given application or product, and don't do your first remote test as an international one.
- Consider reserving remote methods for metric or summative tests, while doing formative evaluations in person. The rich (and interpretable) qualitative data you want in formative testing are easier to get when you are there in person. Especially if your remote summative test is a follow-up to earlier, in-person formative evaluation, you may benefit from the personal contacts you made by doing the earlier testing in person. Even if you are not going back to the same test participants, but simply a local facilitator, or to a company where you carried out the earlier study, the pre-existing relationship and their prior experience with your process may help you manage some of the downside of remote evaluations.
- Consider hybrid approaches when remote capabilities are not used to eliminate UCD personnel on location with the user, but to extend the team by allowing more stakeholders to observe and participate remotely. At a minimum, we recommend having at least one person on location. This person should not simply be a local coordinator, but a usability expert, preferably with international experience, who is knowledgeable about the design issues and the research protocol and able to take a high level of responsibility for onsite direction of the process, interacting with the user, and interpreting the data.
- In international studies, sometimes no alternative exists to using separate teams working in different countries simultaneously. This introduces serious challenges of coordination among these separate teams. Synchronous remote might help address this difficulty, by allowing different teams to see some of each other's sessions or the primary coordinator to observe different teams' evaluations.
- Supplement your remote evaluation with other sources of information about the social, physical, technical, and cultural contexts of the country you are testing in, whether this comes from prior in-person formal ethnographic research or from accumulated knowledge from other sources. The point is to not allow the seeming attractiveness of remote testing to reduce the perceived need to spend time with your users in person and in context.
- Be very clear about any technical requirements for the users, and carefully consider any biases this may introduce into your sampling and logistical planning.
- Be sure to carefully pilot your procedure, to minimize unpleasant surprises.
It is our responsibility as a field to develop methods that apply our skills cost-effectively to help our employers and clients to design products that meet users' needs effectively. Consumers of UCD services who are presented with a range of options need to make intelligent choices in order to get real value for their investment. We need to help them by carefully weighing the pros and cons of different approachesand using them in appropriate ways. Remote international testing is a good example of a tempting approach that can seem to offer much-desired cost savings but that needs to be applied in a way that capitalizes on its strengths and avoids or manages its weaknesses.
2. Bartek, V. and Cheatham, D. Experience remote usability testing, Part 1: Examine study results on the benefits and downside of remote usability testing, 2003. Available at www-106.ibm.com/developerworks/library/wa-rmusts1/
3. Bartek, V. and Cheatham, D. Experience remote usability testing, Part 2: Examine the benefits and downside of remote usability testing, 2003. Available at www-106.ibm.com/developerworks/web/library/wa-rmusts2.html
4. Bos, N., Olson, J., Gergle, D., Olson, G., and Wright, Z. Effects of four computer-mediated communications channels on trust development. CHI 2002: Conference on Human Factors and Computing Systems. Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves (Minneapolis, Minnesota, 2002), pp. 135-140.
5. Dray, S. (Organizer), Cohen-Kiel, A., Siegel, D., Sturm, C., Thrift, N., and Wixon, D. Ethnography in organizations: Questions of validity and value. Panel presentation at British Computer Society-Human Computer Interaction annual conference, Bath, UK, 2003.
7. Scholtz, J. Adaptation of traditional usability testing methods for remote testing, HICSS (Hawaii International Conference on System Science), 2000. Available at www.hicss.hawaii.edu/HICSS_34/PDFs/ETNON06.pdf
8. Tullis, T., Fleischman, S., McNulty, M., Cianchette, C., and Bergel, M. An empirical comparison of lab and remote usability testing of Web sites. In Proceedings of Usability Professionals Association Conference, July 2002. Available at http://hci.stanford.edu/cs377/nardi-schiano/AW.Tullis.pdf
9. Waterson, S., Landay, J, andMatthews, T. In the lab and out in the wild: Remote usability testing for mobile devices. Extended Abstracts of ACM CHI Conference on Human Factors in Computing Systems, Minneapolis, MN, 2002.
Susan Dray & David A. Siegel
Dray & Associates, Inc.
2007 Kenwood Parkway
Minneapolis, MN 55405, USA
612-377-1980 fax: 617-377-0363
©2004 ACM 1072-5220/04/0300 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.