How Far Can We Go with Synthetic User Experience Research?

Authors:
Jie Li

User experience research (UXR) plays an important role in uncovering user personas, their contexts of use, and their detailed needs. UXR employs a range of research methodologies, such as surveys, interviews, diary studies, and observations, all of which can be time-consuming and costly. With the advent of advanced generative AI (GenAI) tools capable of producing humanlike responses from multimodal inputs such as ChatGPT-4 and Gemini, UX researchers are exploring whether large language models (LLMs)—trained on trillions of pieces of text data from a variety of domains accumulated over decades—can effectively simulate user interactions, preferences, needs, and scenarios for product development. UX researchers also realized, however, that while the vast amounts of text data are typically from the Internet and cover various topics, they are not necessarily user research data. The reliability of AI-facilitated research is now a topic often debated among researchers.

Despite the current hype surrounding GenAI and synthetic UXR, I still flew 10 hours recently to a coastal area to conduct an "old-fashioned" field user study, using pen and paper—based observation grids, as well as conducting in-person interviews with real human participants, to test a new product.

Conducting Field User Research: Complaints and Joy

Since the start of my Ph.D. in 2012, I have had only a few chances to conduct field research. To study crowd management strategies and crowd behavior, I spent days at festivals, theme parks, concert halls, and train stations, observing the construction of sites and architecture, as well as the placement of facilities such as fences and one-way gates. A few years ago, I spent a week conducting observational studies on an outside broadcasting truck at the MotoGP races at Silverstone, aiming to understand live sports broadcasting workflows and design new solutions for it. Recently, I conducted an outdoor field study to evaluate a physical product for water activities. I spent hours on the water, observing how users interacted with the product and talking with them about the culture of water activities in that region. One of the reasons for the limited opportunities for field studies is the cost involved. It is costly in terms of human labor and resources to prepare research materials and immerse oneself in the real environment for extended periods of time. In addition, conducting field studies often requires the physical presence of study subjects.

In addition to cost, many other challenges can arise during a field study. Weather, in particular, is unpredictable when the study is conducted outdoors. Despite careful planning, unexpected rain can disrupt observations. Sometimes, we are required to conduct observations under unpleasant weather conditions, such as observing how 50,000 people were evacuated from a music festival when a thunderstorm was forecast. Getting quickly adapted to the user environment can also be challenging, as evidenced by my experience with the MotoGP study. As I had to remain silent during observation and refrain from asking the broadcasting team any questions to avoid distractions, I struggled to understand each person's responsibility within the team and capture their jargon-heavy conversations and workflows during the extremely fast-paced live broadcasting sessions. Recruitment is often challenging as well. Recruiting participants through our convenient but relatively narrow network may lead to a nonrepresentative sample and bias in study results. Alternatively, using professional recruitment agencies can be costly, with fees often reaching hundreds of euros per participant.

The reliability of AI-facilitated research is now a topic often debated among researchers.

Despite these challenges, I enjoy conducting field studies. The authentic connections with participants, observations, and shared laughter lead us to study the product or user behavior and also contribute to a deeper contextual understanding and reflection on the collected data. Moreover, conducting challenging field research can foster closer bonds within the research team, turning each project into not only a research endeavor but also a valuable team-building opportunity.

Let's get back to the question I asked in the title: How far can we go with synthetic UXR? It sounds really fancy when we hear advertisements from some synthetic user research platforms that "we don't need users, recruitment, scheduling, synthesizing, and high costs," or comments like "soon we will have AI-generated researchers conduct research on AI-generated users." As a researcher, I often ask: How much do we want AI to be involved in our research process—as an obedient tool, as an assistive companion, or even as a collaborator? Do we really want the joy of connecting with real human users, stepping into their shoes, and designing solutions based on an in-depth understanding of their needs and contexts of use to be replaced by AI?

What Experts Say About Synthetic User Research

I asked these questions to a number of researchers and professors in the field of human-computer interaction and UXR, including Cher Lowies, Sara Bouzit, Ye Dong, Hamada Rizk, Hiroshi Ishii, and Kai Kunze.

Lowies expressed skepticism about the effectiveness of synthetic user research. She notes that, based on her experience with AI user research platforms such as Synthetic Users (https://www.syntheticusers.com), a significant 70 percent of the AI-generated data does not delve deep enough into nuanced user experiences compared with research studies conducted by human researchers, leading to generalized themes and averaged results that might not accurately represent the complexity and diversity of real user interactions. Similarly, Dong emphasized the "human touch (人味)" [1] in data sources. She pointed out that synthetic user data often provides only summary answers, leaving out context-rich, unexpected responses linked to the personal and emotional experiences of unique users. Dong criticizes synthetic user research for being overly reliant on average, past, or sample data and cautions against using it to predict the future. As Ishii commented, "There is no such thing as an average user."

Taking a different point of view, Rizk and Bouzit both believe in the added value of using AI-generated synthetic user data to increase the volume of minority data and reduce bias in training data. For instance, in the medical domain, a shortage of annotated data poses a significant limitation in medical image processing, particularly for diagnosing rare diseases and ensuring data diversity that includes various ethnic groups. To address this data shortage and potential bias, synthetic user datasets are often generated to provide an effective amount of diverse training data for medical applications (e.g., [2]). In other words, AI-generated synthetic data could reveal overlooked nuances in user studies by simulating user groups that are often not represented (e.g., users with color vision deficiencies in HCI studies). This approach helps mitigate bias when real human data resources are lacking [3].

Kunze, on the other hand, spoke to the limitations of using synthetic UXR in innovative technology fields where established user data is scarce. Synthetic UXR based on established user data may provide predictions within the range of existing data, but may fall short of capturing the full complexity and unpredictability of real human behavior and making predictions beyond the known data range. As I commented in my previous column, artificial intelligence interpolates, while we humans extrapolate [4]. While my colleagues and I are convinced that the empathy of human researchers is a unique skill that cannot be easily replaced by AI [5], Kunze makes a critical distinction between empathy and sympathy in user research. He argues that while it's vital to understand users, being too empathetic with them may hinder our ability to make objective evaluations and decisions. Thus, he questions whether sympathy is a better quality for researchers to cultivate in order to maintain a rational distance and better inspect user needs. This then raises the question of how effectively AI-facilitated research can simulate the empathy and sympathy of human researchers. Instead of creating synthetic users, AI tools could potentially play a crucial role in helping researchers and developers maintain a balanced approach to empathy in their work.

What Genai can Do and Cannot Do

A survey posted on User Interviews showed that 77.1 percent of the researchers in their research sample audience are using AI in at least some of their work, with thematic coding being the most popular analysis use case for AI [6]. When I asked my peer researchers about what GenAI can and cannot do, all of them mentioned that AI can help them overcome the fear of a blank canvas by providing a starting point to develop from. This includes generating a list of questions for the interview guide as inspirations or offering a template for building an observation grid. They expressed concerns, however, about accidentally leaking sensitive data while using AI tools such as ChatGPT for thematic analysis, as well as concerns about the accuracy of the analysis.

AI tools could potentially play a crucial role in helping researchers and developers maintain a balanced approach to empathy in their work.

Reflecting on the typical methods in the four main stages of UXR [7] and considering the comments from HCI and UXR experts, I have summarized in Figure 1 how GenAI, or synthetic UXR, can assist in accelerating our research process and enhance the quality of research. The figure also highlights areas where human researchers are essential for validation and making final decisions.

Figure 1. The author's reflections on what GenAI can and cannot do across the four stages of UX research cycle; Discover, Explore, Test, and Listen [7].

During the Discover stage, GenAI can provide templates, offer cultural insights, and analyze existing research to identify gaps in knowledge, but it cannot fully replace the need for in-depth fieldwork or grasp subtle cultural dynamics without human interpretation. In the Explore phase, it is useful for transcribing and coding audio data, assisting with user pattern identification, and suggesting additional variables during data collection. GenAI falls short, however, in capturing the complete emotional journey of users or providing the context-rich qualitative insights that stem from direct user interaction. In the Test phase, GenAI can standardize usability and accessibility assessments, automate video captioning of user actions, and cross-reference findings to suggest best practices. Its limitations here include an inability to provide nuanced critiques like human testers, detect subtle emotional reactions, or account for unstated emotional cues. In the Listen phase, GenAI excels at offering survey templates, simulating survey answers for pilot testing, identifying patterns in usability issues, and automating feedback categorization. Yet it cannot replicate the human experience or strategic decision making derived from deep, human-led analysis of feedback. To conclude, while GenAI may be a helpful tool for augmenting UXR processes, it does not replace the nuanced understanding, empathy, and contextual awareness that human researchers contribute.

The Joy and Meaning of Doing Research

Another important aspect we shouldn't allow GenAI to deprive us of is the joy and meaning we find in doing our work. In my recent research study about UX professionals' perceptions of GenAI, enjoyment and meaning were two recurring keywords in our conversations [5]. The meaning of our work is not just about work per se but also the pleasure derived from it. A recent study found that the user experience of early users of ChatGPT was affected not just by pragmatic attributes but also by hedonic attributes such as entertainment and creative interactions that impressed or surprised the users [8]. Enjoyment in work can be viewed through the lens of motivation theory, including both extrinsic and intrinsic motivations [9]. Extrinsic motivations revolve around whether the technology efficiently serves as a means to accomplish tasks, treating technology such as GenAI as a tool. Intrinsic motivations center on whether we find our work inherently rewarding and the engagement with technology enjoyable, considering technology such as GenAI as a toy. Our work is driven not just by the end results; we also seek fulfillment and enjoyment throughout the process. The joy of connecting with real users is irreplaceable, and the joy of engaging in thematic analysis with the team, as well as smiling at participants' funny quotes, is irreplaceable.

So, how far can we go with synthetic UXR? GenAI is inevitably going to affect how we conduct user research. There's a potential future scenario for human-AI collaboration where we'll need to start considering AI as both a stakeholder and a user in our UXR. In this scenario, AI is not just a basic obedient tool but rather a collaborator that can cross-reference findings, suggest overlooked factors, and identify best practices using existing databases that are difficult for human researchers to comprehend quickly [3]. However, I still believe that no matter how advanced AI becomes in facilitating research, and regardless of how well it can simulate real users or how extensively it's been programmed for empathy or sympathy, we must ensure that real data from real users is not overshadowed by AI-generated insights and it continues to play the major role in depicting human experiences. Relying solely on synthetic user research data to guide strategic product development and predict the future is risky.

References

1. 人味 (pronounced "rénwèi" in Mandarin) is a Chinese concept that translates literally to "human flavor" or "human taste." In this context, it is better understood as referring to the human touch, or the qualities that convey human warmth, personality, and authenticity.

2. Bauer, D.F. et al. Generation of annotated multimodal ground truth datasets for abdominal medical image registration. International Journal of Computer Assisted Radiology and Surgery 16 (2021), 1277–1285.

3. Schmidt, A., Elagroudy, P., Draxler, F., Kreuter, F., and Welsch, R. Simulating the human in HCD with ChatGPT: Redesigning interaction design with AI. Interactions 31, 1 (2024), 24–31.

4. Li, J. Experimentation everywhere and every day: Running A/B testing in corporate environments. Interactions 31, 1 (2024), 20–22.

5. Li, J., Cao, H,, Lin, L., Hou, Y., Zhu, R., and El Ali, A. User experience design professionals' perceptions of generative artificial intelligence. Proc. of the CHI Conference on Human Factors in Computing Systems. ACM, New York, 2024; https://doi.org/10.1145/3613904.3642114

6. Burnam, L. We surveyed 1093 researchers about how they use AI—here's what we learned. User Interviews. Sep. 15, 2023; https://www.userinterviews.com/ai-in-ux-research-report

7. Farrell, S. UX research cheat sheet. Nielsen Norman Group. Feb. 12, 2017; https://www.nngroup.com/articles/ux-research-cheat-sheet/

8. Skjuve, M., Følstad, A., and Brandtzaeg, P.B. The user experience of ChatGPT: Findings from a questionnaire study of early users. Proc. of the 5th International Conference on Conversational User Interfaces. ACM, New York, 2023, 1–10.

9. Isen, A.M. and Reeve, J. The influence of positive affect on intrinsic and extrinsic motivation: Facilitating enjoyment of play, responsible work behavior, and self-control. Motivation and Emotion 29 (2005), 295–323.

Author

Jie Li is an HCI researcher with a background in industrial design engineering. Her research focuses on developing evaluation metrics for immersive experiences. She is head of research and insights at EPAM Netherlands as well as a creative cake designer and the owner of Cake Researcher, a boutique cafe. [email protected]

ACM Interactions

Columns

How Far Can We Go with Synthetic User Experience Research?

Post Comment

View This Article

Reader Tools

Browse This Issue

SIGN IN