The precise, quantitative measurement of user experience (UX) based on one or more metrics is invaluable for design, research, and product teams in assessing the impact of UX designs and identifying opportunities. Yet these teams often employ supposed UX metrics like conversion rate (CR) and average order value (AOV), which can't provide that measurement . In fact, I believe this can be extended to an even more general statement: In themselves, none of the metrics that are usually readily and easily available from Web analytics data can reliably measure UX. I understand that this is frustrating news to many, since resources are always limited, attention spans short, and Web analytics so very, very convenient.
Whenever I discuss this, I encounter objections like, "But we have to do something," or "It's easy to just state what one shouldn't do, but that doesn't help much." And while it's a perfectly fine start to know what not to do (cf. Nicholas Taleb's The Black Swan), in the case of UX, we must not despair, because there are ways to reliably measure it, albeit ones that are not as simple as pulling some number out of Google Analytics (as nice as that would be).
A simple outcome of measuring UX could be, "The last release improved checkout UX from 75/100 to 80/100," but there could be more-nuanced measurements for different aspects of UX (e.g., usability, aesthetics, joy of use) and user groups. Before diving deeper into how we can do this, let's first get familiar with three concepts:
- Latent variables (like UX) "are variables that are not directly observed but are rather inferred through a mathematical model from other variables that are observed" . Take, for example, the Big Five personality assessment . You can't just ask someone, "What's your personality?" and expect an objective answer. Rather, you have to ask a set of specifically designed questions and then infer personality from those.
- A research instrument is a set of such specifically designed questions, often in the form of a questionnaire. Through an instrument, we can collect the observable variables that help us infer the latent variable we're after.
- We're dealing with composite indicators when we combine individual variables from an instrument into a single metric.
Additionally, it's necessary to understand that the user's experience is not a property of a digital product or user interface. An app doesn't have a UX. Rather, the experience "happens" in the individual user's head as a reaction to their interaction with a digital product . Hence, the only way to directly observe UX would be to look into the user's head using an EEG or similar—and even if you had the possibility, that would be pretty complicated. In any other case, the next best alternative is asking them about it. In contrast, Web analytics data (and metrics like CR and AOV ), are relatively "far away" from what happens in the user's head. They're collected directly on a website or in an app, aggregated over many users—even though there is no such thing as an "average user"—and influenced by myriad other non-UX factors .
In themselves, none of the metrics that are usually readily and easily available from Web analytics data can reliably measure UX.
So, do we always have to ask users to fill out a questionnaire if we want to reliably measure UX and can't make use of analytics data at all? The answer to both is "not necessarily," but first things first.
Plenty of work has already been put into instruments for measuring UX, and this remains an ongoing topic in the human-computer interaction research community (which strongly suggests this is not a trivial matter). Three examples of scientifically well-founded instruments are AttrakDiff (http://attrakdiff.de/index-en.html), UEQ (https://www.ueq-online.org), and meCUE (http://mecue.de/english/home.html), with the first two currently better-known and more widespread. The first research article on AttrakDiff was published as early as 2003. All three also define composite indicators for different aspects of UX. For instance, AttrakDiff provides one composite indicator for hedonic and pragmatic quality each.
So, there exist proper instruments for reliably measuring UX, and you can always have users answer one of those in a controlled study or by means of a live intercept survey. But how might analytics data come into the picture? As so often, the answer lies in statistics. To use analytics metrics for approximating UX, you first have to determine how much individual metrics correlate with and are predictors for actual UX; otherwise you're simply guessing. For instance, if you have a sufficient number of sessions where you record user behavior and collect answers to one of the questionnaires, you might be able to use them as training data for machine-learning models that predict either individual items or composite indicators—or both—for different user segments. Sounds easy enough, right? Collect some analytics data, collect some answers to a questionnaire, put it all in a Jupyter Notebook, and you're good to go?
Incidentally, that's pretty much what I investigated in my Ph.D. thesis, only I did it for usability (using an instrument named Inuit ) and interaction data like mouse-cursor movements and scrolling behavior. But the basic principle was the same. Based on this, I have good news and bad news for you. The good news: It worked. I was able to train models that could predict the usability of a Web interface reasonably well. The bad news: It was not all that easy.
First, I didn't put "sufficient number" above in italics for no reason. Turns out it's really difficult to collect enough training data when people must fill out questionnaires—and Inuit has only seven items. UEQ, for instance, has 26! Second, interaction data based models seem to be very sensitive to interface changes. My results suggested that, at best, one could apply models trained on one website within a narrow cluster of very similarly structured websites, but not beyond, while also having different models per user type. Colleagues and I have, however, already started looking into these problems .
In conclusion, because UX is something so elusive, if you want to properly measure it, there's no way around using a scientifically well-founded instrument. Although often applied as supposed "UX measures" in industry, off-the-shelf analytics data are in themselves not suited for this. Simply speaking, this is because 1) there's a significant distance between the interface where they are collected and the user's head, where UX "happens," and 2) they are influenced by too many non-UX factors to lend themselves to meaningful manual analysis. Yet, under certain conditions, it is possible to predict UX from analytics data, if we combine them with answers to a proper UX instrument and use all of that to train, for example, regression or machine-learning models. In the latter case, you can use methods like SHAP values to find out how each analytics metric affects a model's UX prediction. And if there are just a few strong predictors, it might even be possible to take a step back and work with simple equal-weight models, as Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein describe in their book Noise.
What could be an ultimate solution to the practical problem of measuring UX? Probably some kind of universal model where anyone can put in their analytics data, user segments, and parameters of their digital product to get instant UX predictions. Does such a model exist? Absolutely not. Might it exist in the future? I can't say for sure, but I'll keep looking.
1. Speicher, M. Conversion rate & average order value are not UX metrics. UX Collective (Jan. 2022); https://medium.com/user-experience-design-1/conversion-rate-average-order-value-are-not-ux-metrics-9d6e7e40e286
2. Wikipedia contributors. Latent variable. Wikipedia; https://en.wikipedia.org/w/index.php?title=Latent_variable&oldid=1090312176
4. Speicher, M., Both, A., and Gaedke, M. Inuit: The interface usability instrument. In Design, User Experience, and Usability: Design Discourse. Springer, Cham, 2015, 256–268; https://link.springer.com/chapter/10.1007/978-3-319-20886-2_25
5. Bakaev, M., Speicher, M., Heil, S., and Gaedke, M. I don't have that much data! Reusing user behavior models for websites from different domains. In International Conference on Web Engineering. Springer, Cham, 2020, 146–162; https://link.springer.com/chapter/10.1007/978-3-030-50578-3_11
Maximilian Speicher is a computer scientist, designer, researcher, and ring tennis player. Currently, he is director of product design at BestSecret and cofounder of UX consulting firm Jagow Speicher. His research interests lie primarily with novel ways to do digital design, usability evaluation, augmented and virtual reality, and sustainable design. [email protected]
Copyright held by author
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.