Blogs

Seven heuristics for identifying proper UX instruments and metrics


Authors: Maximilian Speicher
Posted: Tue, September 19, 2023 - 11:09:00

In the two previous articles of this series, we have first learned that metrics such as conversion rate, average order value, or Net Promoter Score are not suitable to reliably measure user experience (UX) [1]. The second article then explained how UX is a latent variable and, therefore, we must rely on research instruments and corresponding composite indicators (that produce a metric) to measure it [2]. Now, the logical next question is how we can identify those instruments and metrics that do reliably measure UX. This boils down to what is called construct validity and reliability, on which we will give a brief introduction in this final article, before deriving easily applicable heuristics for practitioners and researchers alike who don’t know which UX instrument or metric to choose.

Construct validity refers to the extent to which a test measures what it is supposed to measure [3]. In the case of UX, this means that the instrument or metric should measure the concept of UX as it is understood in the research literature, and not, for example, only usability. One good way to establish construct validity is through factor analysis [3].

Construct reliability refers to the consistency of a test or measure [4]. Put differently, it is a measure of how reproducible the results of an instrument or metric are. A good way to establish construct reliability is through studies that assess the test-retest reliability of the instrument or metric, as well as its internal consistency, such as Cronbach’s alpha [4].

In addition to that, the Joint Research Centre of the European Commission (JRC) provides a “Handbook on Constructing Composite Indicators” [5], which summarizes the proper process in terms of a 10-step checklist. We build on all of the above for our following list of seven heuristics for identifying proper UX instruments and metrics.

Heuristic 1: Is there a paper about it? If there is no paper about the instrument and/or metric in question, there’s barely a chance you’ll be able to answer any of the following questions with yes. So, this should be the first thing to look for. A peer-reviewed paper published in a scientific journal or conference would be the best case, but there should be at the very least some kind of white paper available.

Heuristic 2: Is there a sound theoretical basis? In the case of UX, this means, does the provider of the instrument and/or metric clearly explain their understanding of UX and, therefore, what their construct actually measures? The JRC states: “What is badly defined is likely to be badly measured” [5].

Heuristic 3: Is the choice of items explained in detail? Why were these specific variables of the instrument chosen, and not others? And how do they relate to the theoretical framework, that is, the understanding of UX? The JRC states: “The strengths and weaknesses of composite indicators largely derive from the quality of the underlying variables” [5].

Heuristic 4: Is an evaluation of construct validity reported? This could be reported in terms of, for example, a confirmatory factor analysis [3]. If not, you can’t be sure whether the instrument or metric actually measures what it’s supposed to measure.

Heuristic 5: Is an evaluation of construct reliability reported? This could be reported in terms of, for example, Cronbach’s alpha [4]. If not, you can’t be sure whether the measurements you obtain are proper and reproducible approximations of the actual UX you want to measure.

Heuristic 6: Is the data that’s combined to form the metric properly normalized? This is necessary if the items in an instrument have different units of measurement. The JRC states: “Avoid adding up apples and oranges” [5].

Heuristic 7: Is the weighting of the different factors that form the metric explained? Factors should be weighted according to their importance. “Combining variables with a high degree of correlation” (double counting) should be avoided [5].

In the following, the application of these heuristics will be demonstrated through two very brief case studies.

Case Study 1: UEQ
The User Experience Questionnaire (UEQ) is a popular UX instrument developed at SAP AG.

  • H1: There is a peer-reviewed research paper about UEQ, which is available at [6]. ✓
  • H2: The paper clearly defines the authors’ understanding of UX, and they elaborate on the theoretical background. ✓
  • H3: The paper explains the selection of the item pool and how it relates to the theoretical background. ✓
  • H4: The paper describes, in detail, two studies in which the validity of UEQ was investigated. ✓
  • H5: The paper reports Cronbach’s alpha for all subscales of the instrument. ✓
  • H6: Not applicable, since UEQ doesn’t explicitly define a composite indicator. However, a composite indicator can be constructed from the instrument.
  • H7: See H6.

Case Study 2: QX score “for measuring user experience”
This metric was developed by SaaS provider UserZoom and is now provided by UserTesting. It is a composite of two parts: 1) the widely used SUPR-Q instrument and 2) the individual task success rates from the user study where the metric was measured, in a 50/50 proportion.

  • H1: There is no research paper, but at least a blog post explaining the instrument and metric. ✓
  • H2: There is no clear definition of UX given. The theoretical basis for the metric is the assumption that all existing UX metrics use either only behavioral or only attitudinal data. There is no well-founded explanation given why this is considered problematic. The implicit reasoning is that only by mixing behavioral and attitudinal data can we properly measure UX, which is factually incorrect (cf. [2]). ❌
  • H3: The metric mixes attitudinal (SUPR-Q) and behavioral (task success) items, but no well-founded reasoning is given as to why only task success rate was chosen, or why this would improve SUPR-Q, which is already a valid and reliable UX instrument in itself. ❌
  • H4: No evaluation of construct validity is reported. ❌
  • H5: No evaluation of construct reliability is reported. ❌
  • H6: There is no approach to data normalization reported, the metric seemingly adds up apples and oranges. ❌
  • H7: There is no reasoning given for the weighting of the attitudinal and behavioral items. ❌

In conclusion, the seven heuristics provided in this article serve as a useful guide for identifying proper UX instruments and metrics. Additionally, by considering construct validity and reliability, as well as following the JRC’s 10-step checklist, practitioners and researchers alike can make informed decisions when choosing a UX instrument or metric. It’s important to note that not all instruments or metrics will pass all of these heuristics, but the more of them that are met, the more confident one can be that the chosen instrument or metric properly measures UX. If in doubt, choose the instrument or metric that checks more boxes. It's worth noting that some heuristics, like H1, are not strictly necessary, and H6 and H7 only apply to composite indicators. It’s also worth noting that there may be valid instruments or metrics that fail some of these heuristics and vice versa. Their goal is to provide a robust and quick framework for evaluating UX instruments and metrics, but ultimately, the best approach will depend on the specific research or design project.

Endnotes
1. Speicher, M. Conversion rate & average order value are not UX metrics. UX Collective. Jan. 2022; https://uxdesign.cc/conversion...
2. Speicher, M. So, How Can We Measure UX? Interactions 30, 1 (2023), 6–7; https://doi.org/10.1145/357096...
3. Kline, R.B. Principles and Practice of Structural Equation Modeling. Guilford Publications, 2015.
4. Cronbach, L.J. and Meehl, P.E. Construct validity in psychological tests. Psychological Bulletin 52, 4 (1955), 281.
5. Joint Research Centre of the European Commission. Handbook on Constructing Composite Indicators: Methodology and User Guide. OECD publishing, 2008; https://www.oecd.org/sdd/42495...
6. Laugwitz, B., Held, T., and Schrepp, M. Construction and evaluation of a user experience questionnaire. Symposium of the Austrian HCI and Usability Engineering Group. Springer, Berlin, Heidelberg, 2008, 63–76.


Posted in: on Tue, September 19, 2023 - 11:09:00

Maximilian Speicher

Maximilian Speicher is a computer scientist, designer, researcher, and ringtennis player. Currently, he is director of product design at BestSecret and cofounder of UX consulting firm Jagow Speicher. His research interests lie primarily with novel ways to do digital design, usability evaluation, augmented and virtual reality, and sustainable design. [email protected]
View All Maximilian Speicher's Posts



Post Comment


No Comments Found