Card sorting is a knowledge-elicitation technique often used by information architects, interaction designers, and usability professionals to establish or assess the navigation hierarchy of a Web site. The items are typically menu entries or hyperlinks, while the groups are categories or headings. The process involves asking participants to sort items into meaningful groups. In open card sorts, the number and names of groups are decided by each participant, while in the closed card sorts, these factors are fixed by the researcher in advance.
Analysis of card-sorting results range from simple counting of the number of times items were grouped together to the rather intimidating "monothetic agglomerative cluster analysis" (known simply as cluster analysis in most cases). Unfortunately, no single technique provides everything a researcher needs to know, especially if convincing evidence is needed to persuade colleagues or customers of the effectiveness of a proposed design. The evidence we need falls into three categories:
Participants. Are these the right participants for our site? Are they all thinking about the items and their groupings in a similar way? Do they have a clear understanding of the card-sorting task itself?
Items. Are the item names well-understood by participants? Are there alternatives that should be consideredperhaps terms users are more familiar with?
Groups. For closed card sorts, have we chosen the right number of groups and names for each? For open sorts, are participants largely in agreement about the number of groups needed? How well do participants feel the items fit into their groups?
Happily, the answer to this last questionhow well participants feel the items fit into their groupscan also help us with many of the other issues listed. Coupled with a few data-collection guidelines and alternative presentations of results, we can collect fairly comprehensive evidence about what will and will not work in our navigation hierarchies.
So let's examine this last question in some more detail: How well do participants feel the items fit into their groups? It is possible to argue that this question is redundant, that the items must fit into their groups relatively well in any given set of results, because that is how the participant decided to group them. However, practical experience says otherwise. Consider the following example that I use as a practice sorting exercise when teaching: Participants are given the names of 14 wines and asked to sort them into three groups (full-bodied reds, dry whites, and sparkling). Participants are instructed to omit any items they feel do not really belong to any of the groups. The cluster analysis dendogram shown in figure 1 is a fairly typical set of results for 12 participants.
The dendogram shows the three groups, connected in the characteristic tree-like structure that gives this form of presentation its name. The vertical connections between branches indicate the strength of the relationship between items, with stronger relationships to the right and weaker to the left. So for example, the relationship between Riesling and White Zinfandel is the strongest in this dendogram, meaning that those two items appeared in the same group more frequently than any other pair of items. The relationship between Beaujolais and Claret is only slightly less strong, while the weakest relationship between any single item and its groups is Pinot Grigio.
But for wine lovers, there is something fishy about this result. If you remember, participants were asked to group the wines into three categories, one of which was full-bodied reds. While Beaujolais is a red wine, it certainly cannot be described as full-bodied (there is also a problem with White Zinfandel that I am not going to deal with hereit was a nasty trick played on participants that will become immediately obvious if you actually try to buy a bottle of the stuffit is rosé, not white). So what went wrong? It seems that participants are sometimes reluctant to admit defeat and omit a card even though they were asked to if they did not think it fit the groups. So although Beaujolais is not full-bodied, most participants failed to omit the card. Why do I believe this rather than entertaining the possibility that participants were not aware of the difference? Because I asked them to indicate how well each item fit within the group they chose on a simple three-point scale: Fair (1); Good (2); Perfect (3).
This "quality of fit" measure can be incorporated into the cluster analysis as part of the strength of relationship between items. Figure 2 shows the same card-sort results with quality of fit taken into account.
We still have the three groups, but now in a slightly different order (which is not important for our discussion). Notice though, that with quality of fit taken into account, the relationship Beaujolais has with other members of its groups has changed from being the strongest with the other red wines to being the weakest. This is because while participants recognized Beaujolais as a red wine, they also knew it was a poor fit for a group labeled "full-bodied." (In a full-scale exercise, participants would have been invited to create new groups or to annotate the cards, which may have provided a similar result. However, as we will shortly see, quality of fit has other benefits that are worth pursuing.)
I promised earlier that the answer to "How well do items fit into their groups?" would help us deal with issues surrounding participants and the items themselves. So far, all that we have considered is how the quality-of-fit measure changes the "proximity matrix" used as the basis of cluster analysis. The matrix shows the strength of relationship between items and is simply the sum of individual matrices such as that shown in figure 3. Without quality of fit, each cell of a participant's matrix is either 0 (blank in this case) or 1 depending on whether the items in question appeared in the same group. In figure 3, Beaujolais was placed by this participant in the same groups as Cabernet Sauvignon, but Beaujolais and Cava were in separate groups (and so on through each possible pairing).
With quality of fit taken into account, the sample matrix changes to that shown in figure 4. Quality of fit has been averaged between items. So, for example, Claret and Beaujolais have been placed in the same group, but Claret was given a "perfect" 3 while Beaujolais had a "fair" 1. The average that appeared in the matrix was a "good" 2.
We can see from this simple analysis that Beaujolais has the lowest quality of fit of any of the items, which should make us a little suspicious of either the group names we have provided users or of their understanding of the item itself. Asking users to annotate the cards would help to clarify this, but if time allows, asking individual users to think aloud while sorting should also be revealing.
In figure 5 we can see that Pinot Grigio also has a low quality of fit and has the weakest relationship with other members of its group, as shown in both figures 1 and 2. We need to find out why this is so, but the analysis techniques discussed so far have nothing more to add in this respect. If we add together all of the proximity matrices from each participant, we get the result shown in figure 6. [Figure 7]
It is not easy to see at first glance, but Pinot Grigio has often been grouped with red wines such as Claret. The use of a simple spreadsheet chart called a "surface map" (in Microsoft Excel) shows the situation a little more clearly:
The surface map shows three main groups: full-bodied reds in the top left corner, sparkling wines in the center, and dry whites in the bottom right corner. But there are some odd embellishments at the bottom and right edges. (The matrix is mirrored down the diagonal running from top-left to bottom rightI have filled in dummy values for the diagonal cells themselves since they would otherwise be zero and produce a distracting pattern through the middle of each group.) These are the consequences of Pinot Grigio being grouped with both red and white wines. Discussion with the participants revealed that the term "Pinot" was strongly associated with Pinot Noir, hence the tendency for it to appear in both the red and white wine groups. Careful observers will also notice that Muscat was occasionally grouped with the sparkling wines and, as it happens, for a similar reason to the confusion over Pinot Noir. "Muscat" is very similar to "Muscatel," a popular sparkling wine. Again, this information is present in the matrix itself, but is not visually obvious.
Finally, how does quality of fit help us decide whether participants understand the card-sorting process itself? The solution here is to average quality of fit for each participant across all items. This gives us an idea of the degree of confidence participants had in the groupings they madea factor especially important for open-card sorts, where the number and name of each group is a matter of individual choice. The scattergram shown in figure 8 is for an open card sort with 18 participants. The axes are average quality of fit (vertical) and average group size (horizontal).
The scattergram shows most participants clustered around the center, but with one in the top right whose results might be worth investigating to see if they should be excluded from the overall analysis as an outlier.
Naturally, quality of fit in card sorting is not a magic solution and is no substitute for careful planning and qualitative data collection. But coupled with a few data-analysis techniques beyond traditional card-counting and cluster analysis, I believe we can reach much more robust conclusions in Web-navigation design.
About the Author:
William Hudson is a leading authority on user-centered design with over 30 years experience in the development of interactive systems. He is the founder and principal consultant of Syntagm, a consultancy specializing in the design of interactive systems established in 1985.
Encourage user feedback:
- for one-on-one sessions (single participant and researcher), ask participants to think aloud
- for group sessions, use paper or cards for sorting and ask users to make annotations, suggest alternative item and group names, and to add groups and items as required
- for on-screen card sorting (Web- or desktop-based), provide and encourage use of a separate notepad or email facility for participants to make notes and queries
Do not rely on a single method of analysis. Examine the raw data, generate cluster analyses and make sure that unexpected results can be explained.
Exclude results from participants where there is evidence that they did not understand the process or where their results were substantially inconsistent with the majority of participants.
Consider providing a simple trial card sort to give participants practice in the technique. Fruit and vegetables make an easy introduction but may not be appropriate in all cases.
Use open card sorting for exploration and closed for assessment.
©2005 ACM 1072-5220/05/0900 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc.