Card Sorting
Uncover how real users mentally group content so your IA reflects their expectations, not your team's assumptions.
9 min read
The full lesson
Before you finalize a navigation structure or label a single menu item, you need to know how your users mentally organize the content you’re building around. Card sorting is the fastest, most direct way to capture those mental models at scale.
It generates the raw grouping data that drives every IA decision: what goes where, what gets called what, and which categories should exist at all. Skip it, and you’re building an architecture around your team’s assumptions instead of your users’ expectations.
What Card Sorting Is
Card sorting is a research method where participants group labeled items into categories that make sense to them. Each item — a piece of content, a feature, or a task — is written on a separate “card” (physical or digital). Participants arrange the cards into piles and then optionally name those piles.
The output is a co-occurrence matrix: which items participants grouped together most often, and what they called those groups. That data becomes the empirical foundation for your organization scheme and labeling system.
Card sorting is a generative research method, not an evaluative one. It tells you how users think content should be organized. It does not test whether a finished navigation structure actually works — that is the job of tree testing. The two methods are designed to work in sequence: card sorting generates a candidate IA, tree testing validates it.
Three Variants and When to Use Each
Open Card Sort
Participants group cards and create their own category names. This is the most generative variant. Use it when:
- You are designing a new IA from scratch and have no existing category hypotheses.
- You want to capture the vocabulary users naturally reach for — the labels they coin become candidates for your navigation text.
- You have fewer than 60–80 cards (open sorts become unwieldy at larger scales).
Open sorts require more facilitation effort and more analysis time. You must reconcile varied category names across participants before running quantitative analysis.
Closed Card Sort
Participants group cards into predefined categories you supply. Use it when:
- You already have a candidate set of top-level categories and want to test whether content fits into them as expected.
- You are adding new content to an established IA and need to know where users expect it to live.
- You want faster analysis because categories are already normalized.
Closed sorts generate cleaner data, but they cannot reveal whether your predefined categories are wrong in the first place. They can only show fit within the categories you provided.
Hybrid Card Sort
Participants can place cards into predefined categories or create new ones if nothing fits. This is useful when you have a partially established IA but suspect gaps.
Hybrid sorts add analytical complexity. The “escape hatch” categories need separate analysis to determine whether they represent real grouping needs or participant confusion.
Sample Size: Getting the Numbers Right
The right sample size depends on what you are trying to learn.
| Study type | Recommended n | Confidence level |
|---|---|---|
| Qualitative problem-finding | 15–20 participants | Directional; surfaces major patterns |
| Quantitative analysis (similarity matrix, dendrogram) | 30–50 participants | ~85–90% stability on cluster patterns |
| Quantitative benchmarking with statistical rigor | 50+ participants | 95% confidence intervals on cluster membership |
A common mistake is applying the “5-user rule” to card sorting. Five participants are enough to find usability problems in a think-aloud session, but card sorting data is analyzed statistically. A similarity matrix built from five participants will be dominated by individual variation, not shared mental models. The patterns you see will be unreliable and can actively mislead your IA decisions.
For most product teams, 20–30 participants hits the practical sweet spot. You get enough data to identify the top two or three grouping patterns with confidence, without the recruitment overhead of a large quantitative study.
Running a Card Sort: Step by Step
1. Define scope and create cards
Select 30–100 items. Fewer than 20 items produces thin data; more than 100 overwhelms participants and degrades quality. Each card should represent one discrete concept, not a compound idea. “Account settings” is a good card. “Account settings and billing and notification preferences” is three cards collapsed into one.
Write card labels in plain user language. Internal taxonomy terms, jargon, and acronyms will confuse participants and contaminate your vocabulary data. If you are unsure whether a term is user language, check your support ticket verbatim, search query logs, and prior research notes.
2. Choose a modality: remote vs. in-person
In-person card sorting with physical index cards is excellent for generating rich think-aloud data as you watch participants sort. The tradeoff is fewer participants for the same effort and manual data entry for analysis.
Remote unmoderated card sorting — using tools such as Optimal Workshop’s OptimalSort, Maze, or UserZoom — scales to 50-plus participants with automated similarity matrix and dendrogram outputs. You lose the think-aloud data, but you gain statistical power. For generative IA work, behavioral grouping data from a large remote sample is usually more valuable than verbal commentary from a small in-person sample.
For complex or novel content domains, a hybrid approach works well: run 5–8 moderated sessions first to surface vocabulary surprises and confusions, then run a larger unmoderated study to quantify the patterns.
3. Facilitate (or launch) the study
For moderated sessions: ask participants to think aloud as they sort, but avoid explaining the content. Your job is to observe and ask neutral probing questions (“Why did you put those together?”), not to guide participants toward groupings you prefer. Anchoring bias is a real risk — a single careless comment can corrupt a participant’s entire sort.
For unmoderated studies: write clear, concise instructions. Include two or three practice cards before the real sort begins. Set a time expectation (“This typically takes 15–20 minutes”). Avoid showing examples of completed sorts, which prime participants toward specific patterns.
4. Analyze results
The primary analysis artifacts are:
- Similarity matrix: a grid showing, for every pair of cards, what percentage of participants grouped them together. Pairs with 60-plus percent co-occurrence are strong candidates for the same category.
- Dendrogram: a hierarchical cluster diagram generated from the similarity matrix. It visualizes which cards cluster tightly (high similarity) and which sit at the edges of clusters (ambiguous placement).
- Category label analysis (open sorts only): a qualitative review of what participants named their groups. Recurring words and phrases are candidate navigation labels.
- Participant agreement score: the degree to which all participants produced similar groupings. Low agreement signals genuinely ambiguous content, not a failed study — it tells you where your users disagree, which is important data.
Do
- Treat the dendrogram as a starting hypothesis, not a final answer — clusters inform your category candidates, they don’t dictate them.
- Note outlier cards that consistently appear in multiple clusters or that participants place alone — these signal content that doesn’t fit any obvious category and may need to be rewritten, split, or surfaced differently.
- Compare open sort label data with your existing draft labels to identify gaps, mismatches, or opportunities to use more natural language.
- Follow card sorting with tree testing to validate the IA you derived from the sorting data.
Don't
- Don’t build your final IA directly from the dendrogram without applying design judgment — statistical clusters don’t account for business constraints, legal requirements, or content that must be co-located for reasons users wouldn’t know about.
- Don’t run a card sort with only 5–10 participants and then report quantitative similarity percentages — the numbers will be statistically meaningless and potentially misleading.
- Don’t use card sorting as your only IA research method — pair it with search log analysis, analytics, and tree testing for triangulated confidence.
- Don’t interpret low participant agreement as a failure — it’s actionable data that tells you where users genuinely differ and where your navigation will need extra wayfinding support.
Interpreting the Data: Common Patterns and What They Mean
Strong consensus clusters
When 70-plus percent of participants group the same cards together, you have strong evidence for a category. These clusters are your highest-confidence IA decisions — build around them first.
Split clusters
When a card consistently lands in two different clusters at roughly 40–60 percent, users are genuinely divided. This often happens with content that has characteristics of two categories — for example, a help article about billing that could live under “Help” or “Billing.”
Your options: place the item in the category where it has higher co-occurrence with surrounding items, create a cross-reference, or surface it through search and contextual links rather than primary navigation.
Orphan cards
Cards that participants consistently place alone — or that never co-occur with the same other cards — represent content that doesn’t fit users’ mental models at all. This is a content problem, not just a navigation problem. Before deciding where to put the card, ask whether the underlying content needs to be rewritten, renamed, or consolidated with related items.
Low overall agreement
A similarity matrix where most pairwise scores fall below 40 percent means users don’t share a consistent mental model for this content. This is most common when your content spans genuinely distinct domains or when your card labels are ambiguous.
Address ambiguous labels first (rerun the sort with clearer cards), then consider whether you need to analyze user segments as separate groups.
Integrating Card Sorting into the Design Process
Card sorting fits naturally at the start of an IA design phase — after initial user research has established goals and vocabulary, and before sitemaps and navigation are designed.
A practical workflow:
- User research (interviews, contextual inquiry) — establishes goals, tasks, vocabulary, and pain points
- Content audit — inventories what content exists or is planned; produces the card list
- Card sort (open, 20–30 participants) — generates grouping data and candidate label vocabulary
- IA synthesis — uses sort clusters as hypotheses; applies design judgment and business constraints to produce a candidate sitemap
- Tree testing (30–50 participants) — validates the candidate IA behaviorally
- Iterate — refine problem areas identified in tree test; re-test if changes are substantial
This sequence separates the generative work (sorting) from the evaluative work (tree testing), avoiding the common mistake of building a navigation UI and then asking whether it works.
Tooling in 2026
Remote unmoderated card sorting is the default for most teams because of its speed and statistical power. The leading purpose-built tools are:
- Optimal Workshop (OptimalSort + Treejack) — the most widely used dedicated IA research suite; exports dendrogram and similarity matrix natively
- Maze — integrates card sorting alongside other unmoderated usability methods; good for teams already using it for other studies
- UserZoom / UserTesting — enterprise platforms with card sorting modules; useful when you need to route studies to an existing panel
- Figma + FigJam (moderated only) — viable for in-person or moderated remote sessions with a small number of participants; requires manual analysis
Avoid running card sorts in generic survey tools (Google Forms, Typeform). These tools cannot capture grouping behavior — they can only collect self-reported categorization preferences, which introduces a significant say/do gap. There is a meaningful difference between users reporting which category they think an item belongs to and actually watching them place it in a pile.