Usability Testing

Usability testing is watching real people try to use your product. Not asking them what they think. Not having them fill out a survey. Watching them attempt actual tasks and observing where they struggle, succeed, and give up. It's the highest-impact research method and the one most teams skip.

The 5-User Rule

Nielsen Norman Group's research shows that 5 users find approximately 85% of usability problems. You don't need a massive study. You need a small study done often.

Usability problems found vs. number of test participants:

100% ─┤
      │                    ●─────●─────●─────●
 85% ─┤              ●
      │         ●
 75% ─┤
      │    ●
      │
 50% ─┤
      │
      │●
   0% ─┼────┼────┼────┼────┼────┼────┼────┼────
       0    1    2    3    4    5    6    7    8
                   Participants

5 users → ~85% of problems found
Beyond 5 → diminishing returns

Better approach: Test with 5 users, fix the problems, test with 5 more users. Two rounds of 5 finds more than one round of 10.

Test Types

TypeParticipantsFacilitatorDurationCostBest For
Moderated in-person5-8Present in room45-60 minHighDeep insights, complex products, early-stage designs
Moderated remote5-8Video call30-60 minMediumGeographic reach, pandemic-friendly, screen sharing
Unmoderated remote10-50Recorded, no facilitator10-20 minLowQuick validation, high volume, simple tasks
Guerrilla5-10Informal, in a cafe or hallway5-15 minVery LowFast feedback on specific interactions

Choosing the Right Type

QuestionType to Use
Need to understand why users struggle?Moderated (ask follow-up questions)
Need fast results from many users?Unmoderated
Testing a complex, multi-step workflow?Moderated (need to guide and observe)
Testing a specific button/layout/page?Unmoderated (simple, focused task)
No budget and need feedback today?Guerrilla (find people and ask them)
Users are geographically distributed?Remote (moderated or unmoderated)

Moderated vs. Unmoderated: Tradeoffs

AspectModeratedUnmoderated
Follow-up questionsYes, probe deeper on interesting momentsNo, limited to pre-set tasks
Cost per participant$100-300+ (incentive + facilitator time)$10-50 (platform fee + small incentive)
Time to results1-2 weeks (sessions + analysis)2-3 days (automated recording)
Depth of insightDeep: observe body language, ask "why?"Shallow: observe behavior only
Facilitator biasRisk of leading participantsNo facilitator bias
Participant qualityHigher (screened, scheduled)Lower (faster, less committed)

Planning a Usability Test

Define Your Goals

Before writing tasks, answer these questions:

QuestionExample Answer
What are you testing?The new checkout flow redesign
What do you want to learn?Can users complete a purchase without help? Where do they get stuck?
What will you do with the results?Fix the top 3 usability issues before launch
What's your success benchmark?80%+ task completion rate, average < 3 minutes

Recruit the Right Participants

CriterionWhy It Matters
Match your target user profileTesting with the wrong people gives misleading results
Mix of tech comfort levelsYour users aren't all power users
Not employees or close friendsThey know too much about the product
Screener surveyFilter for relevant experience and demographics
Incentive$50-100 per session (or equivalent gift card)

Screener question examples:

  • "How often do you shop online?" (Filter for frequency)
  • "Which of these tools have you used in the past 6 months?" (Filter for experience)
  • "What is your primary role at work?" (Filter for job relevance)

Writing Good Tasks

Tasks are the heart of usability testing. Bad tasks produce useless data. Good tasks reveal real usability problems.

Task Structure

Every task should include:

  1. Context/scenario: Why the user is doing this
  2. Goal: What they need to accomplish
  3. No instructions: Don't tell them how to do it

Good vs. Bad Tasks

Bad TaskWhy It's BadGood Task
"Click the Search button and type 'running shoes'"Tells them exactly what to do, tests nothing"You need new running shoes for a marathon. Find a pair you'd buy."
"Navigate to Settings > Account > Security"Gives the answer away"You want to change your password. Go ahead and do that."
"Test the checkout flow"Too vague, no scenario"You've found a gift for a friend. Complete the purchase using any payment method."
"Find the FAQ page"Tests navigation, not whether FAQ solves their problem"Your order hasn't arrived. Find out what to do."
"Do you like the new design?"Opinion question, not a task"You want to track your monthly spending. Walk me through how you'd do that."

Task Difficulty Spectrum

Include a mix of task difficulties:

DifficultyPurposeExample
Easy (warm-up)Build confidence, calibrate"Find the pricing page"
MediumCore functionality testing"Add an item to your cart and apply a promo code"
HardEdge cases, complex flows"Return a gift item purchased by someone else"
ExploratoryDiscovery, open-ended"Explore the dashboard and tell me what you can learn about your account"

How Many Tasks?

Session LengthNumber of TasksNotes
15 min (guerrilla/unmoderated)3-5 tasksQuick, focused
30 min (remote)5-7 tasksStandard session
45-60 min (moderated)6-10 tasksIncludes follow-up questions

Facilitating a Session

Before the Session

  • Test your recording software
  • Prepare a printed/digital test script
  • Have the prototype/product ready
  • Remove any personal data from test accounts
  • Silence your phone

During the Session

DoDon't
Read the task verbatimParaphrase and accidentally add hints
Stay silent while they workFill silences with hints or explanations
Ask "What are you thinking?" when they pauseAsk "Why did you click that?" (feels judgmental)
Say "There are no wrong answers"Say "That's not right" or "Try the other button"
Note what they do, not just what they sayOnly record verbal feedback
Ask follow-up after they finish the taskInterrupt them mid-task
End on time even if tasks remainRun over and exhaust the participant

The Think-Aloud Protocol

Ask participants to narrate their thoughts as they work:

"Please think out loud as you go through this. Tell me what you're looking for, what you expect to happen, and what you're thinking as you make decisions."

If they go silent:

  • "What are you thinking right now?"
  • "What are you looking for?"
  • "What do you expect to happen next?"

Don't: Ask "Why did you do that?" during the task. It feels like being tested. Save "why" questions for after the task is complete.

Metrics to Track

Quantitative Metrics

MetricWhat It MeasuresHow to CaptureBenchmark
Task success rateCan users complete the task?Pass/fail per task per user> 78% (industry average)
Time on taskHow long does it take?Timer per taskCompare to your target/baseline
Error rateHow often do users make mistakes?Count wrong clicks, backtrackingLower is better, compare over time
Lostness scoreHow much do users wander?(Unique pages visited - optimal pages) / total pages0 = optimal, higher = more lost
Misclick rateHow often do users click the wrong thing?Count clicks on non-target elementsCompare between designs

Qualitative Metrics

MetricWhat It MeasuresHow to Capture
Frustration indicatorsPoints of struggleSighs, swearing, face touching, long pauses
Confusion pointsWhere users don't understand"I'm not sure what this means", scanning behavior
Delight momentsWhat works well"Oh, that's nice!", smiles, quick task completion
Mental model gapsMismatch between expectation and reality"I expected this to be under Settings"
WorkaroundsWhen users find unofficial solutions"I usually just Google the answer instead"

Post-Task Questionnaires

Capture perception after each task:

Single Ease Question (SEQ):

"Overall, how easy or difficult was this task?" 1 (Very Difficult) to 7 (Very Easy) Average: 5.5. Below 5 indicates a problem.

After all tasks, System Usability Scale (SUS):

10 alternating positive/negative statements, rated 1-5:

  1. I think I would like to use this system frequently
  2. I found the system unnecessarily complex
  3. I thought the system was easy to use
  4. I think I would need tech support to use this
  5. I found the functions well integrated
  6. I thought there was too much inconsistency
  7. I imagine most people would learn this quickly
  8. I found the system cumbersome to use
  9. I felt confident using the system
  10. I needed to learn a lot before using this

SUS Scoring:

  • Odd questions: score - 1
  • Even questions: 5 - score
  • Sum all scores × 2.5
  • Result: 0-100 scale
SUS ScoreInterpretationGrade
0-50Poor usabilityF
51-67Below averageD
68Average (industry benchmark)C
69-80GoodB
80-90ExcellentA
90-100Best imaginableA+

Analyzing Results

Step 1: Compile Findings

For each task, document:

  • Success/failure for each participant
  • Time taken
  • Errors observed
  • Quotes and reactions
  • Observations about behavior

Step 2: Categorize Severity

SeverityDefinitionExamplesAction
CriticalPrevents task completionCan't find checkout button, form doesn't submitFix before launch
MajorCauses significant difficulty or frustrationConfusing error message, unclear navigationFix in next sprint
MinorCauses slight hesitation but users recoverUnexpected label, small layout issueFix when convenient
CosmeticNoticed but doesn't affect task successColor seems off, spacing feels unevenBacklog

Step 3: Prioritize

Plot issues on an impact/frequency matrix:

         High Impact
              │
    FIX NOW   │   FIX SOON
    (Critical)│   (Major)
              │
──────────────┼──────────────
              │
    BACKLOG   │   MONITOR
    (Rare but │   (Common but
     severe)  │    minor)
              │
         Low Impact
    Low Frequency    High Frequency

Step 4: Create Recommendations

For each issue, document:

ISSUE: Users can't find the "Apply Coupon" field during checkout
SEVERITY: Major
OBSERVED: 4 of 5 participants missed it
EVIDENCE: "I have a coupon code but I don't see where to enter it"
          Participants scrolled past it, looked in cart summary
RECOMMENDATION: Move coupon field above the order summary,
                add a visible "Have a promo code?" link
BEFORE: [screenshot]
AFTER: [mockup]

Step 5: Share Results

FormatAudienceContent
Highlight reel (2-5 min video)EveryoneThe most impactful moments clipped together
Executive summary (1 page)LeadershipTop 3-5 findings, severity, business impact
Detailed reportProduct/design teamAll findings, severity ratings, recommendations
Live debrief (30 min meeting)Cross-functional teamWalk through findings, discuss priorities

Running Unmoderated Tests

Tools

ToolStrengthsPrice Range
MazePrototype testing, heatmaps, metricsFree-$99/mo
UserTestingLarge participant pool, video recordings$$$$
LookbackLive and unmoderated, screen + webcam$$-$$$
HotjarIn-context feedback, heatmaps, recordingsFree-$99/mo
UsabilityHub/LyssnaQuick preference tests, 5-second tests$-$$

Unmoderated Test Structure

1. WELCOME SCREEN
   "Thanks for participating! This test takes about 10 minutes.
   We're testing a design, not you. There are no wrong answers."

2. SCREENER QUESTIONS (2-3)
   Filter out non-qualifying participants

3. TASKS (3-5)
   Task description → Participant completes task → Post-task question (SEQ)

4. POST-TEST QUESTIONS (3-5)
   SUS or overall impressions

5. THANK YOU
   "Thanks! Your $X gift card will arrive within 24 hours."

Specialized Test Types

First-Click Testing

Show a design and ask: "Where would you click first to [accomplish goal]?" If the first click is correct, users succeed 87% of the time. If the first click is wrong, success drops to 46%.

5-Second Test

Show a design for 5 seconds, then ask:

  • "What is this page about?"
  • "What do you remember?"
  • "What would you do on this page?"

Tests first impressions and visual hierarchy.

A/B Testing

FactorGuideline
Sample sizeMinimum 1000+ per variant for statistical significance
DurationRun for at least 1-2 full business cycles (2-4 weeks)
VariablesChange only ONE thing at a time
Statistical significanceWait for 95% confidence before declaring a winner
MetricDefine your primary metric BEFORE the test starts

Common Mistakes

MistakeImpactFix
Testing with colleaguesThey know too much, they'll always succeedRecruit external participants matching your user profile
Leading participantsThey follow your hints instead of their instinctsRead tasks verbatim, stay silent while they work
Only testing the happy pathYou miss edge cases and error scenariosInclude tasks that may trigger errors or dead ends
Too many tasks per sessionParticipant fatigue, rushing on later tasks5-7 tasks in 30 min, 6-10 in 60 min
Not recording sessionsRely on memory, miss details, can't share clipsAlways record (with consent). Video is more persuasive than notes.
Testing too lateDesign is finished, findings can't be acted onTest early with wireframes or prototypes
Only reporting problemsTeam doesn't know what's working wellInclude positive findings, e.g., "5/5 users completed this easily"
Not iteratingFix issues but never verify the fix worksTest again after making changes

Key Takeaways

  • Test with 5 users, fix, test again. Two small rounds beats one large study.
  • Write scenario-based tasks that give context and goals, not step-by-step instructions.
  • Stay silent during tasks. The urge to help is strong. Resist it.
  • Watch what users do, not what they say. Behavior reveals truth.
  • Categorize findings by severity (critical/major/minor/cosmetic) and prioritize accordingly.
  • Share findings with video clips. A 30-second clip of a user struggling is worth more than a 20-page report.
  • SUS score of 68 is average. Below that, you have usability problems to fix.
  • Test early with wireframes. Don't wait for a polished product.