Usability Testing | UX/UI Design

Usability testing is watching real people try to use your product. Not asking them what they think. Not having them fill out a survey. Watching them attempt actual tasks and observing where they struggle, succeed, and give up. It's the highest-impact research method and the one most teams skip.

The 5-User Rule

Nielsen Norman Group's research shows that 5 users find approximately 85% of usability problems. You don't need a massive study. You need a small study done often.

Usability problems found vs. number of test participants:

100% ─┤
      │                    ●─────●─────●─────●
 85% ─┤              ●
      │         ●
 75% ─┤
      │    ●
      │
 50% ─┤
      │
      │●
   0% ─┼────┼────┼────┼────┼────┼────┼────┼────
       0    1    2    3    4    5    6    7    8
                   Participants

5 users → ~85% of problems found
Beyond 5 → diminishing returns

Better approach: Test with 5 users, fix the problems, test with 5 more users. Two rounds of 5 finds more than one round of 10.

Test Types

Type	Participants	Facilitator	Duration	Cost	Best For
Moderated in-person	5-8	Present in room	45-60 min	High	Deep insights, complex products, early-stage designs
Moderated remote	5-8	Video call	30-60 min	Medium	Geographic reach, pandemic-friendly, screen sharing
Unmoderated remote	10-50	Recorded, no facilitator	10-20 min	Low	Quick validation, high volume, simple tasks
Guerrilla	5-10	Informal, in a cafe or hallway	5-15 min	Very Low	Fast feedback on specific interactions

Choosing the Right Type

Question	Type to Use
Need to understand why users struggle?	Moderated (ask follow-up questions)
Need fast results from many users?	Unmoderated
Testing a complex, multi-step workflow?	Moderated (need to guide and observe)
Testing a specific button/layout/page?	Unmoderated (simple, focused task)
No budget and need feedback today?	Guerrilla (find people and ask them)
Users are geographically distributed?	Remote (moderated or unmoderated)

Moderated vs. Unmoderated: Tradeoffs

Aspect	Moderated	Unmoderated
Follow-up questions	Yes, probe deeper on interesting moments	No, limited to pre-set tasks
Cost per participant	$100-300+ (incentive + facilitator time)	$10-50 (platform fee + small incentive)
Time to results	1-2 weeks (sessions + analysis)	2-3 days (automated recording)
Depth of insight	Deep: observe body language, ask "why?"	Shallow: observe behavior only
Facilitator bias	Risk of leading participants	No facilitator bias
Participant quality	Higher (screened, scheduled)	Lower (faster, less committed)

Planning a Usability Test

Define Your Goals

Before writing tasks, answer these questions:

Question	Example Answer
What are you testing?	The new checkout flow redesign
What do you want to learn?	Can users complete a purchase without help? Where do they get stuck?
What will you do with the results?	Fix the top 3 usability issues before launch
What's your success benchmark?	80%+ task completion rate, average < 3 minutes

Recruit the Right Participants

Criterion	Why It Matters
Match your target user profile	Testing with the wrong people gives misleading results
Mix of tech comfort levels	Your users aren't all power users
Not employees or close friends	They know too much about the product
Screener survey	Filter for relevant experience and demographics
Incentive	$50-100 per session (or equivalent gift card)

Screener question examples:

"How often do you shop online?" (Filter for frequency)
"Which of these tools have you used in the past 6 months?" (Filter for experience)
"What is your primary role at work?" (Filter for job relevance)

Writing Good Tasks

Tasks are the heart of usability testing. Bad tasks produce useless data. Good tasks reveal real usability problems.

Task Structure

Every task should include:

Context/scenario: Why the user is doing this
Goal: What they need to accomplish
No instructions: Don't tell them how to do it

Good vs. Bad Tasks

Bad Task	Why It's Bad	Good Task
"Click the Search button and type 'running shoes'"	Tells them exactly what to do, tests nothing	"You need new running shoes for a marathon. Find a pair you'd buy."
"Navigate to Settings > Account > Security"	Gives the answer away	"You want to change your password. Go ahead and do that."
"Test the checkout flow"	Too vague, no scenario	"You've found a gift for a friend. Complete the purchase using any payment method."
"Find the FAQ page"	Tests navigation, not whether FAQ solves their problem	"Your order hasn't arrived. Find out what to do."
"Do you like the new design?"	Opinion question, not a task	"You want to track your monthly spending. Walk me through how you'd do that."

Task Difficulty Spectrum

Include a mix of task difficulties:

Difficulty	Purpose	Example
Easy (warm-up)	Build confidence, calibrate	"Find the pricing page"
Medium	Core functionality testing	"Add an item to your cart and apply a promo code"
Hard	Edge cases, complex flows	"Return a gift item purchased by someone else"
Exploratory	Discovery, open-ended	"Explore the dashboard and tell me what you can learn about your account"

How Many Tasks?

Session Length	Number of Tasks	Notes
15 min (guerrilla/unmoderated)	3-5 tasks	Quick, focused
30 min (remote)	5-7 tasks	Standard session
45-60 min (moderated)	6-10 tasks	Includes follow-up questions

Facilitating a Session

Before the Session

Test your recording software
Prepare a printed/digital test script
Have the prototype/product ready
Remove any personal data from test accounts
Silence your phone

During the Session

Do	Don't
Read the task verbatim	Paraphrase and accidentally add hints
Stay silent while they work	Fill silences with hints or explanations
Ask "What are you thinking?" when they pause	Ask "Why did you click that?" (feels judgmental)
Say "There are no wrong answers"	Say "That's not right" or "Try the other button"
Note what they do, not just what they say	Only record verbal feedback
Ask follow-up after they finish the task	Interrupt them mid-task
End on time even if tasks remain	Run over and exhaust the participant

The Think-Aloud Protocol

Ask participants to narrate their thoughts as they work:

"Please think out loud as you go through this. Tell me what you're looking for, what you expect to happen, and what you're thinking as you make decisions."

If they go silent:

"What are you thinking right now?"
"What are you looking for?"
"What do you expect to happen next?"

Don't: Ask "Why did you do that?" during the task. It feels like being tested. Save "why" questions for after the task is complete.

Metrics to Track

Quantitative Metrics

Metric	What It Measures	How to Capture	Benchmark
Task success rate	Can users complete the task?	Pass/fail per task per user	> 78% (industry average)
Time on task	How long does it take?	Timer per task	Compare to your target/baseline
Error rate	How often do users make mistakes?	Count wrong clicks, backtracking	Lower is better, compare over time
Lostness score	How much do users wander?	(Unique pages visited - optimal pages) / total pages	0 = optimal, higher = more lost
Misclick rate	How often do users click the wrong thing?	Count clicks on non-target elements	Compare between designs

Qualitative Metrics

Metric	What It Measures	How to Capture
Frustration indicators	Points of struggle	Sighs, swearing, face touching, long pauses
Confusion points	Where users don't understand	"I'm not sure what this means", scanning behavior
Delight moments	What works well	"Oh, that's nice!", smiles, quick task completion
Mental model gaps	Mismatch between expectation and reality	"I expected this to be under Settings"
Workarounds	When users find unofficial solutions	"I usually just Google the answer instead"

Post-Task Questionnaires

Capture perception after each task:

Single Ease Question (SEQ):

"Overall, how easy or difficult was this task?" 1 (Very Difficult) to 7 (Very Easy) Average: 5.5. Below 5 indicates a problem.

After all tasks, System Usability Scale (SUS):

10 alternating positive/negative statements, rated 1-5:

I think I would like to use this system frequently
I found the system unnecessarily complex
I thought the system was easy to use
I think I would need tech support to use this
I found the functions well integrated
I thought there was too much inconsistency
I imagine most people would learn this quickly
I found the system cumbersome to use
I felt confident using the system
I needed to learn a lot before using this

SUS Scoring:

Odd questions: score - 1
Even questions: 5 - score
Sum all scores × 2.5
Result: 0-100 scale

SUS Score	Interpretation	Grade
0-50	Poor usability	F
51-67	Below average	D
68	Average (industry benchmark)	C
69-80	Good	B
80-90	Excellent	A
90-100	Best imaginable	A+

Analyzing Results

Step 1: Compile Findings

For each task, document:

Success/failure for each participant
Time taken
Errors observed
Quotes and reactions
Observations about behavior

Step 2: Categorize Severity

Severity	Definition	Examples	Action
Critical	Prevents task completion	Can't find checkout button, form doesn't submit	Fix before launch
Major	Causes significant difficulty or frustration	Confusing error message, unclear navigation	Fix in next sprint
Minor	Causes slight hesitation but users recover	Unexpected label, small layout issue	Fix when convenient
Cosmetic	Noticed but doesn't affect task success	Color seems off, spacing feels uneven	Backlog

Step 3: Prioritize

Plot issues on an impact/frequency matrix:

         High Impact
              │
    FIX NOW   │   FIX SOON
    (Critical)│   (Major)
              │
──────────────┼──────────────
              │
    BACKLOG   │   MONITOR
    (Rare but │   (Common but
     severe)  │    minor)
              │
         Low Impact
    Low Frequency    High Frequency

Step 4: Create Recommendations

For each issue, document:

ISSUE: Users can't find the "Apply Coupon" field during checkout
SEVERITY: Major
OBSERVED: 4 of 5 participants missed it
EVIDENCE: "I have a coupon code but I don't see where to enter it"
          Participants scrolled past it, looked in cart summary
RECOMMENDATION: Move coupon field above the order summary,
                add a visible "Have a promo code?" link
BEFORE: [screenshot]
AFTER: [mockup]

Format	Audience	Content
Highlight reel (2-5 min video)	Everyone	The most impactful moments clipped together
Executive summary (1 page)	Leadership	Top 3-5 findings, severity, business impact
Detailed report	Product/design team	All findings, severity ratings, recommendations
Live debrief (30 min meeting)	Cross-functional team	Walk through findings, discuss priorities

Running Unmoderated Tests

Tools

Tool	Strengths	Price Range
Maze	Prototype testing, heatmaps, metrics	Free-$99/mo
UserTesting	Large participant pool, video recordings	$$$$
Lookback	Live and unmoderated, screen + webcam	$$-$$$
Hotjar	In-context feedback, heatmaps, recordings	Free-$99/mo
UsabilityHub/Lyssna	Quick preference tests, 5-second tests	$-$$

Unmoderated Test Structure

1. WELCOME SCREEN
   "Thanks for participating! This test takes about 10 minutes.
   We're testing a design, not you. There are no wrong answers."

2. SCREENER QUESTIONS (2-3)
   Filter out non-qualifying participants

3. TASKS (3-5)
   Task description → Participant completes task → Post-task question (SEQ)

4. POST-TEST QUESTIONS (3-5)
   SUS or overall impressions

5. THANK YOU
   "Thanks! Your $X gift card will arrive within 24 hours."

Specialized Test Types

First-Click Testing

Show a design and ask: "Where would you click first to [accomplish goal]?" If the first click is correct, users succeed 87% of the time. If the first click is wrong, success drops to 46%.

5-Second Test

Show a design for 5 seconds, then ask:

"What is this page about?"
"What do you remember?"
"What would you do on this page?"

Tests first impressions and visual hierarchy.

A/B Testing

Factor	Guideline
Sample size	Minimum 1000+ per variant for statistical significance
Duration	Run for at least 1-2 full business cycles (2-4 weeks)
Variables	Change only ONE thing at a time
Statistical significance	Wait for 95% confidence before declaring a winner
Metric	Define your primary metric BEFORE the test starts

Common Mistakes

Mistake	Impact	Fix
Testing with colleagues	They know too much, they'll always succeed	Recruit external participants matching your user profile
Leading participants	They follow your hints instead of their instincts	Read tasks verbatim, stay silent while they work
Only testing the happy path	You miss edge cases and error scenarios	Include tasks that may trigger errors or dead ends
Too many tasks per session	Participant fatigue, rushing on later tasks	5-7 tasks in 30 min, 6-10 in 60 min
Not recording sessions	Rely on memory, miss details, can't share clips	Always record (with consent). Video is more persuasive than notes.
Testing too late	Design is finished, findings can't be acted on	Test early with wireframes or prototypes
Only reporting problems	Team doesn't know what's working well	Include positive findings, e.g., "5/5 users completed this easily"
Not iterating	Fix issues but never verify the fix works	Test again after making changes

Key Takeaways

Test with 5 users, fix, test again. Two small rounds beats one large study.
Write scenario-based tasks that give context and goals, not step-by-step instructions.
Stay silent during tasks. The urge to help is strong. Resist it.
Watch what users do, not what they say. Behavior reveals truth.
Categorize findings by severity (critical/major/minor/cosmetic) and prioritize accordingly.
Share findings with video clips. A 30-second clip of a user struggling is worth more than a 20-page report.
SUS score of 68 is average. Below that, you have usability problems to fix.
Test early with wireframes. Don't wait for a polished product.