Ch4 Performance metrics

Chapter 4Performance Metrics

Presenter: 00335011 魏傳諺

Agenda

• Preface

• Task Success

• Time-on-Task

• Errors

• Efficiency

• Learnability

Preface of Performance Metrics

• Based on specific user behaviors

– User behaviors

– The use of scenarios or task

• How well users are actually using a product

• Useful to estimate the magnitude of a specific usability issue

– How many people are likely to encounter the same issue after the product is

released?

– How many users are able to successfully complete a core set of tasks using a

product

• Not the magical elixir for every situation

– sample size

– time & money

– tell the what very effectively but not the why

Five Basic Types

• The most widely used performance metric• How effectively users are able to complete a given set of

tasksTask Success

• How much time is required to complete a taskTime-on-Task

• Reflect the mistakes made during a taskErrors

• The amount of effort a user expends to complete a taskEfficiency

• How performance changes over timeLearnability

TASK SUCCESS

Task Success

• The most common usability metric

• As long as the user has a well-defined task, you can measure

success

Collecting Any Type of Success Metric

• Each task must have a clear end-state

– Define the success criteria Data collection

• Find the current price for a share of Google stock (clear end-state)

• Research ways to save for your retirement (not a clear end-state)

• Way to collect success data

– Verbally articulate the answer after completing the task

– Provide their answers in a more structured way

• Try to avoid write-in answers if possible

• In some case the correct solution to a task may not be verifiable

– depends on the user’s specific situation

– testing is not being performed in person

Binary Success

• Either participants complete a task successfully or they don’t

• How to Collect and Measure

– 0 & 1

• How to Analyze and Present

– By individual task

– By user or type of user

• Frequency of use

• Previous experience using the product

• Domain expertise

• Age group

• Can calculate a percentage of tasks that each successfully completed

– Binary data Continuous data

• Calculating Confidence Intervals

Levels of Success

• Partially completing a task?

– coming close to fully completing a task may provide value to the

participant

– Helpful for you to know

• Why some participants failed to complete a task

• With which particular tasks they needed help

Levels of Success (cont’d)

• How to Collect and Measure

– Must define the various levels

– Based on the extent or degree to which a participant completed the task

• Complete Success, Partial Success, and Failure

• What constitutes ‘‘giving assistance’’ to the participant

• Assign a numeric value for each level

• Does not differentiate between different types of failure

– Based on the experience in completing a task

• No Problem, Minor Problem, Major Problem, and Failure/Gave up

• Ordinal data No average score

– Based on the participant accomplishing the task in different ways

• Depending on the quality of the answer (not needs numeric score)

Levels of Success (cont’d)

• How to Analyze and Present

– To create a tacked bar chart

– To report a “usability score”

Issues in Measuring Success

• How to define whether a task was successful?

– When unexpected situations arise

• Make note of them

• Afterward try to reach a consensus

• How or when to end a task

– Stopping rule

• Complete task / Reach the point at which they would give up or seek

assistance

• “Three strikes and you’re out”

• Set a time limit

– If the participant is becoming particularly frustrated or agitated

TIME-ON-TASK

Time-on-Task

• Way to measure the efficiency of any product

– The faster a participant can complete a task, the better the experience

• Exceptions to the assumption that faster is better

– Game

– Learning

Importance of Measuring Time-on-Task

• Particularly important for products

– where tasks are performed repeatedly by the user

• The side benefits of measuring time-on-task

– Increasing Efficiency Cost Savings Actual ROI

How to Collect and Measure Time-on-Task

• The time elapsed between the start of a task and the end of a task

– In minutes

– In seconds

• Measure by any time-keeping device

– Start time & End time

– Two people record the times

• Automated Tools for Measuring Time-on-Task

– less error-prone

– Much less obtrusive

• Turning on and off the Clock

– Rules about how to measure time

• Start the clock as soon as they finish reading the task

• Point the timing ends at the participant hit the “answer” button

• Stop timing when the participant has stopped interacting with the product

How to Collect and Measure Time-on-Task (cont’d)

• Tabulating Time Data

Analyzing and Presenting Time-on-Task Data

• Ways to present

– Mean

– Median

– Geometric mean

• Ranges

– Time interval

• Thresholds

– Whether users can complete certain tasks within an acceptable amount of time

• Distributions and Outliers

– Exclude outliers (> 3 SD above the mean)

– Set up thresholds

– determine the fastest possible time

Issues to Consider When Using Time Data

• Only Successful Tasks or All Tasks?

– Advantage of only including successful tasks

• A cleaner measure of efficiency

– Advantage of including all tasks

• A more accurate reflection of the overall user experience

• An independent measure in relation to the task success data

– Always determined when to end include all times

– Sometimes decided when to end only include successful tasks

• Using a Think-Aloud Protocol?

– Think-aloud protocol: to gain important insight

– Have an impact on the time-on-task data

– Retrospective probing technique

• Should You Tell the Participants about the Time Measurement?

– Perform the tasks as quickly and accurately as possible

ERRORS

Errors

• Usability issue vs. Error

– A usability issue is the underlying cause of a problem

– One or more errors are a possible outcome

• Errors

– incorrect actions that may lead to task failure

When to Measure Errors

• When you want to understand the specific action or set of actions

that may result in task failure

• Errors can tell

– How many mistakes were made

– Where they were made within the product

– How various designs produce different frequencies and types of errors

– How usable something really is

• Three general situations where measuring errors might be useful

– When an error will result in a significant loss in efficiency

– When an error will result in significant costs

– When an error will result in task failure

What Constitutes an Error?

• No widely accepted definition of what constitutes an error

• Based on many different types of incorrect actions by the user

– Entering incorrect data into a form field

– Making the wrong choice in a menu or drop-down list

– Taking an incorrect sequence of actions

– Failing to take a key action

• Determine what constitutes an error

– Make a list of all the possible actions

– Define many of the different types of errors that can be made

What Constitutes an Error? (cont’d)

Collecting and Measuring Errors

• Not always easy

– Need to know what the correct (set of) action(s) should be

• Consideration

– Only a single error opportunity

– Multiple error opportunities

• Way of organizing error data

– Record the number of errors for each task and each user

– 0 ~ max(number of error opportunities)

Analyzing and Presenting Errors

• Tasks with a Single Error Opportunity

– Look at the frequency of the error for each task

• Frequency of errors

• Percentage of participants who made an error for each task

– From an aggregate perspective

• Average the error rates for each task into a single error rate

• Take an average of all the tasks that had a certain number of errors

• Establish maximum acceptable error rates for each task

• Tasks with Multiple Error Opportunities

– Look at the frequency of errors for each task error rate

– The average number of errors made by each participant for each task

– Which tasks fall above or below a threshold

– Weight each type of error with a different value and then calculate an “error score”

Issues to Consider When Using Error Metrics

• Make sure you are not double-counting errors

• Need to know

– An error rate, and

– Why different errors are occurring

• An error is the same as failing to complete a task

– Report errors as task failure

EFFICIENCY

Efficiency

• Time-on-task

• Look at the amount of effort required to complete a task

– In most products, the goal is to minimize the amount of effort

– two types of effort

• Cognitive

– Finding the right place to perform an action

– Deciding what action is necessary

– Interpreting the results of the action

• Physical

– The physical activity required to take action

Collecting and Measuring Efficiency

• Identify the action(s) to be measured

• Define the start and end of an action

• Count the actions

• Actions must be meaningful

– Incremental increase in cognitive effort

– Incremental increase in physical effort

• Look only at successful tasks

Analyzing and Presenting Efficiency Data

• The number of actions each participant takes to complete a task

– if some tasks are more complicated than others, it may be misleading

• Lostness

– N: The number of different web pages visited while performing the task

– S: The total number of pages visited while performing the task

– R: The minimum (optimum) number of pages that must be visited to

accomplish the task

– A perfect lostness score would be 0

– Participants with a lostness score greater than 0.5 definitely did appear

to be lost

– The average lostness value

Analyzing and Presenting Efficiency Data (cont’d)

Efficiency as a Combination of Task Success and Time

• Task Success + Time-on-Task

• Core measure of efficiency

– The ratio of the task completion rate to the mean time per task

LEARNABILITY

LEARNABILITY

• Most products, especially new ones, require some amount of learning

• Experience

– Based on the amount of time spent using a product

– Based on the variety of tasks performed

• Learning

– Sometimes quick and painless

– At other times quite arduous and time consuming

• Learnability

– The extent to which something can be learned

– How much time and effort are required to become proficient

– While happens over a short period of time maximize efficiency

– While happen over a longer time period great rely on memory

Collecting and Measuring Learnability Data

• Basically the same as they are for the other performance metrics

• Collect the data at multiple times

– Based on expected frequency of use

• Decide which metrics to use Decide how much time to allow

between trials

• Alternatives

– Trials within the same session

– Trials within the same session but with breaks between tasks

– Trials between sessions

Analyzing and Presenting Learnability Data

• By examining a specific performance metric

• Interpret the chart

– Notice the slope of the line(s)

– Notice the point of asymptote, or essentially where the line starts to

flatten out

– Look at the difference between the highest and lowest values on the y-

axis

• Compare learnability across different conditions

Issues to Consider When Measuring Learnability

• What Is a Trial?

– Learning is continuous and without breaks in time

• Memory is much less a factor in this situation

• More about developing and modifying different strategies to complete a set

of tasks

• Take measurements at specified time intervals

• Number of Trials

– There must be at least two

– In most cases there should be at least three or four

– You should err on the side of more trials than you think you might need

to reach stable performance.

Thanks for your listening~

Education

Ch4 Performance metrics