68
BEST PRACTICES FOR STATISTICS

Statistics for Librarians, Session 4: Statistics best practices

Embed Size (px)

DESCRIPTION

Final session in a series of four seminars presented to University of North Texas librarians. This presentation brings together some best practices for gathering, organizing, analyzing, and presenting statistics and data.

Citation preview

Page 1: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICES FOR STATISTICS

Page 2: Statistics for Librarians, Session 4: Statistics best practices

Know what you know and what you don’t know

Have a comparison group

Use validated measures

Have a Data Entry Plan

Get to know your data

If it doesn’t fit, change it

Place your bets before you collect the data

Use the best methods of analysis for your question & your dataGo beyond the p-value

BEST PRACTICES

Page 3: Statistics for Librarians, Session 4: Statistics best practices

What is Statistics?

• Study of Data• Collecting• Organizing• Summarizing • Analyzing• Presenting• Storing &

Sharing

Why is it Important?

• Make sense of the data

• Explain what happens and (possibly) why

• Make sound decisions

• To know how close we are to the truth.

Page 4: Statistics for Librarians, Session 4: Statistics best practices

Results

Bias?

Sampling Error?

Invalid Measures

?

Random Error?

Other Factors?

PURPOSE OF STATISTICS

Page 5: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE:KNOW WHAT YOU ALREADY

KNOW, WHAT YOU WANT TO KNOW

AND WHAT YOU DON’T KNOW

Page 6: Statistics for Librarians, Session 4: Statistics best practices

How do users differ when (searching, finding, selecting) (articles, books, Web sites)?What are the effects of ___________On ____________?

Which is better at improving _________?

How are people (finding, selecting, using) _______?

What are factors associated with ___________?

STARTING WITH YOUR RESEARCH QUESTION

Page 7: Statistics for Librarians, Session 4: Statistics best practices

KINDS OF VARIABLES

Independent

Subjects

Factors

Effects of…

Dependent

Objects

Outcomes

Effects on…

Page 8: Statistics for Librarians, Session 4: Statistics best practices

Nominal• Counts by category• No meaning between the categories (Blue is not

better than Red)

Ordinal• Ranks• Scales• Space between ranks is subjective

Interval• Integers• No baseline• Space between values is equal and objective, but

discrete

Ratio• Interval data with a baseline• Space between is continuous

LEVELS OF MEASUREMENT (NOIR)

Page 9: Statistics for Librarians, Session 4: Statistics best practices

• Counts by Categories

• Ranks• Scales

Qualitative

• Measurements• Composite scores• Simple Counts

Quantitative

ANOTHER WAY

Page 10: Statistics for Librarians, Session 4: Statistics best practices

LIKERT-TYPE SCALE?

Arbitrary

Few Levels

Individual Questions

Ordinal?

Symmetrical

Many Levels

Composite Score

Interval?

Page 11: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE:HAVE A COMPARISON

GROUP

Page 12: Statistics for Librarians, Session 4: Statistics best practices

WAYS OF COMPARING…

Time Periods

Other Libraries

National Surveys

Patron Types

Material Types

Page 13: Statistics for Librarians, Session 4: Statistics best practices

• Qualitative• Comparison

Expected ranks or ratios

• Quantitative• Correlations

Two variables

• Quantitative or Qualitative• Paired or Not Paired

Samples or Groups

KINDS OF COMPARISON

Page 14: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE: USE A VALID

MEASURE

Page 15: Statistics for Librarians, Session 4: Statistics best practices

Are you actually measuring what you are trying to

measure?

VALIDITY OF MEASURES

Page 16: Statistics for Librarians, Session 4: Statistics best practices

USE A TOOL WITH ESTABLISHED VALIDITY

Approaches and Study Skills Inventory for Students (ASSIST)

User Engagement Scale (UES)

Page 17: Statistics for Librarians, Session 4: Statistics best practices

ESTABLISH VALIDITY OF MEASURES

• ConsistencyReliability

• Common senseContent or

Face Validity

• Based on theoryConstruct Validity

• Comparison with other valid measures

Criterion Validity

Page 18: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE: HAVE A DATA PLAN

Page 19: Statistics for Librarians, Session 4: Statistics best practices

GOAL OF DATA COLLECTION IN STATISTICS

Reliability

Bias

Page 20: Statistics for Librarians, Session 4: Statistics best practices

BIAS

Systematic (not random) deviation from the true value (Statistics.com)

Selection Bias

Measurement• Observer Bias• Non-response

Bias

Analysis Bias

Page 21: Statistics for Librarians, Session 4: Statistics best practices

DATA INPUT

Have a data entry plan

Train the inputters

Use data validation tricks

Double-entry

Page 22: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE:GET TO KNOW YOUR

DATA

Page 23: Statistics for Librarians, Session 4: Statistics best practices

Central Tenden

cy

SpreadError

EXPLORATORY DATA ANALYSIS

Page 24: Statistics for Librarians, Session 4: Statistics best practices

• Average• For Quantative data• Excel function:

=Average(range)

Mean• Middle• For Quantitative or Rank data• Excel function:

=Median(range)

Median

• Most common• Primarily for Qualitative data• Excel function: =Mode(range)

Mode

MEASURES OF CENTRAL TENDENCY

Page 25: Statistics for Librarians, Session 4: Statistics best practices

SPREAD & DISTRIBUTION

Page 26: Statistics for Librarians, Session 4: Statistics best practices

DISTRIBUTION OR SPREAD OF QUALITATIVE DATA

Tables• Counts• Percentages/Ratios• Averages of Counts

Excel• Pivot Tables

Page 27: Statistics for Librarians, Session 4: Statistics best practices

PIVOT TABLES IN EXCEL

Select Data

• Highlight table• Insert->Pivot Table

Select Variables

• Categories (Row Labels)• Values

Change Settings

• Percentage of Grand Total

• Average

Page 28: Statistics for Librarians, Session 4: Statistics best practices

DEMONSTRATION OF PIVOT TABLES FOR SPREAD OF QUALITATIVE DATA

Page 29: Statistics for Librarians, Session 4: Statistics best practices

GRAPH & CHART RULES OF THUMB

TrendsConnection across the

X-axis

CategoricalCompariso

nsGroupedStackedRelative Stacked

CategoricalFew

CategoriesDifferences are Wide

Page 30: Statistics for Librarians, Session 4: Statistics best practices

QUANTITATIVE DISTRIBUTIONS

Stem & Leaf

Histogram

Distribution graphs

Page 31: Statistics for Librarians, Session 4: Statistics best practices

John W. TukeyExploratory Data

AnalysisExamining your

data visually.Stem & LeafHingesBox plotsScatter plots, etc.

EXPLORATORY DATA ANALYSIS

Page 32: Statistics for Librarians, Session 4: Statistics best practices

STEM-AND-LEAF

Stem

Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

First digit(s

)

Last digit

Years at UNT

0 5 131 6 131 6 131 6 132 6 152 6 162 7 172 7 172 7 182 8 182 8 19

3 11 294 11 294 12 304 12 324 12 345 12 355 13 

Page 33: Statistics for Librarians, Session 4: Statistics best practices

FROM STEM-AND-LEAF TO HISTOGRAMS

Page 34: Statistics for Librarians, Session 4: Statistics best practices

Stem

Leaf Count

0 1122223334445555666666677777899

31

1 000011122222222333346677889 27

2 0122234468 10

3 1112355888 11

4 12 2Range Count

0-9 31

10-19 27

20-29 10

30-39 11

40-49 2

0-9 10-19 20-29 30-39 40-490

10

20

30

40

Histogram of Years at UNT

Page 35: Statistics for Librarians, Session 4: Statistics best practices

HISTOGRAMS IN EXCEL

• Options• Add-ins• Manage Add-ins

Analysis Toolpak

• Equal Size Ranges

• Ceiling (“more”)

Set ranges• Data• Data Analysis• Histogram

Create Histogram

• Insert Bar Chart• Highlight

histogram• Select bars &

Format Selection• Gap Width=0%

Create Graph

For Histogra

m

9

19

29

39

49

Page 36: Statistics for Librarians, Session 4: Statistics best practices

DEMONSTRATION OF HISTOGRAM IN EXCEL

Page 37: Statistics for Librarians, Session 4: Statistics best practices

SPREAD OF QUANTITATIVE DATA

How variable is the data?

Range

Quantiles

Standard

Deviation

Page 38: Statistics for Librarians, Session 4: Statistics best practices

RANGE & QUARTILES

Page 39: Statistics for Librarians, Session 4: Statistics best practices

Box plotsMedianUpper & lower quartiles

Outliers

PRESENTATION OF SPREAD

Page 40: Statistics for Librarians, Session 4: Statistics best practices

Measure of dispersion of data

Square root of the average variation from the mean

STANDARD DEVIATION

Page 41: Statistics for Librarians, Session 4: Statistics best practices

Greater variation, less certainty

Lower variation, more certainty

WHAT DOES THE SD TELL YOU?

Page 42: Statistics for Librarians, Session 4: Statistics best practices

• Min(range)• Max(range)Range

• Percentiles.inc(range, %)• Quartile.inc(range,

{1,2,3,4})Quantiles

• STDEV.S(range)Standard Deviation

SPREAD IN EXCEL

Page 45: Statistics for Librarians, Session 4: Statistics best practices

DEMONSTRATION OF DISTRIBUTIONS

Distribution of the PopulationThe “Truth”

N is the # of samples

n is the number of items in each

sample

Watch the cumulative mean & medians slowly merge to the population

Page 46: Statistics for Librarians, Session 4: Statistics best practices

Transformation of data

BEST PRACTICE:IF IT DOESN’T FIT,

CHANGE IT

Page 47: Statistics for Librarians, Session 4: Statistics best practices

WHY TRANSFORM?

0-9 10-19 20-29 30-3905

101520253035404550

Years at UNT

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 1

1.1

1.2

1.3

1.4

1.5

1.6

More

0

2

4

6

8

10

12

14

16

Log10(Years at UNT)

Page 48: Statistics for Librarians, Session 4: Statistics best practices

Y=a+bxLog(Y)=Log(a+

bx)1/Y =

1/(a+bx)

HOW TRANSFORMATION WORKS

Page 50: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE: PLACE YOUR BETS

BEFORE YOU START

Page 51: Statistics for Librarians, Session 4: Statistics best practices

INFERENTIAL STATISTICS

Tests of hypotheses• Associations• ExpectationsAccounts for uncertainty• Random error• Confidence interval

Page 52: Statistics for Librarians, Session 4: Statistics best practices

Your Hypothe

sis(H1)

Null Hypothesis(H0)

HYPOTHESIS TESTING

Page 53: Statistics for Librarians, Session 4: Statistics best practices

EXAMPLE HYPOTHESIS

>=75%* <75%*

*…of journal articles cited by UNT PACS faculty in journal articles published between 2008-2011.

UNT Libraries provides access to…

Page 54: Statistics for Librarians, Session 4: Statistics best practices

p

Sample Size

Central Tendency

SpreadDistribution

Significance Level

HYPOTHESIS TESTING

Page 55: Statistics for Librarians, Session 4: Statistics best practices

TESTING HYPOTHESES

Page 56: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE:CHOOSE THE BEST METHOD

FOR YOUR QUESTION AND DATA

Page 57: Statistics for Librarians, Session 4: Statistics best practices

Assumptions

LimitationsAppropriate data

typeWhat the test tests

KNOW THE TESTS

Page 58: Statistics for Librarians, Session 4: Statistics best practices

Variable Type

What is being

compared

Independence of units

Underlying variance in

the population

Distribution Sample size

Number of comparison

groups

FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD

Page 59: Statistics for Librarians, Session 4: Statistics best practices

USE A FLOW CHART

Page 60: Statistics for Librarians, Session 4: Statistics best practices

BEST PRACTICE: GOING BEYOND THE

P-VALUE

Page 61: Statistics for Librarians, Session 4: Statistics best practices

AND THE P-VALUE SAYS…

Much about the

distributions

More about the H0 than

H1

Little about size of

differences

Page 62: Statistics for Librarians, Session 4: Statistics best practices

MORE USEFUL STATISTICS

Effect Sizes• Tell the real story

Confidence Intervals• State your certainty

Page 63: Statistics for Librarians, Session 4: Statistics best practices

Correlations

• Cohen’s guidelines for Pearson’s r

Differences from the mean

• Standardized• weighted

against the standard deviation

• Cohen’s d

EFFECT SIZES OF QUANTITATIVE DATA

Effect Size

r>

Small .10

Medium

.30

Large .50

Page 64: Statistics for Librarians, Session 4: Statistics best practices

Based on Contingency

table

• Odds of event A divided by odds of event B

• Case-control studiesOdds ratio

• Uses probabilities rather than odds• Experiments, RCTsRelative risk

EFFECT SIZES OF QUALITATIVE DATA

Test A/B Yes No Total

Yes 10 15 25

No 50 25 75

Totals 60 40 100

Page 65: Statistics for Librarians, Session 4: Statistics best practices

Point estimates

Intervals

Based on

Expressed as:

• Single value• Mean

• Degree of uncertainty• Range of certainty around the point estimate

• Point estimate (e.g. mean)• Confidence level (usually .95)• Standard deviation

• The mean score of the students who had the IL training was 83.5 with a 95% CI of 78.3 and 89.4.

CONFIDENCE INTERVALS

Page 66: Statistics for Librarians, Session 4: Statistics best practices

Noise

Signal

STATISTICAL ANALYSIS

Page 67: Statistics for Librarians, Session 4: Statistics best practices

Know what you know and what you don’t know

Have a comparison group

Use validated measures

Have a Data Entry Plan

Get to know your data

If it doesn’t fit, change it

Place your bets before you collect the data

Use the best methods of analysis for your question & your dataGo beyond the p-value

BEST PRACTICES