51
Descriptive Statistics with 2012-10-12 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO KNOW

Descriptive Statistics with R

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Descriptive Statistics with R

Descriptive Statistics with

2012-10-12 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO  KNOW

Page 2: Descriptive Statistics with R

Group Website is at:

http://rpubs.com/kaz_yos/useR_at_HSPH

Page 3: Descriptive Statistics with R

n Introduction to R

n Reading Data into R (1)

n Reading Data into R (2)

Previously in this group

Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH

Page 4: Descriptive Statistics with R

Menu

n mean and sd

n median, quantiles, IQR, max, min, and range

n skewness and kurtosis

n smarter ways of doing these

Page 5: Descriptive Statistics with R

Ingredients

n Summary statistics for continuous data

n Normal data

n Non-normal data

n Normality check

n vector and data frame

n DATA$VAR extraction

n Indexing by [row,col]

n Various functions

n skewness(), kurtosis()

n summary()

n describe(), describeBy()

Statistics Programming

Page 6: Descriptive Statistics with R

http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html

Data loadedWhat’s next?

Page 7: Descriptive Statistics with R

Descriptive Statistics

http://www.ehow.com/info_8650637_descriptive-statistical-methods.html

Page 8: Descriptive Statistics with R

Descriptive statistics is the

discipline of quantitatively describing the main features of a collection of data

http://en.wikipedia.org/wiki/Descriptive_statistics

Page 9: Descriptive Statistics with R

Open R Studio

Page 11: Descriptive Statistics with R

Read in BONEDEN.DAT.txt

Name it bone

Page 12: Descriptive Statistics with R

DATA$VARe.g., mean(bone$age)

Accessing a single variable in data set

dataset name variable name

Page 13: Descriptive Statistics with R

vector

Page 15: Descriptive Statistics with R

1 2 3 4 5 6 7 8

like strings with values attached

“A” “B” “C” “D” “E” “F” “G” “H”

DATA$VAR is a vector

OR

Page 16: Descriptive Statistics with R

1 2 3 4 5 6 7 8

Multiple vectors of same length tied together

“A” “B

” “C” “D

” “E” “F” “G

” “H”

DATA is a data frame

1 2 3 4 5 6 7 8“A

” “B” “C

” “D” “E

” “F” “G” “H

1 2 3 4 5 6 7 8

“A” “B

” “C” “D

” “E” “F” “G

” “H”

Tied here

Page 17: Descriptive Statistics with R

bone[1:15 , 1:12]

Extract 1st to 15th rows Extract 1st to 12th columns

Indexing: extraction of data from data frame

Don’t forget commaColon in between

Page 18: Descriptive Statistics with R

age vector within bone data frame

Page 19: Descriptive Statistics with R

bone$age

Extracted as a vector

Page 20: Descriptive Statistics with R

meanmean(x, trim = 0, na.rm = FALSE)

Page 21: Descriptive Statistics with R

Your turn

n What is the mean of age?

adopted from Hadley Wickham

Page 22: Descriptive Statistics with R

sdsd(x, na.rm = FALSE)

Page 23: Descriptive Statistics with R

Your turn

n What is the sd of age?

adopted from Hadley Wickham

Page 24: Descriptive Statistics with R

medianmedian(x, na.rm = FALSE)

Page 25: Descriptive Statistics with R

Your turn

n What is the median of age?

adopted from Hadley Wickham

Page 26: Descriptive Statistics with R

quantilequantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,

names = TRUE, type = 7)

0th, 25th, 50th, 75th, and 100th percentiles by defaults

Page 27: Descriptive Statistics with R

Your turn

n What is the 25th and 75th percentiles of age?

adopted from Hadley Wickham

Page 28: Descriptive Statistics with R

IQRIQR(x, na.rm = FALSE, type = 7)

75th percentile - 25th percentile

Page 29: Descriptive Statistics with R

Your turn

n What is the IQR of age?

adopted from Hadley Wickham

Page 30: Descriptive Statistics with R

maxmax(..., na.rm = FALSE)

Page 31: Descriptive Statistics with R

min

min(..., na.rm = FALSE)

Page 32: Descriptive Statistics with R

Your turn

n What are the minimum and maximum of age?

adopted from Hadley Wickham

Page 33: Descriptive Statistics with R

rangerange(..., na.rm = FALSE)

Page 34: Descriptive Statistics with R

Your turn

n What the range of age?

adopted from Hadley Wickham

Page 35: Descriptive Statistics with R

We now resort toexternal packages

Page 36: Descriptive Statistics with R

e1071, psychInstall and Load

Page 37: Descriptive Statistics with R

To load a package by command

library(package)

package name here

double quote “” can be omitted

Page 38: Descriptive Statistics with R

Assessment of normality

Page 39: Descriptive Statistics with R

Load e1071 package

Page 40: Descriptive Statistics with R

skewnessskewness(x, na.rm = FALSE, type = 3)

type = 2 SAStype = 1 Stata

library(e1071)

Page 41: Descriptive Statistics with R

kurtosiskurtosis(x, na.rm = FALSE, type = 3)

type = 2 SAStype = 1 Stata

library(e1071)

Page 42: Descriptive Statistics with R

Your turn

n What are the skewness and kurtosis of age by the Stata-method?

adopted from Hadley Wickham

Page 43: Descriptive Statistics with R

Multiple variablesat once

Page 44: Descriptive Statistics with R

summarysummary(object, ...)

Page 45: Descriptive Statistics with R

Your turn

n Try summary on the dataset (data frame).

adopted from Hadley Wickham

Page 46: Descriptive Statistics with R

describedescribe(x, na.rm = TRUE, interp = FALSE, skew =

TRUE, ranges = TRUE,trim = .1, type = 3)type = 2 SAStype = 1 Stata

library(psych)Various summary

measures

Page 47: Descriptive Statistics with R

Your turn

n describe(bone[,-1], type = 2)

adopted from Hadley Wickham

Page 48: Descriptive Statistics with R

describeBydescribeBy(x, group=NULL,mat=FALSE,type=3,...)

type = 2 SAStype = 1 Stata

library(psych)Groupwise summary

Page 49: Descriptive Statistics with R

Your turn

n describeBy(bone[ , c(-1)] , bone$zyg , type = 2)

adopted from Hadley Wickham

zyg vector for groupingbone data frame

without 1st columns

SAS method for skewness and kurtosis

Page 50: Descriptive Statistics with R

Ingredients

n Summary statistics for continuous data

n Normal data

n Non-normal data

n Normality check

n vector and data frame

n DATA$VAR extraction

n Indexing by [row,col]

n Various functions

n skewness(), kurtosis()

n summary()

n describe(), describeBy()

Statistics Programming

Page 51: Descriptive Statistics with R