Descriptive Statistics with R

Preview:

DESCRIPTION

 

Citation preview

Descriptive Statistics with

2012-10-12 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO  KNOW

Group Website is at:

http://rpubs.com/kaz_yos/useR_at_HSPH

n Introduction to R

n Reading Data into R (1)

n Reading Data into R (2)

Previously in this group

Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH

Menu

n mean and sd

n median, quantiles, IQR, max, min, and range

n skewness and kurtosis

n smarter ways of doing these

Ingredients

n Summary statistics for continuous data

n Normal data

n Non-normal data

n Normality check

n vector and data frame

n DATA$VAR extraction

n Indexing by [row,col]

n Various functions

n skewness(), kurtosis()

n summary()

n describe(), describeBy()

Statistics Programming

http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html

Data loadedWhat’s next?

Descriptive Statistics

http://www.ehow.com/info_8650637_descriptive-statistical-methods.html

Descriptive statistics is the

discipline of quantitatively describing the main features of a collection of data

http://en.wikipedia.org/wiki/Descriptive_statistics

Open R Studio

Read in BONEDEN.DAT.txt

Name it bone

DATA$VARe.g., mean(bone$age)

Accessing a single variable in data set

dataset name variable name

vector

1 2 3 4 5 6 7 8

like strings with values attached

“A” “B” “C” “D” “E” “F” “G” “H”

DATA$VAR is a vector

OR

1 2 3 4 5 6 7 8

Multiple vectors of same length tied together

“A” “B

” “C” “D

” “E” “F” “G

” “H”

DATA is a data frame

1 2 3 4 5 6 7 8“A

” “B” “C

” “D” “E

” “F” “G” “H

1 2 3 4 5 6 7 8

“A” “B

” “C” “D

” “E” “F” “G

” “H”

Tied here

bone[1:15 , 1:12]

Extract 1st to 15th rows Extract 1st to 12th columns

Indexing: extraction of data from data frame

Don’t forget commaColon in between

age vector within bone data frame

bone$age

Extracted as a vector

meanmean(x, trim = 0, na.rm = FALSE)

Your turn

n What is the mean of age?

adopted from Hadley Wickham

sdsd(x, na.rm = FALSE)

Your turn

n What is the sd of age?

adopted from Hadley Wickham

medianmedian(x, na.rm = FALSE)

Your turn

n What is the median of age?

adopted from Hadley Wickham

quantilequantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,

names = TRUE, type = 7)

0th, 25th, 50th, 75th, and 100th percentiles by defaults

Your turn

n What is the 25th and 75th percentiles of age?

adopted from Hadley Wickham

IQRIQR(x, na.rm = FALSE, type = 7)

75th percentile - 25th percentile

Your turn

n What is the IQR of age?

adopted from Hadley Wickham

maxmax(..., na.rm = FALSE)

min

min(..., na.rm = FALSE)

Your turn

n What are the minimum and maximum of age?

adopted from Hadley Wickham

rangerange(..., na.rm = FALSE)

Your turn

n What the range of age?

adopted from Hadley Wickham

We now resort toexternal packages

e1071, psychInstall and Load

To load a package by command

library(package)

package name here

double quote “” can be omitted

Assessment of normality

Load e1071 package

skewnessskewness(x, na.rm = FALSE, type = 3)

type = 2 SAStype = 1 Stata

library(e1071)

kurtosiskurtosis(x, na.rm = FALSE, type = 3)

type = 2 SAStype = 1 Stata

library(e1071)

Your turn

n What are the skewness and kurtosis of age by the Stata-method?

adopted from Hadley Wickham

Multiple variablesat once

summarysummary(object, ...)

Your turn

n Try summary on the dataset (data frame).

adopted from Hadley Wickham

describedescribe(x, na.rm = TRUE, interp = FALSE, skew =

TRUE, ranges = TRUE,trim = .1, type = 3)type = 2 SAStype = 1 Stata

library(psych)Various summary

measures

Your turn

n describe(bone[,-1], type = 2)

adopted from Hadley Wickham

describeBydescribeBy(x, group=NULL,mat=FALSE,type=3,...)

type = 2 SAStype = 1 Stata

library(psych)Groupwise summary

Your turn

n describeBy(bone[ , c(-1)] , bone$zyg , type = 2)

adopted from Hadley Wickham

zyg vector for groupingbone data frame

without 1st columns

SAS method for skewness and kurtosis

Ingredients

n Summary statistics for continuous data

n Normal data

n Non-normal data

n Normality check

n vector and data frame

n DATA$VAR extraction

n Indexing by [row,col]

n Various functions

n skewness(), kurtosis()

n summary()

n describe(), describeBy()

Statistics Programming

Recommended