View
2.088
Download
1
Category
Preview:
DESCRIPTION
Citation preview
Descriptive Statistics with
2012-10-12 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO KNOW
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
n Introduction to R
n Reading Data into R (1)
n Reading Data into R (2)
Previously in this group
Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
Menu
n mean and sd
n median, quantiles, IQR, max, min, and range
n skewness and kurtosis
n smarter ways of doing these
Ingredients
n Summary statistics for continuous data
n Normal data
n Non-normal data
n Normality check
n vector and data frame
n DATA$VAR extraction
n Indexing by [row,col]
n Various functions
n skewness(), kurtosis()
n summary()
n describe(), describeBy()
Statistics Programming
http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
Data loadedWhat’s next?
Descriptive Statistics
http://www.ehow.com/info_8650637_descriptive-statistical-methods.html
Descriptive statistics is the
discipline of quantitatively describing the main features of a collection of data
http://en.wikipedia.org/wiki/Descriptive_statistics
Open R Studio
http://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20bI&product_isbn_issn=9780538733496
Download comma-separated and Excel
BONEDEN.DAT.txtPut them in folder
Read in BONEDEN.DAT.txt
Name it bone
DATA$VARe.g., mean(bone$age)
Accessing a single variable in data set
dataset name variable name
vector
http://healthy-india.org/enviromentalhealth/direct_indirect2.html
?
1 2 3 4 5 6 7 8
like strings with values attached
“A” “B” “C” “D” “E” “F” “G” “H”
DATA$VAR is a vector
OR
1 2 3 4 5 6 7 8
Multiple vectors of same length tied together
“A” “B
” “C” “D
” “E” “F” “G
” “H”
DATA is a data frame
1 2 3 4 5 6 7 8“A
” “B” “C
” “D” “E
” “F” “G” “H
”
1 2 3 4 5 6 7 8
“A” “B
” “C” “D
” “E” “F” “G
” “H”
Tied here
bone[1:15 , 1:12]
Extract 1st to 15th rows Extract 1st to 12th columns
Indexing: extraction of data from data frame
Don’t forget commaColon in between
age vector within bone data frame
bone$age
Extracted as a vector
meanmean(x, trim = 0, na.rm = FALSE)
Your turn
n What is the mean of age?
adopted from Hadley Wickham
sdsd(x, na.rm = FALSE)
Your turn
n What is the sd of age?
adopted from Hadley Wickham
medianmedian(x, na.rm = FALSE)
Your turn
n What is the median of age?
adopted from Hadley Wickham
quantilequantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
names = TRUE, type = 7)
0th, 25th, 50th, 75th, and 100th percentiles by defaults
Your turn
n What is the 25th and 75th percentiles of age?
adopted from Hadley Wickham
IQRIQR(x, na.rm = FALSE, type = 7)
75th percentile - 25th percentile
Your turn
n What is the IQR of age?
adopted from Hadley Wickham
maxmax(..., na.rm = FALSE)
min
min(..., na.rm = FALSE)
Your turn
n What are the minimum and maximum of age?
adopted from Hadley Wickham
rangerange(..., na.rm = FALSE)
Your turn
n What the range of age?
adopted from Hadley Wickham
We now resort toexternal packages
e1071, psychInstall and Load
To load a package by command
library(package)
package name here
double quote “” can be omitted
Assessment of normality
Load e1071 package
skewnessskewness(x, na.rm = FALSE, type = 3)
type = 2 SAStype = 1 Stata
library(e1071)
kurtosiskurtosis(x, na.rm = FALSE, type = 3)
type = 2 SAStype = 1 Stata
library(e1071)
Your turn
n What are the skewness and kurtosis of age by the Stata-method?
adopted from Hadley Wickham
Multiple variablesat once
summarysummary(object, ...)
Your turn
n Try summary on the dataset (data frame).
adopted from Hadley Wickham
describedescribe(x, na.rm = TRUE, interp = FALSE, skew =
TRUE, ranges = TRUE,trim = .1, type = 3)type = 2 SAStype = 1 Stata
library(psych)Various summary
measures
Your turn
n describe(bone[,-1], type = 2)
adopted from Hadley Wickham
describeBydescribeBy(x, group=NULL,mat=FALSE,type=3,...)
type = 2 SAStype = 1 Stata
library(psych)Groupwise summary
Your turn
n describeBy(bone[ , c(-1)] , bone$zyg , type = 2)
adopted from Hadley Wickham
zyg vector for groupingbone data frame
without 1st columns
SAS method for skewness and kurtosis
Ingredients
n Summary statistics for continuous data
n Normal data
n Non-normal data
n Normality check
n vector and data frame
n DATA$VAR extraction
n Indexing by [row,col]
n Various functions
n skewness(), kurtosis()
n summary()
n describe(), describeBy()
Statistics Programming
Recommended