14
Correlation & Correlation & Regression Regression

Correlation & Regression

  • Upload
    abbott

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Correlation & Regression. Correlation. Measure the strength of linear relation between 2 random variables (X & Y)  = Corr(X,Y) = Cov(X,Y)/ δ x δ y = E[(X- μ x )(Y- μ y )]/[(X- μ x ) 2 (Y- μ y ) 2 ] 1/2 Standardized Cov(X,Y) so -1   1. Strength of . - PowerPoint PPT Presentation

Citation preview

Page 1: Correlation & Regression

Correlation & Correlation & RegressionRegression

Page 2: Correlation & Regression

Correlation Correlation

• Measure the strength of linear Measure the strength of linear relation between 2 random variables relation between 2 random variables (X & Y)(X & Y)

= Corr(X,Y) = Cov(X,Y)/= Corr(X,Y) = Cov(X,Y)/δδxxδδyy

• = E[(X-= E[(X-μμxx)(Y-)(Y-μμyy)]/[(X- )]/[(X- μμxx))22(Y- (Y- μμyy))22]]1/2 1/2

• Standardized Cov(X,Y) so -1Standardized Cov(X,Y) so -1 1 1

Page 3: Correlation & Regression

Strength of Strength of = -1 = -1 Perfect Negative linear Perfect Negative linear

relationrelation

= 1 = 1 Perfect Positive linear relation Perfect Positive linear relation

= 0 = 0 No linear relation No linear relation

• As |As || increases so does the strength of | increases so does the strength of the relationship the relationship

Page 4: Correlation & Regression

SampleSample

• Cov(X,Y) = 1/(n-1) Cov(X,Y) = 1/(n-1) (x(xii - -x)(yx)(yi i --y)y)

• Corr(X,Y) = r = Corr(X,Y) = r =

• (x(xii - -x)(yx)(yi i --y)/[y)/[(x(xii - -x)x)22 (y(yi i --y)y)22]]1/21/2

Page 5: Correlation & Regression

Hypothesis TestHypothesis Test

• Null: Null: H H00: : = 0 = 0

• Alternative: HAlternative: HAA: : 0; reject H 0; reject H00 if t>t if t>tn-2,n-2,/2/2

• Alternative: HAlternative: HAA: : > 0; reject H > 0; reject H00 if t > t if t > tn-2,n-2,

• Alternative: HAlternative: HAA: : < 0; reject H < 0; reject H00 if t < -t if t < -tn-2,n-2,

Page 6: Correlation & Regression

Rank Correlation Rank Correlation (Spearman’s)(Spearman’s)

• Sample Correlation (r) can be affected by Sample Correlation (r) can be affected by extreme observationsextreme observations

• Spearman’s RankSpearman’s Rank

– 11stst rank xi and yi then calculate sample rank xi and yi then calculate sample correlation of these rankscorrelation of these ranks

– rrss = 1- [6( = 1- [6(dd22)/n(n)/n(n22-1)]-1)]

– Where dWhere di i = the differences of the ranked pairs= the differences of the ranked pairs

Page 7: Correlation & Regression

Linear Regression Linear Regression • Find/Define relationship between dependent Find/Define relationship between dependent

variable and independent variablevariable and independent variable

• Use independent variable to explain the Use independent variable to explain the behavior of the dependent variablebehavior of the dependent variable

• Separate variation in the data into explained Separate variation in the data into explained variation and unexplained variation (noise)variation and unexplained variation (noise)

• Predict the value of the dependent variable Predict the value of the dependent variable given a value for the independent variablegiven a value for the independent variable

Page 8: Correlation & Regression

Linear Regression ModelLinear Regression Model

• Predict Y given XPredict Y given X

• E(Y|X=x) = E(Y|X=x) = 00 + + 11xx

• Y = Y = 00 + + 11xxii + + i i

• Assumptions:Assumptions: I I are random variablesare random variables

– E[E[ii] = 0 ] = 0

– E[E[i i ii] = ] = δδ22

– E[E[i i kk] = 0 i] = 0 ik; they are uncorrelatedk; they are uncorrelated

Page 9: Correlation & Regression

Sum of SquaresSum of Squares

• Total Sum of Squares = Total Sum of Squares = Regression sum of squares + Error sum of Regression sum of squares + Error sum of

squaressquares

• SST = SSR + SSESST = SSR + SSE

(y(yi i --y)y)2 2 = = (y(yi i --y)y)2 2 + + ee22

ii

Page 10: Correlation & Regression

Coefficient of Coefficient of Determination (RDetermination (R22))

• Measures how well x explain the Measures how well x explain the variation in Yvariation in Y

• RR22 = SSR/SST = 1- SSE/SST = r = SSR/SST = 1- SSE/SST = r22

• RR22 measures the explained variation measures the explained variation in the datain the data

Page 11: Correlation & Regression

Confidence IntervalConfidence Interval

• Error Variance: SError Variance: S22ee = = ee22

ii/(n-2) = SSE/(n-2)/(n-2) = SSE/(n-2)

• Unbiased Estimate of Unbiased Estimate of δδ22bb: S: S22

bb = S = S22ee//(x(xii - -x)x)22

• t = (b-t = (b-)/S)/Sbb

• C.I. for Regression Slope =C.I. for Regression Slope = b-tb-tn-2n-2,,/2/2SSbb < < < b+t < b+tn-2n-2,,/2/2SSbb

Page 12: Correlation & Regression

Regression Slope TestsRegression Slope Tests

• HH00: : = = 00 or H or H00: : 00 vs. H vs. H11: : > > 00

• Reject HReject H00 if (b- if (b-)/S)/Sbb > t > tn-2,n-2,

• HH00: : = = 00 or H or H00: : 00 vs. H vs. H11: : < < 00

• Reject HReject H00 if (b- if (b-)/S)/Sbb < -t < -tn-2,n-2,

• HH00: : = = 00 vs. H vs. H11: : 00

• Reject HReject H00 if (b- if (b-)/S)/Sbb > t > tn-2,n-2, or (b-or (b-)/S)/Sbb < -t < -tn-2,n-2,

Page 13: Correlation & Regression

SAS: Inches-CentimeterSAS: Inches-Centimeter• DataData Height; Height;• Input inches centimeter;Input inches centimeter;• Datalines;Datalines;• 11 2.542.54• 22 5.085.08• 2424 60.9660.96• 44 10.1610.16• 55 12.712.7• 1616 40.6440.64• 77 17.7817.78• 88 20.3220.32• 1919 48.2648.26• 1010 25.425.4• 2020 50.850.8• 2525 63.563.5• ;;• ProcProc PlotPlot Data=Height; Data=Height;• Plot inches*centimeter;Plot inches*centimeter;• ProcProc CorrCorr Data=Height; Data=Height;• Title 'Correlation Matrix of Inches vs. Centimeter';Title 'Correlation Matrix of Inches vs. Centimeter';• Var inches centimeter;Var inches centimeter;• ProcProc RegReg Data=Height; Data=Height;• Title 'Regression Line for Inches-Centimeter Data';Title 'Regression Line for Inches-Centimeter Data';• Model inches=centimeter;Model inches=centimeter;• Plot Predicted.*centimeter = 'P'Plot Predicted.*centimeter = 'P'• U95M.*centimeter = '-' L95M.*centimeter = '_'U95M.*centimeter = '-' L95M.*centimeter = '_'• inches*centimeter = '*' / overlay;inches*centimeter = '*' / overlay;• Plot Residual.*centimeter = 'o';Plot Residual.*centimeter = 'o';• Quit;Quit;

Page 14: Correlation & Regression

SAS: GRE – GPA DataSAS: GRE – GPA Data• DataData GRE_GPA; GRE_GPA;• Input GRE GPA;Input GRE GPA;• Datalines;Datalines;• 21002100 44• 19201920 3.83.8• 22902290 3.83.8• 15801580 3.93.9• 14001400 3.773.77• 13001300 3.953.95• 20202020 3.83.8• 10601060 3.543.54• 15001500 33• 19001900 44• 19001900 3.73.7• 18001800 3.53.5• 22002200 44• 19901990 3.513.51• 20002000 44• 16501650 3.83.8• 16401640 3.753.75• 18001800 3.93.9• 23002300 3.913.91• 20002000 3.753.75• 20002000 3.93.9• ;;• ProcProc PlotPlot Data=GRE_GPA; Data=GRE_GPA;• Plot GRE*GPA;Plot GRE*GPA;• ProcProc CorrCorr Data=GRE_GPA; Data=GRE_GPA;• Title 'Correlation Matrix of GRE vs. GPA';Title 'Correlation Matrix of GRE vs. GPA';• Var GRE GPA;Var GRE GPA;• ProcProc RegReg Data=GRE_GPA; Data=GRE_GPA;• Title 'Regression Line for GRE-GPA Data';Title 'Regression Line for GRE-GPA Data';• Model GPA=GRE;Model GPA=GRE;• Plot Predicted.*GRE = 'P'Plot Predicted.*GRE = 'P'• U95M.*GRE = '-' L95M.*GRE = '_'U95M.*GRE = '-' L95M.*GRE = '_'• GPA*GRE = '*' / overlay;GPA*GRE = '*' / overlay;• Plot Residual.*GRE = 'o';Plot Residual.*GRE = 'o';• Quit;Quit;