Upload
douglas-wood
View
233
Download
2
Embed Size (px)
Citation preview
CorrelationHal Whitehead
BIOL4062/5062
• The correlation coefficient
• Tests
• Non-parametric correlations
• Partial correlation
• Multiple correlation
• Autocorrelation
• Many correlation coefficients
The correlation coefficient
Linked observations: x1,x2,...,xn y1,y2,...,yn
Mean: x = Σ xi / n y = Σ yi / n
Variance: S²(x)= Σ(xi-x)²/(n-1) S²(y)= Σ(yi-y)²/(n-1)
Standard Deviation:
S(x) S(y) Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)
Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)
Correlation coefficient
(“Pearson” or “product-moment”):
r = {Σ(xi-x) ∙ (yi-y) / (n-1) } / {S(x) ∙ S(y)}
r = S²(x,y) / {S(x) ∙ S(y)}
The correlation coefficient:
r = S²(x,y) / {S(x) ∙ S(y)}
-1 ≤ r ≤ +1
If no linear relationship: r = 0
r2: proportion of variance accounted for by linear regression
r = -0.01
r = 0.38
r = -0.31
r = 0.95
r = 0.04
r = 0.64
r = -0.46
r = 0.99
r = -0.0
Tests on Correlation Coefficients
Tests on Correlation Coefficients• Assume:
– Independence– Bivariate Normality
Tests on Correlation Coefficients• Assume:
– Independence– Bivariate Normality
Tests on Correlation Coefficients• Assume:
– Independence
– Bivariate Normality
• Then:
z = Ln [(1+r)/(1-r)]/2 is normally distributed
with variance 1/(n-3)
And, if (true population value of r) = 0 :
r ∙ √(n-2) / √(1-r²) is distributed as Student's t with n-2 degrees of freedom
We can test:
a) r ≠ 0
b) r > 0 or r < 0
c) r = constant
d) r(x,y) = r(z,w)
Also confidence intervals for r
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism
0
10
20
30
Rel
ativ
e M
elon
Are
a
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
r = 0.75
(SE = 0.15)
(95% C.I. 0.47-0.89)
Tests:
r ≠ 0 : P = 0.0001
r > 0 : P = 0.00005-30 -20 -10 0 10 20 30 40 50 60
Sexual Size Dimorphism
0
10
20
30
Rel
ativ
e M
elon
Are
a
More sexually dimorphic specieshave relatively larger melons
Why do Large Animals have Large Brains?
(Schoenemann Brain Behav. Evol. 2004)• Correlations among mammals
– Log brain size with
• Log muscle mass
r=0.984
• Log fat mass r=0.942
• Are these significantly different?
t=5.50; df=36; P<0.01
Hotelling-William test
• Brain mass is more closely related to muscle than fat 0.1 1.0 10.0 100.0 1000.0
Fat/Muscle mass (g)
1.0
10.0
100.0
Bra
in m
ass
(g)
MuscleFat
Non-Parametric Correlation
Non-Parametric Correlation
• If one variable normally distributed– can test r=0 as before.
• If neither normally distributed:– Spearman's rS rank correlation coefficient
(replace values by ranks)
or:– Kendall's τ correlation coefficient
• Use Spearman's when there is less certainty about the close rankings
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
r = 0.75
rS = 0.62
τ= 0.47
-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism
0
10
20
30
Rel
ativ
e M
elon
Are
a
Partial Correlation
Partial Correlation• Correlation between X and Y controlling for Z
r (X,Y|Z) = {r(X,Y) - r(X,Z)∙r(Y,Z)}
√{(1 - r(X,Z)²)∙(1 - r(Y,Z)²)}
• Correlation between X and Y controlling for W,Zr (X,Y|W,Z) = {r(X,Y|W) - r(X,Z|W)∙r(Y,Z|W)}
√{(1 - r(X,Z|W)²)∙(1 - r(Y,Z|W)²)}
n-2-c degrees of freedom
(c is number of control variables)
Why do Large Animals have Large Brains?
(Schoenemann Brain Behav. Evol. 2004)
• Correlations among mammals
– Log brain size with
Log muscle mass
Controlling for Log body mass
r=0.466
Log fat mass
Controlling for Log body mass
r=-0.299
• Fatter species have relatively smaller brains and more muscular species relatively larger brains
Semi-partial Correlation Coefficient
• Correlation between X & Y controlling Y for Z
r (X,(Y|Z)) = {r(X,Y) - r(X,Z)∙r(Y,Z)}
√(1 - r(Y,Z)²)
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
Correlation
r = 0.75
Partial Correlation
r (SSD,MA|L) = 0.73
Semi-partial Correlations
r (SSD,(MA|L)) = 0.69
r ((SSD |L),MA) = 0.71
ME
LA
RE
AS
SD
MELAREA
LE
NG
TH
SSD LENGTH
Multiple Correlation
Multiple Correlation Coefficient
• Correlation between one dependent variable and its best estimate from a regression on several independent variables:
r(Y∙X1,X2,X3,...)
• Square of multiple correlation coefficient is:– proportion of variance accounted for by multiple
regression
Multiple Partial Correlation Coefficient
!
Autocorrelation
Autocorrelation
• Purposes– Examine time series
– Look at (serial) independence
Data
(e.g. Feeding rate on consecutive days,
plankton biomass at each station on a transect):
1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6
Autocorrelation of lag=1 is correlation between:
1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7
1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6
r = 0.508
Autocorrelation of lag=2 is correlation between:
1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9
4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6
r = -0.053
…….
Autocorrelation Plot
0 5 10 15Lag
-1.0
-0.5
0.0
0.5
1.0
Cor
rela
t ion
Autocorrelation Plot (Correlogram)
Many Correlation Coefficients
Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]
NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00
Listwise deletion, n=40; P<0.10; P<0.05; uncorrected
Expected no. with P<0.10 = 3.6; with P<0.05 = 1.8
Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]
NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33 1.00LSPEED 0.10 -0.28 0.06 1.00APROP -0.15 -0.34 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29 -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00
Listwise deletion, n=40; P<0.10; P<0.05; Bonferroni corrected
P=1.0 for all coefficients
Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]
NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00
Listwise deletion, n=40; P<0.10; P<0.05; uncorrected
Pairwise deletion, n=59-118; P<0.10; P<0.05; uncorrectedNGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR
NGR25L 1.00SST 0.11 1.00SHITR -0.17+ -0.46* 1.00LSPEED 0.05 -0.17 0.05 1.00APROP -0.05 -0.20+ 0.04 0.31* 1.00SOCV -0.00 -0.05 -0.06 -0.02 -0.25* 1.00SHR2 -0.15 -0.13 0.07 -0.14 0.05 0.01 1.00LFMECS 0.01 0.07 -0.02 -0.14 -0.25* 0.43* -0.26+ 1.00LAERR -0.06 0.06 0.09 -0.27* -0.20+ 0.06 -0.06 0.21+ 1.00
Many Correlation Coefficients
• Missing values:– Listwise deletion (comparability), or– Pairwise deletion (power)
• P-values:– Uncorrected: type 1 errors– Bonferroni, etc.: type 2 errors
Beware!
Correlation Causation
Y1 Y2
Y1 Y3
Y4
Y2 Y5
Y1
Y3
Y2
Y2
Y1 Y3
Y4
Y1 Y3
Y4
Y2 Y5
Y1 Y3
Y4
Y5
Y2 Y6