37
Angewandte statistische Regression I Dr. Matteo Tanadini [email protected] Herbst Semester 2019 (ETHZ) 7. Vorlesung Angewandte statistische Regression I 1 / 37

Dr. Matteo Tanadini [email protected] Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Angewandte statistische Regression I

Dr. Matteo [email protected]

Herbst Semester 2019 (ETHZ)

7. Vorlesung Angewandte statistische Regression I 1 / 37

Page 2: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Outline

1 Introduction

2 Designs

3 Get to know data

4 Fitting a Mixed Effects Model with lme4

5 Checking the model assumptions

6 Inference procedure

7 Concluding remarks

7. Vorlesung Angewandte statistische Regression I 2 / 37

Page 3: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Section 1

Introduction

7. Vorlesung Angewandte statistische Regression I 3 / 37

Page 4: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

What is a Mixed Model?

Definition: a statistical model that contains both random and fixedeffects.

Example: a clinical trial where a new drug is tested along with the goldstandard.

Usually fixed effects are predictors of interest, for which we want to drawconclusions (e.g. drug).

Random effects are predictors that need to be considered in the analysis(design variables e.g. patient), but are usually not of primary interest1.

1Nevertheless, we may be interested in their variability.7. Vorlesung Angewandte statistische Regression I 4 / 37

Page 5: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

What is a Mixed Model?

It comes natural to use Mixed Effect Models when “grouped data“ ispresent. This data arises in experiments where an “experimental unit” ismeasured more than once (e.g. diameter of a tree measured yearly).

How to decide whether a variable must be considered to be random ornot?

Design of the experiment/study

Questions addressed

Hospitals example:Response variable: length of stay [days]Design: 10 hospitals, about 1.000 hospitalisations per hospitalQ1: What is the average length of stay?Q2: Which hospital has, on average, the longest stays?

7. Vorlesung Angewandte statistische Regression I 5 / 37

Page 6: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Orthodont Example

data(Orthodont, package = "nlme")

names(Orthodont)

[1] "distance" "age" "Subject" "Sex"

head(Orthodont)

Grouped Data: distance ~ age | Subject

distance age Subject Sex

1 26 8 M01 Male

2 25 10 M01 Male

3 29 12 M01 Male

4 31 14 M01 Male

5 22 8 M02 Male

6 22 10 M02 Male

7. Vorlesung Angewandte statistische Regression I 6 / 37

Page 7: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Orthodont Example

age

dist

ance

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

This graph was produced with the xyplot() of lattice package [6]. Similar graphs can be produced with ggplot2 package [7].

7. Vorlesung Angewandte statistische Regression I 7 / 37

Page 8: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Orthodont Example

Aims: model the growth of the measured distance2 and inspect genderdifferences. For more information about the data set typehelp(Orthodont) in R.

Response variable: distance

Fixed: Sex and age

Random: Subject

We are interested in growth (i.e. age) and gender (i.e. Sex).We must consider person, but this is not our main interest.

2between the pituitary gland and the pterygomaxillary fissure [mm].7. Vorlesung Angewandte statistische Regression I 8 / 37

Page 9: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Can we drop “Subject” from the analysis?

Given that we are not interested in “Subject”, we drop it from theanalysis? Can we? (motivate)

age

dist

ance

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

7. Vorlesung Angewandte statistische Regression I 9 / 37

Page 10: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Can we drop “Subject” from the analysis?

Let’s fit a simple linear model:

lm.Ort <- lm(distance ~ age * Sex, data = Orthodont)

summary(lm.Ort)

Call:

lm(formula = distance ~ age * Sex, data = Orthodont)

Residuals:

Min 1Q Median 3Q Max

-5.616 -1.322 -0.168 1.330 5.247

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.341 1.416 11.54 < 2e-16 ***

age 0.784 0.126 6.22 1.1e-08 ***

SexFemale 1.032 2.219 0.47 0.64

age:SexFemale -0.305 0.198 -1.54 0.13

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 2.3 on 104 degrees of freedom

Multiple R-squared: 0.423,Adjusted R-squared: 0.406

F-statistic: 25.4 on 3 and 104 DF, p-value: 2.11e-12

Is that fine? Let’s check the model assumptions

7. Vorlesung Angewandte statistische Regression I 10 / 37

Page 11: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Can we drop “Subject” from the analysis?

par(mfrow = c(2, 2), oma = c(0, 0, 0, 0), mar = c(2, 4, 2, 1))

plot(lm.Ort)

21 22 23 24 25 26 27

−6

−4

−2

02

46

Fitted values

Res

idua

ls

Residuals vs Fitted

49

3539

−2 −1 0 1 2

−2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

49

35 39

21 22 23 24 25 26 27

0.0

0.5

1.0

1.5

Sta

ndar

dize

d re

sidu

als

Scale−Location49

3539

0.00 0.01 0.02 0.03 0.04 0.05 0.06

−3

−2

−1

01

2

Sta

ndar

dize

d re

sidu

als

Cook's distance

Residuals vs Leverage

101104

49

The main model assumptions look good! What about theindependence of the observations?

7. Vorlesung Angewandte statistische Regression I 11 / 37

Page 12: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Can we drop “Subject” from the analysis?

resi

dual

s(lm

.Ort

)

−6

−4

−2

0

2

4

F07M

05 F08M

16 F03M

11 F05 F11M

03M

06 F02M

08M

12 F10M

14M

09 F04M

02M

15 F09 F06M

04M

01 F01M

13M

07M

10

There is a clear “Subject” effect! Omitting it from the analysis is incorrect.

7. Vorlesung Angewandte statistische Regression I 12 / 37

Page 13: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Can we drop “Subject” from the analysis?

The previous graph clearly showed that if we don’t consider “Subject” inour analysis, we cannot assume that observations are independent.

Thus, the inference procedure (i.e. computing p-values and estimatingconfidence intervals) is not valid when subjects are omitted3.

Design variables should always be there!

3Although the estimated regression coefficients are still unbiased.7. Vorlesung Angewandte statistische Regression I 13 / 37

Page 14: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Section 2

Designs

7. Vorlesung Angewandte statistische Regression I 14 / 37

Page 15: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Experiments can have several random effects:

Agricultural studywith plots and subplots, response variable = plant height

Educational studywith schools, classes and scholars, response variable = grade

Fish reproduction studywith mothers and fathers, response variable = larvae survival

7. Vorlesung Angewandte statistische Regression I 15 / 37

Page 16: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Experiments can have several random effects, and things becomplex/complicated

Educational study (II)schools, teachers and scholars, response variable = grade

Pedomorphological studysites, response variable = [Zn] and [Cu]

Survival studycomputer processors monitored over time, response variable =working/broken

Note: these are the 3 most common situations that yield “grouped data”.

Warning: Not all studies that have “grouped data” are best analysedwithin the framework of Mixed Effects Models.

7. Vorlesung Angewandte statistische Regression I 16 / 37

Page 17: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Section 3

Get to know data

7. Vorlesung Angewandte statistische Regression I 17 / 37

Page 18: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

When dealing with Mixed Models it is essential to understand the datastructure.

There are 2 fundamental steps to carry out before fitting models:

summaries (str(), head(), table(), ...)

graphics

The structure of the data determines what questions you can answer.

7. Vorlesung Angewandte statistische Regression I 18 / 37

Page 19: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

See MixedModels Lab.pdf

7. Vorlesung Angewandte statistische Regression I 19 / 37

Page 20: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Section 4

Fitting a Mixed Effects Model with lme4

7. Vorlesung Angewandte statistische Regression I 20 / 37

Page 21: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

lme4

Currently the best and more powerful way to fit Mixed Models in R is byusing the lme4 package [1].

To fit a Linear Mixed Effect Model we use the function lmer()

There are 2 arguments that need to be specified:formula4 and data.

mod.0 <- lmer(distance ~ age * Sex + (1 | Subject),

data = Orthodont)

4Which must contain at least one random effect.7. Vorlesung Angewandte statistische Regression I 21 / 37

Page 22: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Models can contain several random effects:

mem.0 <- lmer(grade ~ subject + (1 | School) + (1 | Class) + (1 | Student),

data = MathGrades)

##

fish.0 <- lmer(weight ~ days + (1 | Mother) + (1 | Father) + (1 | larvae),

data = FishDeveloppement)

We don’t need to specify whether random effects are nested or crossed.

7. Vorlesung Angewandte statistische Regression I 22 / 37

Page 23: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Section 5

Checking the model assumptions

7. Vorlesung Angewandte statistische Regression I 23 / 37

Page 24: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Model assumptions

There are two classes of assumptions:

Structural assumptions (model equation)

Distributional assumptions

All assumptions are assessed simultaneously using graphs.

7. Vorlesung Angewandte statistische Regression I 24 / 37

Page 25: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Structural assumptions

When fitting mod.0 we assumed growth to be linear, and we allowed thetwo gender groups to have different slopes.We also assumed that growth was the same within gender groups. In otherwords, that regression lines for subjects run parallel to the gender line (i.esimple random intercept).

We may assume that growth (slopes) are the same in both groups, in thatcase we would fit the model:

mem.1 <- lmer(distance ~ age + Sex + (1 | Subject),

data = Orthodont)

The graphical analysis suggested quite strongly that the two groups havedifferent slopes.

7. Vorlesung Angewandte statistische Regression I 25 / 37

Page 26: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Distributional assumptions

distancei ,j ,k = β(0)i + β

(1)i · agei ,j ,k + bj + εi ,j ,k

i = 1, ..., nsex j = 1, ..., nsub k = 1, ...,N

ε ∼ N (0, σ2ε)

b ∼ N (0, σ2b)

7. Vorlesung Angewandte statistische Regression I 26 / 37

Page 27: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

See MixedModels Lab.pdf

7. Vorlesung Angewandte statistische Regression I 27 / 37

Page 28: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

The fundamental graphics to assess model assumptions are:

Tukey-Anscombe plot

Scale-location plot

Quantile-Quantile plots

Residuals against predictor plots

Other further graphics (if required)

7. Vorlesung Angewandte statistische Regression I 28 / 37

Page 29: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Section 6

Inference procedure

7. Vorlesung Angewandte statistische Regression I 29 / 37

Page 30: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

p-values and confidence intervals can be computed with the functionsanova(), drop1() and confint()

Fixed effects are usually tested with Likelihood Ratio Tests.

For random effects, we prefer to estimate confidence intervals via profilinglikelihood (profile()).

7. Vorlesung Angewandte statistische Regression I 30 / 37

Page 31: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

See MixedModels Lab.pdf

7. Vorlesung Angewandte statistische Regression I 31 / 37

Page 32: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Section 7

Concluding remarks

7. Vorlesung Angewandte statistische Regression I 32 / 37

Page 33: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Take home message

Mixed Effects Models are very powerful methods well suited toanalyse a wide variety of modern experiments

They are relatively easy to use

Complex experiments/studies require the use of appropriatevisualisation techniques

The is some debate on how to perform inference. However, rememberthat statistical significance does not imply practical/biologicalsignificance.

lme4 is one of the most up-to-date software to analyse theseexperiments. It does not implement *everything* for good reasons

Extensions to LMMs and GLMMS exist (e.g. mgcv)

7. Vorlesung Angewandte statistische Regression I 33 / 37

Page 34: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Further topics (not covered here):

Generalised Mixed Models (GLMMs; glmer(..., family =...))

Generalised Additive Mixed Models (GAMMs; mgcv)

R2 and stepwise model selection (Don’t use them with LMMs!)

Partial residual plots

Dynamic documents (knitr and RMarkdown)

Getting advice (Internet, consulting offices)

Literature (see next slide)

...

7. Vorlesung Angewandte statistische Regression I 34 / 37

Page 35: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Literature

lme4: Mixed-effects modeling with R [2] (short!)

Mixed Effects Models in S and S-plus [4]

Extending the linear model with R: generalized linear, mixed effectsand non-parametric regression models [3]

Generalized additive models: an introduction with R [8]

Lattice: multivariate data visualization with R [5]

7. Vorlesung Angewandte statistische Regression I 35 / 37

Page 36: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Bibliography I

Douglas Bates et al. lme4: Linear Mixed-Effects Models using ’Eigen’and S4. R package version 1.1-18-1. 2018. url:https://CRAN.R-project.org/package=lme4.

Douglas M Bates. “lme4: Mixed-effects modeling with R”. In: URLhttp://lme4. r-forge. r-project. org/book (2010).

Julian J Faraway. Extending the linear model with R: generalizedlinear, mixed effects and nonparametric regression models. CRC press,2005.

Jose Pinheiro and Douglas Bates. Mixed-effects models in S andS-PLUS. Springer Science & Business Media, 2006.

Deepayan Sarkar. Lattice: multivariate data visualization with R.Springer Science & Business Media, 2008.

7. Vorlesung Angewandte statistische Regression I 36 / 37

Page 37: Dr. Matteo Tanadini matteo.tanadini@math.ethz.ch Herbst ... · Julian J Faraway. Extending the linear model with R: generalized linear, mixed e ects and nonparametric regression models.CRC

Bibliography II

Deepayan Sarkar. lattice: Trellis Graphics for R. R package version0.20-35. 2017. url:https://CRAN.R-project.org/package=lattice.

Hadley Wickham et al. ggplot2: Create Elegant Data VisualisationsUsing the Grammar of Graphics. R package version 3.0.0. 2018. url:https://CRAN.R-project.org/package=ggplot2.

Simon Wood. Generalized additive models: an introduction with R.CRC press, 2006.

7. Vorlesung Angewandte statistische Regression I 37 / 37