A Rag Aw 2

8/8/2019 A Rag Aw 2

1/106

8/8/2019 A Rag Aw 2

2/106

8/8/2019 A Rag Aw 2

3/106

8/8/2019 A Rag Aw 2

4/106

8/8/2019 A Rag Aw 2

5/106

8/8/2019 A Rag Aw 2

6/106

8/8/2019 A Rag Aw 2

7/106

8/8/2019 A Rag Aw 2

8/106

4CHAPTER 1. THE LINEAR MIXED MODEL AND THE RANDOMIZED BLOCK DESIGN

A basic linear model has only 1 random effect, the experiment error. In the above dataset, Subjects are a random sample from a population and are random effects so the modelhas 2 random effects- subjects and error. The terminology mixed models is used when thereare models for the xed effects and more than 1 random effect but the generic term linear model is consistent for models where the xed and random components are additive.

For the vector of responses for Subject i,

yi = X i + Z i bi + i (1.1)biN (0,

2b ), i N (0,

2I ) , cov(b, ) = 0

where

X i =

1 0 0 01 1 0 01 0 1 0

1 0 0 1

, Z i =

111

1

(1.2)

The Z matrix is an matrix of 1s or zeroes to indicating subject i.Over the whole data set,

y1......

y9 36,1

=

X 1......

X 9 36,4

1 2 3 4 4,1

+

Z 1. . .

. . .Z 9 36,9

b1...

b9 9,1

+

1......36 36,1

The contrasts used in equation (1.2) are treatment contrasts leading to the followinginterpretations of the parameters,

1 mean of stool 1 2 effect of stool 2 compared to stool 1 3 effect of stool 3 compared to stool 1 4 effect of stool 4 compared to stool 1

Other contrasts are simply different ways of spanning the parameter space and one setof linearly independent contrasts can be transformed into another.

The analysis is done vie the lme() function and so continuing on from the previous code,

mmodel

8/8/2019 A Rag Aw 2

9/106

1.1. RANDOMIZED BLOCK 5

lme(effort ~ Type,random=~1 | Subject)

and

lm(effort ~ Type + Subject)

would differ.The usual caveats apply; accept the model if the assumptions are not contravened. To

nish off this job, you would need to check the residuals,

plot(mmodel)

A newer version of lme is lmer which is contained in the package lme4 . This is widelyused in the text Data Analysis Using Regression and Multilevel/Hierarchial Models byAndrew Gelman and Jennifer Hill [3]. They augment lme4 with other functions containedin the package arm which is companion to the textbook. An example of these is the functiondisplay .

library(lme4) mmodel2

8/8/2019 A Rag Aw 2

10/106


Figure 1.2: Worker productivity on machines

score

6

2

4

1

3

5

45 50 55 60 65 70

q qq

q qq

qq

q q

q q

q q

Figure 1.3: Worker machines interaction

45

50

55

60

65

70

Machine

m e a n o

f s c o r e

A B C

Worker

531426

8/8/2019 A Rag Aw 2

11/106

1.1. RANDOMIZED BLOCK 7

machine2

8/8/2019 A Rag Aw 2

12/106


Note that the columns of Z 1 span the space of X 1 but the strategy of REML estimationdoes not have these matrices in conict because the analysis projects the data into theorthogonal spaces of (i) treatment contrasts and (ii) error contrasts by

LT 1LT 2

y = y1y2

where L1, L2 satisfy LT 1 X = X p and LT 2 X = 0 (n p) , for example L1 = X (X T X )1X T and

L2 = I X (X T X )1X T . Then(y) = (y1)

xed+ (y2)

randomWe pursue this theory (see Verbyla (1990)) [13] in detail later on.

8/8/2019 A Rag Aw 2

13/106

Chapter 2

Balanced incomplete blocks

Incomplete blocks comprise a set of designs where it is not possible to allocate everytreatment in each block. This may arise if there is insufficient homogeneous material towhich the treatments are to be applied. A randomized block, where the blocks are knownto be heterogeneous, will lead to over inated experiment error which of course reducesthe power of the block.

1. In taste testing, the palate fatigues so it is advisable to restrict the number of samples.If 10 treatments need to be compared and each taster (block) is reliable for only 5,then the treatments need to be allocated so that all 10 treatments can be compared.were we to test only 5 at a time, we would incur the extra variance due to differentpanels.

2. A physiotherapist is researching the use of the web for remote treatment. The exper-iment requires that practioners rate 15 conditions by video. A randomized completeblock would require each practitioner to examine 15 videos, each of about 1 hour du-ration. The fatigue factor suggests extra variance due to the order of the assessment(ie. rst or last etc.) and the possibility of raters not completing their assessmentsput the viability of the experiment at risk. The problem is averted with an incompleteblock design.

3. In a factory experiment, different factorial combinations are are to be trialled. But asthe experiment progresses, the environment is likely to change as the factory warmsup so that the treatments are measured under different ambient conditions. To getfair comparisons, block size should be restricted so that treatments are measuredunder homogeneous conditions.

4. In a 20 team competition, there is not enough time for each team to play the otherstwice in a season. After round 1, the competition draw becomes an incomplete blockexperiment which allows favourable comparison of points.

From (10.6), var = 2(X T X )1. The optimal variance of the parameters is a balancebetween the degrees of freedom for the design matrix X and the variance, 2. A large

9

8/8/2019 A Rag Aw 2

14/106

10 CHAPTER 2. BALANCED INCOMPLETE BLOCKS

RCB may reduce |(X T X )1| but at the expense of 2. Small block sizes will amelioratethis and the right balance is sought. This balance may be inuenced as much by practicalexperiment reasons as by the mathematics.

In chapter 1, it was mentioned that in a RCB (ie. each treatment occurs the samenumber of times across blocks), blocks and treatments are orthogonal so blocks could betted as xed effects and Total SS = Block SS + Treat SS. However the mixed modeland REML estimation does not demand balance because the model has cov( , b) = 0 andthe estimation is via a xed model and a random model which are orthogonal. Hencethe balance restriction is freed by REML and since Incomplete blocks do not have eachtreatment occurring in each block an equal number of times, the convenient analysis is viathe mixed model.

A major class of nonorthogonal designs are those known as balanced incompleteblock designs. As the word incomplete suggests, not all treatments occur in each block.

The balance referred to is a general balance (i.e. it refers to such matters as the number of times each pair of treatments occur together in a block) rather than meaning all treatmentsoccur together in all blocks.

2.1 Balanced incomplete blocksWhen systematic differences exist between units, blocking is often used as a device toimprove precision. The blocks are formed from groups of similar units, and if all thetreatments under investigation are applied randomly within each block, then it is possibleto make a fair comparison between the treatments. Unfortunately the blocks may notalways contain enough units to accommodate all the treatments and an alternative solutionto Randomised Blocks is needed if the number of treatments is not to be reduced. If alltreatment comparisons are equally important then the most satisfactory design is Balanced Incomplete Blocks (BIB).

The BIB design has three main properties

(a) all blocks have the same number of units,

(b) all treatments are equally replicated,

(c) all treatment pairs occur in the same block equally often over the whole design.

It is convenient to introduce a standard notation to describe the features of a BIB design,and these parameters are now widely accepted.

b number of blocksv number of treatments (sometimes t)k number of units in a blockr number of replicates for each treatmentn total number of units concurrence

8/8/2019 A Rag Aw 2

15/106

2.1. BALANCED INCOMPLETE BLOCKS 11

The concurrence is the number of times two treatments occur together in the same block.The properties of a BIB design may therefore be stated in the alternative form that v > kand k,r, are constant over all blocks and treatments.

Example 2.1

A typical example of a BIB design is given below.

Blocksunits 1 2 3 4 5 6 7 8 9 10

(i) 1 1 1 1 1 1 2 2 2 3(ii) 2 2 2 3 3 4 3 3 4 4

(iii) 3 4 5 4 5 5 4 5 5 5

Observe that treatments are conventionally described by numbers. The above plan isin its basic form before randomisation. In practice, the experimenter should allocate thenumbers randomly to the treatments, randomise the block order and the assignment of theselected treatments to the units within each block. In the example given it is seen thatb = 10, v = 5 , k = 3 , r = 6 , n = 30, = 3.

If the rows are considered as blocks the design is a BIB with b = v = 7 and k = r = 4and = 2. Notice however that each column contains each treatment exactly once, so thistype of design is suitable when blocking is required in two directions. The double blockingis reminiscent of the properties of a Latin Square and this type of design is called a Youden Rectangle . An alternative description for Youden Rectangles is Incomplete Latin Squares,and a method of generating them is to omit certain columns of particular Latin Squaredesigns. It is not always possible to nd a Youden Rectangle of a prescribed specication,but it is always possible to nd a Youden rectangle where the number of treatments is onemore than the block size. All that is needed is for the last column to be dropped from theappropriate Latin Square and a Youden Rectangle will always result. However it is not soeasy if more than two columns need be dropped.

For the particular design in Example ?? the complete Latin Square is formed throughaugmentation with a further Youden rectangle.

1 2 3 6 4 5 72 3 4 7 5 6 13 4 5 1 6 7 24 5 6 2 7 1 35 6 7 3 1 2 46 7 1 4 2 3 57 1 2 5 3 4 6

8/8/2019 A Rag Aw 2

16/106


The advantage of a resolvable BIB is that it can also be analysed as a RCB if theblocks are not signicantly different (or that they do not account for sufficient extraneousvariation). The extra degrees of freedom can be pooled into a single random term, ie theerror, and we may improve the power of the design to detect differences by increasing errordf without inating the variance. But we may make big inroads into the error by removingblock effects.

We can compare 2 designs by the ratio of the variance of a treatment contrast fromdesign 1 to its counterpart of design 2 and this is termed EFFICIENCY. The efficiency of a BIB to a RCB is given by,

E =var( i j )BIBvar( i j )RCB

(2.1)

=22/r

2k2

/v= v/rk = (1 1/k )/ (1 1/v ) 1Certain resolvable designs have arisen to maximise efficiency compared to a RCB,

v = b2, k = b, is called a square lattice, v = b(b1), k = b1 is called a rectangular lattice, v = b(bl), k = ( bl) is called an alpha-design

2.2 Statistical model for BIB designs.The model is similar to the randomized block design,

yij = i + b j + ij (2.2)

but each block contains only a subset of the treatments.If we treat blocks as xed effects and remove their effects prior to examining treatments,

the information about the treatments comes only from comparisons within blocks and isknown as an intra-block analysis. However, we should treat blocks as another randomcomponent because they are random samples from a population so the random parts of the model (2.2) are modelled by

inter-block, b j N (0, 2 ) (2.3)

intra-block, ij N (0, 2) (2.4)

E (Y ij ) = j (2.5)var( Y ij ) = 2b +

2 (2.6)

Maximum likelihood estimates can be derived from

(y; , 2 , 2) =

n2

log(2) n2

log(2 + 2)

12

i j

(yij i )2(2 + 2)

8/8/2019 A Rag Aw 2

17/106

2.3. THE OLD WAY 13

2.3 The old wayFor balanced incomplete blocks, the appropriate variances can be determined by AOV.

Source df MS E(MS)Treat (v-1) MS(Treat) 2 + (kbb)(v1) i

2i

Blocks (b-1) MS(Blocks) 2 + (kbv)(b1) 2

Error bk-v-b-1 MS(Error) 2Total (bk-1)

Thus for the balanced case, equating expected and observed mean squares leads toestimates of the variance components and better estimates of the variances of treatmentmeans, eg (2.6). This is termed an inter-block analysis.

Each block is intended to be relatively homogeneous and so intra-block variance shouldbe less than inter-block variance.

If the treatments are not balanced across blocks, the AOV method will not work becausewe cannot plug in the block size k. The REML estimation arose from this situation, (Patterson & Thompson, 1971), but rapidly found widespread application in all facets of statistics.

Example 2.2

Below is a set of data whose design is a balanced incomplete block, followed by an old-fashioned analysis.

blocks treat response1 A 201 B 231 C 162 A 222 C 192 D 173 B 253 C 203 D 21

4 A 164 B 204 D 14

8/8/2019 A Rag Aw 2

18/106


lmodel F)

treat 3 48.2 16.1 13.6 0.0077 **blocks 3 58.8 19.6 16.6 0.0050 **Residuals 5 5.9 1.2

If the blocks term was not signicant, we would resort to a completely random design.However, it is signicant here and because of non-orthogonality there is some informationabout the treatments contained in the blocks.

To get estimates of the treatment means, we get a prediction of what each treatmentwould have been if it had been allocated to each block. This is done by using the linearmodel. In the above example, the block and treatments effects are given by

BIB.effects |t|)(Intercept) 19.792 0.8308 23.822 2.428e-06

blocks2 1.625 0.9421 1.725 1.451e-01blocks3 3.375 0.9421 3.583 1.583e-02blocks4 -3.000 0.9421 -3.184 2.442e-02

treatB 2.750 0.9421 2.919 3.305e-02treatC -3.125 0.9421 -3.317 2.107e-02treatD -3.125 0.9421 -3.317 2.107e-02

The predictions of treatments means for each block are shown in Table 2.1. We knowthat treatment A was not in block 3 but we get an estimate of it as if it did occur there.Likewise treatment B in block 2, treatment C in block 4 and treatment D in block 1 areestimated.

With less effort, we can see that the estimates of treatment means for A, B, C, D are

ABC D

=

1 1414

14 0 0 0

1 1414

14 1 0 0

1 14 14 14 0 1 01 14

14

14 0 0 1

19.791.6253.375

32.753.1253.125

For this experiment b = 4, k = 3, v = 4 so equating the observed Mean Squares in theAOV to their expected values yields,

2 =(19.6 1.2)

(3 4 4)/ (4 1)= 6 .9

8/8/2019 A Rag Aw 2

19/106

2.4. ANALYSIS USING A MIXED MODEL 15

Table 2.1: Estimates of treatment means for each block of a BIBBlock Treat Intercept block eff treat eff treat est mean

1 A 19.79 = 19.792 A 19.79 + 1.625 = 21.423 A 19.79 + 3.375 = 23.174 A 19.79 - 3 = 16.79 20.291 B 19.79 + 2.75 = 22.542 B 19.79 + 1.625 + 2.75 = 24.173 B 19.79 + 3.375 + 2.75 = 25.924 B 19.79 - 3 + 2.75 = 19.54 23.041 C 19.79 - 3.125 = 16.672 C 19.79 + 1.625 - 3.125 = 18.293 C 19.79 + 3.375 - 3.125 = 20.044 C 19.79 - 3 - 3.125 = 13.67 17.171 D 19.79 - 3.125 = 16.672 D 19.79 + 1.625 - 3.125 = 18.293 D 19.79 + 3.375 - 3.125 = 20.044 D 19.79 - 3 - 3.125 = 13.67 17.17

2.4 Analysis using a mixed modelBy modelling the block effects as nuisance effects orthogonal to the treatment efects, wesimplify the analysis.

library(nlme)blocks

8/8/2019 A Rag Aw 2

20/106


StdDev: 2.6 1.1

Fixed effects: response ~ treatValue Std.Error DF t-value p-value

(Intercept) 20.2 1.47 5 13.8

8/8/2019 A Rag Aw 2

21/106

Chapter 3

Split plot designs

3.1 The modelThe model for a randomised block design arranged in split plots is

yijk = + i + j + ij

main plots+ k + ( )ik + ijk

sub plotswhere j is a block effect, i is a main treatment effect and k is a sub-plot treatmenteffect. It is important to notice that there are two error terms and . Both errors areassumed to be independent N (0,

2m ) and N (0,

2) and the blocks are randomfactors, j

N (0, 2b)

In the following section, b represents block size and m is the number of main plots.We have from the linear model the following variances

var( Y ijk ) = 2m + 2

var( Y i k ) =2m + 2

b

var( Y k ) =2m + 2

bm

cov(Y ijk , Y ijl ) =2

2 + 2m

The model formula for covariance amongst sub-plot means is all important in designingand interpreting split-plot designs. We see that if the split-model holds, there is a positiveand constant correlation between sub-plot means. If this does not hold, the model is awryand the design is a dud.

17

8/8/2019 A Rag Aw 2

22/106

18 CHAPTER 3. SPLIT PLOT DESIGNS

Table 3.1: Expected Mean Squares for A split experiment

Source df E(MS)Blocks (b-1)Main (m-1) 2 + t 2m + tb(m 1)

2i

BM (Error a) (b-1)(m-1) 2 + t 2mTreat (t-1) 2 + mb( t1) 2kMT (m-1)(t-1) 2 + b(m 1)( t1) ( )2ik(Error b) ( b1)[(t 1) + ( m 1)(t 1)] 2Total (bmt-1)

The following excerpt is from the Splus manual, ch 12.Split-plots are also encountered because of restricted randomization. For example, an experiment involving oven temperature and baking time will probably not randomize theoven temperature totally; but rather only change the temperature after all of the runs for that temperature have been made. This type of design is often mistakenly analysed as if there were no restrictions on the randomisation.

Example 3.1

Five varieties of spring wheat were sown in a randomised blocks design in four blocks.The soil was treated with three different levels of nitrogen randomly allocated to equal

areas within each plot. The design and yields in tons/ha were as given.

V2 V5 V1 V4 V3N1 N3 N2 N2 N3 N1 N1 N2 N3 N1 N3 N2 N2 N1 N3

Block 1 4.6 5.5 5.3 5.0 5.4 4.7 5.5 6.1 6.4 5.0 6.0 5.7 5.5 4.9 5.8V1 V3 V2 V5 V4

N3 N1 N2 N1 N3 N2 N3 N2 N1 N2 N3 N1 N2 N1 N3Block 2 5.8 5.0 5.5 4.9 5.5 5.4 5.4 5.0 4.7 4.6 5.0 4.2 6.2 5.7 6.5

V5 V1 V3 V2 V4N2 N3 N1 N2 N3 N1 N3 N1 N2 N1 N3 N2 N1 N3 N2

Block 3 4.8 5.0 4.6 5.4 5.9 5.0 5.5 4.8 4.7 5.0 5.8 5.1 5.3 6.7 5.8V2 V3 V4 V1 V5

N3 N1 N2 N3 N2 N1 N2 N1 N3 N1 N2 N3 N3 N2 N1Block 4 5.9 5.0 5.6 4.8 4.6 4.0 5.1 4.7 5.4 5.2 5.5 5.8 5.2 4.8 4.4

The treatment levels were

8/8/2019 A Rag Aw 2

23/106

3.2. THE COMPARISON OF TREATMENT MEANS 19

V1 Timmo N1 30 kg/haV2 Sicco N2 60 kg/haV3 Sappo N3 90 kg/haV4 HighburyV5 Maris Dove

The split plot AOV is:-

Error: mainplotDf Sum of Sq Mean Sq F Value Pr(F)

block 3 1.005 0.335 1.047 0.4073variety 4 6.449 1.612 5.040 0.0128Residuals 12 3.839 0.320

Error: WithinDf Sum of Sq Mean Sq F Value Pr(F)

nitrogen 2 6.487 3.244 147.1 0.0000variety:nitrogen 8 0.105 0.013 0.6 0.7756Residuals 30 0.662 0.022

The analysis has detected large differences between Nitrogen levels, and a smaller, butsignicant difference, between Varieties. The interaction between Nitrogen and Variety isnot signicant, so the varieties can be assumed to respond similarly to the various levelsof Nitrogen. Notice that the Main Plot Error is considerably greater than the Sub PlotError, which indicates that the use of a split plot design is justied for this experiment, forotherwise the F value for Nitrogen, and its interaction, would have been much lower anddifferences would not have been detected with the same precision.

3.2 The comparison of treatment meansWhen signicant differences have been detected it is reasonable to ask where these differ-ences occur. There are many methods available for an examination of treatment differencesin comparative experiments, but the standard error of the difference between two treatmentmeans is often helpful for this purpose. Its evaluation depends upon the result that if X andY are independent random variables from the distributions X N(1,

21), YN(2,

22),

and n1, n2 independent samples are drawn from these distributions then

X Y N 1 2,21n1

+22n2

.

If the variances are equal, then the variance of the difference between two treatment meansis 2( 1n 1 +

1n 2 ) and if the levels of replication are both equal to r the variance of the mean

difference between any two treatments is given by 2 2

r .The numbers of levels of M, T and Blocks are m, t and b, the Error variance for main plotis 2m , and for sub plots 2. It can be shown that the expected values of the Error MeanSquares for main plots and sub plots are 2 + t 2m and 2.

8/8/2019 A Rag Aw 2

24/106


The error difference between Main Plot Treatments p and q is

p. + p..

q.

q.. .

Since each is the mean of b values, and each is the mean of bt values and all errors in thisexpression are independent, the variance of the difference between Main Plot treatmentmeans is

2b

2m +2bt

2 =2bt

(2 + t 2m ) .

The error difference between Sub Plot treatment means r and s is

..r ..sand the Main Plot errors cancel since each Sub Plot treatment occurs in every Main Plot.As each contains mb independent errors the variance of the difference between Sub Plottreatment means is thus 2mb

2.The interaction is a little more complicated as the error difference is

p. + p.r q. q.sSuppose that two treatment combinations have the same main plot treatment, then theterms cancel as they are exactly the same error term. The variance of the error difference p.q r.s is 2b 2. However for different main plots the terms do not cancel so the varianceof the error difference is

22mb

+ 22

b=

2b

(2 + 2m ) .

Table 3.2: Summary table for variances of treatment differences

Main plot treatments 2bt (2 + t 2m )

Sub plot treatments 2mb 2

Combinations(same Main Plot Treatment) 2b 2

Combinations(different Main Plot Treatment) 2b (2 + 2m )

8/8/2019 A Rag Aw 2

25/106

3.2. THE COMPARISON OF TREATMENT MEANS 21

In the analysis of variance table the main plot error estimates 2 + t 2m and the sub ploterror estimates 2. Hence the variance of the difference between T M means with differinglevels of M is

2b

(2 + 2m ) .

It follows therefore that the variance of the T M treatment difference is2bt

(Main Plot Error + ( t 1)Sub Plot Error)In the example b = 4 , t = 3 , m = 5 so 2 = 0 .022 and hence 2m = 0 .099. The Varietymeans are

V1 V2 V3 V4 V55.59 5.24 5.03 5.68 4.81

and so 0.053 is the variance of the difference. It should be remembered that thisestimate is not very precise as Varieties were not the main source of interest.The Nitrogen means are

N1 N2 N34.860 5.285 5.665

and the variance of the difference is 0.022. However, since the Nitrogen levels areevenly spaced, it would seem better to examine the Nitrogen effect for linear and quadraticcomponents.The interaction is not signicant, so does not justify close scrutiny. However, the variance of the difference is 0.01103 for VarietyNitrogen combinations with the same level of Variety,0.06067 otherwise. An inspection of the two way table of means for Variety and Nitrogenindicates that the response for each factor behaves in a parallel fashion in the presenceof the other.

Means tableV1 V2 V3 V4 V5 all V

N1 5.175 4.825 4.050 5.175 4.475 4.860N2 5.625 5.250 5.050 5.700 4.800 5.285N3 5.975 5.650 5.400 6.150 5.150 5.665all N 5.592 5.242 5.033 5.675 4.808 5.270

One point that emerges from this analysis is the comparison of the error variances.For comparisons of Sub Plot treatments the error variance is 2, but if the mt treatmentshad been arranged in b randomised blocks a much greater error variance could have beenexpected since the Main Plot error is 14.5 times as great as the Sub Plot error. Some of thecomparisons that can be made e.g. of Main Plot treatment levels at xed levels of the SubPlot treatment, or across different Main Plot and Sub Plot treatment levels, have majorinferential problems. These are due to the estimates of variance involving two differentResidual Mean Squares with different d.f. and different expectations.

8/8/2019 A Rag Aw 2

26/106


3.3 The mixed model analysisSee [12].

For a vector of observations from block i and mainplot i,j, we write the model as

y ij = X ij + Z1,ij b i + Z2,ij d ij + ij (3.1)b iN (0, 1) , d ij N (0, 2) ij N (0,

2I)

Furthermore, the random effects are uncorrelated.The design matrix for blocks is Z1, Z2 is the design matrix for the block by main

treatment interaction, 2 is diag(2m ),.In the lme() function, we include a random term for blocks and main plots (variety in

this case) within blocks,

Wheat

8/8/2019 A Rag Aw 2

27/106

3.3. THE MIXED MODEL ANALYSIS 23

Hence the estimates of variances are:-

2 = 0 .022

2m =(0.32 0.022)3 = 0 .1

and these estimates (Moment estimates) agree with the mixed model estimates.

8/8/2019 A Rag Aw 2

28/106


8/8/2019 A Rag Aw 2

29/106

Chapter 4

Mixed Model Theory

The linear mixed model has the general form,y = X + Zu + (4.1)

= X | Z

u+ (4.2)

uN (0, D ), N (0, R ) , E (u ) = 0

4.1 Residual Maximum LikelihoodTemporarily consider model (4.1) as

y = X + , N (0, 2V ) (4.3)

where

= Z1u 1 + Z2u 2 + . . . +

V = I +n u

i=1

i Z i ZT i i = 2i /

2e , E (ui u j ) = 0 , E (ui , ) = 0

The purpose of rewriting the model this way is to indicate that the variance componentsare from the space of error contrasts .

For known Z i , the variance matrix contains unknown parameters i and the dispersionparameter 2. In a model with only 1 random component, the measurement error, it is notnecessary to model the random components because they are residuals. With more than1 random component, we need to model sources of the variance.

In mixed models we estimate variance components for

(i) efficiency of xed effects,

25

8/8/2019 A Rag Aw 2

30/106

26 CHAPTER 4. MIXED MODEL THEORY

(ii) reliable estimates of xed effects,

(iii) of interest in themselves, not nuisance parameters.

In a model containing only random effects, the variance components can be estimatedby equating observed mean squares to their expected values. In a balanced mixed effectsmodel, the old way was to estimate xed effects, subtract from the data and apply momentestimation to the residuals, (Hendersons methods). In cases where there is a lack of balance, matrix multiplication becomes exorbitant so we turn to likelihood methods.

The normal likelihood function is

L( , , |y ) = (2 2)n2 |V |

12 exp

122

(y X )T V 1 (y X )

and its logarithm is

( , , |y) = n2

log(2 2) 12

log |V | 1

22(y X )

T V 1 (y X ) . (4.4)Then,

= 0 = ( X T V 1X )X T V 1y (4.5)

i

= 0 i = (4.6)

2 = 0

2

= n1

(Y X )T

1

(y X ) (4.7)The equations are solved by the iterative Newton-Raphson algorithm,

1...

2 j

=

1...

2 j1

2 2 . . . 2

1 2

21. . .

...... . . .

2

( 2 )2

1

j 1

1...

2 j 1

(4.8)

j = j 1 I j1 j 1The matrix of second derivatives is called the Hessian and the negative inverse is the

observed information matrix , I . The expectation of the negative of the inverse Hessian isthe Fisher Information matrix .In (4.5) and (4.7), we see that an estimate of is required to get an estimate of 2

which is a double dipping of the information contained in the data. Residual MaximumLikelihood arose from the observation that estimates of variance components in mixedmodel estimation were often biassed downwards whereas they were not for the variancecomponent model (ie all random effects).

8/8/2019 A Rag Aw 2

31/106

4.2. PROPERTIES OF REML 27

Patterson and Thompson [6] subdivided the likelihood into 2 spaces,

(i) the space spanned by treatment contrasts, and

(ii) the space spanned by error contrasts, orthogonal to the treatment contrasts.

The treatment given below is well explained in Verbyla (1990) [13].For L = ( L1, L2)T such that LT 1 X = I p and L2 such that LT 2 X = 0(n p) ,

LT y = LT 1

LT 2 y =yT 1yT 2

then,yT 1yT

2

N B p0(n p)

, 2 L1V LT 1 L1V LT 2

L2V LT

1L

2V LT

2One choice for L1 (of rank p) is (X T X )1X T .The likelihood can be written as the sum of orthogonal components,

( , , |y ) = 1( |y1)

xed+ 2(, |y2)

set of error contrasts.

Variance parameter estimates are estimated by maximizing 2 while estimation of isthrough 1.

The forms of 2log( 2) and 2log( ) are given below in (4.9) and (4.10),

2log( 2) = ( n p) log(2) log |X T X | +log |V |+ log |X T V 1X |+ ( y X )T V 1(y X )(4.9)2log( ) = ( n p) log(2) + log |V |+ log |X T V 1X |+ ( y X )T V 1(y X )

(4.10)

The difference between (4.9) and (4.10) is the term log |X T X | which penalises thelikelihood of the nuisance parameters whose information is in y2 for the degrees of freedomused in calculating the interest parameters, .

4.2 Properties of REML(i) For every set of (N p) linear error contrasts, the REML log-likelihood is the same.

(ii) No matter what ( N p) linearly independent error contrasts are used, maximisingtheir likelihood always leads to the same equations for estimating variance compo-nents i and 2.

(iii) Optimal property. The derivation which shows 2 to be a marginal likelihood forY 2 = LT 2 y also shows that 2 is the conditional likelihood of y| .

8/8/2019 A Rag Aw 2

32/106


(a) For given ( i , 2), the estimator is sufficient for , so it is this conditionallikelihood which is used to nd the most powerful similar tests of hypothesesconcerning (

i, 2).

(b) estimators ,s2 are jointly sufficient for ,2 for any given so that most power-ful similar tests for alone may be constructed using the conditional distributionof (y|, s 2). Now s2/ 2 2n p so that the density function of s2,

f 1(s2) =sn p2 exp s

2

2 4

n p2n p

2 n p2

and the distribution of ( y|, s 2),

f 2(y|, s

2

)exp log |V |+ log |X T

V 1

X |+ ( n p2)log s2

does not involve , 2. At = 0, V = V ( 0),

= ( X T V 1X )X T V 1ys2 = ( y X )T V 1(y X )

A most powerful similar test of

H 0 : = 0 versusH 1 : = 1

is 3( 1) 3( 0) K where K is chosen to give some signicant level.(iv) Unbiassed estimating equations.

= 12

n log(2 2) + log |V |+ 2(y X )T V 1(y X )where V = V (). The MLEs are derived by equating derivatives to zero,

= 2X T V 1(y X )

2 = (n/ 22

) + (1 / 24

)(y X )T

V 1

(y X ) i

= 12

log |V | i

+ 2(y X )T V 1 i

(y X )

The expectations of each of these expresions is zero. Moreover E = 0 when 2,

is substituted for 2, . Thus remains an unbiassed estimating equation giving

= ( X T V 1X )X T V 1y .

8/8/2019 A Rag Aw 2

33/106

4.2. PROPERTIES OF REML 29

However if is substituted for in 2 and

i , the expectation is no longer zero,

E (y

X )T V 1(y

X ) = ( n

p)2 .

Therefore,

E

2=

n22

+(n p)2

24=

n22

+(n p)

22.

Hence,

2 E

2=

(n p)22

+(y X )T V 1(y X )

24

becomes the unbiassed estimating equation for true 2 used in V 1. For estimating i , rst note that

E (y X )(y X )T = 2 V X (X T V 1X )1X T .The derivative for i has expectation given by,

2E i

= log |V |

i+ 2 E tr

i

(V 1(y X )(y X )T

= log |V |

i+ tr

V 1 i

V X (X T V 1X )1X T

= tr X T V 1 i

X (X T V 1X )1

= log

|X T V 1X

| i .The unbiassed estimating equation after is replaced by in i is

i

log |V |+

ilog |X T V 1X |+ 2(y X )T

V 1 i

(y X ) = 0which is the same as in previous expressions of REML.The construction of estimating equations via

E

= 0

is known as prole likelihood estimation [1].Write

= E

= 0

ie. = where = E d .

8/8/2019 A Rag Aw 2

34/106


Hence can be considered an adjustment term for the log-likelihood.For normal distributions, the adjustment is

= 12 log{2var( )}= 12 log |2 I |where I | is the information matrix for or X T V 1X .

4.3 Orthogonality of parametersOur aim is to estimate the conditional distribution of the interest parameters, given theMLE for nuisance parameters.

In (4.2), the parameter set is partitioned into 2 vectors of lengths p1, p2. In the mixedmodel we have stipulated that random effects are uncorrelated. Cox and Reid (1987) denethe orthogonality property for = ( , ), of lengths ( p1, p2), by

i, =1n

E

=1n

E 2

= 0 (4.11)

where i, is an element of the Information matrix. Orthogonality between interest andnuisance effects is a key condition of the REML method, even if it is local rather thanglobal orthogonality. The element i is assumed to be O(1) as n .1If (4.11) holds at only 1 parameter value 0, it is locally orthogonal at 0. Localorthogonality can always be achieved in a Hilbert space but global orthogonality is possibleonly in special cases.

For = ( , ),(a) the MLEs, and are asymptotically independent,

(b) se( ) is the same whether is known or not,

(c) the estimation of ( , ) is simpler,

(d) MLE of when is known, , varies only slowly with .To study the last point, expand the joint likelihood near ( , ),

(, ) = (, ) + ( ) ( ) ,

=0

+

12 ( ) ( )

2

2 2

2

2

2 ,

( )( )

+

O p ( ) ( )3

(, ) +12 n j, ( )

2 2n j ( )( ) n j ( )2(4.12)1

O(1) is explained in the appendix.

8/8/2019 A Rag Aw 2

35/106

4.4. MIXED MODEL EQUATIONS (MME) 31

where j = i + Z / n, E (Z ) = 0 and Z = O p(1).Rewriting (4.12) in terms of i and Z and differentiating wrt , satises the relation

ni ( ) + nZ ( ) + 12( )2n Z + 12( )

2n i

+ . . . = 0

Because of the following results ,Z

= O p(1)i = O(1)

= O p 1n = O p 1n

,

then iff i = 0 (ie orthogonality), the rst term is

O p(n) and the remaining terms are

O(1).The idea that is similar for different is illustrated in Figure 4.1 where = and = 2.

If = for all , then and are orthogonal parameters. Models from the expo-nential family contain as part of the canonical parameter and as the complementarypart of the expectation, eg.

Normal , 2

Gamma , (shape and scale parameters)Typically is orthogonal to

1, . . . ,

p2where is interest and the s are nuisance

but they can be of interest. This is the basis of REML - nding a representation of thenuisance effects orthogonal to the interest effects.

4.4 Mixed Model Equations (MME)From (4.1), we have

var( y ) = V = ZDZ T + R (4.13)X T V1X = X T V1y (4.14)

The estimating equations for the mixed model areX T R 1X X T R 1ZZT R 1X Z T R 1Z + D1

Bu =

X T R 1yZT R 1y (4.15)

Because the equations are solved iteratively, D may become singular and a preferablealternative to (4.15) is

X T R 1X X T R 1ZZT R 1X Z T R 1Z + I

B =

X T R 1yZT R 1y (4.16)

8/8/2019 A Rag Aw 2

36/106


Figure 4.1: Estimation of interest, given nuisance parameters and orthogonality

1 2

3 4

5 6

7 8

N u0 .2

0 .3

0 .4

0 .5

S i g m a 2

- 4 0 0

- 3 8 0

- 3 6 0

- 3 4 0

- 3 2 0

- 3 0 0

- 2 8 0

- 2 6 0

L o g - l

i k e

l i h o o

d

where D = u.

Taking the matrix results on faith,

Bu

=(XV 1X )X T V 1yDZ T V 1(y X )

(4.17)

8/8/2019 A Rag Aw 2

37/106

4.4. MIXED MODEL EQUATIONS (MME) 33

The algorithm is

1. Derive variance components of V by REML.

2. Given the variance parameters, calculate B by Generalised Least Squares (GLS).

3. Given variance components and B , calculate u .

Some points of note:-

(i) u is referred to as the Best Linear Unbiased Predictor (BLUP).

Although B is not invariant to the choice of ( X T V1X )1, the occurrence of B inu = DZ T V1(y XB ) is such that u is invariant to ( X T V1X )1 and u is the samefor all solutions of B .

(ii) Under normality assumptions, u is exactly the same as E (u|y), save for in placeof . Since under normality is the m.l.e. of when D and V are known, we canrefer to

u =

E (u |y ) = DZ T V 1(y XB )as the ML estimator of the conditional mean E (u|y).

8/8/2019 A Rag Aw 2

38/106


8/8/2019 A Rag Aw 2

39/106

Chapter 5

Variance Models

Systematic effects are modelled by curves, means etc. and random efects are modelledby density functions. At the heart of the multivariate density function is the covariancematrix. In the same way as we seek to generalise the systematic effect through models,we also nd smooth models for densities to conserve degrees of freedom yet retain theimportant information.

Before launching into a mixed-model analysis, recall that there must be repeated mea-sures (in time, space or measurement) from a sampling unit to have a mixed model withrandom subject effects. That means we have to mentally organise the data according toits source, ie. subject, repeated measures, treatment etc.

5.1 Variance ComponentsWrite the mixed model with separate the mean and covariance structures,

Y = X + E (5.1)

andEN (0, V ()) (5.2)

We assume that E can be additively decomposed into

1. random effects,

2. serially correlated variation and

3. measurement error,

E i,j = Z i u i + W i (t ij ) + i,j , (5.3)

and

var( E i,j ) = ZDZ + 2r R + 2I . (5.4)

35

8/8/2019 A Rag Aw 2

40/106

36 CHAPTER 5. VARIANCE MODELS

We decompose the systematic part of the model into components guided by the experi-ment design and exploratory data analysis. Once we recognise the structure of the randompart, we can explore for likely components.

5.1.1 Pure Serial CorrelationAssume that there are repeated measurements over time from a single subject,

y1, y2, . . . , yn ,

so that equation (5.3) reduces toE i,j = W i (t ij )

and (5.4) isvar( E i,j ) = 2R

The sample covariance amongst the E i,j is1121 22

... . . .

... . . .n 1 nn

= 2

1121 22... . . .... . . .

n 1 nnIf we use an outo-correlation function (u) to model the correlation of observations that areu apart, eg. (u) = exp( u), the covariance matrix is modelled in terms of the parameter.

V = 2

(0)(1) (0)

(2) (1) . . .... . . .

(n 1) (1) (0)5.1.2 Random effects plus measurement errorThe components from equation (5.3) are

E i,j = Z i u i + i,j .

Random Intercepts

If the random effects are scalar intercepts,

var( E ) = 22u2

J + I .

The variance of each E j is 2 + 2u and the correlation between any 2 measurements fromthe same unit is = 2u / (2 + 2u ).

8/8/2019 A Rag Aw 2

41/106

5.1. VARIANCE COMPONENTS 37

Random Slopes

If

Y = X +

Z

1 t11 t2...

...1 tK

1 t11 t2...

...1 tK

u 11u 12u 21u 22

+

where u 1 and u 2 are Normal random vectors with variances 21

and 22, representing inter-

cepts and slopes respectively,

var( E j ) = 21 + t2 j

22 +

2 .

For j = k,Cov(E j , E k ) = 21 + t j tk

22

where we see that covariance increases as t increases.

Correlation may be represented by random effects

Diggle, Liang and Zeger (1994), page 88 state

Whilst serial correlation would appear to be a natural feature of any longitu-dinal model, in specic applications its effect may be dominated by the combi-nation of random effects and measurement error.

5.1.3 Random effects, Serial Correlation and Measurement ErrorIf in the above example the time variable was not included in the Z matrix, the variancemodel would have components due to random intercepts, measurement error and as likely,there would be serial correlation.

The variance matrix (5.4) becomes

var( E ) = 2 I +2u2

J +2r2

R , (5.5)

where

I is the identity matrix, J is a matrix of 1s or 11 , R is a correlation matrix.The information on 2u comes from replication of treatment units, 2r is derived fromamongst times within units and 2 is residual.

8/8/2019 A Rag Aw 2

42/106


5.1.4 The variogramThe diagnostic to show the relative importance of each of the random components is thevariogram (see [2]), dened as

(v) =12

E {Y (t) Y (t v)}2 , v 0In a rst pass at the data, a saturated model is tted for the xed effects, saving the

residuals. The sample variogram is a plot of the observed half squared differences betweenpairs of residuals (vijk ) and plotted versus the corresponding time (or space) differences(ijk ),

vijk =12

(r ij r ik )2, ijk = t ij t ik .

A smooth line regressed through the points ( , v).The information contained in the vijk would have the contributions from

Measurement error, Components due to random effects, Serial correlation.Figure 5.1 is an idealised variogram which shows how the components affect the shapeof the curve. From the sample variogram of the residuals, we can gauge what variancecomponents should be modelled and when the variance model is satisfactory, the variogramshould be at.

Figure 5.1: Variogram

8/8/2019 A Rag Aw 2

43/106

5.2. MATRIX RESULTS 39

5.2 Matrix results

5.2.1 TearingSolution for B requires inverting V but as D is often block diagonal and R either diagonalor patterned (eg banded), we use the result that

V n ,n 1 = ( Zn,q D q,q ZT q,n + R )1n,n= R 1 R 1Z(D1 + ZT R 1Z)1ZT R 1 (5.6)

Now if R has a simple inverse and q

8/8/2019 A Rag Aw 2

44/106


5.3 Nested Random EffectsConsider a eld trial where spatial variation can be anticipated and where yields fromadjacent plots may be more alike (ie. higher correlation) than when the plots are morewidely separated.

row 1

row r

column1 c

1,1 1,c1,c

1,r r,c

One model would be to regard rows and columns as random effects,

Y 11...

Y 1c

Y r 1...

Y rc

= X +

1...

1 1...1

1...1

Ur1

...Urr

+

1. . .

11. . .

1

1. . .

1

Uc1

...Ucc

+

Rather than estimating individual row and column effects (which are nuisance), wecould detrend the spatial variation amongst residuals across the eld with a model of random effects. This has the effect of mathematically adjusting the yields from individualplots to what they would be on the average plot, ie. subtract the plot effect.

If the correlation amongst rows is r (|i j |) and columns is c(|m n|), the correlationbetween 2 plots which are dr rows apart and dc columns apart may be reasonably repre-sented by r (dr ) c(dc); this is called a separable process. The rc rc correlation matrixis

rc,rc = R r,r C c,c .

8/8/2019 A Rag Aw 2

45/106

8/8/2019 A Rag Aw 2

46/106


Dene distributions,

b iN (0, i) , iN (0,

2I) , cov(b i, i) = 0 ,

where

i=21 + 22 21 2121 21 + 22 2121 21 21 + 22

.

The imatrix is compound symmetric.The R code for this model and the output are listed below.

library(nlme)data(Oats) mm1.Oats

8/8/2019 A Rag Aw 2

47/106

5.4. PATTERNED MATRICES IN R 43

5.4.2 Split-plot experiment on Oats - alternative way 2Dene the design matrix and random effects as

Z i =1 1 0 01 0 1 01 0 0 1

1111

, b i =

bibi, 1bi, 2bi, 3

and b i N (0, ). The covariance matrix for random effects is

i =

2122

22

22

the R code and output is

mm2.Oats

8/8/2019 A Rag Aw 2

48/106


5.5 Crossed Random EffectsCrossed random effects are modelled by a combination of pdBlocked and pdIdent objects.

Example

The data are log(optical density) measures from cell cultures in 2 blocks of 30 wells,

1 2 3 4 5

6

5

4

3

2

1

Dilutions

There are 6 samples (treatments) randomly assigned to rows and 5 serial dilutionsrandomly assigned to columns. the data are Assay in the nlme library.

The systematic effects are sample*dilut and the random effects areblock/(row + column) .

Index the blocks by i, the rows by j , columns by k. Then

yijk = + s j + dk + ( s : d) jk + bi + r ij + cik + ijk ,biN (0,

2b ) r ij N (0,

2r ) cij N (0,

2c ) ijk N (0,

2)cov(b, r) = 0 cov(b, c) = 0 cov(r, c ) = 0

Write the vector of random effects as

U i =

bir i, 1

...r i, 6ci, 1

...ci, 5

.

Then

var( U i ) =2b

2r I62c I5

.

That is U i has a block-diagonal structure with each block being a multiple of the identity.The R code to construct this is

8/8/2019 A Rag Aw 2

49/106

5.5. CROSSED RANDOM EFFECTS 45

data(Assay) mm.assay

8/8/2019 A Rag Aw 2

50/106


8/8/2019 A Rag Aw 2

51/106

Chapter 6

Change Over designs

Designs in which each experimental unit receives a cyclical sequence of several treatmentsin successive periods are known as change-over designs. Historically they found favourbecause subject effects could be eliminated from experiment error but with the penaltythat performance in a given period might reect not only the direct effect of the treatmentbut also the residual effects of preceding treatments. Another reason for using change overdesigns is to get replication over periods when there is a shortage of experiment materialand the researcher feels it is safe to get extra replication by giving more than 1 treatmentto each experiment unit. Animal pen studies are sometimes done this way.

Direct and residual effects can be separated by appropriate choice of treatment se-quences when it can be assumed that the residual effects persist only one period. Weconsider (i) balanced and (ii) partially balanced change over designs which may have anextra period which are formed by simply repeating the nal period. The extra periodprovides the property that each treatment is preceded by the others equally often - ie atreatment is also preceded by itself. If rst residuals are important, the extra period designis better but the extra period is unnecessary if residual effects are negligible.

In balanced designs, have all treatment contrasts of equal precision and in partiallybalanced designs, some contrasts are estimated with greater precision than others.

6.1 Latin squares and incomplete blocks

The notation is

t - treatments

p - periods

b - blocks

k - number of units per block47

8/8/2019 A Rag Aw 2

52/106

48 CHAPTER 6. CHANGE OVER DESIGNS

Table 6.1: Change-over design for 4 treatments using orthogonal Latin squares

Square 1 Square 2 Square 3 Square 4UNITSPeriod 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 42 2 1 4 3 3 4 1 2 4 3 2 1 2 4 1 33 3 4 1 2 4 3 2 1 2 1 4 3 3 1 4 24 4 3 2 1 2 1 4 3 3 4 1 2 4 3 2 1

Table 6.2: Change-over design for 7 treatments and 4 periods in blocks of 7

period Incomplete Square 1 Incomplete Square 21 1 2 3 4 5 6 7 1 2 3 4 5 6 72 2 3 4 5 6 7 1 7 1 2 3 4 5 63 4 5 6 7 1 2 3 5 6 7 1 2 3 44 7 1 2 3 4 5 6 2 3 4 5 6 7 1

The simplest type of change over design is a latin square with rows representing periodsof time and columns representing experimental units. If there were no carry over effects,the data would simply be analysed as if it arose from an ordinary latin square. Residualeffects are allowed for by including terms for them in the statistical model.

Complete sets of orthogonal Latin squares, eg Table 6.1 will ensure that each treatmentis preceded by each other equally often but limits on resources will usually not allow this.

Many of the design properties are retained when rows are dropped from the latinsquare ( p < t) or for an incomplete block where k < t . In an incomplete latin square, eachtreatment receives an incomplete set of treatments.

In order to estimate direct and residual effects, block size must be at east 3.

6.2 Analysis

If t1, t2, t3 represent the direct effects of 3 treatments and r 1, r 2, r 3 the residual effects,the total effects of a sequence of treatments 1,2,3 are represented by

1. t1

2. t2 + r 1

3. t3 + r 2

Thus the statistical model has components due to

8/8/2019 A Rag Aw 2

53/106

6.2. ANALYSIS 49

1. Blocks (random effects)

2. Periods within Blocks (random effects)

3. Units within Blocks (random effects)

4. Direct effects (xed effects)

5. Residual effects (xed effects)

and we would regard items 1,2,3 as nuisance and items 4 and 5 as interest.

Example 6.1

The design is 4 blocks of orthogonal 3

3 Latin squares with an extra period where

the rows are Periods and the columns are units.

Block 1 Block 2 Block 3 Block 4Unit Unit Unit Unit

Period 1 2 3 1 2 3 1 2 3 1 2 31 1 4 3 1 3 2 3 4 2 1 2 42 4 3 1 3 2 1 4 2 3 2 4 13 3 1 4 2 1 3 2 3 4 4 1 24 3 1 4 2 1 3 2 3 4 4 1 2

The data shown in Table 6.3 are milk yields (fcm) and the extra periods, indicated by

, could be dropped to compare the basic design with the extra period design.Within each Block we have the additional blocking factors of Periods (the rows) andUnits (the columns), the random model is

Block + Block.Period + Block.Unit

where Block.Period would require 3 df for each block, ie 12 df, and there would be 8 df forBlock.Unit .

The systematic effects are the direct and residual treatment effects.The statistical model for these data is

yitsu = i + R i ,(t1) + B s + B s .P t + B s .Uu + itsuBsN (0,

2B ) Bs .P t N (0,

2BP ) Bs .Uu N (0,

2BU ) N (0,

2)

8/8/2019 A Rag Aw 2

54/106

8/8/2019 A Rag Aw 2

55/106

6.2. ANALYSIS 51

Example 6.2

A Incomplete Block design for 6 treatments in 4 blocks, 4 periods and 4 units per blockis shown below.

Block 1 Block 2 Block 3Unit Unit Unit

Period 1 2 3 4 1 2 3 4 1 2 3 41 1 2 5 4 4 6 3 1 5 2 6 32 4 1 2 5 1 4 6 3 6 5 3 23 2 5 4 1 6 3 1 4 2 3 5 64 5 4 1 2 3 1 4 6 3 6 2 5

The model is the same as before and the essential R code is

XOIB.g

8/8/2019 A Rag Aw 2

56/106


6.3 Change Over Designs - computing Guide milk

8/8/2019 A Rag Aw 2

57/106

8/8/2019 A Rag Aw 2

58/106


8/8/2019 A Rag Aw 2

59/106

8/8/2019 A Rag Aw 2

60/106

56 CHAPTER 7. SEMI-PARAMETRIC REGRESSION

Hermite polynomials,b0(x) = 1 , b1(x) = 2 x, b2(x) = 4 x2

2, b3(x) = 8 x3

12x,

bn (x) = e2nx n2

, 11 bm (x)bn (x) = 0 cubic splines

b1 = x , b j = |x x j |3 , j > 1where x j are the knot points.

Hence (7.2) can be written as a linear model and in the case of cubic splines,

= 0 + 1x 1 +k1

j =1 j +1 |x1 x1,j |+ . . . or

= X

At this point the degrees of freedom depend on the number of knot points . To avoidinuence of the knot points, the model overts and controls smoothness by a penaltyfunction. So it may be considered as putting in a lot of df and then taking out theredundant ones where the data indicate they are not needed . Consequently we may end upwith fractional df.

It is timely to note the Littlewood-Paley-Stein theorem,

Theorem 1 If 1 < p < , there exist 2 constants, C p c p > 0 such that for all functionsbelonging to L p(Rn ),c p||g|| p ||f p|| C p||g|| p ,

where

||f || p = Rn |f (x)| p1p

and

g(x) =

x= |

2j k2j+1

(ak cos kx + bk sin kx)|212

The term g(x) is the Fourier transform of the data y and this theorem is statingproperties of the basis functions used to represent y. With well chosen basis functions,c p and C p will be close p. With poorly chosen basis functions, the addition of extraterms is not so much ne tuning as xing the discrepancies of the low order terms andso the representation of y by bi (x) has localized hot spots of spectral energies. If thespectral energy is distributed evenly across x, ||f || p will be approximately the same for p = 2 , 4, 6, . . ..

8/8/2019 A Rag Aw 2

61/106

7.2. GAMS IN R 57

The penalty function for splines is the roughness dened by

[s (x)]2

dx = T S

and the s are estimated by minimizing

( ) +i

i T S where i are weights.

In the GLM setting we require i,i = y i i and W i,i = iy i /

i i , being the canonical

parameter. Then the steps in tting a semiparametric model are:-

1.z = + ( y )

2. nd i that minimizes

||W 12 (z X ||2

tr( I A)2where A = X (X T WX + i i

T S )1W is the hat matrix and tr( A) = df.

This algorithm is known as generalized cross validation.

7.2 GAMs in R

The mgcv library in R 1 ts Generalized Additive Models using generalized cross validation.The example is for modelling elbow ux over time. A subject receives some physiotherapytreatment and the blood ow through the elbow is measured (as elbow ux) for the nexthour. In the study, there were many subjects but only the data from one is used here to

demonstrate GAMS. The data are saved inelbowdat.csv

and an exploratory data analysisshows a highly nonlinear trend of ef over time .

library(mgcv);library(lattice)elbow.df

8/8/2019 A Rag Aw 2

62/106

8/8/2019 A Rag Aw 2

63/106

7.2. GAMS IN R 59

Figure 7.1: The systematic effect of time on elbow ux.

0 10 20 30 40

0

. 4

0

. 2

0 . 0

0 . 2

0 . 4

time

s ( t i m e

, 3 . 8

7 )

time

e l b o w

f l u x

300

400

500

0 10 20 30 40

q

q

q

q

q

q

qq q

q

q

q

q q

q

q

fit.plot

8/8/2019 A Rag Aw 2

64/106


ExampleThe next example shows how to t GAMs within a group such as the levels of a

treatment.The data are Glucose concentrations in sheep taken over about 2 hours after the sheep

had been injected with hormones,(a) Cortisol (C), (b) Glucagon (G), (c) Adrenalin (A), (d) C+A, (e) C+G+A.The full data are on the web site and only the rst few lines are shown here.

Sheep Weight Treatment Time Glucose148 32 C 0 2.97148 32 C 2 3.06148 32 C 5 3.21148 32 C 10 3.31148 32 C 15 3.39

148 32 C 20 3.44

library(mgcv)Glucose.df

8/8/2019 A Rag Aw 2

65/106

7.2. GAMS IN R 61

The concept of an interaction does not apply in semi-parametric regression and the effect of time is estimated within each treatment. This requires that we set up an indicator variable foreach treatment and use this to specify that the spline t is within that group.

trt.levels

8/8/2019 A Rag Aw 2

66/106


Figure 7.3: Fitted responses from GAM models within treatment levels

0 20 40 60 80 100 120

4

2

0

2

A

0 20 40 60 80 100 120

4

2

0

2

C

0 20 40 60 80 100 120

4

2

0

2

CA

0 20 40 60 80 100 120

4

2

0

2

CGA

0 20 40 60 80 100 120

4

2

0

2

G

3

2

1

0

1

2

A C CA CGA G

8/8/2019 A Rag Aw 2

67/106

7.2. GAMS IN R 63

The prole of tted values is used to estimate, with 95% CIs, the time to max concentration.#_____________________________________________closest

8/8/2019 A Rag Aw 2

68/106


Time

G l u c o s e n a n o

m o

l e s /

l i t r e

4

6

8

10

0 20 60 100

q

q

q

qq

q

q

qq

q

A

0 20 60 100

qq

qq

q q q qq q

C

0 20 60 100

q

q

q

qq q

qq

q

q

CA

0 2 0 60 100

q

q

q

q

q

q

q

q

q

q

CGA

0 20 60 100

qq

q

q

q

q

q q

q

q

G

7.3 Smoothing splines are BLUPSIn his comment upon Geoff Robinsons 1991 paper, That BLUP is a Good Thing: The estimation of Random Effects [8], Terry Speed [11] showed that splines are BLUPS. This permits the non-linear smooth splines to be implemented in a linear mixed model which substantial power formodelling data. This is expertly developed in Verbyla et al. (1999) and the models can be ttedusing the specialized package ASREML written by Arthur Gilmour.

Theory and implementation are also given in the book Semi-parametric Regression by DavidRuppert, Matt Wand and Ray Carroll [9].

Consider a linear modely = g(x) + , N (0,

2R )

and write -2 log-likelihood as

= 2log |2R | 1

22(y g )T R 1(y g ) + s g (x) 2 dx

The solution through REML estimation of variance components and mixed model estimatingequations utilizes matrices which are functions of difference between samplings of the variable inthe spline function. Dene these differences as h j = x j +1 x j . Then

=

. . .

. . . . . .

. . . . . . . . .1

h i +1 1h i + 1h i +1 1h i. . . . . . . . .

n,n 2

,

8/8/2019 A Rag Aw 2

69/106

7.4. SPLINES IN THE GLM USING ASREML 65

and

G =

. . . . . .

. . . . . . . . .h i +1

6h i + h i +1

3h i +1

6. . . . . . . . .

n 2,n 2At the design points,

g = R 1 + s G1 R 1y (7.3)

Equation (7.3) can be reparameterized into the standard form for a mixed model by utilizing,

X s =1 x1...

...1 xn

Z s = ( T ) 1H = 2 R + 1s Z s GZ T s

so that

g = X s s + Z s us

where s = ( X T s H 1X s )1X T s H 1y

us = ( Z T

s R1

Z s + s G1

)Z T

s R1

(y X s )

s =2

2s= 1s

The log-likelihood is a conditional likelihood of y|us and the penalty is the log-density functionof us . Splines in GLMs are tted using the method of Schall (1991) [10].

7.4 Splines in the GLM using ASREMLThe ASREML program [4] is not simple but its power makes it worth having and learning. Theroutines are in GENSTAT and a S-PLUS module is being ne tuned (I think). I use a primitiveR function called asr() which sets the job up and after some translation, passes it to the compiledASREML program. My R code is

attach("/ASREML/.RData",pos=2) model2

8/8/2019 A Rag Aw 2

70/106


elbow fluxSUBNO 25 !ATRT 19 !Ainttreatment 1 !Aaetuetahtuhtascuscefhf

elbow.asd !skip 1ef !GAMMA !LOG ~ mu int !r spl(int) SUBNO SUBNO.int

The t is similar to that with gam() in R

Figure 7.4: Smooth splines for a Gamma model in ASREML

0 10 20 30 40 50 60

0.3

0.2

0.1

0.0

0.1

0.2

0.3

int

Y h a

t

8/8/2019 A Rag Aw 2

71/106

Chapter 8

Longitudinal Data

Repeated Measures data are those which consitute a set of repeated measures, over time, from thesame sampling unit. Because the sampling is from the same unit, the data contains informationfrom the unit (ie random effect) and the systematic effect of time. In conjunction with this, themeasurements are correlated because they all have something in common, the unit itself. Anotherpotential source of correlation is that an observation is inuenced by previous observations andthis is termed autoregressive correlations.

Modelling of repeated measures data needs to account for the correlations so that the modelcan best allocate the information between systematic and random effects.

8.1 The data le

Early repeated measures models were cast as multivariate models in order to capture the corre-lations amongst the data. This idea was superseded and now we analyze repeated measures in aunivariate fashion,

Y 1Y 2...Y

=

X 1 0 0 00 X 2 0 0

0 0 . . . 00 0 0 X

1 2... p

+ .

We have the time variable as a column of the X i s,

X i =

1 0 . . . 1 . . . t i, 11 0 . . . 1 . . . t i, 2... ... ... ... ... ...1 0 . . . 1 . . . t i,n i

The important difference between this model and simple linear models is that elements of are not assumed to be independent or to have constant variance,

67

8/8/2019 A Rag Aw 2

72/106

68 CHAPTER 8. LONGITUDINAL DATA

An example of data le on carbon form Tillage and Rotation treatments, measured at0,3,4,6,12 months is:-

Plot Sample Rotation Tillage Block Time CP1 S1 R1 T1 B1 0 1.53P2 S2 R1 T1 B1 0 1.47P3 S3 R1 T1 B1 0 1.29

. . . . . . .P9 S3 R2 T2 B1 0 2.06P10 S1 R1 T1 B1 0 1.4P11 S2 R2 T1 B1 0 1.46P12 S3 R2 T1 B1 0 1.53P1 S1 R1 T1 B1 3 1.21P2 S2 R1 T1 B1 3 1.17

P3 S3 R1 T1 B1 3 1.31. . . . . . .

P22 S1 R1 T2 B2 3 1.39P23 S2 R1 T2 B2 3 1.91P24 S3 R1 T2 B2 3 1.42P1 S1 R1 T1 B1 4 1.04P2 S2 R1 T1 B1 4 1.12P3 S3 R1 T1 B1 4 1.07

. . . . . . .P22 S1 R1 T2 B2 4 1.25P23 S2 R1 T2 B2 4 1.21P24 S3 R1 T2 B2 4 1.23P1 S1 R1 T1 B1 6 1.44P2 S2 R1 T1 B1 6 1.39P3 S3 R1 T1 B1 6 1.15

. . . . . . .P10 S1 R2 T1 B1 6 1.23P11 S2 R2 T1 B1 6 1.17P13 S1 R2 T1 B2 6 1.29P14 S2 R2 T1 B2 6 1.14

. . . . . . .P22 S1 R1 T2 B2 12 1.83P23 S2 R1 T2 B2 12 1.70P24 S3 R1 T2 B2 12 1.40

In this case all treatments are sampled at the same times 0,3,4,6 and 12 months and the datahave been blocked for each sample time. Note that Plot=P12 at Time=6 is missing so that datais either deleted or denoted as a missing value NA.

8/8/2019 A Rag Aw 2

73/106

8.1. THE DATA FILE 69

The next example is measurements of Metabolizable Energy of deer which are born at differenttimes and hence are sampled on different days. There are 25 animals and the data are blockedas all the repeated measurements for the rst animal, followed by the repeated measurements forthe second and so on.

Tag Strain Sex Age Agewks ME Lwt204 3 1 227 32.43 123.22 71.50204 3 1 234 33.43 125.56 73.00204 3 1 241 34.43 122.94 75.00204 3 1 248 35.43 123.00 74.50204 3 1 255 36.43 117.83 75.00204 3 1 262 37.43 107.25 75.50204 3 1 269 38.43 108.58 75.00

. . . . . . .

207 3 1 224 32.00 133.05 67.50207 3 1 231 33.00 131.76 67.50207 3 1 238 34.00 137.45 67.50207 3 1 245 35.00 128.62 68.00207 3 1 252 36.00 129.41 69.00207 3 1 259 37.00 129.24 69.00

. . . . . . .293 4 2 540 77.14 108.91 96.50293 4 2 547 78.14 151.51 98.50293 4 2 554 79.14 131.97 98.50293 4 2 563 80.43 125.17 99.50293 4 2 568 81.14 138.74 101.00293 4 2 575 82.14 128.27 96.50293 4 2 582 83.14 121.92 99.50

8/8/2019 A Rag Aw 2

74/106


If there are repeated measurements in two directions, eg time and depth, the data le is setup similar to the above but with both classifying variables, eg

Hole Treatment Stratum Depth Calendar WCE2B Control Beneath 20 07/08/95 19.62E2B Control Beneath 20 07/25/95 18.98

. . . . . .E2B Control Beneath 40 07/08/95 17.64E2B Control Beneath 40 07/25/95 17.74


. . . . . .E2B Control Beneath 80 07/08/95 17.88

E2B Control Beneath 80 07/25/95 17.92. . . . . .

E2B Control Beneath 100 07/08/95 11.8E2B Control Beneath 100 07/25/95 11.5


. . . . . .F1B Deforest Beneath 20 01/27/94 16.0F1B Deforest Beneath 20 02/11/94 14.70







The data are unbalanced in that sampling times are different for sites and the numbers of depthsdiffer between treatments. The imbalance does not complicate the analysis.

8/8/2019 A Rag Aw 2

75/106

8/8/2019 A Rag Aw 2

76/106


8.3 Statistical model for repeated measuresThe correlations and changing variances which are charactersitic of longitudinal data are ofteninduced by the units such as animals or plots. This is because over time, observations within aunit may be more alike than those of the same treatment but sampled from another unit. Hencethe random component is affected by sampling units.

The basic model is

y = treatment + time + treatment : time + ERROR , ERROR A distribution

Note how general this is and the error model is not restricted to a simplistic (and unrealistic)i.i.d. Momentarily we regard both treatment and time as factors for which we need to constructcontrasts to get a more specic model. Also, we have recognised that the random structure maycontain a number of model terms.

The model may then contain these components,

y = treatment + time + treatment : time + unit + unit : time + error

unit N (0, 2u J 1)

unit : time N (0, 2ut J 2)

error N (0, 2 R)

The symbols J 1, J 2, R denote matrices which give the weights to be applied to the variancecomponents for each part of the model.

That is the model for the random components may include,

Measurement error, Components due to random effects, Serial correlation

8.3.1 Random componentsA line plot of the response over time for each unit can sometimes tell us about the randomcomponent. If the plot reveals basically parallell lines but with different intercepts, the appro-priate model regards the intercept as a random effect. In this case, the correlations amongst therepeated measures are uniform.

The more usual case is that the lines fan out because the individuals change differently overtime. In that case, the interaction of the individual and the systematic time effect would be ttedas a random effect and this representation leads to a covariance structure that is dependent uponthe differences between sampling times.

The following 3 examples are (i) Ca intake, (ii) egg weight measured on the same birds andthe ME of deer. The Ca intake data show a fan out over time suggestion and animal timeinteraction and the egg weight data suggest that differences are primarily due to animals only ora random intercept model. The deer data show a fanning out at the maximum.

8/8/2019 A Rag Aw 2

77/106

8.3. STATISTICAL MODEL FOR REPEATED MEASURES 73

xyplot(Caint ~ weekno|Ca*feed,type=l,data=Caint,panel=panel.superpose,groups=an)

0

5

10

15

20

25

Cafeed

20 30 40 50

Cafeed

Cafeed

20 30 40 50

Cafeed

Cafeed

20 30 40 50

0

5

10

15

20

25

Cafeed

weekno

C a

i n t

Traces display fanning out suggesting random animal time effect.

8/8/2019 A Rag Aw 2

78/106

8/8/2019 A Rag Aw 2

79/106


xyplot(ME ~ Agewks|Sex*Strain,data=deerdf,type=l,panel=panel.superpose,groups=Tag)

0

100

200

300

SexStrain

30 40 50 60 70 80

SexStrain

SexStrain

0

100

200

300

SexStrain

30 40 50 60 70 80

Agewks

M E

Separation at maximum is bigger than at the start and the traces close up again.

8/8/2019 A Rag Aw 2

80/106

8/8/2019 A Rag Aw 2

81/106


xyplot(ewt ~ weekno|Ca*feed,data=ewt,panel=function(x,y){panel.xyplot(x,y);panel.loess(x,y,span=1/4) })

40

50

60

70

Cafeed

30 35 40 45 50

Cafeed

Cafeed

30 35 40 45 50

Cafeed

Cafeed

30 35 40 45 50

40

50

60

70

Cafeed

weekno

e w t

8/8/2019 A Rag Aw 2

82/106


xyplot(ME ~ Agewks|Sex*Strain,data=deerdf,panel=function(x,y){panel.xyplot(x,y);panel.loess(x,y,span=1/4) })

0

100

200

300

SexStrain

30 40 50 60 70 80

SexStrain

SexStrain

0

100

200

300

SexStrain

30 40 50 60 70 80

Agewks

M E

8/8/2019 A Rag Aw 2

83/106

8.4. LINEAR MIXED MODELS FOR REPEATED MEASURES 79

8.4 Linear Mixed Models for repeated MeasuresExample 8.1

The example shown in Figure 8.1 are the body weights of rats that were published by Crowderand Hand ([ ?? ]) and the data set is included in the nlme library of R , see [7]. The essential Rcode loads the nlme, grid and lattice libraries and plots the traces of each rat with a separateframe for each Diet.

library(nlme);library(grid);library(lattice)data(BodyWeight)

xyp2

8/8/2019 A Rag Aw 2

84/106


8.5 A statistical model for repeated measuresAlthough the rat bodyweights data are simplistic, they suffice to identify the essential ingredientsof a model and the steps required to t the model to the data.

In following a unit through time, the response will be a combination of these effects,(i) treatment,(ii) time,(iii) treatment time,(iv) unit,(v) unit timeThe remaining component is the residuals which may not be independent if an observation at

say time t i is inuenced by the outcome at a previous sampling, say t i 1. This occurs when thereis feedback. Figure ?? shows the responses of subjects elbow ux (blood ow through the elbow)over 15 minutes. Peaks are followed by troughs; the saw-toothing is measurement error and

successive observations are negatively correlated. If we identify this correlation, the resultant twill be a smooth track through the observations but otherwise the model will struggle to interpretthe saw-toothing as systematic information if residuals are assumed independent.

The linear response for each rat is found conveniently in nlme() by the lmList() function andthese individual ts may help in explaining random effects when the model is tted. Condenceintervals for these estimates are derived using intervals() .

bwlist1

8/8/2019 A Rag Aw 2

85/106

8.5. A STATISTICAL MODEL FOR REPEATED MEASURES 81

16 503 0.905

Degrees of freedom: 176 total; 144 residualResidual standard error: 4.44

We t the model ( ?? ) to these data with the lme() function. In the next sequence of code,2 models are tted. The purpose of the rst is to get the AOV and the second is to get theregression coefficients in a convenient form.

mod1

8/8/2019 A Rag Aw 2

86/106


12 99.806 0.173013 -40.910 0.240615 19.478 -0.152414 21.131 -0.318216 0.300 0.2300

Example 8.2

These data are chick weights and are akin to the rats data but with more features. They arealso included as part of the nlme package, Pinheiro and Bates [7]. In the following R code, weload the library nlme to get the data and to use the lme() function. Plotting is done using thexyplot() function which is part of the lattice library. Many of these plotting functions also rely onfunctions from the grid library so we load it as well.

library(nlme);library(grid);library(lattice)

data(ChickWeights); print(names(ChickWeight))

xyp2

8/8/2019 A Rag Aw 2

87/106

8/8/2019 A Rag Aw 2

88/106


Figure 8.4: Proles of ME for 2 strains and M,F deer.

Agewks

M E

0

100

200

300

30 40 50 60 70 80 90

MRed

FRed

MPD

0

100

200

300

FPD

30 40 50 60 70 80 90

8/8/2019 A Rag Aw 2

89/106

8.5. A STATISTICAL MODEL FOR REPEATED MEASURES 85

Figure 8.5: Fitted proles and observed ME for 2 strains and M,F deer.

Agewks

M E

0

100

200

300

30 40 50 60 70 80 90

FPD

MPD

FRed

0

100

200

300

MRed

30 40 50 60 70 80 90

8/8/2019 A Rag Aw 2

90/106


8/8/2019 A Rag Aw 2

91/106

Chapter 9

Generalised Linear Mixed Models

The content of this is drawn from [2].The ideas of a linear mixed model extend to regression models for discrete and non-Gaussiancontinuous data. In the linear model, a random effects model is appropriate if the coefficientscan be thought of as a sample from a distribution. There must be repeat samples from eachexperiment unit to dene a random effects model.

In the non-Gaussian setting it is assumed that the data for an experiment unit are independentobservations following a GLM but regression coefficients can vary from unit to unit according toa distribution. So the assumptions are:-

1. The conditional distribution of Y ij given u i follows a distribution from the exponentialfamily with density f (yij |u i , )

f (yij |u i ) = exp [ {(yij ij ) (ij ))}/ + c(yij ; )]

ij = E (Y ij |u i ) = (ij ) = g1(x ij + zij u i ) (9.1)vij = var( Y ij |u i ) = (ij ) = v(ij ) (9.2)

(9.3)

where g() is the link function ( g1() is the inverse of the link function) and v() is thevariance function. The vectors x and z have dimensions p and q respectively.

2. Given u i , the repeated measurements Y i1 , . . . , Y n i are independent.

3. The u i are independent and identically distributed with density function f (u i ; G ). Com-monly, f (u i ; G ) is a normal distribution with zero mean and variance matrix G ( ).

Correlation amongst observations from a unit, Y i1, . . . , Y i,n i arises from their sharing unob-servable variables u i . The random effect model is most useful whenthe objective is to makeinference about individuals rather than population averages.

87

8/8/2019 A Rag Aw 2

92/106

8/8/2019 A Rag Aw 2

93/106

9.2. PENALISED QUASI-LIKELIHOOD METHODS 89

For a xed G , updated values of and U are obtained by iteratively solving

=m

i=1X i V 11 X i

1 m

i=1X i V 1i i (9.5)

U i = GZ i V 1i ( i X i )(9.6)

These are the mixed model equations applied to the transformed variable .From equation (9.4),

G = m1m

i=1E (U i U i |y i ) (9.7)

= m1m

i=1

E (U i |y i )E (U i |y i ) + m1m

i=1

var( U i |y i )(9.8)

Plugin values are

E (U i |y i ) = U ivar( U i |y i ) = m1

m

i=1(Z i Q1i Z i + G1)1

The approximation gives reasonable estimates of but the approximation is not reliable if there are few observations and the density of the transformed random variable is far from normal.

9.2 Penalised Quasi-likelihood methodsAn implementation of the above technique is the glmmPQLfunction in the MASSlibrary. this is awrapper that implements linearization within the lme() function.

The data in the le AlmondFlowers.txt are the frequencies of ower stages from 162 trees onAugust 24th, 2006. The data were determined from photographs and the purpose was to selectlate-owering trees that would minimise frost damage. The rst 10 trees are listed in Table 9.1

The ower stages are ordered categories which can be analysed as a proportional odds logisticregression. This requires that we calculate the cumulative counts across the ordered categories.The trees are random samples so enter the model as a random effect.

logit( p) = X + ZU

where p are the cumulative probabilities, X is the design matrix for categories and are thecategory effects, Z is the indicator matrix for trees and U are the random tree effects.

The following R program reads the data, transforms to cumulative counts, reshapes the datainto the long format prior to doing the analysis using glmmPQL.

8/8/2019 A Rag Aw 2

94/106

90 CHAPTER 9. GENERALISED LINEAR MIXED MODELS

Table 9.1: Frequencies of Flower stages at 1 sampling

No TreeTag Date Bud PtlTip HalfOpen Anth X1stPtlFall PostAnth OvarySwell Total1 1at02 1 0 0 2 84 18 29 0 1332 1at03 1 0 0 0 45 17 87 0 1493 1at04 1 3 12 2 67 2 1 0 874 1at05 1 4 11 0 118 11 9 0 1535 1at09 1 68 53 25 33 1 0 0 1806 1at12 1 49 49 20 43 0 0 0 1617 1at16 1 1 1 1 80 8 1 0 928 1at18 1 24 55 32 161 0 0 0 2729 1at19 1 52 57 62 117 0 0 0 28810 1at20 1 1 1 2 82 15 40 0 141

#_____________ AlmondFlowers.R ________________options(digits=2)library(MASS);library(nlme)

FlT

8/8/2019 A Rag Aw 2

95/106

9.2. PENALISED QUASI-LIKELIHOOD METHODS 91

plot(tree.distribution$y~tree.distribution$x,main="distribution of random tree effects",type=l,xlab="Tree effect on logistic scale",las=1,ylab="distribution function")

rug(side=1,x=Tree.effects$effect)

Figure 9.1: Distribution of tree effects for ower stages

6 4 2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

distribution of random tree effects

Tree effect on logistic scale

d i s

t r i b

u t i o

n f

u n c t i o n

Negative tree effects are associated with late owering. From the model we observe that when

logit( p) is reduced, this is a shift to the left of the distribution function and a shift to the left isto the early stages of owering. That is on August 26th, those trees found to have negative treeeffects will not be highly represented in the later owering stages; they have not developed tooearly.

(The nal analysis of these data used the bayespolr function in arm but the model givenhere serves as an example.)

8/8/2019 A Rag Aw 2

96/106

8/8/2019 A Rag Aw 2

97/106

9.3. GEE 93

Estimators of and .

Pearson residuals are given by

r it = {yit (it )}/ { (it )}12

where it depends upon the current value for . The scale parameter is estimated by

1 =m

i=1

n i

t =1r 2it / (N p) (N =

i

n i , p = regression df) .

An empirical estimate of R is

Ruv =m

i=1r iu r iv / (N p)

Suppose that we wish to model the correlation structure as a function of the parameters and the distance between observations,

corr( yit , yit ) = |tt |

.K-Y Liang Since E (r it r it ) |tt |, can be estimated as the slope in the regression of log( r it r it )on log |t t |.Hence we see the algorithm iterates between estimating ( | , ) and ( phi, | )

8/8/2019 A Rag Aw 2

98/106

94 CHAPTER 9. GENERALISED LINEAR MIXED MODELS

ExampleThe data in PastureSpecies.txt contains percentages of 6 groups of pastures measured at

7 times from 3 farming systems (A, B, C). Within each farm type, there are several paddockswith a Farm Paddock being an experimental unit.These are analysed by a binomial GEE using geeglm in the geepack package.

Table 9.2: First 12 rows of PastureSpecies.txtFarm Padd Days Group Frequency ExpUnit Total fDays

1 B 1 0 SowFertResp 54 B.1 100 02 B 1 0 NatCoolPeren 4 B.1 100 03 B 1 0 NatWarmPeren 0 B.1 100 04 B 1 0 YrlGrnPer 26 B.1 100 05 B 1 0 WarmAnn 1 B.1 100 06 B 1 0 CoolAnn 9 B.1 100 07 B 1 256 SowFertResp 42 B.1 100 2568 B 1 256 NatCoolPeren 21 B.1 100 2569 B 1 256 NatWarmPeren 3 B.1 100 25610 B 1 256 YrlGrnPer 23 B.1 100 25611 B 1 256 WarmAnn 0 B.1 100 25612 B 1 256 CoolAnn 8 B.1 100 256

The analysis accounts for 2 sources of correlation,

correlation amongst groups and correlation amongst repeated measuresDene

Rg() as the amongst groups correlation matrix and R t as the correlation due to repeated measures from the same experiment unit.These are estimated separately and the working correlation is RB = R tRg .The program below does the job in 3 stages:-

1. Run a GLM on the univariate frequency of each group at each sampling and save theresiduals. For each sampling, the residuals are saved in a matrix with 6 columns ( 1 foreach group) which is used to calculate the correlation matrix. There are 7 such correlationmatrices, one for each sampling, and in this exercise the average of the 7 is used as the

estimate of Rg . This may not be a suitable choice as the correlation could change withtime. This working correlation matrix is named workcorr1 .

2. Using workcorr1 , a GEE analysis is done of the multivariate frequencies from the 6 groupsat each sampling. residuals are saved and used to construct the working correlation foramongst samplings, R t which is named workcorr2 in the program.

3. The big working correlation matrix, RB , is calculated as above. ( WORKCORR

8/8/2019 A Rag Aw 2

99/106

8/8/2019 A Rag Aw 2

100/106

8/8/2019 A Rag Aw 2

101/106

Bibliography

[1] D.R. Cox and N. Reid. Parameter orthogonality and approximate conditional inference. J.Royal Statist. Soc, B , 49:139, 1987.

[2] P. Diggle, K-Y Liang, and S. Zeger. Analysis of Longitudinal Data . Oxford University Press,

1994.[3] Andrew Gelman and Jennifer Hill. Data Analysis Using Regression and Multi-

level/Hierarchial Models . Cambridge University Press, 2007.

[4] A. Gilmour, B. Gogel, B. Cullis, S. Welham, and R. Thompson. Asreml user guide. Technicalreport, VSN International Ltd, Hemel Hempstead, UK, 2002.

[5] K-Y Liang and S Zeger. Longitudinal data analysis using generalized linear models.Biometrika , 73:1322, 1986.

[6] H.D. Patterson and R. Thompson. Recovery of inter-block information when block sizes areunequal. Biometrika , 58:545, 1971.

[7] J. Pinheiro and D. Bates. Mixed Effects Models in S and S-PLUS. Springer, NY, 2000.

[8] G. Robinson. That blup is a good thing: The estimation of random effects. Statistical Science , 6:1551, 1991.

[9] David Ruppert, Matt Wand, and Ray Carroll. Semi-parametric Regression . CambridgeUniversity Press, 2004.

[10] R. Schall. Estimation in generalized linear models with random effects. Biometrika , 78:719727, 1991.

[11] T. Speed. Comment on robinson (1991). Statistical Science , 6:43, 1991.

[12] W.N. Venables and B.D. Ripley. Modern Applied Statistics with S . Springer, 2004.

[13] A. Verbyla. A conditional derivation of residual maximum likelihood. Australian Journal of Statistics , 32:227230, 1990.

97

8/8/2019 A Rag Aw 2

102/106

8/8/2019 A Rag Aw 2

103/106

Chapter 10

Appendix - Linear least squares

10.1 Least squares in matrix formThe linear regression relationship between a dependent variable Y and a set of independent(explanatory) variables x j , j = 1 . . . p can be expressed in matrix form as

Y = X + , N (0, 2I ) (10.1)

where Y is a vector of length n, X is a matrix of dimensions n p and is a vector of length p. For example, for a straight line E (yi ) = 1 + 2x i where E represents expected value. Inmatrix form this becomes

E (y1)...E (yn )

=1 x1... ...1 xn

1 2

The residual sum of squares (SS) is given by

S r = ( Y X )T (Y X ). (10.2)The least squares estimate of (the vector of model parameters) i

Documents

A Rag Aw 2