View
218
Download
0
Category
Preview:
Citation preview
8/8/2019 A Rag Aw 2
1/106
8/8/2019 A Rag Aw 2
2/106
8/8/2019 A Rag Aw 2
3/106
8/8/2019 A Rag Aw 2
4/106
8/8/2019 A Rag Aw 2
5/106
8/8/2019 A Rag Aw 2
6/106
8/8/2019 A Rag Aw 2
7/106
8/8/2019 A Rag Aw 2
8/106
4CHAPTER 1. THE LINEAR MIXED MODEL AND THE RANDOMIZED BLOCK DESIGN
A basic linear model has only 1 random effect, the experiment error. In the above dataset, Subjects are a random sample from a population and are random effects so the modelhas 2 random effects- subjects and error. The terminology mixed models is used when thereare models for the xed effects and more than 1 random effect but the generic term linear model is consistent for models where the xed and random components are additive.
For the vector of responses for Subject i,
yi = X i + Z i bi + i (1.1)biN (0,
2b ), i N (0,
2I ) , cov(b, ) = 0
where
X i =
1 0 0 01 1 0 01 0 1 0
1 0 0 1
, Z i =
111
1
(1.2)
The Z matrix is an matrix of 1s or zeroes to indicating subject i.Over the whole data set,
y1......
y9 36,1
=
X 1......
X 9 36,4
1 2 3 4 4,1
+
Z 1. . .
. . .Z 9 36,9
b1...
b9 9,1
+
1......36 36,1
The contrasts used in equation (1.2) are treatment contrasts leading to the followinginterpretations of the parameters,
1 mean of stool 1 2 effect of stool 2 compared to stool 1 3 effect of stool 3 compared to stool 1 4 effect of stool 4 compared to stool 1
Other contrasts are simply different ways of spanning the parameter space and one setof linearly independent contrasts can be transformed into another.
The analysis is done vie the lme() function and so continuing on from the previous code,
mmodel
8/8/2019 A Rag Aw 2
9/106
1.1. RANDOMIZED BLOCK 5
lme(effort ~ Type,random=~1 | Subject)
and
lm(effort ~ Type + Subject)
would differ.The usual caveats apply; accept the model if the assumptions are not contravened. To
nish off this job, you would need to check the residuals,
plot(mmodel)
A newer version of lme is lmer which is contained in the package lme4 . This is widelyused in the text Data Analysis Using Regression and Multilevel/Hierarchial Models byAndrew Gelman and Jennifer Hill [3]. They augment lme4 with other functions containedin the package arm which is companion to the textbook. An example of these is the functiondisplay .
library(lme4) mmodel2
8/8/2019 A Rag Aw 2
10/106
6CHAPTER 1. THE LINEAR MIXED MODEL AND THE RANDOMIZED BLOCK DESIGN
Figure 1.2: Worker productivity on machines
score
6
2
4
1
3
5
45 50 55 60 65 70
q qq
q qq
q q
q q
q q
Figure 1.3: Worker machines interaction
45
50
55
60
65
70
Machine
m e a n o
f s c o r e
A B C
Worker
531426
8/8/2019 A Rag Aw 2
11/106
1.1. RANDOMIZED BLOCK 7
machine2
8/8/2019 A Rag Aw 2
12/106
8CHAPTER 1. THE LINEAR MIXED MODEL AND THE RANDOMIZED BLOCK DESIGN
Note that the columns of Z 1 span the space of X 1 but the strategy of REML estimationdoes not have these matrices in conict because the analysis projects the data into theorthogonal spaces of (i) treatment contrasts and (ii) error contrasts by
LT 1LT 2
y = y1y2
where L1, L2 satisfy LT 1 X = X p and LT 2 X = 0 (n p) , for example L1 = X (X T X )1X T and
L2 = I X (X T X )1X T . Then(y) = (y1)
xed+ (y2)
randomWe pursue this theory (see Verbyla (1990)) [13] in detail later on.
8/8/2019 A Rag Aw 2
13/106
Chapter 2
Balanced incomplete blocks
Incomplete blocks comprise a set of designs where it is not possible to allocate everytreatment in each block. This may arise if there is insufficient homogeneous material towhich the treatments are to be applied. A randomized block, where the blocks are knownto be heterogeneous, will lead to over inated experiment error which of course reducesthe power of the block.
1. In taste testing, the palate fatigues so it is advisable to restrict the number of samples.If 10 treatments need to be compared and each taster (block) is reliable for only 5,then the treatments need to be allocated so that all 10 treatments can be compared.were we to test only 5 at a time, we would incur the extra variance due to differentpanels.
2. A physiotherapist is researching the use of the web for remote treatment. The exper-iment requires that practioners rate 15 conditions by video. A randomized completeblock would require each practitioner to examine 15 videos, each of about 1 hour du-ration. The fatigue factor suggests extra variance due to the order of the assessment(ie. rst or last etc.) and the possibility of raters not completing their assessmentsput the viability of the experiment at risk. The problem is averted with an incompleteblock design.
3. In a factory experiment, different factorial combinations are are to be trialled. But asthe experiment progresses, the environment is likely to change as the factory warmsup so that the treatments are measured under different ambient conditions. To getfair comparisons, block size should be restricted so that treatments are measuredunder homogeneous conditions.
4. In a 20 team competition, there is not enough time for each team to play the otherstwice in a season. After round 1, the competition draw becomes an incomplete blockexperiment which allows favourable comparison of points.
From (10.6), var = 2(X T X )1. The optimal variance of the parameters is a balancebetween the degrees of freedom for the design matrix X and the variance, 2. A large
9
8/8/2019 A Rag Aw 2
14/106
10 CHAPTER 2. BALANCED INCOMPLETE BLOCKS
RCB may reduce |(X T X )1| but at the expense of 2. Small block sizes will amelioratethis and the right balance is sought. This balance may be inuenced as much by practicalexperiment reasons as by the mathematics.
In chapter 1, it was mentioned that in a RCB (ie. each treatment occurs the samenumber of times across blocks), blocks and treatments are orthogonal so blocks could betted as xed effects and Total SS = Block SS + Treat SS. However the mixed modeland REML estimation does not demand balance because the model has cov( , b) = 0 andthe estimation is via a xed model and a random model which are orthogonal. Hencethe balance restriction is freed by REML and since Incomplete blocks do not have eachtreatment occurring in each block an equal number of times, the convenient analysis is viathe mixed model.
A major class of nonorthogonal designs are those known as balanced incompleteblock designs. As the word incomplete suggests, not all treatments occur in each block.
The balance referred to is a general balance (i.e. it refers to such matters as the number of times each pair of treatments occur together in a block) rather than meaning all treatmentsoccur together in all blocks.
2.1 Balanced incomplete blocksWhen systematic differences exist between units, blocking is often used as a device toimprove precision. The blocks are formed from groups of similar units, and if all thetreatments under investigation are applied randomly within each block, then it is possibleto make a fair comparison between the treatments. Unfortunately the blocks may notalways contain enough units to accommodate all the treatments and an alternative solutionto Randomised Blocks is needed if the number of treatments is not to be reduced. If alltreatment comparisons are equally important then the most satisfactory design is Balanced Incomplete Blocks (BIB).
The BIB design has three main properties
(a) all blocks have the same number of units,
(b) all treatments are equally replicated,
(c) all treatment pairs occur in the same block equally often over the whole design.
It is convenient to introduce a standard notation to describe the features of a BIB design,and these parameters are now widely accepted.
b number of blocksv number of treatments (sometimes t)k number of units in a blockr number of replicates for each treatmentn total number of units concurrence
8/8/2019 A Rag Aw 2
15/106
2.1. BALANCED INCOMPLETE BLOCKS 11
The concurrence is the number of times two treatments occur together in the same block.The properties of a BIB design may therefore be stated in the alternative form that v > kand k,r, are constant over all blocks and treatments.
Example 2.1
A typical example of a BIB design is given below.
Blocksunits 1 2 3 4 5 6 7 8 9 10
(i) 1 1 1 1 1 1 2 2 2 3(ii) 2 2 2 3 3 4 3 3 4 4
(iii) 3 4 5 4 5 5 4 5 5 5
Observe that treatments are conventionally described by numbers. The above plan isin its basic form before randomisation. In practice, the experimenter should allocate thenumbers randomly to the treatments, randomise the block order and the assignment of theselected treatments to the units within each block. In the example given it is seen thatb = 10, v = 5 , k = 3 , r = 6 , n = 30, = 3.
If the rows are considered as blocks the design is a BIB with b = v = 7 and k = r = 4and = 2. Notice however that each column contains each treatment exactly once, so thistype of design is suitable when blocking is required in two directions. The double blockingis reminiscent of the properties of a Latin Square and this type of design is called a Youden Rectangle . An alternative description for Youden Rectangles is Incomplete Latin Squares,and a method of generating them is to omit certain columns of particular Latin Squaredesigns. It is not always possible to nd a Youden Rectangle of a prescribed specication,but it is always possible to nd a Youden rectangle where the number of treatments is onemore than the block size. All that is needed is for the last column to be dropped from theappropriate Latin Square and a Youden Rectangle will always result. However it is not soeasy if more than two columns need be dropped.
For the particular design in Example ?? the complete Latin Square is formed throughaugmentation with a further Youden rectangle.
1 2 3 6 4 5 72 3 4 7 5 6 13 4 5 1 6 7 24 5 6 2 7 1 35 6 7 3 1 2 46 7 1 4 2 3 57 1 2 5 3 4 6
8/8/2019 A Rag Aw 2
16/106
12 CHAPTER 2. BALANCED INCOMPLETE BLOCKS
The advantage of a resolvable BIB is that it can also be analysed as a RCB if theblocks are not signicantly different (or that they do not account for sufficient extraneousvariation). The extra degrees of freedom can be pooled into a single random term, ie theerror, and we may improve the power of the design to detect differences by increasing errordf without inating the variance. But we may make big inroads into the error by removingblock effects.
We can compare 2 designs by the ratio of the variance of a treatment contrast fromdesign 1 to its counterpart of design 2 and this is termed EFFICIENCY. The efficiency of a BIB to a RCB is given by,
E =var( i j )BIBvar( i j )RCB
(2.1)
=22/r
2k2
/v= v/rk = (1 1/k )/ (1 1/v ) 1Certain resolvable designs have arisen to maximise efficiency compared to a RCB,
v = b2, k = b, is called a square lattice, v = b(b1), k = b1 is called a rectangular lattice, v = b(bl), k = ( bl) is called an alpha-design
2.2 Statistical model for BIB designs.The model is similar to the randomized block design,
yij = i + b j + ij (2.2)
but each block contains only a subset of the treatments.If we treat blocks as xed effects and remove their effects prior to examining treatments,
the information about the treatments comes only from comparisons within blocks and isknown as an intra-block analysis. However, we should treat blocks as another randomcomponent because they are random samples from a population so the random parts of the model (2.2) are modelled by
inter-block, b j N (0, 2 ) (2.3)
intra-block, ij N (0, 2) (2.4)
E (Y ij ) = j (2.5)var( Y ij ) = 2b +
2 (2.6)
Maximum likelihood estimates can be derived from
(y; , 2 , 2) =
n2
log(2) n2
log(2 + 2)
12
i j
(yij i )2(2 + 2)
8/8/2019 A Rag Aw 2
17/106
2.3. THE OLD WAY 13
2.3 The old wayFor balanced incomplete blocks, the appropriate variances can be determined by AOV.
Source df MS E(MS)Treat (v-1) MS(Treat) 2 + (kbb)(v1) i
2i
Blocks (b-1) MS(Blocks) 2 + (kbv)(b1) 2
Error bk-v-b-1 MS(Error) 2Total (bk-1)
Thus for the balanced case, equating expected and observed mean squares leads toestimates of the variance components and better estimates of the variances of treatmentmeans, eg (2.6). This is termed an inter-block analysis.
Each block is intended to be relatively homogeneous and so intra-block variance shouldbe less than inter-block variance.
If the treatments are not balanced across blocks, the AOV method will not work becausewe cannot plug in the block size k. The REML estimation arose from this situation, (Patterson & Thompson, 1971), but rapidly found widespread application in all facets of statistics.
Example 2.2
Below is a set of data whose design is a balanced incomplete block, followed by an old-fashioned analysis.
blocks treat response1 A 201 B 231 C 162 A 222 C 192 D 173 B 253 C 203 D 21
4 A 164 B 204 D 14
8/8/2019 A Rag Aw 2
18/106
14 CHAPTER 2. BALANCED INCOMPLETE BLOCKS
lmodel F)
treat 3 48.2 16.1 13.6 0.0077 **blocks 3 58.8 19.6 16.6 0.0050 **Residuals 5 5.9 1.2
If the blocks term was not signicant, we would resort to a completely random design.However, it is signicant here and because of non-orthogonality there is some informationabout the treatments contained in the blocks.
To get estimates of the treatment means, we get a prediction of what each treatmentwould have been if it had been allocated to each block. This is done by using the linearmodel. In the above example, the block and treatments effects are given by
BIB.effects |t|)(Intercept) 19.792 0.8308 23.822 2.428e-06
blocks2 1.625 0.9421 1.725 1.451e-01blocks3 3.375 0.9421 3.583 1.583e-02blocks4 -3.000 0.9421 -3.184 2.442e-02
treatB 2.750 0.9421 2.919 3.305e-02treatC -3.125 0.9421 -3.317 2.107e-02treatD -3.125 0.9421 -3.317 2.107e-02
The predictions of treatments means for each block are shown in Table 2.1. We knowthat treatment A was not in block 3 but we get an estimate of it as if it did occur there.Likewise treatment B in block 2, treatment C in block 4 and treatment D in block 1 areestimated.
With less effort, we can see that the estimates of treatment means for A, B, C, D are
ABC D
=
1 1414
14 0 0 0
1 1414
14 1 0 0
1 14 14 14 0 1 01 14
14
14 0 0 1
19.791.6253.375
32.753.1253.125
For this experiment b = 4, k = 3, v = 4 so equating the observed Mean Squares in theAOV to their expected values yields,
2 =(19.6 1.2)
(3 4 4)/ (4 1)= 6 .9
8/8/2019 A Rag Aw 2
19/106
2.4. ANALYSIS USING A MIXED MODEL 15
Table 2.1: Estimates of treatment means for each block of a BIBBlock Treat Intercept block eff treat eff treat est mean
1 A 19.79 = 19.792 A 19.79 + 1.625 = 21.423 A 19.79 + 3.375 = 23.174 A 19.79 - 3 = 16.79 20.291 B 19.79 + 2.75 = 22.542 B 19.79 + 1.625 + 2.75 = 24.173 B 19.79 + 3.375 + 2.75 = 25.924 B 19.79 - 3 + 2.75 = 19.54 23.041 C 19.79 - 3.125 = 16.672 C 19.79 + 1.625 - 3.125 = 18.293 C 19.79 + 3.375 - 3.125 = 20.044 C 19.79 - 3 - 3.125 = 13.67 17.171 D 19.79 - 3.125 = 16.672 D 19.79 + 1.625 - 3.125 = 18.293 D 19.79 + 3.375 - 3.125 = 20.044 D 19.79 - 3 - 3.125 = 13.67 17.17
2.4 Analysis using a mixed modelBy modelling the block effects as nuisance effects orthogonal to the treatment efects, wesimplify the analysis.
library(nlme)blocks
8/8/2019 A Rag Aw 2
20/106
16 CHAPTER 2. BALANCED INCOMPLETE BLOCKS
StdDev: 2.6 1.1
Fixed effects: response ~ treatValue Std.Error DF t-value p-value
(Intercept) 20.2 1.47 5 13.8
8/8/2019 A Rag Aw 2
21/106
Chapter 3
Split plot designs
3.1 The modelThe model for a randomised block design arranged in split plots is
yijk = + i + j + ij
main plots+ k + ( )ik + ijk
sub plotswhere j is a block effect, i is a main treatment effect and k is a sub-plot treatmenteffect. It is important to notice that there are two error terms and . Both errors areassumed to be independent N (0,
2m ) and N (0,
2) and the blocks are randomfactors, j
N (0, 2b)
In the following section, b represents block size and m is the number of main plots.We have from the linear model the following variances
var( Y ijk ) = 2m + 2
var( Y i k ) =2m + 2
b
var( Y k ) =2m + 2
bm
cov(Y ijk , Y ijl ) =2
2 + 2m
The model formula for covariance amongst sub-plot means is all important in designingand interpreting split-plot designs. We see that if the split-model holds, there is a positiveand constant correlation between sub-plot means. If this does not hold, the model is awryand the design is a dud.
17
8/8/2019 A Rag Aw 2
22/106
18 CHAPTER 3. SPLIT PLOT DESIGNS
Table 3.1: Expected Mean Squares for A split experiment
Source df E(MS)Blocks (b-1)Main (m-1) 2 + t 2m + tb(m 1)
2i
BM (Error a) (b-1)(m-1) 2 + t 2mTreat (t-1) 2 + mb( t1) 2kMT (m-1)(t-1) 2 + b(m 1)( t1) ( )2ik(Error b) ( b1)[(t 1) + ( m 1)(t 1)] 2Total (bmt-1)
The following excerpt is from the Splus manual, ch 12.Split-plots are also encountered because of restricted randomization. For example, an experiment involving oven temperature and baking time will probably not randomize theoven temperature totally; but rather only change the temperature after all of the runs for that temperature have been made. This type of design is often mistakenly analysed as if there were no restrictions on the randomisation.
Example 3.1
Five varieties of spring wheat were sown in a randomised blocks design in four blocks.The soil was treated with three different levels of nitrogen randomly allocated to equal
areas within each plot. The design and yields in tons/ha were as given.
V2 V5 V1 V4 V3N1 N3 N2 N2 N3 N1 N1 N2 N3 N1 N3 N2 N2 N1 N3
Block 1 4.6 5.5 5.3 5.0 5.4 4.7 5.5 6.1 6.4 5.0 6.0 5.7 5.5 4.9 5.8V1 V3 V2 V5 V4
N3 N1 N2 N1 N3 N2 N3 N2 N1 N2 N3 N1 N2 N1 N3Block 2 5.8 5.0 5.5 4.9 5.5 5.4 5.4 5.0 4.7 4.6 5.0 4.2 6.2 5.7 6.5
V5 V1 V3 V2 V4N2 N3 N1 N2 N3 N1 N3 N1 N2 N1 N3 N2 N1 N3 N2
Block 3 4.8 5.0 4.6 5.4 5.9 5.0 5.5 4.8 4.7 5.0 5.8 5.1 5.3 6.7 5.8V2 V3 V4 V1 V5
N3 N1 N2 N3 N2 N1 N2 N1 N3 N1 N2 N3 N3 N2 N1Block 4 5.9 5.0 5.6 4.8 4.6 4.0 5.1 4.7 5.4 5.2 5.5 5.8 5.2 4.8 4.4
The treatment levels were
8/8/2019 A Rag Aw 2
23/106
3.2. THE COMPARISON OF TREATMENT MEANS 19
V1 Timmo N1 30 kg/haV2 Sicco N2 60 kg/haV3 Sappo N3 90 kg/haV4 HighburyV5 Maris Dove
The split plot AOV is:-
Error: mainplotDf Sum of Sq Mean Sq F Value Pr(F)
block 3 1.005 0.335 1.047 0.4073variety 4 6.449 1.612 5.040 0.0128Residuals 12 3.839 0.320
Error: WithinDf Sum of Sq Mean Sq F Value Pr(F)
nitrogen 2 6.487 3.244 147.1 0.0000variety:nitrogen 8 0.105 0.013 0.6 0.7756Residuals 30 0.662 0.022
The analysis has detected large differences between Nitrogen levels, and a smaller, butsignicant difference, between Varieties. The interaction between Nitrogen and Variety isnot signicant, so the varieties can be assumed to respond similarly to the various levelsof Nitrogen. Notice that the Main Plot Error is considerably greater than the Sub PlotError, which indicates that the use of a split plot design is justied for this experiment, forotherwise the F value for Nitrogen, and its interaction, would have been much lower anddifferences would not have been detected with the same precision.
3.2 The comparison of treatment meansWhen signicant differences have been detected it is reasonable to ask where these differ-ences occur. There are many methods available for an examination of treatment differencesin comparative experiments, but the standard error of the difference between two treatmentmeans is often helpful for this purpose. Its evaluation depends upon the result that if X andY are independent random variables from the distributions X N(1,
21), YN(2,
22),
and n1, n2 independent samples are drawn from these distributions then
X Y N 1 2,21n1
+22n2
.
If the variances are equal, then the variance of the difference between two treatment meansis 2( 1n 1 +
1n 2 ) and if the levels of replication are both equal to r the variance of the mean
difference between any two treatments is given by 2 2
r .The numbers of levels of M, T and Blocks are m, t and b, the Error variance for main plotis 2m , and for sub plots 2. It can be shown that the expected values of the Error MeanSquares for main plots and sub plots are 2 + t 2m and 2.
8/8/2019 A Rag Aw 2
24/106
20 CHAPTER 3. SPLIT PLOT DESIGNS
The error difference between Main Plot Treatments p and q is
p. + p..
q.
q.. .
Since each is the mean of b values, and each is the mean of bt values and all errors in thisexpression are independent, the variance of the difference between Main Plot treatmentmeans is
2b
2m +2bt
2 =2bt
(2 + t 2m ) .
The error difference between Sub Plot treatment means r and s is
..r ..sand the Main Plot errors cancel since each Sub Plot treatment occurs in every Main Plot.As each contains mb independent errors the variance of the difference between Sub Plottreatment means is thus 2mb
2.The interaction is a little more complicated as the error difference is
p. + p.r q. q.sSuppose that two treatment combinations have the same main plot treatment, then theterms cancel as they are exactly the same error term. The variance of the error difference p.q r.s is 2b 2. However for different main plots the terms do not cancel so the varianceof the error difference is
22mb
+ 22
b=
2b
(2 + 2m ) .
Table 3.2: Summary table for variances of treatment differences
Main plot treatments 2bt (2 + t 2m )
Sub plot treatments 2mb 2
Combinations(same Main Plot Treatment) 2b 2
Combinations(different Main Plot Treatment) 2b (2 + 2m )
8/8/2019 A Rag Aw 2
25/106
3.2. THE COMPARISON OF TREATMENT MEANS 21
In the analysis of variance table the main plot error estimates 2 + t 2m and the sub ploterror estimates 2. Hence the variance of the difference between T M means with differinglevels of M is
2b
(2 + 2m ) .
It follows therefore that the variance of the T M treatment difference is2bt
(Main Plot Error + ( t 1)Sub Plot Error)In the example b = 4 , t = 3 , m = 5 so 2 = 0 .022 and hence 2m = 0 .099. The Varietymeans are
V1 V2 V3 V4 V55.59 5.24 5.03 5.68 4.81
and so 0.053 is the variance of the difference. It should be remembered that thisestimate is not very precise as Varieties were not the main source of interest.The Nitrogen means are
N1 N2 N34.860 5.285 5.665
and the variance of the difference is 0.022. However, since the Nitrogen levels areevenly spaced, it would seem better to examine the Nitrogen effect for linear and quadraticcomponents.The interaction is not signicant, so does not justify close scrutiny. However, the variance of the difference is 0.01103 for VarietyNitrogen combinations with the same level of Variety,0.06067 otherwise. An inspection of the two way table of means for Variety and Nitrogenindicates that the response for each factor behaves in a parallel fashion in the presenceof the other.
Means tableV1 V2 V3 V4 V5 all V
N1 5.175 4.825 4.050 5.175 4.475 4.860N2 5.625 5.250 5.050 5.700 4.800 5.285N3 5.975 5.650 5.400 6.150 5.150 5.665all N 5.592 5.242 5.033 5.675 4.808 5.270
One point that emerges from this analysis is the comparison of the error variances.For comparisons of Sub Plot treatments the error variance is 2, but if the mt treatmentshad been arranged in b randomised blocks a much greater error variance could have beenexpected since the Main Plot error is 14.5 times as great as the Sub Plot error. Some of thecomparisons that can be made e.g. of Main Plot treatment levels at xed levels of the SubPlot treatment, or across different Main Plot and Sub Plot treatment levels, have majorinferential problems. These are due to the estimates of variance involving two differentResidual Mean Squares with different d.f. and different expectations.
8/8/2019 A Rag Aw 2
26/106
22 CHAPTER 3. SPLIT PLOT DESIGNS
3.3 The mixed model analysisSee [12].
For a vector of observations from block i and mainplot i,j, we write the model as
y ij = X ij + Z1,ij b i + Z2,ij d ij + ij (3.1)b iN (0, 1) , d ij N (0, 2) ij N (0,
2I)
Furthermore, the random effects are uncorrelated.The design matrix for blocks is Z1, Z2 is the design matrix for the block by main
treatment interaction, 2 is diag(2m ),.In the lme() function, we include a random term for blocks and main plots (variety in
this case) within blocks,
Wheat
8/8/2019 A Rag Aw 2
27/106
3.3. THE MIXED MODEL ANALYSIS 23
Hence the estimates of variances are:-
2 = 0 .022
2m =(0.32 0.022)3 = 0 .1
and these estimates (Moment estimates) agree with the mixed model estimates.
8/8/2019 A Rag Aw 2
28/106
24 CHAPTER 3. SPLIT PLOT DESIGNS
8/8/2019 A Rag Aw 2
29/106
Chapter 4
Mixed Model Theory
The linear mixed model has the general form,y = X + Zu + (4.1)
= X | Z
u+ (4.2)
uN (0, D ), N (0, R ) , E (u ) = 0
4.1 Residual Maximum LikelihoodTemporarily consider model (4.1) as
y = X + , N (0, 2V ) (4.3)
where
= Z1u 1 + Z2u 2 + . . . +
V = I +n u
i=1
i Z i ZT i i = 2i /
2e , E (ui u j ) = 0 , E (ui , ) = 0
The purpose of rewriting the model this way is to indicate that the variance componentsare from the space of error contrasts .
For known Z i , the variance matrix contains unknown parameters i and the dispersionparameter 2. In a model with only 1 random component, the measurement error, it is notnecessary to model the random components because they are residuals. With more than1 random component, we need to model sources of the variance.
In mixed models we estimate variance components for
(i) efficiency of xed effects,
25
8/8/2019 A Rag Aw 2
30/106
26 CHAPTER 4. MIXED MODEL THEORY
(ii) reliable estimates of xed effects,
(iii) of interest in themselves, not nuisance parameters.
In a model containing only random effects, the variance components can be estimatedby equating observed mean squares to their expected values. In a balanced mixed effectsmodel, the old way was to estimate xed effects, subtract from the data and apply momentestimation to the residuals, (Hendersons methods). In cases where there is a lack of balance, matrix multiplication becomes exorbitant so we turn to likelihood methods.
The normal likelihood function is
L( , , |y ) = (2 2)n2 |V |
12 exp
122
(y X )T V 1 (y X )
and its logarithm is
( , , |y) = n2
log(2 2) 12
log |V | 1
22(y X )
T V 1 (y X ) . (4.4)Then,
= 0 = ( X T V 1X )X T V 1y (4.5)
i
= 0 i = (4.6)
2 = 0
2
= n1
(Y X )T
1
(y X ) (4.7)The equations are solved by the iterative Newton-Raphson algorithm,
1...
2 j
=
1...
2 j1
2 2 . . . 2
1 2
21. . .
...... . . .
2
( 2 )2
1
j 1
1...
2 j 1
(4.8)
j = j 1 I j1 j 1The matrix of second derivatives is called the Hessian and the negative inverse is the
observed information matrix , I . The expectation of the negative of the inverse Hessian isthe Fisher Information matrix .In (4.5) and (4.7), we see that an estimate of is required to get an estimate of 2
which is a double dipping of the information contained in the data. Residual MaximumLikelihood arose from the observation that estimates of variance components in mixedmodel estimation were often biassed downwards whereas they were not for the variancecomponent model (ie all random effects).
8/8/2019 A Rag Aw 2
31/106
4.2. PROPERTIES OF REML 27
Patterson and Thompson [6] subdivided the likelihood into 2 spaces,
(i) the space spanned by treatment contrasts, and
(ii) the space spanned by error contrasts, orthogonal to the treatment contrasts.
The treatment given below is well explained in Verbyla (1990) [13].For L = ( L1, L2)T such that LT 1 X = I p and L2 such that LT 2 X = 0(n p) ,
LT y = LT 1
LT 2 y =yT 1yT 2
then,yT 1yT
2
N B p0(n p)
, 2 L1V LT 1 L1V LT 2
L2V LT
1L
2V LT
2One choice for L1 (of rank p) is (X T X )1X T .The likelihood can be written as the sum of orthogonal components,
( , , |y ) = 1( |y1)
xed+ 2(, |y2)
set of error contrasts.
Variance parameter estimates are estimated by maximizing 2 while estimation of isthrough 1.
The forms of 2log( 2) and 2log( ) are given below in (4.9) and (4.10),
2log( 2) = ( n p) log(2) log |X T X | +log |V |+ log |X T V 1X |+ ( y X )T V 1(y X )(4.9)2log( ) = ( n p) log(2) + log |V |+ log |X T V 1X |+ ( y X )T V 1(y X )
(4.10)
The difference between (4.9) and (4.10) is the term log |X T X | which penalises thelikelihood of the nuisance parameters whose information is in y2 for the degrees of freedomused in calculating the interest parameters, .
4.2 Properties of REML(i) For every set of (N p) linear error contrasts, the REML log-likelihood is the same.
(ii) No matter what ( N p) linearly independent error contrasts are used, maximisingtheir likelihood always leads to the same equations for estimating variance compo-nents i and 2.
(iii) Optimal property. The derivation which shows 2 to be a marginal likelihood forY 2 = LT 2 y also shows that 2 is the conditional likelihood of y| .
8/8/2019 A Rag Aw 2
32/106
28 CHAPTER 4. MIXED MODEL THEORY
(a) For given ( i , 2), the estimator is sufficient for , so it is this conditionallikelihood which is used to nd the most powerful similar tests of hypothesesconcerning (
i, 2).
(b) estimators ,s2 are jointly sufficient for ,2 for any given so that most power-ful similar tests for alone may be constructed using the conditional distributionof (y|, s 2). Now s2/ 2 2n p so that the density function of s2,
f 1(s2) =sn p2 exp s
2
2 4
n p2n p
2 n p2
and the distribution of ( y|, s 2),
f 2(y|, s
2
)exp log |V |+ log |X T
V 1
X |+ ( n p2)log s2
does not involve , 2. At = 0, V = V ( 0),
= ( X T V 1X )X T V 1ys2 = ( y X )T V 1(y X )
A most powerful similar test of
H 0 : = 0 versusH 1 : = 1
is 3( 1) 3( 0) K where K is chosen to give some signicant level.(iv) Unbiassed estimating equations.
= 12
n log(2 2) + log |V |+ 2(y X )T V 1(y X )where V = V (). The MLEs are derived by equating derivatives to zero,
= 2X T V 1(y X )
2 = (n/ 22
) + (1 / 24
)(y X )T
V 1
(y X ) i
= 12
log |V | i
+ 2(y X )T V 1 i
(y X )
The expectations of each of these expresions is zero. Moreover E = 0 when 2,
is substituted for 2, . Thus remains an unbiassed estimating equation giving
= ( X T V 1X )X T V 1y .
8/8/2019 A Rag Aw 2
33/106
4.2. PROPERTIES OF REML 29
However if is substituted for in 2 and
i , the expectation is no longer zero,
E (y
X )T V 1(y
X ) = ( n
p)2 .
Therefore,
E
2=
n22
+(n p)2
24=
n22
+(n p)
22.
Hence,
2 E
2=
(n p)22
+(y X )T V 1(y X )
24
becomes the unbiassed estimating equation for true 2 used in V 1. For estimating i , rst note that
E (y X )(y X )T = 2 V X (X T V 1X )1X T .The derivative for i has expectation given by,
2E i
= log |V |
i+ 2 E tr
i
(V 1(y X )(y X )T
= log |V |
i+ tr
V 1 i
V X (X T V 1X )1X T
= tr X T V 1 i
X (X T V 1X )1
= log
|X T V 1X
| i .The unbiassed estimating equation after is replaced by in i is
i
log |V |+
ilog |X T V 1X |+ 2(y X )T
V 1 i
(y X ) = 0which is the same as in previous expressions of REML.The construction of estimating equations via
E
= 0
is known as prole likelihood estimation [1].Write
= E
= 0
ie. = where = E d .
8/8/2019 A Rag Aw 2
34/106
30 CHAPTER 4. MIXED MODEL THEORY
Hence can be considered an adjustment term for the log-likelihood.For normal distributions, the adjustment is
= 12 log{2var( )}= 12 log |2 I |where I | is the information matrix for or X T V 1X .
4.3 Orthogonality of parametersOur aim is to estimate the conditional distribution of the interest parameters, given theMLE for nuisance parameters.
In (4.2), the parameter set is partitioned into 2 vectors of lengths p1, p2. In the mixedmodel we have stipulated that random effects are uncorrelated. Cox and Reid (1987) denethe orthogonality property for = ( , ), of lengths ( p1, p2), by
i, =1n
E
=1n
E 2
= 0 (4.11)
where i, is an element of the Information matrix. Orthogonality between interest andnuisance effects is a key condition of the REML method, even if it is local rather thanglobal orthogonality. The element i is assumed to be O(1) as n .1If (4.11) holds at only 1 parameter value 0, it is locally orthogonal at 0. Localorthogonality can always be achieved in a Hilbert space but global orthogonality is possibleonly in special cases.
For = ( , ),(a) the MLEs, and are asymptotically independent,
(b) se( ) is the same whether is known or not,
(c) the estimation of ( , ) is simpler,
(d) MLE of when is known, , varies only slowly with .To study the last point, expand the joint likelihood near ( , ),
(, ) = (, ) + ( ) ( ) ,
=0
+
12 ( ) ( )
2
2 2
2
2
2 ,
( )( )
+
O p ( ) ( )3
(, ) +12 n j, ( )
2 2n j ( )( ) n j ( )2(4.12)1
O(1) is explained in the appendix.
8/8/2019 A Rag Aw 2
35/106
4.4. MIXED MODEL EQUATIONS (MME) 31
where j = i + Z / n, E (Z ) = 0 and Z = O p(1).Rewriting (4.12) in terms of i and Z and differentiating wrt , satises the relation
ni ( ) + nZ ( ) + 12( )2n Z + 12( )
2n i
+ . . . = 0
Because of the following results ,Z
= O p(1)i = O(1)
= O p 1n = O p 1n
,
then iff i = 0 (ie orthogonality), the rst term is
O p(n) and the remaining terms are
O(1).The idea that is similar for different is illustrated in Figure 4.1 where = and = 2.
If = for all , then and are orthogonal parameters. Models from the expo-nential family contain as part of the canonical parameter and as the complementarypart of the expectation, eg.
Normal , 2
Gamma , (shape and scale parameters)Typically is orthogonal to
1, . . . ,
p2where is interest and the s are nuisance
but they can be of interest. This is the basis of REML - nding a representation of thenuisance effects orthogonal to the interest effects.
4.4 Mixed Model Equations (MME)From (4.1), we have
var( y ) = V = ZDZ T + R (4.13)X T V1X = X T V1y (4.14)
The estimating equations for the mixed model areX T R 1X X T R 1ZZT R 1X Z T R 1Z + D1
Bu =
X T R 1yZT R 1y (4.15)
Because the equations are solved iteratively, D may become singular and a preferablealternative to (4.15) is
X T R 1X X T R 1ZZT R 1X Z T R 1Z + I
B =
X T R 1yZT R 1y (4.16)
8/8/2019 A Rag Aw 2
36/106
32 CHAPTER 4. MIXED MODEL THEORY
Figure 4.1: Estimation of interest, given nuisance parameters and orthogonality
1 2
3 4
5 6
7 8
N u0 .2
0 .3
0 .4
0 .5
S i g m a 2
- 4 0 0
- 3 8 0
- 3 6 0
- 3 4 0
- 3 2 0
- 3 0 0
- 2 8 0
- 2 6 0
L o g - l
i k e
l i h o o
d
where D = u.
Taking the matrix results on faith,
Bu
=(XV 1X )X T V 1yDZ T V 1(y X )
(4.17)
8/8/2019 A Rag Aw 2
37/106
4.4. MIXED MODEL EQUATIONS (MME) 33
The algorithm is
1. Derive variance components of V by REML.
2. Given the variance parameters, calculate B by Generalised Least Squares (GLS).
3. Given variance components and B , calculate u .
Some points of note:-
(i) u is referred to as the Best Linear Unbiased Predictor (BLUP).
Although B is not invariant to the choice of ( X T V1X )1, the occurrence of B inu = DZ T V1(y XB ) is such that u is invariant to ( X T V1X )1 and u is the samefor all solutions of B .
(ii) Under normality assumptions, u is exactly the same as E (u|y), save for in placeof . Since under normality is the m.l.e. of when D and V are known, we canrefer to
u =
E (u |y ) = DZ T V 1(y XB )as the ML estimator of the conditional mean E (u|y).
8/8/2019 A Rag Aw 2
38/106
34 CHAPTER 4. MIXED MODEL THEORY
8/8/2019 A Rag Aw 2
39/106
Chapter 5
Variance Models
Systematic effects are modelled by curves, means etc. and random efects are modelledby density functions. At the heart of the multivariate density function is the covariancematrix. In the same way as we seek to generalise the systematic effect through models,we also nd smooth models for densities to conserve degrees of freedom yet retain theimportant information.
Before launching into a mixed-model analysis, recall that there must be repeated mea-sures (in time, space or measurement) from a sampling unit to have a mixed model withrandom subject effects. That means we have to mentally organise the data according toits source, ie. subject, repeated measures, treatment etc.
5.1 Variance ComponentsWrite the mixed model with separate the mean and covariance structures,
Y = X + E (5.1)
andEN (0, V ()) (5.2)
We assume that E can be additively decomposed into
1. random effects,
2. serially correlated variation and
3. measurement error,
E i,j = Z i u i + W i (t ij ) + i,j , (5.3)
and
var( E i,j ) = ZDZ + 2r R + 2I . (5.4)
35
8/8/2019 A Rag Aw 2
40/106
36 CHAPTER 5. VARIANCE MODELS
We decompose the systematic part of the model into components guided by the experi-ment design and exploratory data analysis. Once we recognise the structure of the randompart, we can explore for likely components.
5.1.1 Pure Serial CorrelationAssume that there are repeated measurements over time from a single subject,
y1, y2, . . . , yn ,
so that equation (5.3) reduces toE i,j = W i (t ij )
and (5.4) isvar( E i,j ) = 2R
The sample covariance amongst the E i,j is1121 22
... . . .
... . . .n 1 nn
= 2
1121 22... . . .... . . .
n 1 nnIf we use an outo-correlation function (u) to model the correlation of observations that areu apart, eg. (u) = exp( u), the covariance matrix is modelled in terms of the parameter.
V = 2
(0)(1) (0)
(2) (1) . . .... . . .
(n 1) (1) (0)5.1.2 Random effects plus measurement errorThe components from equation (5.3) are
E i,j = Z i u i + i,j .
Random Intercepts
If the random effects are scalar intercepts,
var( E ) = 22u2
J + I .
The variance of each E j is 2 + 2u and the correlation between any 2 measurements fromthe same unit is = 2u / (2 + 2u ).
8/8/2019 A Rag Aw 2
41/106
5.1. VARIANCE COMPONENTS 37
Random Slopes
If
Y = X +
Z
1 t11 t2...
...1 tK
1 t11 t2...
...1 tK
u 11u 12u 21u 22
+
where u 1 and u 2 are Normal random vectors with variances 21
and 22, representing inter-
cepts and slopes respectively,
var( E j ) = 21 + t2 j
22 +
2 .
For j = k,Cov(E j , E k ) = 21 + t j tk
22
where we see that covariance increases as t increases.
Correlation may be represented by random effects
Diggle, Liang and Zeger (1994), page 88 state
Whilst serial correlation would appear to be a natural feature of any longitu-dinal model, in specic applications its effect may be dominated by the combi-nation of random effects and measurement error.
5.1.3 Random effects, Serial Correlation and Measurement ErrorIf in the above example the time variable was not included in the Z matrix, the variancemodel would have components due to random intercepts, measurement error and as likely,there would be serial correlation.
The variance matrix (5.4) becomes
var( E ) = 2 I +2u2
J +2r2
R , (5.5)
where
I is the identity matrix, J is a matrix of 1s or 11 , R is a correlation matrix.The information on 2u comes from replication of treatment units, 2r is derived fromamongst times within units and 2 is residual.
8/8/2019 A Rag Aw 2
42/106
38 CHAPTER 5. VARIANCE MODELS
5.1.4 The variogramThe diagnostic to show the relative importance of each of the random components is thevariogram (see [2]), dened as
(v) =12
E {Y (t) Y (t v)}2 , v 0In a rst pass at the data, a saturated model is tted for the xed effects, saving the
residuals. The sample variogram is a plot of the observed half squared differences betweenpairs of residuals (vijk ) and plotted versus the corresponding time (or space) differences(ijk ),
vijk =12
(r ij r ik )2, ijk = t ij t ik .
A smooth line regressed through the points ( , v).The information contained in the vijk would have the contributions from
Measurement error, Components due to random effects, Serial correlation.Figure 5.1 is an idealised variogram which shows how the components affect the shapeof the curve. From the sample variogram of the residuals, we can gauge what variancecomponents should be modelled and when the variance model is satisfactory, the variogramshould be at.
Figure 5.1: Variogram
8/8/2019 A Rag Aw 2
43/106
5.2. MATRIX RESULTS 39
5.2 Matrix results
5.2.1 TearingSolution for B requires inverting V but as D is often block diagonal and R either diagonalor patterned (eg banded), we use the result that
V n ,n 1 = ( Zn,q D q,q ZT q,n + R )1n,n= R 1 R 1Z(D1 + ZT R 1Z)1ZT R 1 (5.6)
Now if R has a simple inverse and q
8/8/2019 A Rag Aw 2
44/106
40 CHAPTER 5. VARIANCE MODELS
5.3 Nested Random EffectsConsider a eld trial where spatial variation can be anticipated and where yields fromadjacent plots may be more alike (ie. higher correlation) than when the plots are morewidely separated.
row 1
row r
column1 c
1,1 1,c1,c
1,r r,c
One model would be to regard rows and columns as random effects,
Y 11...
Y 1c
Y r 1...
Y rc
= X +
1...
1 1...1
1...1
Ur1
...Urr
+
1. . .
11. . .
1
1. . .
1
Uc1
...Ucc
+
Rather than estimating individual row and column effects (which are nuisance), wecould detrend the spatial variation amongst residuals across the eld with a model of random effects. This has the effect of mathematically adjusting the yields from individualplots to what they would be on the average plot, ie. subtract the plot effect.
If the correlation amongst rows is r (|i j |) and columns is c(|m n|), the correlationbetween 2 plots which are dr rows apart and dc columns apart may be reasonably repre-sented by r (dr ) c(dc); this is called a separable process. The rc rc correlation matrixis
rc,rc = R r,r C c,c .
8/8/2019 A Rag Aw 2
45/106
8/8/2019 A Rag Aw 2
46/106
42 CHAPTER 5. VARIANCE MODELS
Dene distributions,
b iN (0, i) , iN (0,
2I) , cov(b i, i) = 0 ,
where
i=21 + 22 21 2121 21 + 22 2121 21 21 + 22
.
The imatrix is compound symmetric.The R code for this model and the output are listed below.
library(nlme)data(Oats) mm1.Oats
8/8/2019 A Rag Aw 2
47/106
5.4. PATTERNED MATRICES IN R 43
5.4.2 Split-plot experiment on Oats - alternative way 2Dene the design matrix and random effects as
Z i =1 1 0 01 0 1 01 0 0 1
1111
, b i =
bibi, 1bi, 2bi, 3
and b i N (0, ). The covariance matrix for random effects is
i =
2122
22
22
the R code and output is
mm2.Oats
8/8/2019 A Rag Aw 2
48/106
44 CHAPTER 5. VARIANCE MODELS
5.5 Crossed Random EffectsCrossed random effects are modelled by a combination of pdBlocked and pdIdent objects.
Example
The data are log(optical density) measures from cell cultures in 2 blocks of 30 wells,
1 2 3 4 5
6
5
4
3
2
1
Dilutions
There are 6 samples (treatments) randomly assigned to rows and 5 serial dilutionsrandomly assigned to columns. the data are Assay in the nlme library.
The systematic effects are sample*dilut and the random effects areblock/(row + column) .
Index the blocks by i, the rows by j , columns by k. Then
yijk = + s j + dk + ( s : d) jk + bi + r ij + cik + ijk ,biN (0,
2b ) r ij N (0,
2r ) cij N (0,
2c ) ijk N (0,
2)cov(b, r) = 0 cov(b, c) = 0 cov(r, c ) = 0
Write the vector of random effects as
U i =
bir i, 1
...r i, 6ci, 1
...ci, 5
.
Then
var( U i ) =2b
2r I62c I5
.
That is U i has a block-diagonal structure with each block being a multiple of the identity.The R code to construct this is
8/8/2019 A Rag Aw 2
49/106
5.5. CROSSED RANDOM EFFECTS 45
data(Assay) mm.assay
8/8/2019 A Rag Aw 2
50/106
46 CHAPTER 5. VARIANCE MODELS
8/8/2019 A Rag Aw 2
51/106
Chapter 6
Change Over designs
Designs in which each experimental unit receives a cyclical sequence of several treatmentsin successive periods are known as change-over designs. Historically they found favourbecause subject effects could be eliminated from experiment error but with the penaltythat performance in a given period might reect not only the direct effect of the treatmentbut also the residual effects of preceding treatments. Another reason for using change overdesigns is to get replication over periods when there is a shortage of experiment materialand the researcher feels it is safe to get extra replication by giving more than 1 treatmentto each experiment unit. Animal pen studies are sometimes done this way.
Direct and residual effects can be separated by appropriate choice of treatment se-quences when it can be assumed that the residual effects persist only one period. Weconsider (i) balanced and (ii) partially balanced change over designs which may have anextra period which are formed by simply repeating the nal period. The extra periodprovides the property that each treatment is preceded by the others equally often - ie atreatment is also preceded by itself. If rst residuals are important, the extra period designis better but the extra period is unnecessary if residual effects are negligible.
In balanced designs, have all treatment contrasts of equal precision and in partiallybalanced designs, some contrasts are estimated with greater precision than others.
6.1 Latin squares and incomplete blocks
The notation is
t - treatments
p - periods
b - blocks
k - number of units per block47
8/8/2019 A Rag Aw 2
52/106
48 CHAPTER 6. CHANGE OVER DESIGNS
Table 6.1: Change-over design for 4 treatments using orthogonal Latin squares
Square 1 Square 2 Square 3 Square 4UNITSPeriod 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 42 2 1 4 3 3 4 1 2 4 3 2 1 2 4 1 33 3 4 1 2 4 3 2 1 2 1 4 3 3 1 4 24 4 3 2 1 2 1 4 3 3 4 1 2 4 3 2 1
Table 6.2: Change-over design for 7 treatments and 4 periods in blocks of 7
period Incomplete Square 1 Incomplete Square 21 1 2 3 4 5 6 7 1 2 3 4 5 6 72 2 3 4 5 6 7 1 7 1 2 3 4 5 63 4 5 6 7 1 2 3 5 6 7 1 2 3 44 7 1 2 3 4 5 6 2 3 4 5 6 7 1
The simplest type of change over design is a latin square with rows representing periodsof time and columns representing experimental units. If there were no carry over effects,the data would simply be analysed as if it arose from an ordinary latin square. Residualeffects are allowed for by including terms for them in the statistical model.
Complete sets of orthogonal Latin squares, eg Table 6.1 will ensure that each treatmentis preceded by each other equally often but limits on resources will usually not allow this.
Many of the design properties are retained when rows are dropped from the latinsquare ( p < t) or for an incomplete block where k < t . In an incomplete latin square, eachtreatment receives an incomplete set of treatments.
In order to estimate direct and residual effects, block size must be at east 3.
6.2 Analysis
If t1, t2, t3 represent the direct effects of 3 treatments and r 1, r 2, r 3 the residual effects,the total effects of a sequence of treatments 1,2,3 are represented by
1. t1
2. t2 + r 1
3. t3 + r 2
Thus the statistical model has components due to
8/8/2019 A Rag Aw 2
53/106
6.2. ANALYSIS 49
1. Blocks (random effects)
2. Periods within Blocks (random effects)
3. Units within Blocks (random effects)
4. Direct effects (xed effects)
5. Residual effects (xed effects)
and we would regard items 1,2,3 as nuisance and items 4 and 5 as interest.
Example 6.1
The design is 4 blocks of orthogonal 3
3 Latin squares with an extra period where
the rows are Periods and the columns are units.
Block 1 Block 2 Block 3 Block 4Unit Unit Unit Unit
Period 1 2 3 1 2 3 1 2 3 1 2 31 1 4 3 1 3 2 3 4 2 1 2 42 4 3 1 3 2 1 4 2 3 2 4 13 3 1 4 2 1 3 2 3 4 4 1 24 3 1 4 2 1 3 2 3 4 4 1 2
The data shown in Table 6.3 are milk yields (fcm) and the extra periods, indicated by
, could be dropped to compare the basic design with the extra period design.Within each Block we have the additional blocking factors of Periods (the rows) andUnits (the columns), the random model is
Block + Block.Period + Block.Unit
where Block.Period would require 3 df for each block, ie 12 df, and there would be 8 df forBlock.Unit .
The systematic effects are the direct and residual treatment effects.The statistical model for these data is
yitsu = i + R i ,(t1) + B s + B s .P t + B s .Uu + itsuBsN (0,
2B ) Bs .P t N (0,
2BP ) Bs .Uu N (0,
2BU ) N (0,
2)
8/8/2019 A Rag Aw 2
54/106
8/8/2019 A Rag Aw 2
55/106
6.2. ANALYSIS 51
Example 6.2
A Incomplete Block design for 6 treatments in 4 blocks, 4 periods and 4 units per blockis shown below.
Block 1 Block 2 Block 3Unit Unit Unit
Period 1 2 3 4 1 2 3 4 1 2 3 41 1 2 5 4 4 6 3 1 5 2 6 32 4 1 2 5 1 4 6 3 6 5 3 23 2 5 4 1 6 3 1 4 2 3 5 64 5 4 1 2 3 1 4 6 3 6 2 5
The model is the same as before and the essential R code is
XOIB.g
8/8/2019 A Rag Aw 2
56/106
52 CHAPTER 6. CHANGE OVER DESIGNS
6.3 Change Over Designs - computing Guide milk
8/8/2019 A Rag Aw 2
57/106
8/8/2019 A Rag Aw 2
58/106
54 CHAPTER 6. CHANGE OVER DESIGNS
8/8/2019 A Rag Aw 2
59/106
8/8/2019 A Rag Aw 2
60/106
56 CHAPTER 7. SEMI-PARAMETRIC REGRESSION
Hermite polynomials,b0(x) = 1 , b1(x) = 2 x, b2(x) = 4 x2
2, b3(x) = 8 x3
12x,
bn (x) = e2nx n2
, 11 bm (x)bn (x) = 0 cubic splines
b1 = x , b j = |x x j |3 , j > 1where x j are the knot points.
Hence (7.2) can be written as a linear model and in the case of cubic splines,
= 0 + 1x 1 +k1
j =1 j +1 |x1 x1,j |+ . . . or
= X
At this point the degrees of freedom depend on the number of knot points . To avoidinuence of the knot points, the model overts and controls smoothness by a penaltyfunction. So it may be considered as putting in a lot of df and then taking out theredundant ones where the data indicate they are not needed . Consequently we may end upwith fractional df.
It is timely to note the Littlewood-Paley-Stein theorem,
Theorem 1 If 1 < p < , there exist 2 constants, C p c p > 0 such that for all functionsbelonging to L p(Rn ),c p||g|| p ||f p|| C p||g|| p ,
where
||f || p = Rn |f (x)| p1p
and
g(x) =
x= |
2j k2j+1
(ak cos kx + bk sin kx)|212
The term g(x) is the Fourier transform of the data y and this theorem is statingproperties of the basis functions used to represent y. With well chosen basis functions,c p and C p will be close p. With poorly chosen basis functions, the addition of extraterms is not so much ne tuning as xing the discrepancies of the low order terms andso the representation of y by bi (x) has localized hot spots of spectral energies. If thespectral energy is distributed evenly across x, ||f || p will be approximately the same for p = 2 , 4, 6, . . ..
8/8/2019 A Rag Aw 2
61/106
7.2. GAMS IN R 57
The penalty function for splines is the roughness dened by
[s (x)]2
dx = T S
and the s are estimated by minimizing
( ) +i
i T S where i are weights.
In the GLM setting we require i,i = y i i and W i,i = iy i /
i i , being the canonical
parameter. Then the steps in tting a semiparametric model are:-
1.z = + ( y )
2. nd i that minimizes
||W 12 (z X ||2
tr( I A)2where A = X (X T WX + i i
T S )1W is the hat matrix and tr( A) = df.
This algorithm is known as generalized cross validation.
7.2 GAMs in R
The mgcv library in R 1 ts Generalized Additive Models using generalized cross validation.The example is for modelling elbow ux over time. A subject receives some physiotherapytreatment and the blood ow through the elbow is measured (as elbow ux) for the nexthour. In the study, there were many subjects but only the data from one is used here to
demonstrate GAMS. The data are saved inelbowdat.csv
and an exploratory data analysisshows a highly nonlinear trend of ef over time .
library(mgcv);library(lattice)elbow.df
8/8/2019 A Rag Aw 2
62/106
8/8/2019 A Rag Aw 2
63/106
7.2. GAMS IN R 59
Figure 7.1: The systematic effect of time on elbow ux.
0 10 20 30 40
0
. 4
0
. 2
0 . 0
0 . 2
0 . 4
time
s ( t i m e
, 3 . 8
7 )
time
e l b o w
f l u x
300
400
500
0 10 20 30 40
q
q
q
q
q
q
qq q
q
q
q
q q
q
q
fit.plot
8/8/2019 A Rag Aw 2
64/106
60 CHAPTER 7. SEMI-PARAMETRIC REGRESSION
ExampleThe next example shows how to t GAMs within a group such as the levels of a
treatment.The data are Glucose concentrations in sheep taken over about 2 hours after the sheep
had been injected with hormones,(a) Cortisol (C), (b) Glucagon (G), (c) Adrenalin (A), (d) C+A, (e) C+G+A.The full data are on the web site and only the rst few lines are shown here.
Sheep Weight Treatment Time Glucose148 32 C 0 2.97148 32 C 2 3.06148 32 C 5 3.21148 32 C 10 3.31148 32 C 15 3.39
148 32 C 20 3.44
library(mgcv)Glucose.df
8/8/2019 A Rag Aw 2
65/106
7.2. GAMS IN R 61
The concept of an interaction does not apply in semi-parametric regression and the effect of time is estimated within each treatment. This requires that we set up an indicator variable foreach treatment and use this to specify that the spline t is within that group.
trt.levels
8/8/2019 A Rag Aw 2
66/106
62 CHAPTER 7. SEMI-PARAMETRIC REGRESSION
Figure 7.3: Fitted responses from GAM models within treatment levels
0 20 40 60 80 100 120
4
2
0
2
A
0 20 40 60 80 100 120
4
2
0
2
C
0 20 40 60 80 100 120
4
2
0
2
CA
0 20 40 60 80 100 120
4
2
0
2
CGA
0 20 40 60 80 100 120
4
2
0
2
G
3
2
1
0
1
2
A C CA CGA G
8/8/2019 A Rag Aw 2
67/106
7.2. GAMS IN R 63
The prole of tted values is used to estimate, with 95% CIs, the time to max concentration.#_____________________________________________closest
8/8/2019 A Rag Aw 2
68/106
64 CHAPTER 7. SEMI-PARAMETRIC REGRESSION
Time
G l u c o s e n a n o
m o
l e s /
l i t r e
4
6
8
10
0 20 60 100
q
q
q
q
q
q
A
0 20 60 100
q q q qq q
C
0 20 60 100
q
q
q
qq q
q
q
CA
0 2 0 60 100
q
q
q
q
q
q
q
q
q
q
CGA
0 20 60 100
q
q
q
q
q q
q
q
G
7.3 Smoothing splines are BLUPSIn his comment upon Geoff Robinsons 1991 paper, That BLUP is a Good Thing: The estimation of Random Effects [8], Terry Speed [11] showed that splines are BLUPS. This permits the non-linear smooth splines to be implemented in a linear mixed model which substantial power formodelling data. This is expertly developed in Verbyla et al. (1999) and the models can be ttedusing the specialized package ASREML written by Arthur Gilmour.
Theory and implementation are also given in the book Semi-parametric Regression by DavidRuppert, Matt Wand and Ray Carroll [9].
Consider a linear modely = g(x) + , N (0,
2R )
and write -2 log-likelihood as
= 2log |2R | 1
22(y g )T R 1(y g ) + s g (x) 2 dx
The solution through REML estimation of variance components and mixed model estimatingequations utilizes matrices which are functions of difference between samplings of the variable inthe spline function. Dene these differences as h j = x j +1 x j . Then
=
. . .
. . . . . .
. . . . . . . . .1
h i +1 1h i + 1h i +1 1h i. . . . . . . . .
n,n 2
,
8/8/2019 A Rag Aw 2
69/106
7.4. SPLINES IN THE GLM USING ASREML 65
and
G =
. . . . . .
. . . . . . . . .h i +1
6h i + h i +1
3h i +1
6. . . . . . . . .
n 2,n 2At the design points,
g = R 1 + s G1 R 1y (7.3)
Equation (7.3) can be reparameterized into the standard form for a mixed model by utilizing,
X s =1 x1...
...1 xn
Z s = ( T ) 1H = 2 R + 1s Z s GZ T s
so that
g = X s s + Z s us
where s = ( X T s H 1X s )1X T s H 1y
us = ( Z T
s R1
Z s + s G1
)Z T
s R1
(y X s )
s =2
2s= 1s
The log-likelihood is a conditional likelihood of y|us and the penalty is the log-density functionof us . Splines in GLMs are tted using the method of Schall (1991) [10].
7.4 Splines in the GLM using ASREMLThe ASREML program [4] is not simple but its power makes it worth having and learning. Theroutines are in GENSTAT and a S-PLUS module is being ne tuned (I think). I use a primitiveR function called asr() which sets the job up and after some translation, passes it to the compiledASREML program. My R code is
attach("/ASREML/.RData",pos=2) model2
8/8/2019 A Rag Aw 2
70/106
66 CHAPTER 7. SEMI-PARAMETRIC REGRESSION
elbow fluxSUBNO 25 !ATRT 19 !Ainttreatment 1 !Aaetuetahtuhtascuscefhf
elbow.asd !skip 1ef !GAMMA !LOG ~ mu int !r spl(int) SUBNO SUBNO.int
The t is similar to that with gam() in R
Figure 7.4: Smooth splines for a Gamma model in ASREML
0 10 20 30 40 50 60
0.3
0.2
0.1
0.0
0.1
0.2
0.3
int
Y h a
t
8/8/2019 A Rag Aw 2
71/106
Chapter 8
Longitudinal Data
Repeated Measures data are those which consitute a set of repeated measures, over time, from thesame sampling unit. Because the sampling is from the same unit, the data contains informationfrom the unit (ie random effect) and the systematic effect of time. In conjunction with this, themeasurements are correlated because they all have something in common, the unit itself. Anotherpotential source of correlation is that an observation is inuenced by previous observations andthis is termed autoregressive correlations.
Modelling of repeated measures data needs to account for the correlations so that the modelcan best allocate the information between systematic and random effects.
8.1 The data le
Early repeated measures models were cast as multivariate models in order to capture the corre-lations amongst the data. This idea was superseded and now we analyze repeated measures in aunivariate fashion,
Y 1Y 2...Y
=
X 1 0 0 00 X 2 0 0
0 0 . . . 00 0 0 X
1 2... p
+ .
We have the time variable as a column of the X i s,
X i =
1 0 . . . 1 . . . t i, 11 0 . . . 1 . . . t i, 2... ... ... ... ... ...1 0 . . . 1 . . . t i,n i
The important difference between this model and simple linear models is that elements of are not assumed to be independent or to have constant variance,
67
8/8/2019 A Rag Aw 2
72/106
68 CHAPTER 8. LONGITUDINAL DATA
An example of data le on carbon form Tillage and Rotation treatments, measured at0,3,4,6,12 months is:-
Plot Sample Rotation Tillage Block Time CP1 S1 R1 T1 B1 0 1.53P2 S2 R1 T1 B1 0 1.47P3 S3 R1 T1 B1 0 1.29
. . . . . . .P9 S3 R2 T2 B1 0 2.06P10 S1 R1 T1 B1 0 1.4P11 S2 R2 T1 B1 0 1.46P12 S3 R2 T1 B1 0 1.53P1 S1 R1 T1 B1 3 1.21P2 S2 R1 T1 B1 3 1.17
P3 S3 R1 T1 B1 3 1.31. . . . . . .
P22 S1 R1 T2 B2 3 1.39P23 S2 R1 T2 B2 3 1.91P24 S3 R1 T2 B2 3 1.42P1 S1 R1 T1 B1 4 1.04P2 S2 R1 T1 B1 4 1.12P3 S3 R1 T1 B1 4 1.07
. . . . . . .P22 S1 R1 T2 B2 4 1.25P23 S2 R1 T2 B2 4 1.21P24 S3 R1 T2 B2 4 1.23P1 S1 R1 T1 B1 6 1.44P2 S2 R1 T1 B1 6 1.39P3 S3 R1 T1 B1 6 1.15
. . . . . . .P10 S1 R2 T1 B1 6 1.23P11 S2 R2 T1 B1 6 1.17P13 S1 R2 T1 B2 6 1.29P14 S2 R2 T1 B2 6 1.14
. . . . . . .P22 S1 R1 T2 B2 12 1.83P23 S2 R1 T2 B2 12 1.70P24 S3 R1 T2 B2 12 1.40
In this case all treatments are sampled at the same times 0,3,4,6 and 12 months and the datahave been blocked for each sample time. Note that Plot=P12 at Time=6 is missing so that datais either deleted or denoted as a missing value NA.
8/8/2019 A Rag Aw 2
73/106
8.1. THE DATA FILE 69
The next example is measurements of Metabolizable Energy of deer which are born at differenttimes and hence are sampled on different days. There are 25 animals and the data are blockedas all the repeated measurements for the rst animal, followed by the repeated measurements forthe second and so on.
Tag Strain Sex Age Agewks ME Lwt204 3 1 227 32.43 123.22 71.50204 3 1 234 33.43 125.56 73.00204 3 1 241 34.43 122.94 75.00204 3 1 248 35.43 123.00 74.50204 3 1 255 36.43 117.83 75.00204 3 1 262 37.43 107.25 75.50204 3 1 269 38.43 108.58 75.00
. . . . . . .
207 3 1 224 32.00 133.05 67.50207 3 1 231 33.00 131.76 67.50207 3 1 238 34.00 137.45 67.50207 3 1 245 35.00 128.62 68.00207 3 1 252 36.00 129.41 69.00207 3 1 259 37.00 129.24 69.00
. . . . . . .293 4 2 540 77.14 108.91 96.50293 4 2 547 78.14 151.51 98.50293 4 2 554 79.14 131.97 98.50293 4 2 563 80.43 125.17 99.50293 4 2 568 81.14 138.74 101.00293 4 2 575 82.14 128.27 96.50293 4 2 582 83.14 121.92 99.50
8/8/2019 A Rag Aw 2
74/106
70 CHAPTER 8. LONGITUDINAL DATA
If there are repeated measurements in two directions, eg time and depth, the data le is setup similar to the above but with both classifying variables, eg
Hole Treatment Stratum Depth Calendar WCE2B Control Beneath 20 07/08/95 19.62E2B Control Beneath 20 07/25/95 18.98
. . . . . .E2B Control Beneath 40 07/08/95 17.64E2B Control Beneath 40 07/25/95 17.74
. . . . . .E2B Control Beneath 60 07/08/95 16.97E2B Control Beneath 60 07/25/95 16.86
. . . . . .E2B Control Beneath 80 07/08/95 17.88
E2B Control Beneath 80 07/25/95 17.92. . . . . .
E2B Control Beneath 100 07/08/95 11.8E2B Control Beneath 100 07/25/95 11.5
. . . . . .E2B Control Beneath 120 07/08/95 9.41E2B Control Beneath 120 07/25/95 9.41
. . . . . .F1B Deforest Beneath 20 01/27/94 16.0F1B Deforest Beneath 20 02/11/94 14.70
. . . . . .F1B Deforest Beneath 40 01/27/94 17.30F1B Deforest Beneath 40 02/11/94 14.91
. . . . . .F1B Deforest Beneath 60 01/27/94 18.54F1B Deforest Beneath 60 02/11/94 14.40
. . . . . .F1B Deforest Beneath 80 01/27/94 19.82F1B Deforest Beneath 80 02/11/94 14.86
. . . . . .F1B Deforest Beneath 100 01/27/94 23.69F1B Deforest Beneath 100 02/11/94 18.65
. . . . . .F1B Deforest Beneath 120 01/27/94 23.86F1B Deforest Beneath 120 02/11/94 23.69
. . . . . .F1B Deforest Beneath 140 01/27/94 20.10F1B Deforest Beneath 140 02/11/94 21.63
The data are unbalanced in that sampling times are different for sites and the numbers of depthsdiffer between treatments. The imbalance does not complicate the analysis.
8/8/2019 A Rag Aw 2
75/106
8/8/2019 A Rag Aw 2
76/106
72 CHAPTER 8. LONGITUDINAL DATA
8.3 Statistical model for repeated measuresThe correlations and changing variances which are charactersitic of longitudinal data are ofteninduced by the units such as animals or plots. This is because over time, observations within aunit may be more alike than those of the same treatment but sampled from another unit. Hencethe random component is affected by sampling units.
The basic model is
y = treatment + time + treatment : time + ERROR , ERROR A distribution
Note how general this is and the error model is not restricted to a simplistic (and unrealistic)i.i.d. Momentarily we regard both treatment and time as factors for which we need to constructcontrasts to get a more specic model. Also, we have recognised that the random structure maycontain a number of model terms.
The model may then contain these components,
y = treatment + time + treatment : time + unit + unit : time + error
unit N (0, 2u J 1)
unit : time N (0, 2ut J 2)
error N (0, 2 R)
The symbols J 1, J 2, R denote matrices which give the weights to be applied to the variancecomponents for each part of the model.
That is the model for the random components may include,
Measurement error, Components due to random effects, Serial correlation
8.3.1 Random componentsA line plot of the response over time for each unit can sometimes tell us about the randomcomponent. If the plot reveals basically parallell lines but with different intercepts, the appro-priate model regards the intercept as a random effect. In this case, the correlations amongst therepeated measures are uniform.
The more usual case is that the lines fan out because the individuals change differently overtime. In that case, the interaction of the individual and the systematic time effect would be ttedas a random effect and this representation leads to a covariance structure that is dependent uponthe differences between sampling times.
The following 3 examples are (i) Ca intake, (ii) egg weight measured on the same birds andthe ME of deer. The Ca intake data show a fan out over time suggestion and animal timeinteraction and the egg weight data suggest that differences are primarily due to animals only ora random intercept model. The deer data show a fanning out at the maximum.
8/8/2019 A Rag Aw 2
77/106
8.3. STATISTICAL MODEL FOR REPEATED MEASURES 73
xyplot(Caint ~ weekno|Ca*feed,type=l,data=Caint,panel=panel.superpose,groups=an)
0
5
10
15
20
25
Cafeed
20 30 40 50
Cafeed
Cafeed
20 30 40 50
Cafeed
Cafeed
20 30 40 50
0
5
10
15
20
25
Cafeed
weekno
C a
i n t
Traces display fanning out suggesting random animal time effect.
8/8/2019 A Rag Aw 2
78/106
8/8/2019 A Rag Aw 2
79/106
8.3. STATISTICAL MODEL FOR REPEATED MEASURES 75
xyplot(ME ~ Agewks|Sex*Strain,data=deerdf,type=l,panel=panel.superpose,groups=Tag)
0
100
200
300
SexStrain
30 40 50 60 70 80
SexStrain
SexStrain
0
100
200
300
SexStrain
30 40 50 60 70 80
Agewks
M E
Separation at maximum is bigger than at the start and the traces close up again.
8/8/2019 A Rag Aw 2
80/106
8/8/2019 A Rag Aw 2
81/106
8.3. STATISTICAL MODEL FOR REPEATED MEASURES 77
xyplot(ewt ~ weekno|Ca*feed,data=ewt,panel=function(x,y){panel.xyplot(x,y);panel.loess(x,y,span=1/4) })
40
50
60
70
Cafeed
30 35 40 45 50
Cafeed
Cafeed
30 35 40 45 50
Cafeed
Cafeed
30 35 40 45 50
40
50
60
70
Cafeed
weekno
e w t
8/8/2019 A Rag Aw 2
82/106
78 CHAPTER 8. LONGITUDINAL DATA
xyplot(ME ~ Agewks|Sex*Strain,data=deerdf,panel=function(x,y){panel.xyplot(x,y);panel.loess(x,y,span=1/4) })
0
100
200
300
SexStrain
30 40 50 60 70 80
SexStrain
SexStrain
0
100
200
300
SexStrain
30 40 50 60 70 80
Agewks
M E
8/8/2019 A Rag Aw 2
83/106
8.4. LINEAR MIXED MODELS FOR REPEATED MEASURES 79
8.4 Linear Mixed Models for repeated MeasuresExample 8.1
The example shown in Figure 8.1 are the body weights of rats that were published by Crowderand Hand ([ ?? ]) and the data set is included in the nlme library of R , see [7]. The essential Rcode loads the nlme, grid and lattice libraries and plots the traces of each rat with a separateframe for each Diet.
library(nlme);library(grid);library(lattice)data(BodyWeight)
xyp2
8/8/2019 A Rag Aw 2
84/106
80 CHAPTER 8. LONGITUDINAL DATA
8.5 A statistical model for repeated measuresAlthough the rat bodyweights data are simplistic, they suffice to identify the essential ingredientsof a model and the steps required to t the model to the data.
In following a unit through time, the response will be a combination of these effects,(i) treatment,(ii) time,(iii) treatment time,(iv) unit,(v) unit timeThe remaining component is the residuals which may not be independent if an observation at
say time t i is inuenced by the outcome at a previous sampling, say t i 1. This occurs when thereis feedback. Figure ?? shows the responses of subjects elbow ux (blood ow through the elbow)over 15 minutes. Peaks are followed by troughs; the saw-toothing is measurement error and
successive observations are negatively correlated. If we identify this correlation, the resultant twill be a smooth track through the observations but otherwise the model will struggle to interpretthe saw-toothing as systematic information if residuals are assumed independent.
The linear response for each rat is found conveniently in nlme() by the lmList() function andthese individual ts may help in explaining random effects when the model is tted. Condenceintervals for these estimates are derived using intervals() .
bwlist1
8/8/2019 A Rag Aw 2
85/106
8.5. A STATISTICAL MODEL FOR REPEATED MEASURES 81
16 503 0.905
Degrees of freedom: 176 total; 144 residualResidual standard error: 4.44
We t the model ( ?? ) to these data with the lme() function. In the next sequence of code,2 models are tted. The purpose of the rst is to get the AOV and the second is to get theregression coefficients in a convenient form.
mod1
8/8/2019 A Rag Aw 2
86/106
82 CHAPTER 8. LONGITUDINAL DATA
12 99.806 0.173013 -40.910 0.240615 19.478 -0.152414 21.131 -0.318216 0.300 0.2300
Example 8.2
These data are chick weights and are akin to the rats data but with more features. They arealso included as part of the nlme package, Pinheiro and Bates [7]. In the following R code, weload the library nlme to get the data and to use the lme() function. Plotting is done using thexyplot() function which is part of the lattice library. Many of these plotting functions also rely onfunctions from the grid library so we load it as well.
library(nlme);library(grid);library(lattice)
data(ChickWeights); print(names(ChickWeight))
xyp2
8/8/2019 A Rag Aw 2
87/106
8/8/2019 A Rag Aw 2
88/106
84 CHAPTER 8. LONGITUDINAL DATA
Figure 8.4: Proles of ME for 2 strains and M,F deer.
Agewks
M E
0
100
200
300
30 40 50 60 70 80 90
MRed
FRed
MPD
0
100
200
300
FPD
30 40 50 60 70 80 90
8/8/2019 A Rag Aw 2
89/106
8.5. A STATISTICAL MODEL FOR REPEATED MEASURES 85
Figure 8.5: Fitted proles and observed ME for 2 strains and M,F deer.
Agewks
M E
0
100
200
300
30 40 50 60 70 80 90
FPD
MPD
FRed
0
100
200
300
MRed
30 40 50 60 70 80 90
8/8/2019 A Rag Aw 2
90/106
86 CHAPTER 8. LONGITUDINAL DATA
8/8/2019 A Rag Aw 2
91/106
Chapter 9
Generalised Linear Mixed Models
The content of this is drawn from [2].The ideas of a linear mixed model extend to regression models for discrete and non-Gaussiancontinuous data. In the linear model, a random effects model is appropriate if the coefficientscan be thought of as a sample from a distribution. There must be repeat samples from eachexperiment unit to dene a random effects model.
In the non-Gaussian setting it is assumed that the data for an experiment unit are independentobservations following a GLM but regression coefficients can vary from unit to unit according toa distribution. So the assumptions are:-
1. The conditional distribution of Y ij given u i follows a distribution from the exponentialfamily with density f (yij |u i , )
f (yij |u i ) = exp [ {(yij ij ) (ij ))}/ + c(yij ; )]
ij = E (Y ij |u i ) = (ij ) = g1(x ij + zij u i ) (9.1)vij = var( Y ij |u i ) = (ij ) = v(ij ) (9.2)
(9.3)
where g() is the link function ( g1() is the inverse of the link function) and v() is thevariance function. The vectors x and z have dimensions p and q respectively.
2. Given u i , the repeated measurements Y i1 , . . . , Y n i are independent.
3. The u i are independent and identically distributed with density function f (u i ; G ). Com-monly, f (u i ; G ) is a normal distribution with zero mean and variance matrix G ( ).
Correlation amongst observations from a unit, Y i1, . . . , Y i,n i arises from their sharing unob-servable variables u i . The random effect model is most useful whenthe objective is to makeinference about individuals rather than population averages.
87
8/8/2019 A Rag Aw 2
92/106
8/8/2019 A Rag Aw 2
93/106
9.2. PENALISED QUASI-LIKELIHOOD METHODS 89
For a xed G , updated values of and U are obtained by iteratively solving
=m
i=1X i V 11 X i
1 m
i=1X i V 1i i (9.5)
U i = GZ i V 1i ( i X i )(9.6)
These are the mixed model equations applied to the transformed variable .From equation (9.4),
G = m1m
i=1E (U i U i |y i ) (9.7)
= m1m
i=1
E (U i |y i )E (U i |y i ) + m1m
i=1
var( U i |y i )(9.8)
Plugin values are
E (U i |y i ) = U ivar( U i |y i ) = m1
m
i=1(Z i Q1i Z i + G1)1
The approximation gives reasonable estimates of but the approximation is not reliable if there are few observations and the density of the transformed random variable is far from normal.
9.2 Penalised Quasi-likelihood methodsAn implementation of the above technique is the glmmPQLfunction in the MASSlibrary. this is awrapper that implements linearization within the lme() function.
The data in the le AlmondFlowers.txt are the frequencies of ower stages from 162 trees onAugust 24th, 2006. The data were determined from photographs and the purpose was to selectlate-owering trees that would minimise frost damage. The rst 10 trees are listed in Table 9.1
The ower stages are ordered categories which can be analysed as a proportional odds logisticregression. This requires that we calculate the cumulative counts across the ordered categories.The trees are random samples so enter the model as a random effect.
logit( p) = X + ZU
where p are the cumulative probabilities, X is the design matrix for categories and are thecategory effects, Z is the indicator matrix for trees and U are the random tree effects.
The following R program reads the data, transforms to cumulative counts, reshapes the datainto the long format prior to doing the analysis using glmmPQL.
8/8/2019 A Rag Aw 2
94/106
90 CHAPTER 9. GENERALISED LINEAR MIXED MODELS
Table 9.1: Frequencies of Flower stages at 1 sampling
No TreeTag Date Bud PtlTip HalfOpen Anth X1stPtlFall PostAnth OvarySwell Total1 1at02 1 0 0 2 84 18 29 0 1332 1at03 1 0 0 0 45 17 87 0 1493 1at04 1 3 12 2 67 2 1 0 874 1at05 1 4 11 0 118 11 9 0 1535 1at09 1 68 53 25 33 1 0 0 1806 1at12 1 49 49 20 43 0 0 0 1617 1at16 1 1 1 1 80 8 1 0 928 1at18 1 24 55 32 161 0 0 0 2729 1at19 1 52 57 62 117 0 0 0 28810 1at20 1 1 1 2 82 15 40 0 141
#_____________ AlmondFlowers.R ________________options(digits=2)library(MASS);library(nlme)
FlT
8/8/2019 A Rag Aw 2
95/106
9.2. PENALISED QUASI-LIKELIHOOD METHODS 91
plot(tree.distribution$y~tree.distribution$x,main="distribution of random tree effects",type=l,xlab="Tree effect on logistic scale",las=1,ylab="distribution function")
rug(side=1,x=Tree.effects$effect)
Figure 9.1: Distribution of tree effects for ower stages
6 4 2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
distribution of random tree effects
Tree effect on logistic scale
d i s
t r i b
u t i o
n f
u n c t i o n
Negative tree effects are associated with late owering. From the model we observe that when
logit( p) is reduced, this is a shift to the left of the distribution function and a shift to the left isto the early stages of owering. That is on August 26th, those trees found to have negative treeeffects will not be highly represented in the later owering stages; they have not developed tooearly.
(The nal analysis of these data used the bayespolr function in arm but the model givenhere serves as an example.)
8/8/2019 A Rag Aw 2
96/106
8/8/2019 A Rag Aw 2
97/106
9.3. GEE 93
Estimators of and .
Pearson residuals are given by
r it = {yit (it )}/ { (it )}12
where it depends upon the current value for . The scale parameter is estimated by
1 =m
i=1
n i
t =1r 2it / (N p) (N =
i
n i , p = regression df) .
An empirical estimate of R is
Ruv =m
i=1r iu r iv / (N p)
Suppose that we wish to model the correlation structure as a function of the parameters and the distance between observations,
corr( yit , yit ) = |tt |
.K-Y Liang Since E (r it r it ) |tt |, can be estimated as the slope in the regression of log( r it r it )on log |t t |.Hence we see the algorithm iterates between estimating ( | , ) and ( phi, | )
8/8/2019 A Rag Aw 2
98/106
94 CHAPTER 9. GENERALISED LINEAR MIXED MODELS
ExampleThe data in PastureSpecies.txt contains percentages of 6 groups of pastures measured at
7 times from 3 farming systems (A, B, C). Within each farm type, there are several paddockswith a Farm Paddock being an experimental unit.These are analysed by a binomial GEE using geeglm in the geepack package.
Table 9.2: First 12 rows of PastureSpecies.txtFarm Padd Days Group Frequency ExpUnit Total fDays
1 B 1 0 SowFertResp 54 B.1 100 02 B 1 0 NatCoolPeren 4 B.1 100 03 B 1 0 NatWarmPeren 0 B.1 100 04 B 1 0 YrlGrnPer 26 B.1 100 05 B 1 0 WarmAnn 1 B.1 100 06 B 1 0 CoolAnn 9 B.1 100 07 B 1 256 SowFertResp 42 B.1 100 2568 B 1 256 NatCoolPeren 21 B.1 100 2569 B 1 256 NatWarmPeren 3 B.1 100 25610 B 1 256 YrlGrnPer 23 B.1 100 25611 B 1 256 WarmAnn 0 B.1 100 25612 B 1 256 CoolAnn 8 B.1 100 256
The analysis accounts for 2 sources of correlation,
correlation amongst groups and correlation amongst repeated measuresDene
Rg() as the amongst groups correlation matrix and R t as the correlation due to repeated measures from the same experiment unit.These are estimated separately and the working correlation is RB = R tRg .The program below does the job in 3 stages:-
1. Run a GLM on the univariate frequency of each group at each sampling and save theresiduals. For each sampling, the residuals are saved in a matrix with 6 columns ( 1 foreach group) which is used to calculate the correlation matrix. There are 7 such correlationmatrices, one for each sampling, and in this exercise the average of the 7 is used as the
estimate of Rg . This may not be a suitable choice as the correlation could change withtime. This working correlation matrix is named workcorr1 .
2. Using workcorr1 , a GEE analysis is done of the multivariate frequencies from the 6 groupsat each sampling. residuals are saved and used to construct the working correlation foramongst samplings, R t which is named workcorr2 in the program.
3. The big working correlation matrix, RB , is calculated as above. ( WORKCORR
8/8/2019 A Rag Aw 2
99/106
8/8/2019 A Rag Aw 2
100/106
8/8/2019 A Rag Aw 2
101/106
Bibliography
[1] D.R. Cox and N. Reid. Parameter orthogonality and approximate conditional inference. J.Royal Statist. Soc, B , 49:139, 1987.
[2] P. Diggle, K-Y Liang, and S. Zeger. Analysis of Longitudinal Data . Oxford University Press,
1994.[3] Andrew Gelman and Jennifer Hill. Data Analysis Using Regression and Multi-
level/Hierarchial Models . Cambridge University Press, 2007.
[4] A. Gilmour, B. Gogel, B. Cullis, S. Welham, and R. Thompson. Asreml user guide. Technicalreport, VSN International Ltd, Hemel Hempstead, UK, 2002.
[5] K-Y Liang and S Zeger. Longitudinal data analysis using generalized linear models.Biometrika , 73:1322, 1986.
[6] H.D. Patterson and R. Thompson. Recovery of inter-block information when block sizes areunequal. Biometrika , 58:545, 1971.
[7] J. Pinheiro and D. Bates. Mixed Effects Models in S and S-PLUS. Springer, NY, 2000.
[8] G. Robinson. That blup is a good thing: The estimation of random effects. Statistical Science , 6:1551, 1991.
[9] David Ruppert, Matt Wand, and Ray Carroll. Semi-parametric Regression . CambridgeUniversity Press, 2004.
[10] R. Schall. Estimation in generalized linear models with random effects. Biometrika , 78:719727, 1991.
[11] T. Speed. Comment on robinson (1991). Statistical Science , 6:43, 1991.
[12] W.N. Venables and B.D. Ripley. Modern Applied Statistics with S . Springer, 2004.
[13] A. Verbyla. A conditional derivation of residual maximum likelihood. Australian Journal of Statistics , 32:227230, 1990.
97
8/8/2019 A Rag Aw 2
102/106
8/8/2019 A Rag Aw 2
103/106
Chapter 10
Appendix - Linear least squares
10.1 Least squares in matrix formThe linear regression relationship between a dependent variable Y and a set of independent(explanatory) variables x j , j = 1 . . . p can be expressed in matrix form as
Y = X + , N (0, 2I ) (10.1)
where Y is a vector of length n, X is a matrix of dimensions n p and is a vector of length p. For example, for a straight line E (yi ) = 1 + 2x i where E represents expected value. Inmatrix form this becomes
E (y1)...E (yn )
=1 x1... ...1 xn
1 2
The residual sum of squares (SS) is given by
S r = ( Y X )T (Y X ). (10.2)The least squares estimate of (the vector of model parameters) i
Recommended