ByoengseonSeo_JournalOfEconometrics_2007

Journal of Econometrics 137 (2007) 68111

increases. The simulation results indicate that the relative power of the t-statistics based on the MLE

has been considered important in the recent development of time series econometrics.Many statistical methods have been developed for the analysis of the cointegrated systems,

ARTICLE IN PRESS

www.elsevier.com/locate/jeconom

0304-4076/$ - see front matter r 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.jeconom.2006.03.008

Tel.: +82 2 820 0552; fax: +82 2 824 4384.

E-mail address: [email protected] signicantly as the GARCH effect increases.

r 2006 Elsevier B.V. All rights reserved.

JEL classification: C13; C32

Keywords: Cointegrating vector; Efciency gain; Multivariate GARCH

1. Introduction

The notion of cointegration was developed by Engle and Granger (1987), and since thenwith conditional heteroskedasticity

Byeongseon Seo

Department of Economics, Soongsil University and Texas A&M University, Seoul 156-743, Korea

Available online 12 May 2006

Abstract

This paper explores the asymptotic distribution of the cointegrating vector estimator in error

correction models with conditionally heteroskedastic errors. Asymptotic properties of the maximum

likelihood estimator (MLE) of the cointegrating vector, which estimates the cointegrating vector and

the multivariate GARCH process jointly, are provided. The MLE of the cointegrating vector follows

mixture normal, and its asymptotic distribution depends on the conditional heteroskedasticity and

the kurtosis of standardized innovations. The reduced rank regression (RRR) estimator and the

regression-based cointegrating vector estimators do not consider conditional heteroskedasticity, and

thus the efciency gain of the MLE emerges as the magnitude of conditional heteroskedasticityAsymptotic distribution of the cointegratingvector estimator in error correction models

ARTICLE IN PRESSand several methods of estimating the cointegrating vector have been proposed. Anotherdevelopment, generalized autoregressive conditional heteroskedasticity (GARCH), wasmade by Engle (1982) and Bollerslev (1986) to explain the time-varying volatility in thedata. This paper explores the asymptotic properties of the maximum likelihood estimator(MLE) of the cointegrating vector in the vector error correction model with conditionalheteroskedasticity. Because the existing estimation methods do not consider conditionalheteroskedasticity in the data, this study is useful and required.The main objective is to develop the asymptotic properties of the MLE of the

cointegrating vector, which estimates the error correction model and the multivariateGARCH process jointly. The existing estimation methods, including the reduced rankregression (RRR) and the regression-based estimators, allow for, but do not treat explicitlyconditional heteroskedasticity. Their asymptotic distributions are invariant to conditionalheteroskedasticity. However, these estimators ignore the information coming fromconditional heteroskedasticity. Many authors, including Bollerslev et al. (1992), showthat economic variables such as stock prices and exchange rates have time-varyingvariances. The clustered volatility and thick-tailed distribution are typical characteristics ofthese variables. Although there is vast literature on the cointegrating vector and GARCH,the literature on the distribution theory for the cointegrating vector estimator withconditionally heteroskedastic errors is still sparse. This paper lls this gap in the literatureby developing an asymptotic theory for the cointegrating vector estimator in errorcorrection models with conditional heteroskedasticity.In this paper, we nd that the MLE of the cointegrating vector follows mixture normal,

and its asymptotic distribution depends on the conditional heteroskedasticity and thekurtosis of standardized errors. The RRR and the regression-based cointegrating vectorestimators do not consider conditional heteroskedasticity in the data, and thus the MLEimproves efciency signicantly. Statistical inference on the cointegrating vector alsodepends on heteroskedasticity. The simulation study reveals that the efciency gain of theMLE emerges signicantly as the GARCH effect increases.The limiting distribution of the cointegrating vector estimator with heteroskedastic

errors has been explored by Li et al. (2001) and Seo (2001). Li et al. (2001) investigated thelimiting distribution of the cointegrating vector estimator in the partially nonstationaryvector autoregressive model with ARCH(1) errors. We consider the multivariate GARCHerrors, which is a natural extension considering the stylized facts of the real data. Thedistribution theory of the cointegrating vector estimator, found by Li et al. (2001), dependson two correlated Brownian motions, which implies nonstandard asymptotic distribution.In this paper, we show that the MLE of the cointegrating vector follows the mixed normaldistribution, and provide an explicit analysis of efciency gain. This study also extends Seo(2001) by allowing for multiple cointegration rank.There are other related papers by Wong and Li (1997), Ling and McAleer (2003), Ling

and Li (1998, 2003), and Seo (1999). Wong and Li (1997) and Ling and McAleer (2003)consider the vector autoregressive model with the GARCH errors, but they do notconsider nonstationarity and cointegration. Ling and Li (1998, 2003) and Seo (1999)explore the asymptotic theory for unit root tests with conditional heteroskedasticity. Here,we consider the cointegrating vector, and thus extend the former results to thenonstationary cointegrated models.We denote !p as convergence in probability, !d as convergence in distribution,

B. Seo / Journal of Econometrics 137 (2007) 68111 69respectively, and ) as weak convergence with respect to the uniform metric. BMO

represents a Brownian motion with long-run variance O. Also, is the integer operator,

ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 6811170j j is the Euclidean norm, and vec is the column-stacking operator.The paper is organized as follows. Section 2 introduces the model and the co-

integrating vector estimators. Section 3 develops the asymptotic theory for thecointegrating vector estimators. The error correction model with an intercept is analyzedin Section 4. Section 5 deals with simulation results on the properties of the cointegratingvector estimators.

2. The model

Consider a p-dimensional time series xt generated by the error correction model (ECM)as follows:

Dxt aIr

b

!0xt1

Xli1

GiDxti ut, (1)

where a is the p r adjustment vector, and b is the p r r cointegrating vector.We assume that the cointegration rank is known and equals r. Thus, if we denote Eq. (1)

as PLxt ut, then the rank of P P1 is r. We use the normalization of thecointegrating vector with respect to the rst r elements of xt. According to ournormalization, the cointegrating relationship wt is dened as follows:

wtb x1t b0x2t, (2)where x1t is r-dimensional and x2t is p r-dimensional.As dened in Engle and Granger (1987), the cointegrating relationship is stationary. Our

model is based on the normalization (2). The cointegrating vector can be identied fromthis representation. The same normalization has been used in many studies such as Phillips(1991).The error process ut is assumed to be a vector-valued Martingale difference sequence

(MDS) satisfying EutjFt1 0 and Eutu0tjFt1 Ot, whereFt is the s-eld generatedby xti for i 0; 1; 2; . . . . Thus, our model allows for the time-varying conditionalvariance, which generalizes the error condition of Engle and Granger (1987) and Johansen(1988, 1991).Many models of multivariate conditional heteroskedasticity have been developed to

explain time-varying covariance, common persistence, and volatility causality. Bollerslevet al. (1988) proposed vector GARCH and diagonal GARCH models. Each element ofcovariance follows the GARCH process, and thus we need to estimate a huge number ofparameters.1 Bollerslev (1990) proposed a multivariate GARCH model with constantconditional correlation. This model reduces the number of parameters to a manageablesize and it satises positive deniteness, and so the model has been used in many empiricalstudies.Our model is based on the constant-correlation GARCH specication, which has been

proposed by Bollerslev (1990).

Ot L1StL10, (3)

1For example, if p 3, the vector GARCH model has 78 parameters and the diagonal GARCH model has 18

parameters even though we assume a minimal lag order.

ARTICLE IN PRESSwhere L is a lower triangular matrix and St is a diagonal matrix as follows:

St

s21t 0 0 00 s22t 0 00 0 s23t 0... ..

. ... ..

. ...

0 0 0 s2pt

0BBBBBBBB@

1CCCCCCCCA

; L

1 0 0 0l21 1 0 0l31 l32 1 0... ..

. ... ..

. ...

lp1 lp2 lp3 1

0BBBBBBB@

1CCCCCCCA.

Dene et Lut, where et is an orthogonalized innovation of ut, satisfying EetjFt1 0 and Eete0tjFt1 St. We assume that each element of et follows the GARCH processas follows:

s2jt oj cje2jt1 fjs2jt1, (4)where oj40, cjX0, and fjX0 for j 1; 2; . . . ; p.We note that our model is the vector error correction model with multivariate GARCH

errors. The RRR estimator is based on the mean equation, but the MLE estimates themean and volatility equations jointly. We use the multivariate GARCH model withconstant correlation coefcient, and our analysis can be extended to other specicationssuch as the factor GARCH and the asymmetric GARCH models.If we denote Xt1b as the vector of stationary regressors and U as its coefcient matrix,

then the mean equation (1) can be written as follows:

Dxt UXt1b ut, (5)where Xt1b w0t1b;Dx0t1; . . . ;Dx0tl0 and U a;G1;G2; . . . ;Gl.We dene the parameter vector y b0; y020, where b vecb, y2 vecU0; g0; l00,

g g01; g02; . . . ; g0p0, gj oj ;cj ;fj0 for j 1; 2; . . . ; p, and l l21; l31; l32; . . . ; lpp10.Let y0 be the true parameter value. We denote ut uty0, et ety0, and St Sty0.

Dene the parameter space Y as y 2 Y Rk, where k r2p r ppl p 1=2 3.Let S ESto1 be the unconditional variance of the orthogonalized errors et, which

requires fj cjo1 for all j 1; 2; . . . ; p. Thus, the volatility process is stationary, whichimplies a moving average representation.We dene the following:

s2jt oj

1 fj cj

X1k0

fkj e2jtk1,

for j 1; 2; . . . ; p.The process s2jt follows the law of motion (4) with innite past history. However, based

on a sample of fx1;x2; . . . ; xng, the volatility process s2jt cannot be observed by aneconometrician.The volatility process (4), given the startup condition s2j0 oj=1 fj, has a moving

average representation in the form

s2jt oj

1 fj cj

Xt1k0

fkj e2jtk1,

B. Seo / Journal of Econometrics 137 (2007) 68111 71for t 1; 2; 3; . . . ; n and j 1; 2; . . . ; p.

ARTICLE IN PRESSThe distribution theory of the GARCH process has been based on the unobservedvolatility representation because the nite horizon representation is not stationary. Asdiscussed in Lee and Hansen (1994) and Lumsdaine (1996), the initial conditions areasymptotically negligible, and the distribution theory using the nite horizon representa-tion is asymptotically equivalent to that of the innite horizon representation given someregularity conditions. This paper develops the distribution theory in accordance with theasymptotic equivalence of these two volatility representations.The log-likelihood function, with the auxiliary condition that utjFt1N0;Ot, is given

by

Lny n1Xnt1

lty, (6)

where

lty 0:5 log jOtyj 0:5u0tyO1t yuty 0:5 log jStyj 0:5e0tyS1t yety

0:5Xpj1

log s2jty e2jtys2jty

!,

where s2jty and ejty satisfy Eqs. (1)(4).The MLE y^n can be dened as follows:

y^n argmaxy2YLny. (7)

We use the following derivatives:

qltyqb

Xpj1

A0j x2t1ejtys2jty

hjLx2t2ejt1yZjtys2jty

" #,

q2ltyqb qb

0

Xpj1

A0jAj x2t1x02t1s2jty

2 hjLx2t2ejt1yhjLx02t2ejt1y

s4jty1 2Zjty

"

2 hjLx2t2ejt1yx02t1ejty

s4jty 2 x2t1ejtyhjLx

02t2ejt1y

s4jty

hjLx2t2x02t2Zjty

s2jty

#,

where Zjty e2jty=s2jty 1, hjLx2t2ejt1y cjPt1

k0 fkj x2tk2ejtk1y, and Aj is

the jth row of A La for j 1; 2; . . . ; p.Because the MLE y^n maximizes the likelihood function, we get

Pnt1 qlty^n=qy 0. In

our model, the conditional variance depends on the cointegrating vector, and thus the rst-order condition accompanies the volatility adjustment. The Hessian matrix and the outer

B. Seo / Journal of Econometrics 137 (2007) 6811172product of gradients also entail the volatility adjustment.

ARTICLE IN PRESSThe likelihood function depends on a number of parameters, and redundant parametersmay lead to the singularity error. In particular, the Hessian matrix tends to be near-singular when the volatility equation is specied with redundant parameters. Thus, it isnecessary to achieve the parsimonious specication by using the associated diagnostictests. In some cases, the factor GARCH model and the conditional error correction modelcan be used to reduce the number of redundant parameters. If the likelihood function isspecied, the computation of the MLE is feasible in any statistical software, which iscapable of operating the maximum likelihood procedure.The RRR estimator is based on the mean equation (5), which can be computed by using

RRR (Ahn and Reinsel, 1988) or canonical analysis (Box and Tiao, 1977). Other slopeparameters can be estimated by least squares once the cointegrating vector is estimated.We denote the RRR estimator as ~bn and other estimators as ~Un. The RRR estimator ~bn issuper-consistent, and thus its estimates can be used as the initial values for an algorithm tomaximize the likelihood function.

3. Main results

If the cointegration rank is known and equals r, then there exist p r full column rankmatrices a and bn satisfying P abn0 . Let a? and bn? be p p r full column rankmatrices such that a0?a 0 and bn

0?b

n 0. From the representation theorem by Engle andGranger (1987), the error correction model (1) has the following representation:

Dxt CLut, 8

xt C1Xti1

ui FLut, 9

wt bn0xt bn

0FLut, 10

where C1 bn?a0?Pn1bn?1a0?, PnL PL P1=1 L, and FL CLC1=1 L.The ECM representation holds if Ejutj2o1 and

Pk1 kjCkjo1. Thus, xt involves

stochastic trends and a stationary component. Because the null space of C1 is spanned bythe cointegration space, bn

0C1 0 and C1a 0. The cointegration vector eliminates

the stochastic trends; hence, the cointegrating relationship wt is stationary. We denoteC21 as a partitioned matrix of C1 which corresponds to x2t, hence its dimension isp r p.Dene the standardized innovations t S1=2t et. That is, jt ejt=sjt for j 1; 2; . . . ; p.

We assume the following conditions.

Assumption 1. (a) jti:i:d:0; 1, E6jto1, and jt has a continuous and symmetric densityfor j 1; 2; . . . ; p.(b) Ejutjm1o1 for some m142.(c)P1

k1 k2jCkjo1, where CL

P1k0CkL

k and Dxt CLut.(d) ojXoj40 for j 1; 2; . . . ; p.(e) Y is compact.(f) For some m14m242, fwt1=s2jt;Dxti=s2jt;wtk1e2jtk=s4jt;Dxtk1e2jtk=s4jt; i 1;

2; . . . ; l; j 1; 2; . . . ; p; kX1g is a zero mean, strictly stationary, and strong mixing

B. Seo / Journal of Econometrics 137 (2007) 68111 73process with mixing coefcient ak Okc such that c4m1m2=m1 m2.

n1=2 w ) bn0F1Us, 13

ARTICLE IN PRESSt1t

where O EOt.

3.1. Stochastic equicontinuity

The asymptotic theory of the cointegrating vector estimator involves the tightness of theHessian matrix and the parameter restriction, which can be veried if consistency holds. AsSaikkonen (1993, 1995) has shown, the asymptotic distribution and consistency of thecointegrating vector estimator in nonstationary cointegrated models cannot be achieved bythe standard tightness condition, which has been used in the model with stationaryvariables such as Andrews (1987) and Newey (1991). The cointegrated systems involvenonstationary variables with unbounded variance. Besides, the convergence rate of thecointegrating vector estimator is different from that of short-run parameters. Thus, anappropriate tightness condition is necessary to show the distribution theory.We dene a diagonal matrix Dn diagD1n;D2n, where D1n diagn; n; . . . ; n and

D2n diagn

p;n

p; . . . ;

n

p correspond to the parameter vectors b and y2, respectively.The gradient vector, the Hessian matrix, and the outer product of gradients can be denedas follows:

Gny D1nXnt1

qltyqy

,

Hny D1nXn q2lty

qyqy0D1n ,Assumptions 1(a) and 1(b) imply that fs2jt; ejtg is strictly stationary and b-mixing (orabsolutely regular) with exponential decay for j 1; 2; . . . ; p as shown by Carrasco andChen (2002) and He and Terasvirta (1999). In addition, the volatility process fs2jtg is weaklystationary from Assumption 1(b), which justies the moving average representation offs2jtg. Assumptions 1(b) and 1(c) imply that fDxt;wtg is squarely integrable. The volatilityprocesses are strictly positive from Assumption 1(d). Assumption 1(e) implies that theparameter space is bounded. The volatility parameters are bounded from Assumption 1(b).Assumption 1(f) can be veried by assuming the smooth density condition because theECM representations (8) and (10) imply that the process fwt;Dxtg is stationary and satisesthe sufcient conditions for strong mixing suggested in Chanda (1974) and Gorodetskii(1977).The multivariate invariance principle of Phillips and Durlauf (1986) implies the

following:

Lemma 1. Under Assumption 1,

n1=2Xnst1

ut ) Us BMO, 11

n1=2x2ns ) C21Us, 12Xns

B. Seo / Journal of Econometrics 137 (2007) 6811174t1

ARTICLE IN PRESSand

Pny D1nXnt1

qltyqy

qltyqy0

D1n .

Denition 1 (Stochastic equicontinuity). Xny is stochastically equicontinuous on Yd if,for every 40 and Z40, there exists N; Z such that nXN; Z implies, for all d40,

P supy2Yd

jXny Xny0j4 !

pZ,

where Yd fy 2 Yj jDny y0jpdg and Dny n

pb0; y020.

Our denition is the tightness condition of Saikkonen (1993, 1995), and it is based on thenormalized parameter space to allow for difference in the convergence rates. We denoteXny Xny0 op1 if Xny is stochastically equicontinuous.Lemma 2. (1) Under Assumption 1, Lny Lny0 op1.(2) Assumption 1 implies

limn

Pn;y0 supy2NdLny Lny0o0

( ) 1,

for every d40, where Nd fy 2 Yj jn

p b b0j4dg.Therefore, limn Pn;y0fy^n 2 Ndg 1 and

n

p ^bn b0!p0, where Nd fy 2 Yj

j np b b0jpdg.(3) If Assumption 1 holds and kutk6o1, then Hny Hny0 op1.Lemma 2(1) shows that the likelihood function is stochastically equicontinuous on the

local neighborhood of the true parameter value. As the parameter values deviate fartherfrom the local neighborhood, the integrated regressors amplify the squared errors, whichlowers the likelihood function sharply. However, the volatility increases at the same time,which moderates the decline in the likelihood function.The MLE y^n exists because the likelihood function is continuous and the parameter

space is compact. The consistency of the MLE b^n in Lemma 2(2) is based on the sufcientcondition for consistency, which has been used in Wu (1981) and Saikkonen (1995). Thestandard theory of consistency does not apply as the model involves the different rates ofconvergence. However, this condition holds under Assumption 1, and hence the MLE b^n isconsistent.The consistency of the short-run parameters can be based on the consistency of the long-

run parameters. Given the convergence rate of b^n, the analysis of the ECM reduces to thatof the stationary VAR. The standard theory of consistency such as Ling and McAleer(2003) can be applied to show the consistency of the short-run parameters.Lemma 2(3) shows that the tightness condition of the Hessian matrix can be satised

under the moment condition kutk6o1. Lee and Hansen (1994) and Lumsdaine (1996)derived the stochastic equicontinuity of the Hessian matrix in the GARCH model. Inparticular, Lee and Hansen (1994) have shown that ktk4o1 is sufcient for stochasticequicontinuity. However, this result cannot be applied to our analysis as our model allows

B. Seo / Journal of Econometrics 137 (2007) 68111 75nonstationary regressors in the mean equation.

For example, Ling and McAleer (2003) developed the distribution theory for the

ARTICLE IN PRESSvector ARMA-GARCH model under the moment condition kutk6o1. Li et al. (2001)developed the limiting distribution of the cointegrating vector estimator with ARCH(1)process under the condition kutk4o1. However, it is not easy to compare the momentconditions directly because their distribution theory is based on the nite-dimensionalconvergence.The condition of bounded moments may well be treated as the sufcient condition for

the main results, and it may not be a crucial burden in a practical sense. However, as notedby Lumsdaine (1996), the moment condition restricts the parameter space severely, andhence the estimated parameter values in empirical studies often fail to satisfy even thefourth moment condition.As shown by Carrasco and Chen (2002), the moment condition kutk2mo1 can be

implied by ktk2mo1 and Ejfj cj2jtjmo1 for an integer mX1 and for all j 1; 2; . . . ; p.As mentioned before, the moment condition restricts the parameter space seriously whenwe set m 2 or 3. Therefore, we assume stochastic equicontinuity of the Hessian matrixdirectly and explore the asymptotic distribution of the cointegrating vector estimatorunder the minimal restriction on the parameter space.

Assumption 2. supy2Yd jHny Hny0j!p0.


H12ny0!p0,

where H12ny n3=2Pn

t1 q2lty=qbqy02.

Therefore, under Assumptions 12,

n ^b b0 hny01gny0 op1, (14)where

gny n1Xnt1

qltyqb

,

and

hny n2Xnt1

q2ltyqb qb

0 .

3.2. Asymptotic distribution

Let zjt ejt=s2jt and qjt;k ejtkZjt=s2jt for kX1 and j 1; 2; . . . ; p. Because ejt, zjt,and qjt;k are Martingale difference sequences for all j 1; 2; . . . ; p and kX1,Because the likelihood function and the Hessian matrix of our model containvolatility adjustments, the asymptotic theory of the MLE depends on the heavy momentcondition. In particular, the distribution theory requires the tightness of the Hessianmatrix, which can be justied when the errors ut satisfy strong moment conditions.

B. Seo / Journal of Econometrics 137 (2007) 6811176Assumptions 1(a) and 1(b) imply the following from the invariance principle of Phillips

ARTICLE IN PRESSand Durlauf (1986).

n1=2Pns

t1 ejt

n1=2Pns

t1 zjt

n1=2Pns

t1 qjt;k

0BB@

1CCA )

EjsZjsQj;ks

0B@

1CA BM

s2j 1 0

1 1=B2j 0

0 0 kj 1x2j;k

0BB@

1CCA, (15)

where s2j Es2jt, 1=B2j E1=s2jt, x2j;k Ee2jtk=s4jt, and kj E4jt.We note that Zjs and Qj;ks are independent for each j 1; 2; . . . ; p and kX1. Also,

Zjs is independent of Qj;ks for each j 1; 2; . . . ; p and kX1.LetQjs

P1k1 hj;kQj;ksBMkj 1Xj, where hj;k cjfk1j and Xj

P1k1 h

2j;kx

2j;k,

which is nite because hj;k decays exponentially and x2j;k is nite for all j and k. We denote Es,

Zs, and Qs as p-dimensional vectors of Ejs, Zjs, and Qjs, respectively. Note thatEs LUs.Dene W 1s and W 2s as follows:

W 1sW 2s

!

A0Zs QsC21Us

! BM

m 0

0 C21OC021

!,

where m A0S1=2MS1=2A, S diags21;s22; . . . ;s2p, M is a diagonal matrix with theelement s2j =B

2j kj 1Hj, and Hj

P1k1 h

2j;kEs2j e2jtk=s4jt for j 1; 2; . . . ; p.

We can show that W 1s and W 2s are mutually independent by using the ECMrepresentation theorem and Eq. (15). We also dene n A0S1=2NS1=2A, where N is adiagonal matrix with the element s2j =B

2j 2Hj for j 1; 2; . . . ; p. If r 1, then m Pp

j1 A2j =s2j s2j =B2j kj 1Hj and n Pp

j1 A2j =s2j s2j =B2j 2Hj.


gny0 )Z 10

dW 1s W 2s, 16

hny0 ) nZ 10

W 2sW 02sds

, 17

and

pny0 ) mZ 10

W 2sW 02sds

, (18)

where pny n2Pn

t1 qlty=qbqlty=qb0.

If jt is Gaussian for each j, then kj 3 and m n, which implies that the negativeHessian and the outer product have the same distribution. However, if the distribution ofjt is not normal for some j, then the variance of score does not coincide with that based onthe Hessian matrix, as discussed in White (1982). Thus, statistical inference on thecointegrating vector depends on the covariance estimation.To derive the asymptotic distribution of the RRR estimator ~bn, we dene W 1rs

B. Seo / Journal of Econometrics 137 (2007) 68111 77A0S1Es BMmr, where mr A0S1A. We note that W 1rs is independent of W 2s.

ARTICLE IN PRESSTheorem 1. Under Assumptions 12,

nb^n b0 )Z 10

W 2W02

1 Z 10

W 2 dW01n1. (19)

If Assumptions 1(b), (c), and (e) hold,

n ~bn b0 )Z 10

W 2W02

1 Z 10

W 2 dW01rm

1r . (20)

First, we note that the asymptotic distribution of the MLE is a mixture normal with avariance of n1mn1 R 10 W 2W 021. Li et al. (2001) considered the cointegrating vectorestimator in the vector autoregressive model with ARCH(1) errors. The limitingdistribution of the cointegrating vector estimator, found by Li et al. (2001), is a functionalof two correlated Brownian motions, which implies that the cointegrating vector estimatorfollows nonstandard asymptotic distribution. Theorem 1 shows that the MLE of thecointegrating vector has the mixed normal distribution, and therefore the inference on thecointegrating vector can be based on the standard theory.Second, the RRR estimator is also asymptotically distributed as mixture normal with a

variance of m1r R 10 W 2W

021. Because the rst-order condition and the Hessian

matrix of the RRR estimator do not accompany the volatility adjustment, Assumptions1(b), (c), and (e) are sufcient for the limiting distribution of the RRR estimator. Thedistribution theory for the RRR estimator has been explored by Johansen (1988, 1991) andSeo (1998), where white noise errors are assumed. This paper considers conditionalheteroskedastic errors, and we nd that the asymptotic distribution of the RRR estimatoris invariant to conditional heteroskedasticity.Third, if there is no GARCH effect, then m n mr A0S1A because Hj 0 and

s2j =B2j 1 for all j 1; 2; . . . ; p. In this case, the asymptotic distribution of the MLE is the

same as that of the RRR estimator.Fourth, the variance of the MLE depends on the adjustment vector a, correlation matrix

L, kurtosis kj, and the magnitude of the GARCH effect, which can be represented by Hjand s2j =B

2j . As a approaches zero, the cointegrating relationship becomes weaker and the

variance of the MLE increases to innity. In the same way, a weak cointegratingrelationship increases the variance of the RRR estimator.The GARCH effect magnies the unconditional variance of the error term, which leads

to the increase in the variance of the MLE. In our model, the intercept of the volatilityequation is xed. The unconditional variance can be invariant to the GARCH effect whenthe intercept varies depending on the GARCH parameters. However, the MLE uses theinformation of conditional heteroskedasticity, which lowers the variance of the MLE. TheGARCH effect increases Hj and s2j =B

2j for j 1; 2; . . . ; p, which generates the gain in

relative efciency of the MLE. On the other hand, the RRR estimator does not considerthe information of conditional heteroskedasticity. Thus, the variance of the RRRestimator always increases in the GARCH effect.The relative efciency gain of the MLE compared to the RRR estimator can be

measured by g as follows:

g n1mn1mr, (21)

B. Seo / Journal of Econometrics 137 (2007) 6811178where G Varnb^nVarn ~bn1 g I .

The efciency gain g depends on the magnitude of the GARCH effect, the fourthmoment kj , and s2j =B

2j for j 1; 2; . . . ; p. To simplify analysis, we dene the partial

efciency gain gj as follows:

gj s2j =B2j kj 1Hj s2j =B2j 2Hj2

.

As the GARCH effect increases, Hj increases. Thus, gj decreases, and the relativeefciency of the MLE improves. If the fourth moment kj is larger than 3, gj increases andlowers the efciency gain. Thus, the efciency of the MLE can be affected by thespecication error. The efciency gain also depends on Jensens ratio s2j =B

2j because this

ratio increases in the GARCH effect.Fig. 1 shows the partial efciency gain of the MLE of the cointegrating vector in an

error correction model with GARCH errors. The theoretical efciency gain is calculated asthe function of the volatility parameters cj and fj. The standardized innovations jt areassumed to follow the standard normal distribution. If the volatility parameters are notlarge, the efciency improves slowly. However, as the volatility parameters become larger,a signicant amount of efciency gain emerges. The overall efciency gain g can be largerthan the partial efciency gain gj depending on the correlation coefcient and theadjustment coefcient. Fig. 1 is based on the asymptotic theory, but the small sampleperformance will be affected by the estimation error, which increases uncertainty. As the

ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 79Fig. 1. GARCH effect and efciency gain.

and

ARTICLE IN PRESSWPn ) J 0QPJ, (24)0 0sample size increases, the estimation error decreases and the efciency gain approaches thetheoretical values.

3.3. Statistical inference

Suppose we want to test a set of linear restrictions based on the null hypothesis asfollows:

H0 : Rb rq,where R is a q rp r matrix, and rq is the q-dimensional vector.The covariance matrix of the cointegrating vector estimator can be estimated by using

the Hessian matrix and the outer product of the gradient. When the model is correctlyspecied, the negative Hessian matrix is equivalent to the outer product matrix. However,our model cannot be correctly specied, and so in that case we use the robust covarianceestimator. Thus, we may dene three t-statistics or Wald statistics according to thecovariance estimator.The Wald statistics can be dened according to the covariance estimation method as

follows:

Wjn nR ^b rq0RVarjn ^bR01nR ^b rq,where j I using the information matrix, j P using the inverse of the outer productmatrix, and j W using Whites robust covariance estimator.To derive the distribution of the Wald statistics, we dene a q-dimensional random

variable J as follows:

J Q1=2R n1 Z 10

W 2sW 02sds 1" # Z 1

0

dW 1s W 2s

,

where Q Rn1mn1 R 10 W 2sW 02sds1R0.The random variable J follows the standard normal distribution. We dene Qn and Qm

as follows:

Qn R n1 Z 10

W 2sW 02sds 1" #

R0,

Qm R m1 Z 10

W 2sW 02sds 1" #

R0.

Theorem 2. Under the null hypothesis H0 : Rb rq and Assumptions 12,WWn ) J 0J, 22WIn ) J 0QIJ, 23

B. Seo / Journal of Econometrics 137 (2007) 6811180where QI Q1=2 Q1n Q1=2 and QP Q1=2 Q1m Q1=2.

n1=2qt

!L0Z QN0;L0S1=2MS1=2L,

ARTICLE IN PRESSt1

n1Xnt1

q2lty0qt qt0

!p L0S1=2NS1=2L,

n3=2Xnt1

q2lty0qb qt0

) A0S1=2NS1=2LZ 10

W 2,

where Z Z1 and Q Q1 are the vector-valued random variables defined in Section 3.

Dene the demeaned Brownian motion W 2s W 2s RW 2sds.

Theorem 3. Under Assumptions 12,

nb^n b0 )Z 1

W 2W02

1 Z 1W 2 dW

01n1. (25)The Wald statistic based on the robust covariance estimator follows the chi-squareddistribution with q degrees of freedom, where q is the number of restrictions in the nullhypothesis. If we use the information matrix, the Wald statistic follows a chi-squareddistribution up to the scale effect QI . If the cointegration rank r equals 1, m and n becomescalars, and thus QI m=nIq. If the covariance is estimated by the inverse of the outerproduct of gradients, the distribution of the Wald statistic is also chi-squared up to thescale effect QP. By the same token, QP m2=n2Iq if r 1. Therefore, if the covariancematrix is estimated by the information matrix or the outer product of gradients, excesskurtosis tends to amplify the Wald statistics, which may lead to the over-rejection of thenull hypothesis. If the distribution of jt is normal, or if kj 3 for all j, these nuisanceparameters disappear since m n. Thus, the scale effect disappears given normality, ormore generally, kj 3. However, statistical inference using the robust covarianceestimator works properly even without normality.

4. ECM with an intercept

Suppose the nonstationary variable xt contains nonzero mean. It is natural to include anintercept in the vector error correction model as follows:

Dxt t a1

b

!0xt1

Xli1

GiDxti ut.

The error ut follows the multivariate GARCH process as in the model without intercept.The parameter vector is dened as y b0; t0; y020. The likelihood function, score, andHessian matrix can be dened in the same way as before. We denote y^n as the MLE and ~ynas the RRR estimator.We use the following asymptotic results:


Xn qlty0 d

B. Seo / Journal of Econometrics 137 (2007) 68111 810 0

ARTICLE IN PRESSIf Assumptions 1(b), (c), and (e) hold,

n ~bn b0 )Z 10

W 2W02

1 Z 10

W 2 dW01rm

1r . (26)

In the ECM with an intercept, the asymptotic distribution of the MLE is a mixture

normal with a variance of n1mn1 R 10 W 2W 02 1. Also, the RRR estimator isasymptotically distributed as mixture normal with a variance of m1r

R 10 W

2W

02 1.

Thus, the cointegrating vector estimators follow the mixed normal asymptotic distribution.Besides, the efciency gain of the MLE depends on the magnitude of the GARCH effect asin the model without deterministic trends.Our results can be extended to the ECM with the deterministic trends. When the data

generating process xt contains the deterministic trends, we consider the ECM with thecorresponding trend variables. In that case, the asymptotic distribution is based on thedetrended Brownian motions. In addition, we may use the detrended variables to reducethe number of parameters to estimate. The detrended variables remove the deterministictrends, and the asymptotic distribution of the cointegrating vector estimator is based onthe detrended Brownian motions. Therefore, the cointegrating vector estimator follows themixed normal distribution in the ECM with deterministic trends.

5. Simulation evidence

In this section, we examine the nite sample properties of the cointegrating vectorestimators using the Monte Carlo simulation. The experiments are based on a bivariateerror correction model as follows:

Dxt a1

a2

!1

b

!0xt1 ut.

We also assume et Lut and St Eete0tjFt, where

L 1 0

l 1

!; St

s21t 0

0 s22t

!,

s2jt 1 cje2jt1 fjs2jt1,ejt sjtjt and jti:i:d:0; 1 for j 1; 2.

We compare the nite sample performance of the MLE of the cointegrating vector tothat of the RRR, the fully modied (FM), and the OLS estimators. The standard errors ofthe MLE are calculated from the robust covariance estimator. The experiments are basedon a sample size of 250 and 1000 replications. The process jt is generated by the Gaussrandom number generator. The true value of b is set at 1.First, we study the efciency gain of the MLE of the cointegrating vector. Table 1 shows

the root mean squared error (RMSE) and the mean absolute error (MAE) of thecointegrating vector estimators. When there is no conditional heteroskedasticity, the MLEis almost equivalent to the RRR estimator. As the GARCH effect increases, the RMSE

B. Seo / Journal of Econometrics 137 (2007) 6811182and MAE of the MLE decrease while those of the RRR estimator slowly increase. For

ARTICLE IN PRESSexample, at a1; a2;c1;f1;c2;f2; l 1; 0; 0:95; 0; 0:95; 0; 0 the RMSE and MAE of theMLE are 75% and 60% lower than those of the RRR estimator, respectively. The RMSEof the MLE is 30% lower than that of the RRR estimator at a1; a2;c1;f1;c2;f2; l 1; 0; 0:25; 0:7; 0:25; 0:7; 0.Table 1 shows that the impact of the parameter cj on the efciency gain is larger than

that of the parameter fj. If the parameter l is different from 0, the RMSE and the MAEdecrease. Thus, the relative efciency of the MLE improves depending on the parametersin the volatility processes compared to other estimators, which do not considerheteroskedasticity.As Table 1 shows, the FM estimator is less efcient to the MLE and the RRR estimator

Table 1

Efciency gain

a1, a2, c1, f1, c2, f2, l RMSE MAE

MLE RRR FM OLS MLE RRR FM OLS

1; 0; 0; 0; 0; 0; 0 0.0077 0.0079 0.0147 0.0158 0.0050 0.0054 0.0108 0.01061; 0; 0:25; 0; 0:25; 0; 0 0.0069 0.0074 0.0133 0.0146 0.0047 0.0051 0.0098 0.00971; 0; 0:5; 0; 0:5; 0; 0 0.0059 0.0080 0.0138 0.0157 0.0040 0.0053 0.0101 0.01021; 0; 0:75; 0; 0:75; 0; 0 0.0050 0.0088 0.0166 0.0179 0.0033 0.0056 0.0109 0.01081; 0; 0:95; 0; 0:95; 0; 0 0.0044 0.0173 0.0250 0.0219 0.0028 0.0072 0.0133 0.01281; 0; 0:25; 0:2; 0:25; 0:2; 0 0.0074 0.0081 0.0141 0.0159 0.0049 0.0055 0.0105 0.01081; 0; 0:25; 0:45; 0:25; 0:45; 0 0.0074 0.0080 0.0141 0.0156 0.0052 0.0055 0.0104 0.01041; 0; 0:25; 0:7; 0:25; 0:7; 0 0.0066 0.0093 0.0160 0.0177 0.0044 0.0058 0.0111 0.01191; 0; 0:25; 0:7; 0:25; 0:7; 0:5 0.0057 0.0072 0.0123 0.0115 0.0036 0.0045 0.008 0.00721;0:5; 0:25; 0:7; 0:25; 0:7; 0 0.0025 0.0031 0.0057 0.0067 0.0017 0.0021 0.0042 0.00501; 0:5; 0:25; 0:7; 0:25; 0:7; 0 0.0076 0.0093 0.0256 0.0478 0.0052 0.0066 0.0146 0.0297

B. Seo / Journal of Econometrics 137 (2007) 68111 83because it considers neither the short-run dynamics nor conditional heteroskedasticity. TheRMSE and MAE of the FM estimator increase slowly as the volatility parameters increase.Compared to the FM estimator, the OLS estimator does not treat asymptotic bias, and itsRMSE and MAE increase in the volatility parameters.Next, we examine the size performance of the t-statistics for the null hypothesis:

H0 : b 1.

Table 2 shows the descriptive statistics, the percentiles, and the coverage rates of thet-statistics based on the MLE, RRR, FM, and OLS estimators. The coverage rate isdened as PTou0:05 for the lower 5% size and PT4u0:95 for the upper 5% size, where Tis the t-statistic and u is the critical value. The standard errors of the MLE are based on therobust covariance matrix estimator.The descriptive statistics of the t-statistics based on the MLE are close to the properties

of the standard normal distribution. The coverage rates are very close to the true size,and thus statistical inference on the cointegrating vector can be based on the standardtheory.Fig. 2 shows the estimated kernel density of the t-statistics of the cointegrating vector

estimators. The estimated density based on the MLE looks very close to the standard

ARTICLE IN PRESS

Table 2

Size performance of the t-statistics

Descriptive statistics Percentiles Coverage rate

Mean S.D. Skewness Kurtosis 5 50 95 0.05 0.95

a 1 00, c1 0, f1 0, c2 0, f2 0, l 0MLE 0.0484 0.9609 0.0493 3.3621 1.7115 0.0130 1.5387 0.0550 0.0390RRR 0.0616 1.0225 0.0154 2.8984 1.7760 0.0526 1.6079 0.0650 0.0460FM 0.0552 1.0791 0.1021 3.0048 1.7274 0.0436 1.7642 0.0610 0.0550OLS 0.9415 0.8468 0.0786 3.0227 2.3296 0.9376 0.4512 0.2050 0.0000a 1 00, c1 0:25, f1 0, c2 0:25, f2 0, l 0MLE 0.0046 0.9958 0.0746 3.2712 1.6091 0.0390 1.6593 0.0480 0.0530RRR 0.0106 0.9843 0.0114 3.1257 1.6140 0.0274 1.6284 0.0470 0.0500FM 0.0706 1.0138 0.0751 3.1097 1.7647 0.0875 1.6087 0.0600 0.0450OLS 0.8922 0.8530 0.1537 2.8878 2.3297 0.8723 0.4928 0.1890 0.0000a 1 00, c1 0:5, f1 0, c2 0:5, f2 0, l 0MLE 0.0705 0.9754 0.0058 3.2890 1.6957 0.0758 1.5011 0.0540 0.0410RRR 0.0307 0.9609 0.0357 3.2549 1.5389 0.0387 1.5201 0.0460 0.0350FM 0.1096 1.0009 0.0255 2.9200 1.7616 0.1311 1.5035 0.0620 0.0330OLS 0.9032 0.8502 0.1717 3.0220 2.4042 0.8693 0.4231 0.1910 0.0020a 1 00, c1 0:75, f1 0, c2 0:75, f2 0, l 0MLE 0.0417 1.0145 0.1359 3.2862 1.7106 0.0712 1.6367 0.0590 0.0500RRR 0.0514 1.0088 0.1007 3.0358 1.6146 0.0590 1.6125 0.0450 0.0490FM 0.0915 1.0160 0.2077 3.8452 1.7403 0.0772 1.5578 0.0620 0.0420OLS 0.8917 0.8879 0.2214 3.4199 2.3956 0.8632 0.5092 0.1950 0.0030a 1 00, c1 0:95, f1 0, c2 0:95, f2 0, l 0MLE 0.0012 1.0686 0.0244 3.1862 1.7797 0.0422 1.7125 0.0630 0.0570RRR 0.0351 1.0649 0.0283 2.8242 1.7388 0.0268 1.7888 0.0620 0.0640FM 0.1201 1.1283 1.7399 21.8168 1.9119 0.0813 1.6030 0.0660 0.0460OLS 0.9188 0.9947 0.1727 5.6338 2.4226 0.9408 0.7530 0.2170 0.0120a 1 00, c1 0:25, f1 0:2, c2 0:25, f2 0:2, l 0MLE 0.0906 1.0056 0.0208 3.3390 1.8587 0.0429 1.6206 0.0650 0.0490RRR 0.0808 1.0342 0.0058 2.9102 1.8149 0.0878 1.6038 0.0670 0.0460FM 0.0792 1.0507 0.0326 2.9888 1.7867 0.0875 1.6885 0.0630 0.0520OLS 0.9417 0.8527 0.0207 2.7293 2.3466 0.9436 0.4890 0.2160 0.0000a 1 00, c1 0:25, f1 0:45, c2 0:25, f2 0:45, l 0MLE 0.0259 1.0294 0.0217 2.8997 1.6937 0.0458 1.7371 0.0540 0.0590RRR 0.0220 1.0079 0.0116 2.9878 1.6670 0.0185 1.7099 0.0530 0.0590FM 0.0768 1.0339 0.0238 2.8758 1.7891 0.0739 1.6406 0.0650 0.0500OLS 0.8899 0.8357 0.0080 3.0056 2.2193 0.8928 0.4802 0.1910 0.0020a 1 00, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0340 1.0471 0.0032 3.1390 1.7393 0.0372 1.6738 0.0630 0.0540RRR 0.0521 1.0511 0.1534 3.3659 1.7458 0.0793 1.6762 0.0610 0.0540FM 0.1839 1.0140 0.0089 3.2349 1.8452 0.1695 1.5007 0.0730 0.0370OLS 0.9524 0.9199 0.1455 3.2786 2.4333 0.9934 0.5298 0.2150 0.0060a 1 0:50, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0366 1.0451 0.0570 3.0441 1.6961 0.0280 1.7383 0.0570 0.0590RRR 0.0048 1.0213 0.0492 3.0431 1.6389 0.0060 1.6497 0.0490 0.0520FM 0.0077 1.0662 0.0690 3.1926 1.6852 0.0442 1.7712 0.0550 0.0610OLS 0.8649 1.2365 0.5086 3.6378 3.0223 0.7568 0.8945 0.2330 0.0150

B. Seo / Journal of Econometrics 137 (2007) 6811184

ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 85Table 2 (continued )

Descriptive statistics Percentiles Coverage rate

Mean S.D. Skewness Kurtosis 5 50 95 0.05 0.95

a 1 0:50, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0435 1.0626 0.0083 3.0819 1.7816 0.0561 1.8055 0.0660 0.0660RRR 0.0110 1.0519 0.0424 2.7972 1.7365 0.0084 1.7046 0.0610 0.0570FM 0.2559 1.0325 0.2624 4.6975 2.0407 0.2643 1.2801 0.0810 0.0300normal distribution for most values of the GARCH parameters. Also, the t-statistics basedon the RRR and FM estimators can be closely approximated by the normal distribution.However, as Fig. 2 shows, the OLS estimator reveals a large amount of size distortion andasymmetry.Next, we investigate the small sample properties on the power of the t-statistics by using

the local alternative hypothesis:

Hn : bn 1d

n.

OLS 1.4933 1.0235 0.7144 3.9089 3.3739 1.3836 0.0030 0.3950 0.0000

Fig. 2. Kernel density estimation.

If d 0, then the null hypothesis holds. As the local alternative parameter d varies, thenull hypothesis is no longer valid, and the t-statistics tend to reject the null hypothesis.Table 3 shows the frequency of rejecting the null hypothesis at d 1, 2, 3, 4, 5. At the localalternative d 3, the MLE rejects 65%, and the RRR estimator rejects 66% of the nullhypothesis at the 5% size if there is no conditional heteroskedasticity. Ata1; a2;c1;f1;c2;f2; l 1; 0; 0:95; 0; 0:95; 0; 0 and d 3, the MLE rejects 92%, andthe RRR estimator rejects 65% of the null hypothesis at the 5% size. Thus, the relativepower of the t-statistic based on the MLE improves as the volatility parameters increase.On the other hand, the power of the t-statistic based on the RRR or the FM estimator isinvariant to conditional heteroskedasticity.As Table 3 shows, the impact of the parameter cj on the power is greater than that of the

parameter fj. The moving average representation of a GARCH(1,1) process has theexponentially decaying coefcient of fj multiplied by the parameter cj at each lag of ejt.Besides, the power of the tests based on the MLE improves as the correlation parameter lis different from 0.

ARTICLE IN PRESS

Table 3

Power of the t-statistics

a1, a2, c1, f1, c2, f2, l d

1 2 3 4 5

1; 0; 0; 0; 0; 0; 0 MLE 0.2050 0.4700 0.6450 0.7790 0.8520RRR 0.2140 0.4920 0.6610 0.7910 0.8540

FM 0.1190 0.2320 0.3340 0.4990 0.6130

1; 0; 0:25; 0; 0:25; 0; 0 MLE 0.2660 0.5010 0.6860 0.8040 0.8850RRR 0.2360 0.4730 0.6510 0.7830 0.8670

FM 0.1190 0.2140 0.3180 0.4540 0.6040

1; 0; 0:5; 0; 0:5; 0; 0 MLE 0.3000 0.5830 0.7510 0.8330 0.9080RRR 0.2340 0.4740 0.6590 0.7660 0.8520

B. Seo / Journal of Econometrics 137 (2007) 6811186FM 0.1190 0.2320 0.3690 0.4770 0.6080

1; 0; 0:75; 0; 0:75; 0; 0 MLE 0.3790 0.7060 0.8420 0.9240 0.9520RRR 0.2390 0.4870 0.6370 0.8000 0.8360

FM 0.1280 0.2390 0.3510 0.5070 0.5900

1; 0; 0:95; 0; 0:95; 0; 0 MLE 0.5280 0.7810 0.9160 0.9610 0.9800RRR 0.2720 0.4970 0.6500 0.7510 0.8020

FM 0.1400 0.2610 0.3790 0.4880 0.5450

1; 0; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.3180 0.5460 0.7430 0.8360 0.8950RRR 0.2360 0.4210 0.6600 0.7580 0.8140

FM 0.1210 0.2320 0.3680 0.4900 0.5800

1; 0; 0:25; 0:7; 0:25; 0:7; 0:5 MLE 0.3970 0.6620 0.8100 0.9050 0.9520RRR 0.3090 0.5550 0.7360 0.8370 0.9200

FM 0.1430 0.3090 0.4530 0.6170 0.7130

1;0:5; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.6530 0.8880 0.9650 0.9910 0.9930RRR 0.5490 0.8380 0.9390 0.9860 0.9810

FM 0.2760 0.5740 0.7760 0.9030 0.9420

1; 0:5; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.2790 0.4940 0.6930 0.7980 0.8540RRR 0.1810 0.4010 0.5880 0.7010 0.7910

FM 0.1310 0.2020 0.3100 0.3980 0.5310

ARTICLE IN PRESSMany empirical studies have shown that the conditional variances of nancial variablesreveal common persistence and volatility causality. Therefore, the efciency gain andpowerful inference on the cointegrating vector can be obtained when we use theinformation of conditional heteroskedasticity. The simulation evidence indicates thepotential gain of the information contained in the volatility process.

6. Concluding remarks

In this paper, we nd that the asymptotic distribution of the MLE of the cointegratingvector depends on the conditional heteroskedasticity. This fact implies that the efciencyof the MLE can be improved as the data contains conditional heteroskedasticity. Althoughthe RRR estimator and the regression-based estimators allow for conditional hetero-skedasticity, they do not consider the information coming from conditional hetero-skedasticity. As a result, the power of statistical inference on the cointegrating vectorimproves if we use the information of conditional heteroskedasticity.The conventional methods of estimating the cointegrating vector are based on the mean

equation. Because the OLS and GLS estimators are asymptotically equivalent innonstationary cointegrated models, the volatility equation has been treated lessimportantly. However, Amemiya (1973) has shown that the MLE improves the efciencyof estimators if the heteroskedasticity depends on the parameter of the mean equation inthe linear model with stationary variables. Therefore, this paper extends Amemiyas resultto the nonstationary cointegrated model with conditionally heteroskedastic errors.As many studies have shown, the nancial variables have time-varying variances and the

GARCH model has been widely used to estimate volatility. There exist many otherspecications which are capable of explaining conditional heteroskedasticity. Although weconsider a multivariate GARCH model with constant coefcients of correlation, our mainresults can be extended to other heteroskedastic models.Statistical inference on the cointegration space can be also affected by conditional

heteroskedasticity. If we use information of heteroskedastic errors, the power of thecointegration test is expected to improve in the same way that the efciency gain of thecointegrating vector estimator emerges. As this topic requires more complicated analysis,we leave it to future research.

Acknowledgments

I would like to thank Badi Baltagi, Valentina Corradi, David Drukker, Bruce Hansen,Dennis Jansen, Qi Li, Joon Park, Peter Robinson, Pentti Saikkonen, and participants atthe 2004 North America Econometric Society Meeting and workshops at Rice Universityand Texas A&M University for useful comments and suggestions. Special thanks are owedto the co-editor and two anonymous referees, who provided detailed and extensivecomments and suggestions. The author gratefully acknowledges the research support fromSoongsil University.

Appendix A. Mathematical proofs

In the appendix, we denote jAj trA0A1=2, kAkm EjAjm1=m, and Yd fy 2

B. Seo / Journal of Econometrics 137 (2007) 68111 87Y j jDny y0jpdg. For simplicity, supt sup1ptpn, and k k k k1. We denote

ARTICLE IN PRESSXty Xty0 op1 if, for all d40,

supy2Yd

jXty Xty0j!p0.

Proof of Lemma 1. By the invariance principle of Phillips and Durlauf (1986),

n1=2Xnst1

ut ) Us BMO.

(1.1) Show n1=2xns ) C1Us.We need to show

P sups20;1

n1=2 xns C1Xnst1

ut

4

!pP sup

s20;1n1=2jFLunsj4

!! 0.

Note that suptkFLutk2o1 because

kFLutk2pX1j0

jFjjkutk2pX1j0

X1kj1

jCkjkutk2pX1k1

kjCk j kutk2o1.

Thus, fFLutg is uniformly square integrable, which implies

supt

n1=2jFLutj!p0.

(1.2) Show n1=2Pns

t1 wt ) bn0F1Us.

We need to show

P sups20;1

n1=2Xnst1

wt bn0F1

Xnst1

ut

4

!pP sup

s20;1n1=2jbn0F1Lunsj4

!! 0,

where F1L FL F1=1 L.P1k1 k

2jCkjo1, supy2Yjyjo1, and kutk2o1 imply

kbn0F1Lutk2p supbn

jbnjX1j0

X1kj2

k j 1Ckut

2

p supbn

jbnjX1k1

k2jCkjkutk2o1: &

Proof of Lemma 2. The error correction model (1) can be written as follows:

Dxt aIr

b

!0x1t1x2t1

! GDXt1 ut,

B. Seo / Journal of Econometrics 137 (2007) 6811188where G G1;G2; . . . ;Gl and DXt1 vecDxt1;Dxt2; . . . ;Dxtl.

ARTICLE IN PRESSLet Lj be the jth row of the correlation matrix L. The orthogonalized innovation ejt Ljut follows the GARCH(1,1) process

s2jt oj cje2jt1 fjs2jt1.

We use the fact that supt supy 1=s2jtyp1=ojo1 by Assumption 1(d). Also, we use

0pfjo1 for j 1; 2; . . . ; p.First, we show suptkn1=2xtkmo1 for 1pmp6. We prove it for m 6.Since xt C1

Pti1 ui FLut, we need to show

supt

n1=2C1Xti1

ui

6

o1 and suptkn1=2FLutk6o1,

where ut uty0, and FL CL C1=1 L.By using Burkholders inequality and Minkowskis inequality, kutk6o1 implies

supt

n1=2C1Xti1

ui

6

p supt

C1 E n1Xt

i1u2i

3

0@

1A

1=6

pC1 supt

t=nkutk6pC1kutk6o1,

suptkn1=2FLutk6pn1=2

X1k1

kjCkjkutk6 op1,

where C1 1086=5

pjC1j.

Thus, suptkn1=2xtkmo1 for 1pmp6 by monotonicity.Also, we can show that suptkDxtkmo1 and suptkwtkmo1 for 1pmp6 because

suptkDxtkmp sup

tkCLutkmp

X1k0

jCkjkutkmo1

and

suptkwtkmp sup

tkbn0FLutkmp sup

bnjbnj

X1k1

kjCkjkutkmo1.

(2.1) Ln is stochastically equicontinuous.(2.1.a) Show e2jty e2jt op1.Because

uty ut an

p b b00n1=2x2t1 a a0wt1 G G0DXt1,utyu0ty utu0t op1 if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.We use the following:

ejty Ljuty

B. Seo / Journal of Econometrics 137 (2007) 68111 89 ejt Lj Lj0ut Ljuty ut.

ARTICLE IN PRESSSince

e2jty e2jt Lj Lj0utu0tLj Lj00 Ljuty ututy ut0L0j 2ejtu0tLj Lj00 2ejtuty ut0L0j 2Lj Lj0ututy ut0L0j,

supy2Yd

je2jty e2jtj!p0,

for all j 1; 2; . . . ; p if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.b) Show s2jty s2jt op1.Note that

s2jty oj

1 fj cj

Xt1k0

fkj e2jtk1y

s2jt oj

1 fj oj01 fj0

" # cj

Xt1k0

fkj e2jtk1y e2jtk1

Xt1k0

cjfkj cj0fkj0e2jtk1.

We use the following:

cjfkj cj0fkj0 cj cj0fkj cj0fkj fkj0

fkj fkj0 fj fj0fk1j fk2j fj0 fk1j0 pfj fj0kfj

k1,

where fj maxffj ;fj0g.Thus, s2jty s2jt op1 for all j if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.c) Show e2jty=s2jty e2jt=s2jt op1.We use the following:

e2jtys2jty

e2jt

s2jt e

2jty e2jts2jty

2jt

s2jtys2jty s2jt.

Since e2jty e2jt op1, s2jty s2jt op1, and 1=s21typ1=o1, e2jty=s2jty e2jt=s2jt op1 for all j if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.d) Show lty lty0 op1, where

lty lty0 0:5Xpj1

log s2jty log s2jt e2jtys2jty

e2jt

s2jt

!" #,

Since log s2jty=s2jtps2jty s2jt=s2jt, log s2jty log s2jt op1.

B. Seo / Journal of Econometrics 137 (2007) 6811190Therefore, lty lty0 op1 if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.

ARTICLE IN PRESSBecause

P supy2Yd

jLny Lny0j4 !

p 1n

Xnt1

E supy2Yd

jlty lty0j !

p 1supt

E supy2Yd

jlty lty0j !

,

Assumption 1 implies that the likelihood function Lny is stochastically equicontinuous.(2.2) Show the consistency of b^n.We dene Nd fy 2 Yj j

n

p b b0jpdg and Nd fy 2 Yj jn

p b b0j4dg. We claimthat, for every d40,

limn

Pn;y0fLny04 supy2NdLnyg 1.

To prove the claim, we use the following:

Lny0 Lny

12n

Xpj1

Xnt1

log s2jty log s2jt e2jtys2jty

e2jt

s2jt

" #

12n

Xpj1

Xnt1

logs2jtys2jt

s2jty s2jts2jty

e2jty e2jts2jty

e2jt s2jts2jty s2jt

s2jts2jty

" #

12n

Xpj1

Xnt1

s2jts2jty

log s2jt

s2jty 1 e

2jty e2jts2jty

e2jt s2jts2jty s2jt

s2jts2jty

" #.

Note that

e2jty Ljutyu0tyL0j

s2jty oj

1 fj cj

Xt1k0

fkj e2jtk1y,

where uty ut ab b00x2t1 a a0wt1 G G0DXt1.We use the fact that x2t Op

n

p , supt n1=2jDxtj!p0, and supt n

1=2jwtj!p0 since Dxt

and wt are uniformly square integrable. Thus, uty Opn

p , e2jty Opn, and1=s2jty Opn1 if y 2 Nd.First, s2jt=s

2jty log s2jt=s2jty 1X0 for all s2jt=s2jty40. Thus, limn Pn;y0fK1jnyX0g

1 for all j and y 2 Y, where K1jny 1=nPn

t1 s2jt=s2jty log s2jt=s2jty 1. If y 2 Nd,

B. Seo / Journal of Econometrics 137 (2007) 68111 91s2jt=s2jty ! 0 as n!1, which implies limn Pn;y0fK1jny40g 1.

ARTICLE IN PRESSSecond, show limn Pn;y0fK2jnyX0g 1 for all j and y 2 Nd, where K2jny 1=nPn

t1e2jty e2jt=s2jty.We use the following:

n1Xnt1

e2jty n1Xnt1

e2jt Ljn1Xnt1

uty ututy ut0L0j

Lj Lj0n1Xnt1

utu0tLj Lj00

2n1Xnt1

ejtu0tLj Lj00 2Ljn1

Xnt1

ututy ut0L0j.

Since n1Pn

t1 x2t1u0t Op1, n1

Pnt1 wt1u

0t op1, and n1

Pnt1 DXt1u

0t op1,

n1Xnt1

e2jty e2jt Ljn1Xnt1

uty ututy ut0L0j opn.

As Lemma 3 shows, n1Pn

t1 x2t1X0t1 Op1 and n1

Pnt1Xt1X

0t1 Op1, where

Xt1 w0t1;DX 0t10.

n1Xnt1

uty ututy ut0 ab b00n1Xnt1

x2t1x02t1b b0a0 opn.

Because x2t1x02t1=s2jty Op1 for all y 2 Nd,

n1Xnt1

e2jty e2jts2jty

Ljab b00n1Xnt1

x2t1x02t1s2jty

b b0a0L0j op1

!p Ljab b00M22yb b0a0L0j,

where M22y plim n1Pn

t1 x2t1x02t1=s

2jty.

Thus, limn Pn;y0fK2jnyX0g 1 and for all j and y 2 Nd. If aa0 and M22y40 for ally 2 Nd, then limn Pn;y0fK2jny40g 1.Third, K3jny 1=n

Pnt1 e2jt s2jts2jty s2jt=s2jts2jty 1=n

Pnt1 2jt 11 s2jt=

s2jty!p0 for all j and y 2 Y because E2jt 11 s2jt=s2jtyjFt1 0 and k2jt 1

1 s2jt=s2jtykm=2pkjtk2m 11 kejtk2m=ojo1 for some m42.Thus, limn Pn;y0fLny04supy2NdLnyg 1. The claim implies that if Lny^nXLny0,

it must be that y^n 2 Nd.Therefore,

limn

Pn;y0fy^n 2 NdgX limn

Pn;y0fLny^nXLny0g 1.

B. Seo / Journal of Econometrics 137 (2007) 6811192Next, we show that Hny is stochastically equicontinuous.

ARTICLE IN PRESS(2.3) Show Hnbby Hnbby0 op1, where Hnbby n1Pn

t1 n1q2lty=qb qb0

and

q2ltyqb qb

0

Xpj1

A0jAj x2t1x02t1s2jty

2 hjLx2t2ejt1yhjLx02t2ejt1y

s4jty1 2Zjty

"

2 hjLx2t2ejt1yx02t1ejty

s4jty 2 x2t1ejtyhjLx

02t2ejt1y

s4jty

hjLx2t2x02t2Zjty

s2jty

#.

(2.3.a) Show A0jAj n1x2t1x02t1=s2jty A0j0Aj0 n1x2t1x02t1=s2jt op1.Because

A0jAj n1x2t1x02t1

s2jty

! A0j0Aj0

n1x2t1x02t1s2jt

!

A0jAj n1x2t1x02t1

s2jts2jty

s2jty s2jt !" #

A0jAj A0j0Aj0 n1x2t1x02t1

s2jt

" #

and A0jAj A0j0Aj0 Aj Aj00Aj A0j0Aj Aj0, we get A0jAj n1x2t1x02t1=s2jty A0j0Aj0 n1x2t1x02t1=s2jt op1 if kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1.(2.3.b) Show n1hjLx2t2ejt1yhjLx02t2ejt1y=s4jty n1hj0Lx2t2ejt1

hj0Lx02t2ejt1=s4jt op1, where

hjLx2t2e1t1yhjLx02t2ejt1y c2jXt1k1

f2kj x2tk1x02tk1e

2jtky

c2jXkal

fkj fljx2tk1x

02tl1ejtkyejtly.

If kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1, thenx2tk1x02tk1e

2jtky

s4jtyx2tk1x02tk1e

2jtk

s41j op1

because

x2tk1x02tk1e2jtky

s4jty

x2tk1x02tk1e2jtk

s4jtx2tk1x02tk1e2jtky e2jtk

s4jty

x2tk1x02tk1e

2jtk

s2s2 y1

s2 y 1

s2

!s2jty s2jt.

B. Seo / Journal of Econometrics 137 (2007) 68111 93jt jt jt jt

ARTICLE IN PRESSAs a result, if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1, then

n1hjLx2t2ejt1yhjLx02t2ejt1y

s4jty

n1 hj0Lx2t2ejt1hj0Lx02t2ejt1

s4jt op1.

(2.3.c) Show

n1hjLx2t2ejt1yhjLx02t2ejt1y

s4jtyZjty

n1 hj0Lx2t2ejt1hj0Lx02t2ejt1

s4jtZjt op1,

where Zjt e2jt=s2jt 1.Because

x2tk1x02tk1e2jtkye2jty

s6jty

x2tk1x02tk1e

2jtke

2jt

s6jt x2tk1x

02tk1e2jtkye2jty e2jtke2jt

s6jty

x2tk1x02tk1e

2jtk

2jt

s2jty1

s4jty 1s2jty

1

s2jt 1s4jt

!s2jty s2jt,

we can get the desired results if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1.(2.3.d) Show n1A0jAj hjLx2t2ejt1yx02t1ejty=s4jty n1A0j0Aj0 hj0 Lx2t2

ejt1x02t1ejt=s4jt op1.If kn1=2x2tk5o1, kwtk5o1, and kDxtk5o1, then

n1x2tk1ejtkyx02t1ejty

s4jty n1 x2tk1ejtkx

02t1ejt

s4jt op1

because

x2tk1ejtkyx02t1ejtys4jty

x2tk1ejtkx02t1ejt

s4jt

x2tk1x02t1ejtkyejty ejtkejt

s4t

x2tk1x02t1ejtkjt

sjts2jty1

s2jty 1s2jt

!s2jty s2jt

for all kX1.(2.3.e) In the same way, kn1=2x2tk4o1 and kwtk4o1 imply that

n1hjLx2t2x02t2Zjty

2 n1 hj0Lx2t2x

02t2Zjt

2 op1.

B. Seo / Journal of Econometrics 137 (2007) 6811194sjty sjt

ARTICLE IN PRESSTherefore, Assumption 1 and kutk6o1 imply that n1q2lty=qbqb0 n1q2lty0=qbqb

0 op1.(2.4) Show Hnaay Hnaay0 op1, where a veca, Hnaay n1

Pnt1 q

2lty=qa qa0 and

q2ltyqa qa0

Xpj1

L0jLj wt1bw0t1b

s2jty 2 hjLwt2bejt1yhjLw

0t2bejt1y

s4jty

"

1 2Zjty 2hjLwt2bejt1yw0t1bejty

s4jty

2 wt1bejtyhjLw0t2bejt1y

s4jty hjLwt2bw

0t2bZjty

s2jty

#,

where hjLwt2bejt1y cjPt1

k0 fkj wtk2bejtk1y.

Note that wtbw0tb wtw0t op1 if kn1=2x2tk2o1 and kwtk2o1 becausewtbw0tb wtw0t

n

p b b00n1x2tx02tn

p b b0 np b b00n1=2x2tw0t wtn1=2x02t np b b0.

(2.4.a) Show wt1bw0t1b=s21ty wt1w0t1=s2jt op1.If kn1=2x2tk4o1 and kwtk4o1, then wt1bw0tb=s21ty wt1w0t1=s2jt op1

because

wt1bw0tbs21ty

wt1w0t1

s2jt wt1bw

0tb wt1w0t1s2jty

wt1w0t1

s2jts2jty

s2jty s2jt.

In the same way as (2.3.b)(2.3.e), we can show that Assumption 1 and kutk6o1 implyq2ltyqa qa0

q2lty0qa qa0

op1.

(2.5) Show Hnggy Hnggy0 op1, where Hnggy n1Pn

t1 q2lty=qg qg0.

Because q2lty=qgi qgj 0 for iaj and i; j 1; 2; . . . ; p, we consider the following:q2ltyqo2j

121 fj2s4jty

1 2Zjty

q2ltyqc2j

Pt1k0 fkj e2jtk1y2

s4jty1 2Zjty

q2ltyqf2j

oj=1 fj2 cj

Pt1k1 kf

k1j e

2jtk1y2

2s4jty1 2Zjty

oj=1 fj3

s2jtyZjty.

The proof for stochastic equicontinuity in the GARCH model has been provided in Lee

B. Seo / Journal of Econometrics 137 (2007) 68111 95and Hansen (1994) and Lumsdaine (1996), where the mean equation does not contain

ARTICLE IN PRESSregressors. However, in the same way as (2.3), we can show that

q2ltyqg qg0

q2lty0qg qg0

op1

if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1.(2.6) Show Hnlly Hnlly0 op1, where Hnlly n1

Pnt1 q

2lty=ql ql0.Because qeity=qlji uity1 j4i,

q2ltyql2ji

u2itys2jty

2 c2j

Pt1k0 f

kj uitk1yejtk1ys4jty

1 2Zjty

4 cjuityejtyPt1

k0 fkj uitk1yejtk1y

s4jty

cjP1k0fkj u2itk1y

s2jtyZjty.

(2.6.a) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then u2ity=s2jty u2it=s2jt op1because

u2itys2jty

u2it

s2jt u

2ity u2its2jty

u2it

s2jts2jty

s2jty s2jt.

(2.6.b) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then

c2jPt1

k0 fkj uitk1yejtk1ys4jty

c2j0

Pt1k0 f

kj0uitk1ejtk1s4jt

op1,

because

uitk1yejtk1ys4jty

uitk1ejtk1s4jt

uitk1yejtk1y uitk1ejtk1s4jty

uitk1ejtk1s2jts

2jty

1

s2jty 1s2jt

!s2jty s2jt.

(2.6.c) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then

c2jPt1

k0 fkj uitk1yejtk1ys4jty

Zjty c2j0Pt1

k0 fkj0uitk1ejtk1s4jt

Zjt op1.

The cross derivatives entail similar variables, and the same method can be applied to theproof. Therefore, if Assumption 1 holds and kutk6o1, then

Hny Hny0 op1: &P

B. Seo / Journal of Econometrics 137 (2007) 6811196Proof of Lemma 3. Show n3=2 nt1 q2lty0=qb qy02 op1.

ARTICLE IN PRESS(3.1) We show n3=2Pn

t1 q2lty0=qb qveca00 op1, where

q2lty0qb qveca00

Xmj1

A0jLj x2t1w0t1

s2jt 2 hjLx2t2ejt1hjLw

0t2ejt1

s4jt1 2Zjt

"

2 hjLx2t2ejt1w0t1ejt

s4jt 2 x2t1ejthjLw

0t2ejt1

s4jt

hjLx2t2w0t2Zjt

s2jt

#,

where hjLx2t2ejt1 cjPt1

k0fkj x2tk2ejtk1.

First, we show that n1Pn

t1 x2t1w0t1 Op1.

n1Xnt1

xtw0t n1

Xnt1

xt1 Dxtw0t )Z 10

C1U dU 0F01bn K1,

where K1 EDx0w00 Ev0w01, and vt FLut.(3.1.a) Show n3=2

Pnt1 x2t1w

0t1=s

2jt op1.

Note that supt kwt1=s2jtkmo1 for some m42 because

kwt1=s2jtkm kbn0 FLut1=s2jtkmp1=oj sup

bnjbnj

X1k1

kjCkjkutkmo1.

Because fwt1=s2jtg is strong mixing from Assumption 1(f), we can appeal to Theorem 3.1of Hansen (1992) to show that n1

Pnt1 x2t1w

0t1=s

2jt Op1. Thus, n3=2

Pnt1 x2t1w

0t1=

s2jt op1.(3.1.b) Show n3=2

Pnt1 hjLx2t2ejt1hjLw0t2ejt1=s4jt op1, where

hjLx2t2ejt1hjLw0t2ejt1 Xt1k1

h2j;kx2tk1w0tk1e

2jtk

Xkal

hj;khj;lx2tk1w0tl1ejtkejtl .

First, we show that

n3=2Xnt1

Xt1k1

h2j;kx2tk1w0tk1e

2jtk=s

4jt op1.

Lemma 4 of Lee and Hansen (1994) has shown that, for all j and kX1,

E fkjs2jtks2jt

Ftk1 !

p1 a.s.

Thus, we can show that, for some m42,

sup kfkj wtk1e2jtk=s4jtkmp1=ojkwtk1kmkjtkk22mo1.

B. Seo / Journal of Econometrics 137 (2007) 68111 97t

ARTICLE IN PRESSBecause fwtk1e2jtk=s4jtg is strong mixing, we can show that

n1Xnt1

hj;kx2tk1w0tk1e2jtk=s

4jt Op1

for all j and kX1.Because hj;k cjfkj decays exponentially, the nite-dimensional convergence implies

n1Xnt1

Xt1k0

h2j;kx2tk1w0tk1e

2jtk=s

4jt Op1.

Second, we show that

n3=2Xnt1

Xkal

hj;khj;lx2tk1w0tl1ejtlejtk=s4jt op1,

for all j.Without loss of generality, we set kol.

n3=2Xnt1

Xkal

hj;khj;lx2tk1w0tl1ejtlejtk=s4jt

n3=2Xnt1

Xkal

hj;khj;lx2tl1 Dx2tl Dx2tk1w0tl1ejtlejtk=s4jt.

Because kfk=2j fl=2j wtl1ejtlejtk=s4jtkmp1=ojkwtl1kmkjtlkmkjtkkmo1 for somem42,

n1Xnt1

Xkal

hj;khj;lx2tl1w0tl1ejtlejtk=s4jt Op1.

Also,

n1Xnt1

Xkal

hj;khj;lDx2tl Dx2tk1w0tl1ejtlejtk=s4jt Op1.

(3.1.c) The other parts entail fwtk2ejtk1ejtl1Zjt=s4jtg, fejtk1wt1ejt=s4jtg, fwtk2Zjt=s2jtg, and fwtk2ejtk1ejt=s4jtg. These processes are Martingale difference sequences, andthus we can apply Theorem 2.1 of Hansen (1992) to get the desired results.(3.2) Show n1

Pnt1 q

2lty0=qb qvecG00 Op1, where

q2lty0qb qvecG00

Xmj1

A0jLj x2t1DX 0t1

s2jt 2 hjLx2t2ejt1hjLDX

0t2ejt1

s4jt

"

1 2Zjt 2hjLx2t2ejt1DX 0t1ejt

s4jt

2 x2t1ejthjLDX0t2ejt1

s4 hjLx2t2DX

0t2Zjt

s2

#.

B. Seo / Journal of Econometrics 137 (2007) 6811198jt jt

ARTICLE IN PRESSFirst, we show that n1Pn

t1 xtDx0ti Op1 for i 0; 1; 2; . . . ; l 1.

n1Xnt1

xtDx0ti n1Xnt1

xti1 Dxti DxtDx0ti

)Z 10

C1U dU 0C10 K2i,

where K2i EDx0Dx00 DxiDx00 Ev0Dx01 for i 0; 1; 2; . . . ; l 1.(3.2.a)(3.2.b) Note that supt kDxti=s2jtkmo1 for some m42 and i 1; 2; . . . ; l because

kDxti=s2jtkm kCLuti=s2jtkmp1=ojX1k0

jCkjkutkmo1.

Also,

suptkfkj Dxtk1e2jtk=s4jtkmp1=ojkDxtk1kmkjtkk22mo1.

Thus, we get the following in the same way as (3.1.a)(3.1.b):

n3=2Xnt1

x2t1DX 0t1s2jt

op1,

n3=2Xnt1

hjLx2t2ejt1hjLDX 0t2ejt1=s4jt op1.

(3.2.c) The other parts entail

DXtk2ejtk1ejtl1Zjts4jt

;ejtk1DXt1ejt

s4jt;DXtk2Zjt

s2jt;DXtk2ejtk1ejt

s4jt

( ).

These processes are Martingale difference sequences, and thus we can get the desiredresults.(3.3) Let gj oj ;cj ;fj0 for j 1; 2; . . . ;m.Show n1

Pnt1 q

2lty0=qb qg0j Op1, where

q2lty0qb qg0j

Xmj1

A0j x2t1ejtqs2jt=qg0j

s4jt hjLx2t2ejt1qs

2jt=qg

0j

s4jt1 2Zjt

" #

and

qs2jtqgj

1

1 fjPt1k0 f

kj e

2jtk1

oj1 fj2

cjPt1

k1 kfk1j e

2jtk

0BBBBBBB@

1CCCCCCCA.

Lemma 4 of Lee and Hansen (1994) and Lemma 3 of Lumsdaine (1996) have shown that

kqs2jt=qgj1=s2jtkmo1 for some 1pmp6. Furthermore, it can be shown that qs2jt=qgj1=

B. Seo / Journal of Econometrics 137 (2007) 68111 99s2jto1 a.s. for gj oj ;cj. Thus, we show the proof for gj fj .

ARTICLE IN PRESS(3.3.a) Show n1Pn

t1 x2t1qs2jt=qfjejt=s4jt Op1, where qs2jt=qfj oj=1 fj2cjPt1

k1 kfk1j e

2jtk.

Note that qs2jt=qfjejt=s4jt is an MDS. Because kqs2jt=qfjejt=s4jtk2p1=o1=2j kqs2jt=qfj1=s2jtk2o1, we can show that

n1Xnt1

x2t1qs2jt=qfjejts4jt

Op1.

(3.3.b) Show n3=2Pn

t1 hjLx2t2ejt1Pt1

k1 kfk1j e

2jtk=s4jt op1, where

hjLx2t2ejt1Xt1k1

kfk1j e2jtk

! cj

Xt1k1

kf2kj x2tk1e3jtk

cj=fjXkal

lfkj fljx2tk1ejtke

2jtl .

First, we note that vjt;k f3k=2j e3jtk=s4jt is uniformly square integrable because

supt

Ev2jt;kfjvjt;kjXcgp1

ojsupt

E6jtkfj3jtkjXcojg ! 0

as c!1 if ktk6o1.We apply Theorem 3.1 of Hansen (1992) and get the desired result because kfk=2j decays

exponentially.Second, we show that

n3=2Xnt1

Xkal

lfkj fljx2tk1ejtke

2jtl=s

4jt op1.

In the same way as (3.1.b), kfk=2j fljejtke2jtl=s4jtkmo1 for some m42 and for all kal.Because lfk=2j decays exponentially, we can get the desired result.(3.3.c) The other part entails ejtkqs2jt=qfjZjt=s4jt. As the process is an MDS, we can

show the proof in the same way as (3.3.a).(3.4) Show n3=2

Pnt1 q

2lty0=qb ql0 op1, whereq2lty0qb qlji

A0j x2t1uits2jt

2 hjLx2t2ejt1hjLuit1ejt1s4jt

1 2Zjt"

4 hjLx2t2ejt1uitejts4jt

hjLx2t2uit1Zjts2jt

#

for ioj.We use the following:

qejtqlji

uit if ioj

0 otherwise.(3.4.a) Because uit=s2jt is an MDS and kuit=s2jtk2p1=ojkuitk2o1, we can get

n1Xn xt1uit

2 Op1.

B. Seo / Journal of Econometrics 137 (2007) 68111100t1 sjt

ARTICLE IN PRESS(3.4.b) Show n3=2Pn

t1 hjLx2t2ejt1hjLuit1ejt1=s4jt op1, where

hjLx2t2ejt1hjLuit1ejt1 Xt1k1

h2j;kx2tk1uitke2jtk

Xkal

hj;khj;lx2tk1ejtkuitlejtl .

First, because kfkj uitke2jtk=s4jtkmo1 for some m42 and fkj decays exponentially,

n3=2Xnt1

Xt1k1

h2j;kx2tk1uitke2jtk=s

4jt op1.

Next, we can show that n3=2Pn

t1P

kal hj;khj;lx2tk1ejtkuitlejtl=s4jt op1 in the

same as (3.3.b).Therefore, n3=2

Pnt1 q

2lty0=qb qy02 op1.Lemma 2implies that y^n 2 Yd, and hence y 2 Yd, where y 2 y^n; y0.Therefore, by appealing to Proposition 3.2 of Saikkonen (1993), Hny

Hny0 op1, andDny^n y0 Hny1Gny0

Hny01Gny0 op1,where y 2 y^n; y0. Furthermore, block-diagonality implies that

n ^bn b0 hny01gny0 op1,which completes the proof. &

Proof of Lemma 4. (4.1) Show n1Pn

t1 qlty0=qb)R 10W 2sdW 01s, where

qlty0qb

Xmj1

A0j x2t1ejts2jt

hjLx2t2ejt1Zjts2jt

" #.

We denote zjt ejt=s2jt and qjt;k ejtkZjt=s2jt. Note that fzjt; qjt;kg is strictly stationaryand ergodic, and an MDS.Since kzjtk2p1=ojkejtk2o1, we can apply Kurtz and Protter (1991) and Hansen (1992)

to get

n1Xnt1

x2t1zjt )Z 10

W 2 dZj.

In the same way, kqjt;kk2p1=ojkejtk2kjtk24 1o1 for all kX1, and hence

n1Xnt1

x2t1qjt;k )Z 10

W 2 dQj;k.

We denote Fjn;k n1Pn

t1 x2t1qjt;k and Fj;k R 10 W 2 dQj;k. Now, we want to show

Fjn ) Fj, where Fjn Fjn;1;Fjn;2; . . . and Fj Fj;1;Fj;2; . . .. We dene a metricd1f

P1k1 k

rjf kj, where r42 and f 2 R1. Then, the nite-dimensional convergenceand the tightness of the probability measure Fjn in R

1, with respect to the metric d1f ,imply the weak convergence Fjn ) Fj. The detailed proof is given in Hansen (1995,

B. Seo / Journal of Econometrics 137 (2007) 68111 101pp. 11271128).

ARTICLE IN PRESSBecauseP1

k1 hj;kf j;k is d1-continuous, we have the following result by using the CMT:

n1Xnt1

X1k1

hj;kx2tk1qjt;k X1k1

hj;kn1Xn

t1x2t1qjt;k op1

)X1k1

hj;k

Z 10

W 2 dQj;k

Z 10

W 2X1k1

hj;k dQj;k Z 10

W 2 dQj.

Also, show that Qjns ) Qjs, where Qjns n1=2Pns

t1Pt1

k1 hj;kqjt;k and Qjs P1k1 hj;kQj;k.Because

P1kt hj;kkqjt;kk Oftj and

Pnst1P1

kt hj;kkqjt;kko1 for all s 2 0; 1,

P sups20;1

n1=2Xnst1

X1k1

hj;kqjt;k Xnst1

Xt1k1

hj;kqjt;k

4

!

P sups20;1

n1=2Xnst1

X1kt

hj;kqjt;k

4

!

pP n1=2Xnt1

X1kt

hj;kjqjt;kj4 !

p n1=2Pn

t1P1

kt hj;kkqjt;kk

! 0.

Thus, Qjns ) Qjs. Let x2tk1 x2t1 Pk

i1 Dx2ti for kX1. Since kPt1

k1 hj;kPki1 Dx2tiqjt;kkm=2pPt1k1 hj;kPki1 kDx2tikmkqjt;kkmo1 for some m42,

n1Xnt1

Xt1k1

hj;kx2tk1qjt;k

n1Xnt1

Xt1k1

hj;kx2t1qjt;k n1Xnt1

Xt1k1

hj;kXki1

Dx2ti

!qjt;k

n1Xnt1

Xt1k1

hj;kx2t1qjt;k op1.

Thus,

n1Xnt1

hjLx2t2ejt1Zjts2jt

n1Xnt1

Xt1k1

hj;kx2tk1qjt;k

n1Xnt1

Xt1k1

hj;kx2t1qjt;k op1

)Z 1

W 2 dQj.

B. Seo / Journal of Econometrics 137 (2007) 681111020

ARTICLE IN PRESSTherefore, we have n1Pn

t1 qlty0=qb)R 10 W 2 dW

01, where W 1s A0Zs Qs.

(4.2) Show n2Pnt1 q2lty0=qb qb0 ) n R 10 W 2sW 02sds, where q

2lty0qb qb

0 Xmj1

A0jAj x2t1x02t1

s2jt 2 hjLx2t2ejt1hjLx

02t2ejt1

s4jt1 2Zjt

"

2x2t1ejthjLx02t2ejt1=s4jt 2hjLx2t2ejt1x02t1ejt=s4jt

hjLx2t2x02t2Zjt=s2jt#.

(4.2.a) Show n2Pn

t1 x2t1x02t1=s

2jt ) 1=B2j

R 10 W 2W

02.

First, we show that n3=2Pn

t1 x2t1x02t11=s2jt E1=s2jt Op1.

Assumption 1(a) and 1(b) imply that fs2jtg is b-mixing with exponential decay, and sof1=s2jtg. Because k1=s2jtkmp1=ojo1 for somem42, we can show that n3=2

Pnt1 x2t1x

02t1

1=s2jt E1=s2jt Op1. Thus,

n2Xnt1

x2t1x02t1=s2jt ) E1=s2jt

Z 10

W 2W02 1=B2j

Z 10

W 2W02.

(4.2.b) Show n2Pn

t1 hjLx2t2ejt1hjLx02t2ejt1=s4jt ) XjR 10 W 2W

02, where

hjLx2t2ejt1hjLx02t2ejt1 Xt1k1

h2j;kx2tk1x02tk1e

2jtk

Xkal

hj;khj;lx2tk1x02tl1ejtkejtl .

First, we show that

n3=2Xnt1

hj;kx2tk1x02tk1sjt;k Op1,

where sjt;k hj;ke2jtk=s4jt Ee2jtk=s4jt for all k.Note that sjt;k is b-mixing with exponential decay and ksjt;kkmp2=ojkjtk22mo1 for some

m42 and for all kX1. Thus, we get n3=2Pt1

k1Pn

t1 h2j;kx2tk1x

02tk1sjt;k Op1 by

appealing to Theorem 3.1 of Hansen (1992).Also, we get

n2Xnt1

Xt1k1

h2j;kx2tk1x02tk1Ee2jtk=s4jt

n2Xnt1

Xt1k1

h2j;kx2t1x02t1Ee2jtk=s4jt op1

since n1Pn

t1 hj;kx2t1Pk

i1 Dx02ti Op1 and n1

Pnt1 hj;k

Pki1 Dx2ti

Pki1 Dx

02ti

B. Seo / Journal of Econometrics 137 (2007) 68111 103Op1.

ARTICLE IN PRESSBecause n1x2t1x02t1 Op1 andP1

kth2j;kEe2jtk=s4jt Oftj,

n2Xnt1

X1kt

h2j;kx2t1x02t1Ee2jtk=s4jt op1.

Therefore,

n2Xnt1

Xt1k1

h2j;kx2tk1x02tk1e

2jtk=s

4jt

n2Xnt1

Xt1k1

h2j;kx2tk1x02tk1Ee2jtk=s4jt op1

n2Xnt1

x2t1x02t1X1k1

h2j;kEe2jtk=s4jt op1

) XjZ 10

W 2W02.

Second, we show n2Pn

t1 hj;khj;lx2tk1x02tl1ejtkejtl=s

4jt op1 for all kaj.

Without loss of generality, we set l4k.We have n3=2

Pnt1P

kal hj;khj;lx2tk1x02tl1ejtkejtl=s

4jt Op1 because ejtkejtl=s4jt

is b-mixing and kfk=2j fl=2j ejtkejtl=s4jtkmp1=ojktk2mo1 for some m42.(4.2.c) Also, n2

Pnt1 hjLx2t2ejt1hjLx02t2ejt1Zjt=s4jt op1 because

n3=2Xnt1

x2tk1x02tk1e2jtkZjt=s

4jt Op1

and

n3=2Xnt1

x2tk1x02tl1ejtkejtlZjt=s4jt Op1 for all kX1 and kal.

(4.2.d) In the same way, n3=2Pn

t1 x2tk1x02t1ejtkejt=s

4jt Op1 for all kX1. Hence,

we have n2Pn

t1 x2t1ejthjLx02t2ejt1=s4jt op1.(4.2.e) n2

Pnt1 hjLx2t2x02t2Zjt=s2jt op1 since n3=2

Pnt1 x2tk1x

02tk1Zjt=s

2jt

Op1 for all kX1.Therefore, we have n2Pnt1 q2lty0=qb qb0 ) n R 10 W 2sW 02sds.(4.3) Show n2

Pnt1 qlty0=qb qlty0=qb

0 ) m R 10 W 2W 02, whereqlty0qb

qlty0qb

0 Xmj1

A0jAj x2t1x02t1e2jt=s4jthjLx2t2ejt1hjLx02t2ejt1Z2jt=s4jt

2hjLx2t2ejt1x02t1ejtZjt=s4jt.

(4.3.a) We apply Theorem 3.1 of Hansen (1992) to show that

n3=2Xn

x2t1x02t1e2jt=s4jt Ee2jt=s4jt Op1.

B. Seo / Journal of Econometrics 137 (2007) 68111104t1

ARTICLE IN PRESSThus,

n2Xnt1

x2t1x02t1e2jt=s

4jt ) Ee2jt=s4jt

Z 10

W 2W02 1=B2j

Z 10

W 2W02.

(4.3.b) Since EZ2jt kj 1,

n2Xnt1

hjLx2t2ejt1hjLx02t2ejt1Z2jt=s4jt ) kj 1XjZ 10

W 2W02

in the same way as (4.2.b).

(4.3.c) Since fejtkejtZjt=s4jtg is an MDS, n3=2Pn

t1 x2tk1x02t1ejtkejtZjt=s

4jt Op1 for

all kX1. Thus, n2Pn

t1 hjLx2t2ejt1x02t1ejtZjt=s4jt op1 in the same way as (4.2.c).Therefore, n2

Pnt1 gty0g0ty0 ) m

R 10 W 2sW 02sds. &

Proof of Theorem 1. By using Lemmas 14, we have

n ^bn b0 hny01gny0 op1

) nZ 10

W 2W02

1 Z 10

dW 1 W 2

vecZ 10

W 2W02

1 Z 10

W 2 dW01n1

!.

The RRR estimator ~bn is based on the following likelihood function:

Lnb;U;L n1Xnt1

ltb;U;L,

where ltb;U;L 0:5Pm

j1 log s2j 0:5

Pmj1 e

2jtb;U;L=s2j , etb; U;L Lutb;U, s2j

Es2jt, and utb; U satises Eq. (5).We have the following derivatives:

qltb0; U0;L0qb

Xmj1

A0j x2t1ejts2j

,

q2ltb0;U0;L0qb qb

0 Xmj1

A0jAj x2t1x02t1

s2j.

By using the previous results, we can show that the RRR estimator ~bn has an asymptotic

B. Seo / Journal of Econometrics 137 (2007) 68111 105distribution as (20). &

ARTICLE IN PRESSProof of Theorem 2. If we use the information matrix,

WIn )Z

dW 01 W 02

n1 Z 10

W 2W02

1 !

R0 R n1 Z 10

W 2W02

1 !R0

" #1

R n1 Z 10

W 2W02

1 ! ZdW 1 W 2

J 0Q1=20Q1n Q1=2J.We can show other parts in the same way. &

Proof of Lemma 5. (5.1) Show n1=2Pn

t1 qlty0=qt!dL0Z QN0;L0S1=2MS1=2L,

where

qlty0qt

Xmj1

L0jejt

s2jt hjLejt1Zjt

s2jt

" #.

Let zjt ejt=s2jt, and qjt;k ejtkZjt=s2jt. Since fzjtg and qjt;k are Martingale differencesequences,

n1=2Xnt1

zjt !d ZjN 0;1

B2j

!,

n1=2Xnt1

qjt;k !dQj;kN0; kj 1x2j;k,

for each kX1.Let Qjn n1=2

Pnt1P1

k1 hj;kqjt;k. Because kqjt;kk2o1 andP1

k1 hj;kkqjt;kk2o1,Qjn!

d P1k1 hj;kQj;k Qj.

SinceP1

kt hj;kkqjt;kk Oftj, we have the following result in the same way as (4.1).

Qjn n1=2Xnt1

Xt1k1

hj;kqjt;k!dQj .

Therefore, n1=2Pn

t1 qlty0=qt!dL0Z Q.

(5.2) Show n1Pnt1 q2lty0=qtqt0 !p L0S1=2NS1=2L, where q

2lty0qt qt0

Xmj1

L0jLj1

s2jt 2 hjLejt1

2

s4jt1 2Zjt 4

hjLejt1ejts4jt

hj1Zjts2jt

" #.

(5.2.a) Because f1=s2jtg is strictly stationary and ergodic, and k1=s2jtkp1=ojo1,

n1Xn 1

s2!p E 1

s2

! 1B2.

B. Seo / Journal of Econometrics 137 (2007) 68111106t1 jt jt j

ARTICLE IN PRESS(5.2.b) Show n1Pn

t1 hjLejt12=s4jt!pXj, where

hjLejt12 Xt1k1

h2j;ke2jtk

Xkal

hj;khj;lejtkejtl .

Because e2jtk=s4jt is strictly stationary and ergodic, for all kX1, and k

P1k1 h

2j;ke2jtk=

s4jtko1,

n1Xnt1

X1k1

h2j;ke2jtks4jt

!pX1k1

h2j;kEe2jtks4jt

!.

Also, n1Pn

t1P1

kt h2j;kEe2jtk=s4jt o1 since

P1kt h

2j;kEe2jtk=s4jt Oftj.

Thus,

n1Xnt1

Xt1k1

h2j;ke2jtk

s4jt n1

Xnt1

X1k1

h2j;kh2j;ke

2jtk

s4jt op1

!pX1k1

h2j;kEe2jtk=s4jt Xj.

Because n1Pn

t1 hj;khj;lejtkejtl=s4jt op1 for all kal, n1

Pnt1 hjLejt12=s4jt!

pXj.

The other parts entail fejtkejtlZjt=s4jt; ejtkejt=s4jt; Zjt=s2jtg for k; lX1. These processes areMartingale difference sequences, and their sample moments converge to zero.

Therefore, n1Pnt1 q2lty0=qtqt0 !p L0S1=2NS1=2L.(5.3) Show n3=2Pnt1 q2lty0=qb qt0 ) A0S1=2NS1=2L R 10 W 2, where

q2lty0qb qt0

Xmj1

A0jLj x2t1s2jt

2 hjLx2t2ejt1hjLejt1s4jt

1 2Zjt"

2 x2t1ejthjLejt1s4jt

2 hjLx2t2ejt1ejts4jt

hjLx2t2Zjts2jt

#.

(5.3.a) Show n3=2Pn

t1 x2t1=s2jt ) 1=B2j

R 10 W 2.

Assumption 1 implies that fs2jtg is b-mixing with exponential decay, and so f1=s2jtg.Because k1=s2jtkmp1=ojo1 for some m42, we can show that n1

Pnt1 x2t11=s2jt

E1=s2jt Op1. Thus,

n3=2Xnt1

x2t1=s2jt ) E1=s2jtZ 10

W 2 1=B2jZ 10

W 2.

(5.3.b) Show n3=2Pn

t1 hjLx2t2ejt1hjLejt1=s4jt ) XjR 10 W 2, where

hjLx2t2ejt1hjLejt1 Xt1

h2j;kx2tk1e2jtk

Xhj;khj;lx2tk1ejtkejtl .

B. Seo / Journal of Econometrics 137 (2007) 68111 107k1 kal

ARTICLE IN PRESSFirst, we show that

n1Xnt1

fkj x2tk1sjt;k Op1,

where sjt;k fkj e2jtk=s4jt Ee2jtk=s4jt for all kX1.Note that sjt;k is b-mixing with exponential decay and ksjt;kkmp2=ojkjtk22mo1 for some

m42 and for all kX1. Also, n3=2Pn

t1P1

kt h2j;kx2t1Ee2jtk=s4jt op1 since n1=2x2t1

Op1 andP1

kt h2j;kEe2jtk=s4jt Ofkj .

Therefore,

n3=2Xnt1

Xt1k1

h2j;kx2tk1e2jtk

s4jt n3=2

Xnt1

Xt1k1

h2j;kx2tk1Ee2jtk=s4jt op1

n3=2Xnt1

Xt1k1

h2j;kx2t1Ee2jtk=s4jt op1

n3=2Xnt1

x2t1X1k1

h2j;kEe2jtk=s4jt op1

) XjZ 10

W 2.

Second, we show n3=2Pn

t1 hj;khj;lx2tk1ejtkejtl=s4jt op1 for all kaj.

Without loss of generality, we set l4k. We have

n1Xnt1

hj;khj;lx2tk1ejtkejtl=s4jt Op1.

Because hj;khj;l decays exponentially, n1P

l4k

Pnt1 hj;khj;lx2tk1ejtkejtl=s

4jt Op1.

The other parts entail Martingale difference sequences, and are asymptoticallynegligible. Therefore,

n3=2Xnt1

q2lty0qb qt0

) A0S1=2NS1=2LZ 10

W 2: &

Proof of Theorem 3. The parameter vector of the model with an intercept can be denedas y b0; t0; y020. First, show that Hny2ty0 op1, where Hny2ty0 n1

Pnt1 q

2lty0=qy2 qt0.(6.1) Show Hnaty0 op1, where Hnaty0 n1

Pnt1 q

2lty0=qa qt0 and

q2lty0qa qt0

Xpj1

L0j0Lj0 wt1s2jt

2 hjLwt2ejt1hjLejt1s4jt

1 2Zjt"

2 hjLwt2ejt1ejts4jt

2wt1ejthjLejt1s4jt

hjLwt2Zjts2jt

#.

(6.1.a) Because wt1=s2jt is strictly stationary and ergodic, and kwt1=s2jtko1,P p

B. Seo / Journal of Econometrics 137 (2007) 68111108n1 nt1 wt1=s2jt! 0.

ARTICLE IN PRESS(6.1.b) Show n1Pn

t1 hjLwt2ejt1hjLejt1=s4jt op1, where

hjLwt2ejt1hjLejt1 Xt1k1

h2j;kwtk1e2jtk

Xkal

hj;khj;lwtk1ejtkejtl .

Since supt kfkj wtk1e2jtk=s4jtkmo1 for some m42,

n1=2Xnt1

fkj wtk1e2jtk=s

4jt Op1.

Because hj;k decays exponentially, n1=2Pt1

k1Pn

t1 h2j;kwtk1e

2jtk=s

4jt Op1. In the

same way, n1=2Pn

t1 hj;khj;lwtk1ejtkejtl=s4jt Op1 for kal.

Thus, n1Pn

t1 hjLwt2ejt1hjLejt1=s4jt op1.The other parts entail

wtk1ejtkejtlZjts4jt

;wtk1ejtkejt

s4jt;wtk1Zjt

s2jt

( )for k; lX1.

These processes are MDS, and therefore Hnaty0 op1.In the same way, we can show that Hny2ty0 op1.Using the block diagonality of the Hessian matrix,

n ^bn b0n

p t^n t0

0@

1A) n

RW 2W

02 A

0S1=2NS1=2L R W 2L0S1=2NS1=2A R W 02 L0S1=2NS1=2L

0@

1A

1

RdW 1 W 2L0Z Q

!.

Thus,

n ^bn b0 ) V1U ,

where

U Z

dW 1 W 2

A0S1=2NS1=2LZ

W 2

L0S1=2NS1=2L1L0Z Q,

V nZ

W 2W02

A0S1=2NS1=2L

ZW 2

L0S1=2NS1=2L1

L0S1=2NS1=2AZ

W 02

.

B. Seo / Journal of Econometrics 137 (2007) 68111 109

ARTICLE IN PRESSWe use the following:

U ZdW 1 W 11 W 2

ZW 2

Z

dW 1 W 2,

V nZ

W 2W02

n

ZW 2

ZW 02

n

ZW 2W

02 ,

where W 1 a0L0Z Q and W 2 W 2 RW 2.

Therefore,

n ^bn b0 ) nZ

W 2W02

1 ZdW 1 W 2

vecZ

W 2W02

1 ZW 2 dW

01n

!.

In the same way, n ~bn b0 ) vecRW 2W

02 1

RW 2 dW

01mr. &

References

Ahn, S.K., Reinsel, G.C., 1988. Nested reduced-rank autoregressive models for multiple time series. Journal of the

American Statistical Association 83, 849856.

Amemiya, T., 1973. Regression analysis when the variance of the dependent variable is proportional to the square

of its expectation. Journal of the American Statistical Association 68, 928934.

Andrews, D., 1987. Consistency in nonlinear econometric models: a generic uniform law of large numbers.

Econometrica 55, 14651471.

Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31,

307327.

Bollerslev, T., 1990. Modeling coherence in short-run nominal exchange rates: a multivariate generalized ARCH

approach. Review of Economics and Statistics 72, 498505.

Bollerslev, T., Engle, R., Wooldridge, J., 1988. A capital asset pricing model with time varying covariances.

Journal of Political Economy 96, 116131.

Bollerslev, T., Chou, R.Y., Kroner, K.F., 1992. ARCH modeling in nance: a review of the theory and empirical

evidence. Journal of Econometrics 52, 559.

Box, G.E.P., Tiao, G.C., 1977. A canonical analysis of multiple time series. Biometrika 64, 355365.

Carrasco, M., Chen, X., 2002. Mixing and moment properties of various GARCH and stochastic volatility

models. Econometric Theory 18, 1739.

Chanda, K., 1974. Strong mixing properties of linear stochastic processes. Journal of Applied Probability 11,

401408.

Engle, R., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom

ination. Econometrica 50, 9871008.

Engle, R., Granger, C., 1987. Cointegration and error correction representation, estimation, and testing.

Econometrica 55, 251276.

Gorodetskii, V., 1977. On the strong mixing property for linear sequences. Theory of Probability and Its

Applications 22, 411413.

Hansen, B.E., 1992. Convergence to stochastic integrals for dependent heterogeneous processes. Econometric

Theory 8, 489500.

Hansen, B.E., 1995. Regression with nonstationary volatility. Econometrica 63, 11131132.

He, C., Terasvirta, T., 1999. Properties of moments of a family of GARCH processes. Journal of Econometrics

92, 173192.

Johansen, S., 1988. Statistical analysis of cointegrating vectors. Journal of Economic Dynamics and Control 12,

B. Seo / Journal of Econometrics 137 (2007) 68111110231254.

Johansen, S., 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive

models. Econometrica 59, 15511580.

Kurtz, T., Protter, P., 1991. Weak limit theorems to stochastic integrals and stochastic differential equations.

Annals of Probability 19, 10351070.

Lee, S.W., Hansen, B.E., 1994. Asymptotic theory for the GARCH(1,1) quasi-maximum likelihood estimator.

Econometric Theory 10, 2952.

Li, W.K., Ling, S., Wong, H., 2001. Estimation for partially nonstationary multivariate autoregressive models

with conditional heteroskedasticity. Biometrika 88, 11351152.

Ling, S., Li, W.K., 1998. Limiting distributions of maximum likelihood estimators for unstable autoregressive

moving-average time series with general autoregressive heteroskedastic errors. Annals of Statistics 26, 84125.

Ling, S., Li, W.K., 2003. Asymptotic inference for unit root processes with GARCH(1,1) errors. Econometric

Theory 19, 541564.

Ling, S., McAleer, M., 2003. Asymptotic theory for a vector ARMA-GARCH model. Econometric Theory 19,

280310.

Lumsdaine, R.L., 1996. Consistency and asymptotic normality of the quasi-maximum likelihood estimator in

IGARCH (1,1) and covariance stationary GARCH (1,1) models. Econometrica 64, 575596.

Newey, W., 1991. Uniform convergence in probability and stochastic equicontinuity. Econometrica 59,

11611167.

Phillips, P.C.B., 1991. Optimal inference in cointegrated system. Econometrica 59, 283306.

Phillips, P.C.B., Durlauf, S.N., 1986. Multiple time series with integrated variables. Review of Economic Studies

ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 11153, 473495.

Saikkonen, P., 1993. Continuous weak convergence and stochastic equicontinuity results for integrated processes

with an application to the estimation of a regression model. Econometric Theory 9, 155188.

Saikkonen, P., 1995. Problems with the asymptotic theory

Documents

ByoengseonSeo_JournalOfEconometrics_2007