45
 Journal of Econometrics 137 (2007) 68–111 Asymptotic distribution of the cointegrating vector estimator in error correction models with conditional heteroskedasticity Byeongseon Seo Department of Economics, Soongsil University and Texas A&M University, Seoul 156-743, Korea Available online 12 May 2006 Abstract This paper expl ores the asy mptot ic distr ibut ion of the coint egra ting vector estimat or in erro r correction models with conditionally heteroskedastic errors. Asymptotic properties of the maximum likelihood estimator (MLE) of the cointegrating vector, which estimates the cointegrating vector and the multivariate GARCH process jointly, are provided. The MLE of the cointegrating vector follows mixture normal, and its asymptotic distribution depends on the conditional heteroskedasticity and the kurtosis of standardized innovations. The reduced rank regression (RRR) estimator and the regression-based cointegrating vector estimators do not consider conditional heteroskedasticity, and thus the efci ency gain of the MLE emerges as the mag nitude of cond itional heter osked asti city increases. The simulation results indicate that the relative power of the  t -statistics based on the MLE improves signicantly as the GARCH effect increases. r 2006 Elsevier B.V. All rights reserved. JEL classication:  C13; C32 Keywords:  Cointegrating vector; Efciency gain; Multivariate GARCH 1. Intro duct ion The notion of cointegration was developed by  Engle and Granger (1987) , and since then has bee n con sid ered imp ortant in the rec ent dev elo pment of time ser ies economet ric s. Many statistical methods have been developed for the analysis of the cointegrated systems, AR TI CL E IN PR ESS www.elsevier.com/locate/jeconom 0304 -407 6/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2006.03.008 Te l.: +82 2 8200552; fax : +82 2 8244384. E-mail address:  [email protected].

ByoengseonSeo_JournalOfEconometrics_2007

Embed Size (px)

DESCRIPTION

Econometrics

Citation preview

  • Journal of Econometrics 137 (2007) 68111

    increases. The simulation results indicate that the relative power of the t-statistics based on the MLE

    has been considered important in the recent development of time series econometrics.Many statistical methods have been developed for the analysis of the cointegrated systems,

    ARTICLE IN PRESS

    www.elsevier.com/locate/jeconom

    0304-4076/$ - see front matter r 2006 Elsevier B.V. All rights reserved.

    doi:10.1016/j.jeconom.2006.03.008

    Tel.: +82 2 820 0552; fax: +82 2 824 4384.

    E-mail address: [email protected] signicantly as the GARCH effect increases.

    r 2006 Elsevier B.V. All rights reserved.

    JEL classification: C13; C32

    Keywords: Cointegrating vector; Efciency gain; Multivariate GARCH

    1. Introduction

    The notion of cointegration was developed by Engle and Granger (1987), and since thenwith conditional heteroskedasticity

    Byeongseon Seo

    Department of Economics, Soongsil University and Texas A&M University, Seoul 156-743, Korea

    Available online 12 May 2006

    Abstract

    This paper explores the asymptotic distribution of the cointegrating vector estimator in error

    correction models with conditionally heteroskedastic errors. Asymptotic properties of the maximum

    likelihood estimator (MLE) of the cointegrating vector, which estimates the cointegrating vector and

    the multivariate GARCH process jointly, are provided. The MLE of the cointegrating vector follows

    mixture normal, and its asymptotic distribution depends on the conditional heteroskedasticity and

    the kurtosis of standardized innovations. The reduced rank regression (RRR) estimator and the

    regression-based cointegrating vector estimators do not consider conditional heteroskedasticity, and

    thus the efciency gain of the MLE emerges as the magnitude of conditional heteroskedasticityAsymptotic distribution of the cointegratingvector estimator in error correction models

  • ARTICLE IN PRESSand several methods of estimating the cointegrating vector have been proposed. Anotherdevelopment, generalized autoregressive conditional heteroskedasticity (GARCH), wasmade by Engle (1982) and Bollerslev (1986) to explain the time-varying volatility in thedata. This paper explores the asymptotic properties of the maximum likelihood estimator(MLE) of the cointegrating vector in the vector error correction model with conditionalheteroskedasticity. Because the existing estimation methods do not consider conditionalheteroskedasticity in the data, this study is useful and required.The main objective is to develop the asymptotic properties of the MLE of the

    cointegrating vector, which estimates the error correction model and the multivariateGARCH process jointly. The existing estimation methods, including the reduced rankregression (RRR) and the regression-based estimators, allow for, but do not treat explicitlyconditional heteroskedasticity. Their asymptotic distributions are invariant to conditionalheteroskedasticity. However, these estimators ignore the information coming fromconditional heteroskedasticity. Many authors, including Bollerslev et al. (1992), showthat economic variables such as stock prices and exchange rates have time-varyingvariances. The clustered volatility and thick-tailed distribution are typical characteristics ofthese variables. Although there is vast literature on the cointegrating vector and GARCH,the literature on the distribution theory for the cointegrating vector estimator withconditionally heteroskedastic errors is still sparse. This paper lls this gap in the literatureby developing an asymptotic theory for the cointegrating vector estimator in errorcorrection models with conditional heteroskedasticity.In this paper, we nd that the MLE of the cointegrating vector follows mixture normal,

    and its asymptotic distribution depends on the conditional heteroskedasticity and thekurtosis of standardized errors. The RRR and the regression-based cointegrating vectorestimators do not consider conditional heteroskedasticity in the data, and thus the MLEimproves efciency signicantly. Statistical inference on the cointegrating vector alsodepends on heteroskedasticity. The simulation study reveals that the efciency gain of theMLE emerges signicantly as the GARCH effect increases.The limiting distribution of the cointegrating vector estimator with heteroskedastic

    errors has been explored by Li et al. (2001) and Seo (2001). Li et al. (2001) investigated thelimiting distribution of the cointegrating vector estimator in the partially nonstationaryvector autoregressive model with ARCH(1) errors. We consider the multivariate GARCHerrors, which is a natural extension considering the stylized facts of the real data. Thedistribution theory of the cointegrating vector estimator, found by Li et al. (2001), dependson two correlated Brownian motions, which implies nonstandard asymptotic distribution.In this paper, we show that the MLE of the cointegrating vector follows the mixed normaldistribution, and provide an explicit analysis of efciency gain. This study also extends Seo(2001) by allowing for multiple cointegration rank.There are other related papers by Wong and Li (1997), Ling and McAleer (2003), Ling

    and Li (1998, 2003), and Seo (1999). Wong and Li (1997) and Ling and McAleer (2003)consider the vector autoregressive model with the GARCH errors, but they do notconsider nonstationarity and cointegration. Ling and Li (1998, 2003) and Seo (1999)explore the asymptotic theory for unit root tests with conditional heteroskedasticity. Here,we consider the cointegrating vector, and thus extend the former results to thenonstationary cointegrated models.We denote !p as convergence in probability, !d as convergence in distribution,

    B. Seo / Journal of Econometrics 137 (2007) 68111 69respectively, and ) as weak convergence with respect to the uniform metric. BMO

  • represents a Brownian motion with long-run variance O. Also, is the integer operator,

    ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 6811170j j is the Euclidean norm, and vec is the column-stacking operator.The paper is organized as follows. Section 2 introduces the model and the co-

    integrating vector estimators. Section 3 develops the asymptotic theory for thecointegrating vector estimators. The error correction model with an intercept is analyzedin Section 4. Section 5 deals with simulation results on the properties of the cointegratingvector estimators.

    2. The model

    Consider a p-dimensional time series xt generated by the error correction model (ECM)as follows:

    Dxt aIr

    b

    !0xt1

    Xli1

    GiDxti ut, (1)

    where a is the p r adjustment vector, and b is the p r r cointegrating vector.We assume that the cointegration rank is known and equals r. Thus, if we denote Eq. (1)

    as PLxt ut, then the rank of P P1 is r. We use the normalization of thecointegrating vector with respect to the rst r elements of xt. According to ournormalization, the cointegrating relationship wt is dened as follows:

    wtb x1t b0x2t, (2)where x1t is r-dimensional and x2t is p r-dimensional.As dened in Engle and Granger (1987), the cointegrating relationship is stationary. Our

    model is based on the normalization (2). The cointegrating vector can be identied fromthis representation. The same normalization has been used in many studies such as Phillips(1991).The error process ut is assumed to be a vector-valued Martingale difference sequence

    (MDS) satisfying EutjFt1 0 and Eutu0tjFt1 Ot, whereFt is the s-eld generatedby xti for i 0; 1; 2; . . . . Thus, our model allows for the time-varying conditionalvariance, which generalizes the error condition of Engle and Granger (1987) and Johansen(1988, 1991).Many models of multivariate conditional heteroskedasticity have been developed to

    explain time-varying covariance, common persistence, and volatility causality. Bollerslevet al. (1988) proposed vector GARCH and diagonal GARCH models. Each element ofcovariance follows the GARCH process, and thus we need to estimate a huge number ofparameters.1 Bollerslev (1990) proposed a multivariate GARCH model with constantconditional correlation. This model reduces the number of parameters to a manageablesize and it satises positive deniteness, and so the model has been used in many empiricalstudies.Our model is based on the constant-correlation GARCH specication, which has been

    proposed by Bollerslev (1990).

    Ot L1StL10, (3)

    1For example, if p 3, the vector GARCH model has 78 parameters and the diagonal GARCH model has 18

    parameters even though we assume a minimal lag order.

  • ARTICLE IN PRESSwhere L is a lower triangular matrix and St is a diagonal matrix as follows:

    St

    s21t 0 0 00 s22t 0 00 0 s23t 0... ..

    . ... ..

    . ...

    0 0 0 s2pt

    0BBBBBBBB@

    1CCCCCCCCA

    ; L

    1 0 0 0l21 1 0 0l31 l32 1 0... ..

    . ... ..

    . ...

    lp1 lp2 lp3 1

    0BBBBBBB@

    1CCCCCCCA.

    Dene et Lut, where et is an orthogonalized innovation of ut, satisfying EetjFt1 0 and Eete0tjFt1 St. We assume that each element of et follows the GARCH processas follows:

    s2jt oj cje2jt1 fjs2jt1, (4)where oj40, cjX0, and fjX0 for j 1; 2; . . . ; p.We note that our model is the vector error correction model with multivariate GARCH

    errors. The RRR estimator is based on the mean equation, but the MLE estimates themean and volatility equations jointly. We use the multivariate GARCH model withconstant correlation coefcient, and our analysis can be extended to other specicationssuch as the factor GARCH and the asymmetric GARCH models.If we denote Xt1b as the vector of stationary regressors and U as its coefcient matrix,

    then the mean equation (1) can be written as follows:

    Dxt UXt1b ut, (5)where Xt1b w0t1b;Dx0t1; . . . ;Dx0tl0 and U a;G1;G2; . . . ;Gl.We dene the parameter vector y b0; y020, where b vecb, y2 vecU0; g0; l00,

    g g01; g02; . . . ; g0p0, gj oj ;cj ;fj0 for j 1; 2; . . . ; p, and l l21; l31; l32; . . . ; lpp10.Let y0 be the true parameter value. We denote ut uty0, et ety0, and St Sty0.

    Dene the parameter space Y as y 2 Y Rk, where k r2p r ppl p 1=2 3.Let S ESto1 be the unconditional variance of the orthogonalized errors et, which

    requires fj cjo1 for all j 1; 2; . . . ; p. Thus, the volatility process is stationary, whichimplies a moving average representation.We dene the following:

    s2jt oj

    1 fj cj

    X1k0

    fkj e2jtk1,

    for j 1; 2; . . . ; p.The process s2jt follows the law of motion (4) with innite past history. However, based

    on a sample of fx1;x2; . . . ; xng, the volatility process s2jt cannot be observed by aneconometrician.The volatility process (4), given the startup condition s2j0 oj=1 fj, has a moving

    average representation in the form

    s2jt oj

    1 fj cj

    Xt1k0

    fkj e2jtk1,

    B. Seo / Journal of Econometrics 137 (2007) 68111 71for t 1; 2; 3; . . . ; n and j 1; 2; . . . ; p.

  • ARTICLE IN PRESSThe distribution theory of the GARCH process has been based on the unobservedvolatility representation because the nite horizon representation is not stationary. Asdiscussed in Lee and Hansen (1994) and Lumsdaine (1996), the initial conditions areasymptotically negligible, and the distribution theory using the nite horizon representa-tion is asymptotically equivalent to that of the innite horizon representation given someregularity conditions. This paper develops the distribution theory in accordance with theasymptotic equivalence of these two volatility representations.The log-likelihood function, with the auxiliary condition that utjFt1N0;Ot, is given

    by

    Lny n1Xnt1

    lty, (6)

    where

    lty 0:5 log jOtyj 0:5u0tyO1t yuty 0:5 log jStyj 0:5e0tyS1t yety

    0:5Xpj1

    log s2jty e2jtys2jty

    !,

    where s2jty and ejty satisfy Eqs. (1)(4).The MLE y^n can be dened as follows:

    y^n argmaxy2YLny. (7)

    We use the following derivatives:

    qltyqb

    Xpj1

    A0j x2t1ejtys2jty

    hjLx2t2ejt1yZjtys2jty

    " #,

    q2ltyqb qb

    0

    Xpj1

    A0jAj x2t1x02t1s2jty

    2 hjLx2t2ejt1yhjLx02t2ejt1y

    s4jty1 2Zjty

    "

    2 hjLx2t2ejt1yx02t1ejty

    s4jty 2 x2t1ejtyhjLx

    02t2ejt1y

    s4jty

    hjLx2t2x02t2Zjty

    s2jty

    #,

    where Zjty e2jty=s2jty 1, hjLx2t2ejt1y cjPt1

    k0 fkj x2tk2ejtk1y, and Aj is

    the jth row of A La for j 1; 2; . . . ; p.Because the MLE y^n maximizes the likelihood function, we get

    Pnt1 qlty^n=qy 0. In

    our model, the conditional variance depends on the cointegrating vector, and thus the rst-order condition accompanies the volatility adjustment. The Hessian matrix and the outer

    B. Seo / Journal of Econometrics 137 (2007) 6811172product of gradients also entail the volatility adjustment.

  • ARTICLE IN PRESSThe likelihood function depends on a number of parameters, and redundant parametersmay lead to the singularity error. In particular, the Hessian matrix tends to be near-singular when the volatility equation is specied with redundant parameters. Thus, it isnecessary to achieve the parsimonious specication by using the associated diagnostictests. In some cases, the factor GARCH model and the conditional error correction modelcan be used to reduce the number of redundant parameters. If the likelihood function isspecied, the computation of the MLE is feasible in any statistical software, which iscapable of operating the maximum likelihood procedure.The RRR estimator is based on the mean equation (5), which can be computed by using

    RRR (Ahn and Reinsel, 1988) or canonical analysis (Box and Tiao, 1977). Other slopeparameters can be estimated by least squares once the cointegrating vector is estimated.We denote the RRR estimator as ~bn and other estimators as ~Un. The RRR estimator ~bn issuper-consistent, and thus its estimates can be used as the initial values for an algorithm tomaximize the likelihood function.

    3. Main results

    If the cointegration rank is known and equals r, then there exist p r full column rankmatrices a and bn satisfying P abn0 . Let a? and bn? be p p r full column rankmatrices such that a0?a 0 and bn

    0?b

    n 0. From the representation theorem by Engle andGranger (1987), the error correction model (1) has the following representation:

    Dxt CLut, 8

    xt C1Xti1

    ui FLut, 9

    wt bn0xt bn

    0FLut, 10

    where C1 bn?a0?Pn1bn?1a0?, PnL PL P1=1 L, and FL CLC1=1 L.The ECM representation holds if Ejutj2o1 and

    Pk1 kjCkjo1. Thus, xt involves

    stochastic trends and a stationary component. Because the null space of C1 is spanned bythe cointegration space, bn

    0C1 0 and C1a 0. The cointegration vector eliminates

    the stochastic trends; hence, the cointegrating relationship wt is stationary. We denoteC21 as a partitioned matrix of C1 which corresponds to x2t, hence its dimension isp r p.Dene the standardized innovations t S1=2t et. That is, jt ejt=sjt for j 1; 2; . . . ; p.

    We assume the following conditions.

    Assumption 1. (a) jti:i:d:0; 1, E6jto1, and jt has a continuous and symmetric densityfor j 1; 2; . . . ; p.(b) Ejutjm1o1 for some m142.(c)P1

    k1 k2jCkjo1, where CL

    P1k0CkL

    k and Dxt CLut.(d) ojXoj40 for j 1; 2; . . . ; p.(e) Y is compact.(f) For some m14m242, fwt1=s2jt;Dxti=s2jt;wtk1e2jtk=s4jt;Dxtk1e2jtk=s4jt; i 1;

    2; . . . ; l; j 1; 2; . . . ; p; kX1g is a zero mean, strictly stationary, and strong mixing

    B. Seo / Journal of Econometrics 137 (2007) 68111 73process with mixing coefcient ak Okc such that c4m1m2=m1 m2.

  • n1=2 w ) bn0F1Us, 13

    ARTICLE IN PRESSt1t

    where O EOt.

    3.1. Stochastic equicontinuity

    The asymptotic theory of the cointegrating vector estimator involves the tightness of theHessian matrix and the parameter restriction, which can be veried if consistency holds. AsSaikkonen (1993, 1995) has shown, the asymptotic distribution and consistency of thecointegrating vector estimator in nonstationary cointegrated models cannot be achieved bythe standard tightness condition, which has been used in the model with stationaryvariables such as Andrews (1987) and Newey (1991). The cointegrated systems involvenonstationary variables with unbounded variance. Besides, the convergence rate of thecointegrating vector estimator is different from that of short-run parameters. Thus, anappropriate tightness condition is necessary to show the distribution theory.We dene a diagonal matrix Dn diagD1n;D2n, where D1n diagn; n; . . . ; n and

    D2n diagn

    p;n

    p; . . . ;

    n

    p correspond to the parameter vectors b and y2, respectively.The gradient vector, the Hessian matrix, and the outer product of gradients can be denedas follows:

    Gny D1nXnt1

    qltyqy

    ,

    Hny D1nXn q2lty

    qyqy0D1n ,Assumptions 1(a) and 1(b) imply that fs2jt; ejtg is strictly stationary and b-mixing (orabsolutely regular) with exponential decay for j 1; 2; . . . ; p as shown by Carrasco andChen (2002) and He and Terasvirta (1999). In addition, the volatility process fs2jtg is weaklystationary from Assumption 1(b), which justies the moving average representation offs2jtg. Assumptions 1(b) and 1(c) imply that fDxt;wtg is squarely integrable. The volatilityprocesses are strictly positive from Assumption 1(d). Assumption 1(e) implies that theparameter space is bounded. The volatility parameters are bounded from Assumption 1(b).Assumption 1(f) can be veried by assuming the smooth density condition because theECM representations (8) and (10) imply that the process fwt;Dxtg is stationary and satisesthe sufcient conditions for strong mixing suggested in Chanda (1974) and Gorodetskii(1977).The multivariate invariance principle of Phillips and Durlauf (1986) implies the

    following:

    Lemma 1. Under Assumption 1,

    n1=2Xnst1

    ut ) Us BMO, 11

    n1=2x2ns ) C21Us, 12Xns

    B. Seo / Journal of Econometrics 137 (2007) 6811174t1

  • ARTICLE IN PRESSand

    Pny D1nXnt1

    qltyqy

    qltyqy0

    D1n .

    Denition 1 (Stochastic equicontinuity). Xny is stochastically equicontinuous on Yd if,for every 40 and Z40, there exists N; Z such that nXN; Z implies, for all d40,

    P supy2Yd

    jXny Xny0j4 !

    pZ,

    where Yd fy 2 Yj jDny y0jpdg and Dny n

    pb0; y020.

    Our denition is the tightness condition of Saikkonen (1993, 1995), and it is based on thenormalized parameter space to allow for difference in the convergence rates. We denoteXny Xny0 op1 if Xny is stochastically equicontinuous.Lemma 2. (1) Under Assumption 1, Lny Lny0 op1.(2) Assumption 1 implies

    limn

    Pn;y0 supy2NdLny Lny0o0

    ( ) 1,

    for every d40, where Nd fy 2 Yj jn

    p b b0j4dg.Therefore, limn Pn;y0fy^n 2 Ndg 1 and

    n

    p ^bn b0!p0, where Nd fy 2 Yj

    j np b b0jpdg.(3) If Assumption 1 holds and kutk6o1, then Hny Hny0 op1.Lemma 2(1) shows that the likelihood function is stochastically equicontinuous on the

    local neighborhood of the true parameter value. As the parameter values deviate fartherfrom the local neighborhood, the integrated regressors amplify the squared errors, whichlowers the likelihood function sharply. However, the volatility increases at the same time,which moderates the decline in the likelihood function.The MLE y^n exists because the likelihood function is continuous and the parameter

    space is compact. The consistency of the MLE b^n in Lemma 2(2) is based on the sufcientcondition for consistency, which has been used in Wu (1981) and Saikkonen (1995). Thestandard theory of consistency does not apply as the model involves the different rates ofconvergence. However, this condition holds under Assumption 1, and hence the MLE b^n isconsistent.The consistency of the short-run parameters can be based on the consistency of the long-

    run parameters. Given the convergence rate of b^n, the analysis of the ECM reduces to thatof the stationary VAR. The standard theory of consistency such as Ling and McAleer(2003) can be applied to show the consistency of the short-run parameters.Lemma 2(3) shows that the tightness condition of the Hessian matrix can be satised

    under the moment condition kutk6o1. Lee and Hansen (1994) and Lumsdaine (1996)derived the stochastic equicontinuity of the Hessian matrix in the GARCH model. Inparticular, Lee and Hansen (1994) have shown that ktk4o1 is sufcient for stochasticequicontinuity. However, this result cannot be applied to our analysis as our model allows

    B. Seo / Journal of Econometrics 137 (2007) 68111 75nonstationary regressors in the mean equation.

  • For example, Ling and McAleer (2003) developed the distribution theory for the

    ARTICLE IN PRESSvector ARMA-GARCH model under the moment condition kutk6o1. Li et al. (2001)developed the limiting distribution of the cointegrating vector estimator with ARCH(1)process under the condition kutk4o1. However, it is not easy to compare the momentconditions directly because their distribution theory is based on the nite-dimensionalconvergence.The condition of bounded moments may well be treated as the sufcient condition for

    the main results, and it may not be a crucial burden in a practical sense. However, as notedby Lumsdaine (1996), the moment condition restricts the parameter space severely, andhence the estimated parameter values in empirical studies often fail to satisfy even thefourth moment condition.As shown by Carrasco and Chen (2002), the moment condition kutk2mo1 can be

    implied by ktk2mo1 and Ejfj cj2jtjmo1 for an integer mX1 and for all j 1; 2; . . . ; p.As mentioned before, the moment condition restricts the parameter space seriously whenwe set m 2 or 3. Therefore, we assume stochastic equicontinuity of the Hessian matrixdirectly and explore the asymptotic distribution of the cointegrating vector estimatorunder the minimal restriction on the parameter space.

    Assumption 2. supy2Yd jHny Hny0j!p0.

    Lemma 3. Under Assumption 1,

    H12ny0!p0,

    where H12ny n3=2Pn

    t1 q2lty=qbqy02.

    Therefore, under Assumptions 12,

    n ^b b0 hny01gny0 op1, (14)where

    gny n1Xnt1

    qltyqb

    ,

    and

    hny n2Xnt1

    q2ltyqb qb

    0 .

    3.2. Asymptotic distribution

    Let zjt ejt=s2jt and qjt;k ejtkZjt=s2jt for kX1 and j 1; 2; . . . ; p. Because ejt, zjt,and qjt;k are Martingale difference sequences for all j 1; 2; . . . ; p and kX1,Because the likelihood function and the Hessian matrix of our model containvolatility adjustments, the asymptotic theory of the MLE depends on the heavy momentcondition. In particular, the distribution theory requires the tightness of the Hessianmatrix, which can be justied when the errors ut satisfy strong moment conditions.

    B. Seo / Journal of Econometrics 137 (2007) 6811176Assumptions 1(a) and 1(b) imply the following from the invariance principle of Phillips

  • ARTICLE IN PRESSand Durlauf (1986).

    n1=2Pns

    t1 ejt

    n1=2Pns

    t1 zjt

    n1=2Pns

    t1 qjt;k

    0BB@

    1CCA )

    EjsZjsQj;ks

    0B@

    1CA BM

    s2j 1 0

    1 1=B2j 0

    0 0 kj 1x2j;k

    0BB@

    1CCA, (15)

    where s2j Es2jt, 1=B2j E1=s2jt, x2j;k Ee2jtk=s4jt, and kj E4jt.We note that Zjs and Qj;ks are independent for each j 1; 2; . . . ; p and kX1. Also,

    Zjs is independent of Qj;ks for each j 1; 2; . . . ; p and kX1.LetQjs

    P1k1 hj;kQj;ksBMkj 1Xj, where hj;k cjfk1j and Xj

    P1k1 h

    2j;kx

    2j;k,

    which is nite because hj;k decays exponentially and x2j;k is nite for all j and k. We denote Es,

    Zs, and Qs as p-dimensional vectors of Ejs, Zjs, and Qjs, respectively. Note thatEs LUs.Dene W 1s and W 2s as follows:

    W 1sW 2s

    !

    A0Zs QsC21Us

    ! BM

    m 0

    0 C21OC021

    !,

    where m A0S1=2MS1=2A, S diags21;s22; . . . ;s2p, M is a diagonal matrix with theelement s2j =B

    2j kj 1Hj, and Hj

    P1k1 h

    2j;kEs2j e2jtk=s4jt for j 1; 2; . . . ; p.

    We can show that W 1s and W 2s are mutually independent by using the ECMrepresentation theorem and Eq. (15). We also dene n A0S1=2NS1=2A, where N is adiagonal matrix with the element s2j =B

    2j 2Hj for j 1; 2; . . . ; p. If r 1, then m Pp

    j1 A2j =s2j s2j =B2j kj 1Hj and n Pp

    j1 A2j =s2j s2j =B2j 2Hj.

    Lemma 4. Under Assumption 1,

    gny0 )Z 10

    dW 1s W 2s, 16

    hny0 ) nZ 10

    W 2sW 02sds

    , 17

    and

    pny0 ) mZ 10

    W 2sW 02sds

    , (18)

    where pny n2Pn

    t1 qlty=qbqlty=qb0.

    If jt is Gaussian for each j, then kj 3 and m n, which implies that the negativeHessian and the outer product have the same distribution. However, if the distribution ofjt is not normal for some j, then the variance of score does not coincide with that based onthe Hessian matrix, as discussed in White (1982). Thus, statistical inference on thecointegrating vector depends on the covariance estimation.To derive the asymptotic distribution of the RRR estimator ~bn, we dene W 1rs

    B. Seo / Journal of Econometrics 137 (2007) 68111 77A0S1Es BMmr, where mr A0S1A. We note that W 1rs is independent of W 2s.

  • ARTICLE IN PRESSTheorem 1. Under Assumptions 12,

    nb^n b0 )Z 10

    W 2W02

    1 Z 10

    W 2 dW01n1. (19)

    If Assumptions 1(b), (c), and (e) hold,

    n ~bn b0 )Z 10

    W 2W02

    1 Z 10

    W 2 dW01rm

    1r . (20)

    First, we note that the asymptotic distribution of the MLE is a mixture normal with avariance of n1mn1 R 10 W 2W 021. Li et al. (2001) considered the cointegrating vectorestimator in the vector autoregressive model with ARCH(1) errors. The limitingdistribution of the cointegrating vector estimator, found by Li et al. (2001), is a functionalof two correlated Brownian motions, which implies that the cointegrating vector estimatorfollows nonstandard asymptotic distribution. Theorem 1 shows that the MLE of thecointegrating vector has the mixed normal distribution, and therefore the inference on thecointegrating vector can be based on the standard theory.Second, the RRR estimator is also asymptotically distributed as mixture normal with a

    variance of m1r R 10 W 2W

    021. Because the rst-order condition and the Hessian

    matrix of the RRR estimator do not accompany the volatility adjustment, Assumptions1(b), (c), and (e) are sufcient for the limiting distribution of the RRR estimator. Thedistribution theory for the RRR estimator has been explored by Johansen (1988, 1991) andSeo (1998), where white noise errors are assumed. This paper considers conditionalheteroskedastic errors, and we nd that the asymptotic distribution of the RRR estimatoris invariant to conditional heteroskedasticity.Third, if there is no GARCH effect, then m n mr A0S1A because Hj 0 and

    s2j =B2j 1 for all j 1; 2; . . . ; p. In this case, the asymptotic distribution of the MLE is the

    same as that of the RRR estimator.Fourth, the variance of the MLE depends on the adjustment vector a, correlation matrix

    L, kurtosis kj, and the magnitude of the GARCH effect, which can be represented by Hjand s2j =B

    2j . As a approaches zero, the cointegrating relationship becomes weaker and the

    variance of the MLE increases to innity. In the same way, a weak cointegratingrelationship increases the variance of the RRR estimator.The GARCH effect magnies the unconditional variance of the error term, which leads

    to the increase in the variance of the MLE. In our model, the intercept of the volatilityequation is xed. The unconditional variance can be invariant to the GARCH effect whenthe intercept varies depending on the GARCH parameters. However, the MLE uses theinformation of conditional heteroskedasticity, which lowers the variance of the MLE. TheGARCH effect increases Hj and s2j =B

    2j for j 1; 2; . . . ; p, which generates the gain in

    relative efciency of the MLE. On the other hand, the RRR estimator does not considerthe information of conditional heteroskedasticity. Thus, the variance of the RRRestimator always increases in the GARCH effect.The relative efciency gain of the MLE compared to the RRR estimator can be

    measured by g as follows:

    g n1mn1mr, (21)

    B. Seo / Journal of Econometrics 137 (2007) 6811178where G Varnb^nVarn ~bn1 g I .

  • The efciency gain g depends on the magnitude of the GARCH effect, the fourthmoment kj , and s2j =B

    2j for j 1; 2; . . . ; p. To simplify analysis, we dene the partial

    efciency gain gj as follows:

    gj s2j =B2j kj 1Hj s2j =B2j 2Hj2

    .

    As the GARCH effect increases, Hj increases. Thus, gj decreases, and the relativeefciency of the MLE improves. If the fourth moment kj is larger than 3, gj increases andlowers the efciency gain. Thus, the efciency of the MLE can be affected by thespecication error. The efciency gain also depends on Jensens ratio s2j =B

    2j because this

    ratio increases in the GARCH effect.Fig. 1 shows the partial efciency gain of the MLE of the cointegrating vector in an

    error correction model with GARCH errors. The theoretical efciency gain is calculated asthe function of the volatility parameters cj and fj. The standardized innovations jt areassumed to follow the standard normal distribution. If the volatility parameters are notlarge, the efciency improves slowly. However, as the volatility parameters become larger,a signicant amount of efciency gain emerges. The overall efciency gain g can be largerthan the partial efciency gain gj depending on the correlation coefcient and theadjustment coefcient. Fig. 1 is based on the asymptotic theory, but the small sampleperformance will be affected by the estimation error, which increases uncertainty. As the

    ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 79Fig. 1. GARCH effect and efciency gain.

  • and

    ARTICLE IN PRESSWPn ) J 0QPJ, (24)0 0sample size increases, the estimation error decreases and the efciency gain approaches thetheoretical values.

    3.3. Statistical inference

    Suppose we want to test a set of linear restrictions based on the null hypothesis asfollows:

    H0 : Rb rq,where R is a q rp r matrix, and rq is the q-dimensional vector.The covariance matrix of the cointegrating vector estimator can be estimated by using

    the Hessian matrix and the outer product of the gradient. When the model is correctlyspecied, the negative Hessian matrix is equivalent to the outer product matrix. However,our model cannot be correctly specied, and so in that case we use the robust covarianceestimator. Thus, we may dene three t-statistics or Wald statistics according to thecovariance estimator.The Wald statistics can be dened according to the covariance estimation method as

    follows:

    Wjn nR ^b rq0RVarjn ^bR01nR ^b rq,where j I using the information matrix, j P using the inverse of the outer productmatrix, and j W using Whites robust covariance estimator.To derive the distribution of the Wald statistics, we dene a q-dimensional random

    variable J as follows:

    J Q1=2R n1 Z 10

    W 2sW 02sds 1" # Z 1

    0

    dW 1s W 2s

    ,

    where Q Rn1mn1 R 10 W 2sW 02sds1R0.The random variable J follows the standard normal distribution. We dene Qn and Qm

    as follows:

    Qn R n1 Z 10

    W 2sW 02sds 1" #

    R0,

    Qm R m1 Z 10

    W 2sW 02sds 1" #

    R0.

    Theorem 2. Under the null hypothesis H0 : Rb rq and Assumptions 12,WWn ) J 0J, 22WIn ) J 0QIJ, 23

    B. Seo / Journal of Econometrics 137 (2007) 6811180where QI Q1=2 Q1n Q1=2 and QP Q1=2 Q1m Q1=2.

  • n1=2qt

    !L0Z QN0;L0S1=2MS1=2L,

    ARTICLE IN PRESSt1

    n1Xnt1

    q2lty0qt qt0

    !p L0S1=2NS1=2L,

    n3=2Xnt1

    q2lty0qb qt0

    ) A0S1=2NS1=2LZ 10

    W 2,

    where Z Z1 and Q Q1 are the vector-valued random variables defined in Section 3.

    Dene the demeaned Brownian motion W 2s W 2s RW 2sds.

    Theorem 3. Under Assumptions 12,

    nb^n b0 )Z 1

    W 2W02

    1 Z 1W 2 dW

    01n1. (25)The Wald statistic based on the robust covariance estimator follows the chi-squareddistribution with q degrees of freedom, where q is the number of restrictions in the nullhypothesis. If we use the information matrix, the Wald statistic follows a chi-squareddistribution up to the scale effect QI . If the cointegration rank r equals 1, m and n becomescalars, and thus QI m=nIq. If the covariance is estimated by the inverse of the outerproduct of gradients, the distribution of the Wald statistic is also chi-squared up to thescale effect QP. By the same token, QP m2=n2Iq if r 1. Therefore, if the covariancematrix is estimated by the information matrix or the outer product of gradients, excesskurtosis tends to amplify the Wald statistics, which may lead to the over-rejection of thenull hypothesis. If the distribution of jt is normal, or if kj 3 for all j, these nuisanceparameters disappear since m n. Thus, the scale effect disappears given normality, ormore generally, kj 3. However, statistical inference using the robust covarianceestimator works properly even without normality.

    4. ECM with an intercept

    Suppose the nonstationary variable xt contains nonzero mean. It is natural to include anintercept in the vector error correction model as follows:

    Dxt t a1

    b

    !0xt1

    Xli1

    GiDxti ut.

    The error ut follows the multivariate GARCH process as in the model without intercept.The parameter vector is dened as y b0; t0; y020. The likelihood function, score, andHessian matrix can be dened in the same way as before. We denote y^n as the MLE and ~ynas the RRR estimator.We use the following asymptotic results:

    Lemma 5. Under Assumption 1,

    Xn qlty0 d

    B. Seo / Journal of Econometrics 137 (2007) 68111 810 0

  • ARTICLE IN PRESSIf Assumptions 1(b), (c), and (e) hold,

    n ~bn b0 )Z 10

    W 2W02

    1 Z 10

    W 2 dW01rm

    1r . (26)

    In the ECM with an intercept, the asymptotic distribution of the MLE is a mixture

    normal with a variance of n1mn1 R 10 W 2W 02 1. Also, the RRR estimator isasymptotically distributed as mixture normal with a variance of m1r

    R 10 W

    2W

    02 1.

    Thus, the cointegrating vector estimators follow the mixed normal asymptotic distribution.Besides, the efciency gain of the MLE depends on the magnitude of the GARCH effect asin the model without deterministic trends.Our results can be extended to the ECM with the deterministic trends. When the data

    generating process xt contains the deterministic trends, we consider the ECM with thecorresponding trend variables. In that case, the asymptotic distribution is based on thedetrended Brownian motions. In addition, we may use the detrended variables to reducethe number of parameters to estimate. The detrended variables remove the deterministictrends, and the asymptotic distribution of the cointegrating vector estimator is based onthe detrended Brownian motions. Therefore, the cointegrating vector estimator follows themixed normal distribution in the ECM with deterministic trends.

    5. Simulation evidence

    In this section, we examine the nite sample properties of the cointegrating vectorestimators using the Monte Carlo simulation. The experiments are based on a bivariateerror correction model as follows:

    Dxt a1

    a2

    !1

    b

    !0xt1 ut.

    We also assume et Lut and St Eete0tjFt, where

    L 1 0

    l 1

    !; St

    s21t 0

    0 s22t

    !,

    s2jt 1 cje2jt1 fjs2jt1,ejt sjtjt and jti:i:d:0; 1 for j 1; 2.

    We compare the nite sample performance of the MLE of the cointegrating vector tothat of the RRR, the fully modied (FM), and the OLS estimators. The standard errors ofthe MLE are calculated from the robust covariance estimator. The experiments are basedon a sample size of 250 and 1000 replications. The process jt is generated by the Gaussrandom number generator. The true value of b is set at 1.First, we study the efciency gain of the MLE of the cointegrating vector. Table 1 shows

    the root mean squared error (RMSE) and the mean absolute error (MAE) of thecointegrating vector estimators. When there is no conditional heteroskedasticity, the MLEis almost equivalent to the RRR estimator. As the GARCH effect increases, the RMSE

    B. Seo / Journal of Econometrics 137 (2007) 6811182and MAE of the MLE decrease while those of the RRR estimator slowly increase. For

  • ARTICLE IN PRESSexample, at a1; a2;c1;f1;c2;f2; l 1; 0; 0:95; 0; 0:95; 0; 0 the RMSE and MAE of theMLE are 75% and 60% lower than those of the RRR estimator, respectively. The RMSEof the MLE is 30% lower than that of the RRR estimator at a1; a2;c1;f1;c2;f2; l 1; 0; 0:25; 0:7; 0:25; 0:7; 0.Table 1 shows that the impact of the parameter cj on the efciency gain is larger than

    that of the parameter fj. If the parameter l is different from 0, the RMSE and the MAEdecrease. Thus, the relative efciency of the MLE improves depending on the parametersin the volatility processes compared to other estimators, which do not considerheteroskedasticity.As Table 1 shows, the FM estimator is less efcient to the MLE and the RRR estimator

    Table 1

    Efciency gain

    a1, a2, c1, f1, c2, f2, l RMSE MAE

    MLE RRR FM OLS MLE RRR FM OLS

    1; 0; 0; 0; 0; 0; 0 0.0077 0.0079 0.0147 0.0158 0.0050 0.0054 0.0108 0.01061; 0; 0:25; 0; 0:25; 0; 0 0.0069 0.0074 0.0133 0.0146 0.0047 0.0051 0.0098 0.00971; 0; 0:5; 0; 0:5; 0; 0 0.0059 0.0080 0.0138 0.0157 0.0040 0.0053 0.0101 0.01021; 0; 0:75; 0; 0:75; 0; 0 0.0050 0.0088 0.0166 0.0179 0.0033 0.0056 0.0109 0.01081; 0; 0:95; 0; 0:95; 0; 0 0.0044 0.0173 0.0250 0.0219 0.0028 0.0072 0.0133 0.01281; 0; 0:25; 0:2; 0:25; 0:2; 0 0.0074 0.0081 0.0141 0.0159 0.0049 0.0055 0.0105 0.01081; 0; 0:25; 0:45; 0:25; 0:45; 0 0.0074 0.0080 0.0141 0.0156 0.0052 0.0055 0.0104 0.01041; 0; 0:25; 0:7; 0:25; 0:7; 0 0.0066 0.0093 0.0160 0.0177 0.0044 0.0058 0.0111 0.01191; 0; 0:25; 0:7; 0:25; 0:7; 0:5 0.0057 0.0072 0.0123 0.0115 0.0036 0.0045 0.008 0.00721;0:5; 0:25; 0:7; 0:25; 0:7; 0 0.0025 0.0031 0.0057 0.0067 0.0017 0.0021 0.0042 0.00501; 0:5; 0:25; 0:7; 0:25; 0:7; 0 0.0076 0.0093 0.0256 0.0478 0.0052 0.0066 0.0146 0.0297

    B. Seo / Journal of Econometrics 137 (2007) 68111 83because it considers neither the short-run dynamics nor conditional heteroskedasticity. TheRMSE and MAE of the FM estimator increase slowly as the volatility parameters increase.Compared to the FM estimator, the OLS estimator does not treat asymptotic bias, and itsRMSE and MAE increase in the volatility parameters.Next, we examine the size performance of the t-statistics for the null hypothesis:

    H0 : b 1.

    Table 2 shows the descriptive statistics, the percentiles, and the coverage rates of thet-statistics based on the MLE, RRR, FM, and OLS estimators. The coverage rate isdened as PTou0:05 for the lower 5% size and PT4u0:95 for the upper 5% size, where Tis the t-statistic and u is the critical value. The standard errors of the MLE are based on therobust covariance matrix estimator.The descriptive statistics of the t-statistics based on the MLE are close to the properties

    of the standard normal distribution. The coverage rates are very close to the true size,and thus statistical inference on the cointegrating vector can be based on the standardtheory.Fig. 2 shows the estimated kernel density of the t-statistics of the cointegrating vector

    estimators. The estimated density based on the MLE looks very close to the standard

  • ARTICLE IN PRESS

    Table 2

    Size performance of the t-statistics

    Descriptive statistics Percentiles Coverage rate

    Mean S.D. Skewness Kurtosis 5 50 95 0.05 0.95

    a 1 00, c1 0, f1 0, c2 0, f2 0, l 0MLE 0.0484 0.9609 0.0493 3.3621 1.7115 0.0130 1.5387 0.0550 0.0390RRR 0.0616 1.0225 0.0154 2.8984 1.7760 0.0526 1.6079 0.0650 0.0460FM 0.0552 1.0791 0.1021 3.0048 1.7274 0.0436 1.7642 0.0610 0.0550OLS 0.9415 0.8468 0.0786 3.0227 2.3296 0.9376 0.4512 0.2050 0.0000a 1 00, c1 0:25, f1 0, c2 0:25, f2 0, l 0MLE 0.0046 0.9958 0.0746 3.2712 1.6091 0.0390 1.6593 0.0480 0.0530RRR 0.0106 0.9843 0.0114 3.1257 1.6140 0.0274 1.6284 0.0470 0.0500FM 0.0706 1.0138 0.0751 3.1097 1.7647 0.0875 1.6087 0.0600 0.0450OLS 0.8922 0.8530 0.1537 2.8878 2.3297 0.8723 0.4928 0.1890 0.0000a 1 00, c1 0:5, f1 0, c2 0:5, f2 0, l 0MLE 0.0705 0.9754 0.0058 3.2890 1.6957 0.0758 1.5011 0.0540 0.0410RRR 0.0307 0.9609 0.0357 3.2549 1.5389 0.0387 1.5201 0.0460 0.0350FM 0.1096 1.0009 0.0255 2.9200 1.7616 0.1311 1.5035 0.0620 0.0330OLS 0.9032 0.8502 0.1717 3.0220 2.4042 0.8693 0.4231 0.1910 0.0020a 1 00, c1 0:75, f1 0, c2 0:75, f2 0, l 0MLE 0.0417 1.0145 0.1359 3.2862 1.7106 0.0712 1.6367 0.0590 0.0500RRR 0.0514 1.0088 0.1007 3.0358 1.6146 0.0590 1.6125 0.0450 0.0490FM 0.0915 1.0160 0.2077 3.8452 1.7403 0.0772 1.5578 0.0620 0.0420OLS 0.8917 0.8879 0.2214 3.4199 2.3956 0.8632 0.5092 0.1950 0.0030a 1 00, c1 0:95, f1 0, c2 0:95, f2 0, l 0MLE 0.0012 1.0686 0.0244 3.1862 1.7797 0.0422 1.7125 0.0630 0.0570RRR 0.0351 1.0649 0.0283 2.8242 1.7388 0.0268 1.7888 0.0620 0.0640FM 0.1201 1.1283 1.7399 21.8168 1.9119 0.0813 1.6030 0.0660 0.0460OLS 0.9188 0.9947 0.1727 5.6338 2.4226 0.9408 0.7530 0.2170 0.0120a 1 00, c1 0:25, f1 0:2, c2 0:25, f2 0:2, l 0MLE 0.0906 1.0056 0.0208 3.3390 1.8587 0.0429 1.6206 0.0650 0.0490RRR 0.0808 1.0342 0.0058 2.9102 1.8149 0.0878 1.6038 0.0670 0.0460FM 0.0792 1.0507 0.0326 2.9888 1.7867 0.0875 1.6885 0.0630 0.0520OLS 0.9417 0.8527 0.0207 2.7293 2.3466 0.9436 0.4890 0.2160 0.0000a 1 00, c1 0:25, f1 0:45, c2 0:25, f2 0:45, l 0MLE 0.0259 1.0294 0.0217 2.8997 1.6937 0.0458 1.7371 0.0540 0.0590RRR 0.0220 1.0079 0.0116 2.9878 1.6670 0.0185 1.7099 0.0530 0.0590FM 0.0768 1.0339 0.0238 2.8758 1.7891 0.0739 1.6406 0.0650 0.0500OLS 0.8899 0.8357 0.0080 3.0056 2.2193 0.8928 0.4802 0.1910 0.0020a 1 00, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0340 1.0471 0.0032 3.1390 1.7393 0.0372 1.6738 0.0630 0.0540RRR 0.0521 1.0511 0.1534 3.3659 1.7458 0.0793 1.6762 0.0610 0.0540FM 0.1839 1.0140 0.0089 3.2349 1.8452 0.1695 1.5007 0.0730 0.0370OLS 0.9524 0.9199 0.1455 3.2786 2.4333 0.9934 0.5298 0.2150 0.0060a 1 0:50, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0366 1.0451 0.0570 3.0441 1.6961 0.0280 1.7383 0.0570 0.0590RRR 0.0048 1.0213 0.0492 3.0431 1.6389 0.0060 1.6497 0.0490 0.0520FM 0.0077 1.0662 0.0690 3.1926 1.6852 0.0442 1.7712 0.0550 0.0610OLS 0.8649 1.2365 0.5086 3.6378 3.0223 0.7568 0.8945 0.2330 0.0150

    B. Seo / Journal of Econometrics 137 (2007) 6811184

  • ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 85Table 2 (continued )

    Descriptive statistics Percentiles Coverage rate

    Mean S.D. Skewness Kurtosis 5 50 95 0.05 0.95

    a 1 0:50, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0435 1.0626 0.0083 3.0819 1.7816 0.0561 1.8055 0.0660 0.0660RRR 0.0110 1.0519 0.0424 2.7972 1.7365 0.0084 1.7046 0.0610 0.0570FM 0.2559 1.0325 0.2624 4.6975 2.0407 0.2643 1.2801 0.0810 0.0300normal distribution for most values of the GARCH parameters. Also, the t-statistics basedon the RRR and FM estimators can be closely approximated by the normal distribution.However, as Fig. 2 shows, the OLS estimator reveals a large amount of size distortion andasymmetry.Next, we investigate the small sample properties on the power of the t-statistics by using

    the local alternative hypothesis:

    Hn : bn 1d

    n.

    OLS 1.4933 1.0235 0.7144 3.9089 3.3739 1.3836 0.0030 0.3950 0.0000

    Fig. 2. Kernel density estimation.

  • If d 0, then the null hypothesis holds. As the local alternative parameter d varies, thenull hypothesis is no longer valid, and the t-statistics tend to reject the null hypothesis.Table 3 shows the frequency of rejecting the null hypothesis at d 1, 2, 3, 4, 5. At the localalternative d 3, the MLE rejects 65%, and the RRR estimator rejects 66% of the nullhypothesis at the 5% size if there is no conditional heteroskedasticity. Ata1; a2;c1;f1;c2;f2; l 1; 0; 0:95; 0; 0:95; 0; 0 and d 3, the MLE rejects 92%, andthe RRR estimator rejects 65% of the null hypothesis at the 5% size. Thus, the relativepower of the t-statistic based on the MLE improves as the volatility parameters increase.On the other hand, the power of the t-statistic based on the RRR or the FM estimator isinvariant to conditional heteroskedasticity.As Table 3 shows, the impact of the parameter cj on the power is greater than that of the

    parameter fj. The moving average representation of a GARCH(1,1) process has theexponentially decaying coefcient of fj multiplied by the parameter cj at each lag of ejt.Besides, the power of the tests based on the MLE improves as the correlation parameter lis different from 0.

    ARTICLE IN PRESS

    Table 3

    Power of the t-statistics

    a1, a2, c1, f1, c2, f2, l d

    1 2 3 4 5

    1; 0; 0; 0; 0; 0; 0 MLE 0.2050 0.4700 0.6450 0.7790 0.8520RRR 0.2140 0.4920 0.6610 0.7910 0.8540

    FM 0.1190 0.2320 0.3340 0.4990 0.6130

    1; 0; 0:25; 0; 0:25; 0; 0 MLE 0.2660 0.5010 0.6860 0.8040 0.8850RRR 0.2360 0.4730 0.6510 0.7830 0.8670

    FM 0.1190 0.2140 0.3180 0.4540 0.6040

    1; 0; 0:5; 0; 0:5; 0; 0 MLE 0.3000 0.5830 0.7510 0.8330 0.9080RRR 0.2340 0.4740 0.6590 0.7660 0.8520

    B. Seo / Journal of Econometrics 137 (2007) 6811186FM 0.1190 0.2320 0.3690 0.4770 0.6080

    1; 0; 0:75; 0; 0:75; 0; 0 MLE 0.3790 0.7060 0.8420 0.9240 0.9520RRR 0.2390 0.4870 0.6370 0.8000 0.8360

    FM 0.1280 0.2390 0.3510 0.5070 0.5900

    1; 0; 0:95; 0; 0:95; 0; 0 MLE 0.5280 0.7810 0.9160 0.9610 0.9800RRR 0.2720 0.4970 0.6500 0.7510 0.8020

    FM 0.1400 0.2610 0.3790 0.4880 0.5450

    1; 0; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.3180 0.5460 0.7430 0.8360 0.8950RRR 0.2360 0.4210 0.6600 0.7580 0.8140

    FM 0.1210 0.2320 0.3680 0.4900 0.5800

    1; 0; 0:25; 0:7; 0:25; 0:7; 0:5 MLE 0.3970 0.6620 0.8100 0.9050 0.9520RRR 0.3090 0.5550 0.7360 0.8370 0.9200

    FM 0.1430 0.3090 0.4530 0.6170 0.7130

    1;0:5; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.6530 0.8880 0.9650 0.9910 0.9930RRR 0.5490 0.8380 0.9390 0.9860 0.9810

    FM 0.2760 0.5740 0.7760 0.9030 0.9420

    1; 0:5; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.2790 0.4940 0.6930 0.7980 0.8540RRR 0.1810 0.4010 0.5880 0.7010 0.7910

    FM 0.1310 0.2020 0.3100 0.3980 0.5310

  • ARTICLE IN PRESSMany empirical studies have shown that the conditional variances of nancial variablesreveal common persistence and volatility causality. Therefore, the efciency gain andpowerful inference on the cointegrating vector can be obtained when we use theinformation of conditional heteroskedasticity. The simulation evidence indicates thepotential gain of the information contained in the volatility process.

    6. Concluding remarks

    In this paper, we nd that the asymptotic distribution of the MLE of the cointegratingvector depends on the conditional heteroskedasticity. This fact implies that the efciencyof the MLE can be improved as the data contains conditional heteroskedasticity. Althoughthe RRR estimator and the regression-based estimators allow for conditional hetero-skedasticity, they do not consider the information coming from conditional hetero-skedasticity. As a result, the power of statistical inference on the cointegrating vectorimproves if we use the information of conditional heteroskedasticity.The conventional methods of estimating the cointegrating vector are based on the mean

    equation. Because the OLS and GLS estimators are asymptotically equivalent innonstationary cointegrated models, the volatility equation has been treated lessimportantly. However, Amemiya (1973) has shown that the MLE improves the efciencyof estimators if the heteroskedasticity depends on the parameter of the mean equation inthe linear model with stationary variables. Therefore, this paper extends Amemiyas resultto the nonstationary cointegrated model with conditionally heteroskedastic errors.As many studies have shown, the nancial variables have time-varying variances and the

    GARCH model has been widely used to estimate volatility. There exist many otherspecications which are capable of explaining conditional heteroskedasticity. Although weconsider a multivariate GARCH model with constant coefcients of correlation, our mainresults can be extended to other heteroskedastic models.Statistical inference on the cointegration space can be also affected by conditional

    heteroskedasticity. If we use information of heteroskedastic errors, the power of thecointegration test is expected to improve in the same way that the efciency gain of thecointegrating vector estimator emerges. As this topic requires more complicated analysis,we leave it to future research.

    Acknowledgments

    I would like to thank Badi Baltagi, Valentina Corradi, David Drukker, Bruce Hansen,Dennis Jansen, Qi Li, Joon Park, Peter Robinson, Pentti Saikkonen, and participants atthe 2004 North America Econometric Society Meeting and workshops at Rice Universityand Texas A&M University for useful comments and suggestions. Special thanks are owedto the co-editor and two anonymous referees, who provided detailed and extensivecomments and suggestions. The author gratefully acknowledges the research support fromSoongsil University.

    Appendix A. Mathematical proofs

    In the appendix, we denote jAj trA0A1=2, kAkm EjAjm1=m, and Yd fy 2

    B. Seo / Journal of Econometrics 137 (2007) 68111 87Y j jDny y0jpdg. For simplicity, supt sup1ptpn, and k k k k1. We denote

  • ARTICLE IN PRESSXty Xty0 op1 if, for all d40,

    supy2Yd

    jXty Xty0j!p0.

    Proof of Lemma 1. By the invariance principle of Phillips and Durlauf (1986),

    n1=2Xnst1

    ut ) Us BMO.

    (1.1) Show n1=2xns ) C1Us.We need to show

    P sups20;1

    n1=2 xns C1Xnst1

    ut

    4

    !pP sup

    s20;1n1=2jFLunsj4

    !! 0.

    Note that suptkFLutk2o1 because

    kFLutk2pX1j0

    jFjjkutk2pX1j0

    X1kj1

    jCkjkutk2pX1k1

    kjCk j kutk2o1.

    Thus, fFLutg is uniformly square integrable, which implies

    supt

    n1=2jFLutj!p0.

    (1.2) Show n1=2Pns

    t1 wt ) bn0F1Us.

    We need to show

    P sups20;1

    n1=2Xnst1

    wt bn0F1

    Xnst1

    ut

    4

    !pP sup

    s20;1n1=2jbn0F1Lunsj4

    !! 0,

    where F1L FL F1=1 L.P1k1 k

    2jCkjo1, supy2Yjyjo1, and kutk2o1 imply

    kbn0F1Lutk2p supbn

    jbnjX1j0

    X1kj2

    k j 1Ckut

    2

    p supbn

    jbnjX1k1

    k2jCkjkutk2o1: &

    Proof of Lemma 2. The error correction model (1) can be written as follows:

    Dxt aIr

    b

    !0x1t1x2t1

    ! GDXt1 ut,

    B. Seo / Journal of Econometrics 137 (2007) 6811188where G G1;G2; . . . ;Gl and DXt1 vecDxt1;Dxt2; . . . ;Dxtl.

  • ARTICLE IN PRESSLet Lj be the jth row of the correlation matrix L. The orthogonalized innovation ejt Ljut follows the GARCH(1,1) process

    s2jt oj cje2jt1 fjs2jt1.

    We use the fact that supt supy 1=s2jtyp1=ojo1 by Assumption 1(d). Also, we use

    0pfjo1 for j 1; 2; . . . ; p.First, we show suptkn1=2xtkmo1 for 1pmp6. We prove it for m 6.Since xt C1

    Pti1 ui FLut, we need to show

    supt

    n1=2C1Xti1

    ui

    6

    o1 and suptkn1=2FLutk6o1,

    where ut uty0, and FL CL C1=1 L.By using Burkholders inequality and Minkowskis inequality, kutk6o1 implies

    supt

    n1=2C1Xti1

    ui

    6

    p supt

    C1 E n1Xt

    i1u2i

    3

    0@

    1A

    1=6

    pC1 supt

    t=nkutk6pC1kutk6o1,

    suptkn1=2FLutk6pn1=2

    X1k1

    kjCkjkutk6 op1,

    where C1 1086=5

    pjC1j.

    Thus, suptkn1=2xtkmo1 for 1pmp6 by monotonicity.Also, we can show that suptkDxtkmo1 and suptkwtkmo1 for 1pmp6 because

    suptkDxtkmp sup

    tkCLutkmp

    X1k0

    jCkjkutkmo1

    and

    suptkwtkmp sup

    tkbn0FLutkmp sup

    bnjbnj

    X1k1

    kjCkjkutkmo1.

    (2.1) Ln is stochastically equicontinuous.(2.1.a) Show e2jty e2jt op1.Because

    uty ut an

    p b b00n1=2x2t1 a a0wt1 G G0DXt1,utyu0ty utu0t op1 if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.We use the following:

    ejty Ljuty

    B. Seo / Journal of Econometrics 137 (2007) 68111 89 ejt Lj Lj0ut Ljuty ut.

  • ARTICLE IN PRESSSince

    e2jty e2jt Lj Lj0utu0tLj Lj00 Ljuty ututy ut0L0j 2ejtu0tLj Lj00 2ejtuty ut0L0j 2Lj Lj0ututy ut0L0j,

    supy2Yd

    je2jty e2jtj!p0,

    for all j 1; 2; . . . ; p if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.b) Show s2jty s2jt op1.Note that

    s2jty oj

    1 fj cj

    Xt1k0

    fkj e2jtk1y

    s2jt oj

    1 fj oj01 fj0

    " # cj

    Xt1k0

    fkj e2jtk1y e2jtk1

    Xt1k0

    cjfkj cj0fkj0e2jtk1.

    We use the following:

    cjfkj cj0fkj0 cj cj0fkj cj0fkj fkj0

    fkj fkj0 fj fj0fk1j fk2j fj0 fk1j0 pfj fj0kfj

    k1,

    where fj maxffj ;fj0g.Thus, s2jty s2jt op1 for all j if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.c) Show e2jty=s2jty e2jt=s2jt op1.We use the following:

    e2jtys2jty

    e2jt

    s2jt e

    2jty e2jts2jty

    2jt

    s2jtys2jty s2jt.

    Since e2jty e2jt op1, s2jty s2jt op1, and 1=s21typ1=o1, e2jty=s2jty e2jt=s2jt op1 for all j if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.d) Show lty lty0 op1, where

    lty lty0 0:5Xpj1

    log s2jty log s2jt e2jtys2jty

    e2jt

    s2jt

    !" #,

    Since log s2jty=s2jtps2jty s2jt=s2jt, log s2jty log s2jt op1.

    B. Seo / Journal of Econometrics 137 (2007) 6811190Therefore, lty lty0 op1 if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.

  • ARTICLE IN PRESSBecause

    P supy2Yd

    jLny Lny0j4 !

    p 1n

    Xnt1

    E supy2Yd

    jlty lty0j !

    p 1supt

    E supy2Yd

    jlty lty0j !

    ,

    Assumption 1 implies that the likelihood function Lny is stochastically equicontinuous.(2.2) Show the consistency of b^n.We dene Nd fy 2 Yj j

    n

    p b b0jpdg and Nd fy 2 Yj jn

    p b b0j4dg. We claimthat, for every d40,

    limn

    Pn;y0fLny04 supy2NdLnyg 1.

    To prove the claim, we use the following:

    Lny0 Lny

    12n

    Xpj1

    Xnt1

    log s2jty log s2jt e2jtys2jty

    e2jt

    s2jt

    " #

    12n

    Xpj1

    Xnt1

    logs2jtys2jt

    s2jty s2jts2jty

    e2jty e2jts2jty

    e2jt s2jts2jty s2jt

    s2jts2jty

    " #

    12n

    Xpj1

    Xnt1

    s2jts2jty

    log s2jt

    s2jty 1 e

    2jty e2jts2jty

    e2jt s2jts2jty s2jt

    s2jts2jty

    " #.

    Note that

    e2jty Ljutyu0tyL0j

    s2jty oj

    1 fj cj

    Xt1k0

    fkj e2jtk1y,

    where uty ut ab b00x2t1 a a0wt1 G G0DXt1.We use the fact that x2t Op

    n

    p , supt n1=2jDxtj!p0, and supt n

    1=2jwtj!p0 since Dxt

    and wt are uniformly square integrable. Thus, uty Opn

    p , e2jty Opn, and1=s2jty Opn1 if y 2 Nd.First, s2jt=s

    2jty log s2jt=s2jty 1X0 for all s2jt=s2jty40. Thus, limn Pn;y0fK1jnyX0g

    1 for all j and y 2 Y, where K1jny 1=nPn

    t1 s2jt=s2jty log s2jt=s2jty 1. If y 2 Nd,

    B. Seo / Journal of Econometrics 137 (2007) 68111 91s2jt=s2jty ! 0 as n!1, which implies limn Pn;y0fK1jny40g 1.

  • ARTICLE IN PRESSSecond, show limn Pn;y0fK2jnyX0g 1 for all j and y 2 Nd, where K2jny 1=nPn

    t1e2jty e2jt=s2jty.We use the following:

    n1Xnt1

    e2jty n1Xnt1

    e2jt Ljn1Xnt1

    uty ututy ut0L0j

    Lj Lj0n1Xnt1

    utu0tLj Lj00

    2n1Xnt1

    ejtu0tLj Lj00 2Ljn1

    Xnt1

    ututy ut0L0j.

    Since n1Pn

    t1 x2t1u0t Op1, n1

    Pnt1 wt1u

    0t op1, and n1

    Pnt1 DXt1u

    0t op1,

    n1Xnt1

    e2jty e2jt Ljn1Xnt1

    uty ututy ut0L0j opn.

    As Lemma 3 shows, n1Pn

    t1 x2t1X0t1 Op1 and n1

    Pnt1Xt1X

    0t1 Op1, where

    Xt1 w0t1;DX 0t10.

    n1Xnt1

    uty ututy ut0 ab b00n1Xnt1

    x2t1x02t1b b0a0 opn.

    Because x2t1x02t1=s2jty Op1 for all y 2 Nd,

    n1Xnt1

    e2jty e2jts2jty

    Ljab b00n1Xnt1

    x2t1x02t1s2jty

    b b0a0L0j op1

    !p Ljab b00M22yb b0a0L0j,

    where M22y plim n1Pn

    t1 x2t1x02t1=s

    2jty.

    Thus, limn Pn;y0fK2jnyX0g 1 and for all j and y 2 Nd. If aa0 and M22y40 for ally 2 Nd, then limn Pn;y0fK2jny40g 1.Third, K3jny 1=n

    Pnt1 e2jt s2jts2jty s2jt=s2jts2jty 1=n

    Pnt1 2jt 11 s2jt=

    s2jty!p0 for all j and y 2 Y because E2jt 11 s2jt=s2jtyjFt1 0 and k2jt 1

    1 s2jt=s2jtykm=2pkjtk2m 11 kejtk2m=ojo1 for some m42.Thus, limn Pn;y0fLny04supy2NdLnyg 1. The claim implies that if Lny^nXLny0,

    it must be that y^n 2 Nd.Therefore,

    limn

    Pn;y0fy^n 2 NdgX limn

    Pn;y0fLny^nXLny0g 1.

    B. Seo / Journal of Econometrics 137 (2007) 6811192Next, we show that Hny is stochastically equicontinuous.

  • ARTICLE IN PRESS(2.3) Show Hnbby Hnbby0 op1, where Hnbby n1Pn

    t1 n1q2lty=qb qb0

    and

    q2ltyqb qb

    0

    Xpj1

    A0jAj x2t1x02t1s2jty

    2 hjLx2t2ejt1yhjLx02t2ejt1y

    s4jty1 2Zjty

    "

    2 hjLx2t2ejt1yx02t1ejty

    s4jty 2 x2t1ejtyhjLx

    02t2ejt1y

    s4jty

    hjLx2t2x02t2Zjty

    s2jty

    #.

    (2.3.a) Show A0jAj n1x2t1x02t1=s2jty A0j0Aj0 n1x2t1x02t1=s2jt op1.Because

    A0jAj n1x2t1x02t1

    s2jty

    ! A0j0Aj0

    n1x2t1x02t1s2jt

    !

    A0jAj n1x2t1x02t1

    s2jts2jty

    s2jty s2jt !" #

    A0jAj A0j0Aj0 n1x2t1x02t1

    s2jt

    " #

    and A0jAj A0j0Aj0 Aj Aj00Aj A0j0Aj Aj0, we get A0jAj n1x2t1x02t1=s2jty A0j0Aj0 n1x2t1x02t1=s2jt op1 if kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1.(2.3.b) Show n1hjLx2t2ejt1yhjLx02t2ejt1y=s4jty n1hj0Lx2t2ejt1

    hj0Lx02t2ejt1=s4jt op1, where

    hjLx2t2e1t1yhjLx02t2ejt1y c2jXt1k1

    f2kj x2tk1x02tk1e

    2jtky

    c2jXkal

    fkj fljx2tk1x

    02tl1ejtkyejtly.

    If kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1, thenx2tk1x02tk1e

    2jtky

    s4jtyx2tk1x02tk1e

    2jtk

    s41j op1

    because

    x2tk1x02tk1e2jtky

    s4jty

    x2tk1x02tk1e2jtk

    s4jtx2tk1x02tk1e2jtky e2jtk

    s4jty

    x2tk1x02tk1e

    2jtk

    s2s2 y1

    s2 y 1

    s2

    !s2jty s2jt.

    B. Seo / Journal of Econometrics 137 (2007) 68111 93jt jt jt jt

  • ARTICLE IN PRESSAs a result, if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1, then

    n1hjLx2t2ejt1yhjLx02t2ejt1y

    s4jty

    n1 hj0Lx2t2ejt1hj0Lx02t2ejt1

    s4jt op1.

    (2.3.c) Show

    n1hjLx2t2ejt1yhjLx02t2ejt1y

    s4jtyZjty

    n1 hj0Lx2t2ejt1hj0Lx02t2ejt1

    s4jtZjt op1,

    where Zjt e2jt=s2jt 1.Because

    x2tk1x02tk1e2jtkye2jty

    s6jty

    x2tk1x02tk1e

    2jtke

    2jt

    s6jt x2tk1x

    02tk1e2jtkye2jty e2jtke2jt

    s6jty

    x2tk1x02tk1e

    2jtk

    2jt

    s2jty1

    s4jty 1s2jty

    1

    s2jt 1s4jt

    !s2jty s2jt,

    we can get the desired results if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1.(2.3.d) Show n1A0jAj hjLx2t2ejt1yx02t1ejty=s4jty n1A0j0Aj0 hj0 Lx2t2

    ejt1x02t1ejt=s4jt op1.If kn1=2x2tk5o1, kwtk5o1, and kDxtk5o1, then

    n1x2tk1ejtkyx02t1ejty

    s4jty n1 x2tk1ejtkx

    02t1ejt

    s4jt op1

    because

    x2tk1ejtkyx02t1ejtys4jty

    x2tk1ejtkx02t1ejt

    s4jt

    x2tk1x02t1ejtkyejty ejtkejt

    s4t

    x2tk1x02t1ejtkjt

    sjts2jty1

    s2jty 1s2jt

    !s2jty s2jt

    for all kX1.(2.3.e) In the same way, kn1=2x2tk4o1 and kwtk4o1 imply that

    n1hjLx2t2x02t2Zjty

    2 n1 hj0Lx2t2x

    02t2Zjt

    2 op1.

    B. Seo / Journal of Econometrics 137 (2007) 6811194sjty sjt

  • ARTICLE IN PRESSTherefore, Assumption 1 and kutk6o1 imply that n1q2lty=qbqb0 n1q2lty0=qbqb

    0 op1.(2.4) Show Hnaay Hnaay0 op1, where a veca, Hnaay n1

    Pnt1 q

    2lty=qa qa0 and

    q2ltyqa qa0

    Xpj1

    L0jLj wt1bw0t1b

    s2jty 2 hjLwt2bejt1yhjLw

    0t2bejt1y

    s4jty

    "

    1 2Zjty 2hjLwt2bejt1yw0t1bejty

    s4jty

    2 wt1bejtyhjLw0t2bejt1y

    s4jty hjLwt2bw

    0t2bZjty

    s2jty

    #,

    where hjLwt2bejt1y cjPt1

    k0 fkj wtk2bejtk1y.

    Note that wtbw0tb wtw0t op1 if kn1=2x2tk2o1 and kwtk2o1 becausewtbw0tb wtw0t

    n

    p b b00n1x2tx02tn

    p b b0 np b b00n1=2x2tw0t wtn1=2x02t np b b0.

    (2.4.a) Show wt1bw0t1b=s21ty wt1w0t1=s2jt op1.If kn1=2x2tk4o1 and kwtk4o1, then wt1bw0tb=s21ty wt1w0t1=s2jt op1

    because

    wt1bw0tbs21ty

    wt1w0t1

    s2jt wt1bw

    0tb wt1w0t1s2jty

    wt1w0t1

    s2jts2jty

    s2jty s2jt.

    In the same way as (2.3.b)(2.3.e), we can show that Assumption 1 and kutk6o1 implyq2ltyqa qa0

    q2lty0qa qa0

    op1.

    (2.5) Show Hnggy Hnggy0 op1, where Hnggy n1Pn

    t1 q2lty=qg qg0.

    Because q2lty=qgi qgj 0 for iaj and i; j 1; 2; . . . ; p, we consider the following:q2ltyqo2j

    121 fj2s4jty

    1 2Zjty

    q2ltyqc2j

    Pt1k0 fkj e2jtk1y2

    s4jty1 2Zjty

    q2ltyqf2j

    oj=1 fj2 cj

    Pt1k1 kf

    k1j e

    2jtk1y2

    2s4jty1 2Zjty

    oj=1 fj3

    s2jtyZjty.

    The proof for stochastic equicontinuity in the GARCH model has been provided in Lee

    B. Seo / Journal of Econometrics 137 (2007) 68111 95and Hansen (1994) and Lumsdaine (1996), where the mean equation does not contain

  • ARTICLE IN PRESSregressors. However, in the same way as (2.3), we can show that

    q2ltyqg qg0

    q2lty0qg qg0

    op1

    if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1.(2.6) Show Hnlly Hnlly0 op1, where Hnlly n1

    Pnt1 q

    2lty=ql ql0.Because qeity=qlji uity1 j4i,

    q2ltyql2ji

    u2itys2jty

    2 c2j

    Pt1k0 f

    kj uitk1yejtk1ys4jty

    1 2Zjty

    4 cjuityejtyPt1

    k0 fkj uitk1yejtk1y

    s4jty

    cjP1k0fkj u2itk1y

    s2jtyZjty.

    (2.6.a) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then u2ity=s2jty u2it=s2jt op1because

    u2itys2jty

    u2it

    s2jt u

    2ity u2its2jty

    u2it

    s2jts2jty

    s2jty s2jt.

    (2.6.b) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then

    c2jPt1

    k0 fkj uitk1yejtk1ys4jty

    c2j0

    Pt1k0 f

    kj0uitk1ejtk1s4jt

    op1,

    because

    uitk1yejtk1ys4jty

    uitk1ejtk1s4jt

    uitk1yejtk1y uitk1ejtk1s4jty

    uitk1ejtk1s2jts

    2jty

    1

    s2jty 1s2jt

    !s2jty s2jt.

    (2.6.c) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then

    c2jPt1

    k0 fkj uitk1yejtk1ys4jty

    Zjty c2j0Pt1

    k0 fkj0uitk1ejtk1s4jt

    Zjt op1.

    The cross derivatives entail similar variables, and the same method can be applied to theproof. Therefore, if Assumption 1 holds and kutk6o1, then

    Hny Hny0 op1: &P

    B. Seo / Journal of Econometrics 137 (2007) 6811196Proof of Lemma 3. Show n3=2 nt1 q2lty0=qb qy02 op1.

  • ARTICLE IN PRESS(3.1) We show n3=2Pn

    t1 q2lty0=qb qveca00 op1, where

    q2lty0qb qveca00

    Xmj1

    A0jLj x2t1w0t1

    s2jt 2 hjLx2t2ejt1hjLw

    0t2ejt1

    s4jt1 2Zjt

    "

    2 hjLx2t2ejt1w0t1ejt

    s4jt 2 x2t1ejthjLw

    0t2ejt1

    s4jt

    hjLx2t2w0t2Zjt

    s2jt

    #,

    where hjLx2t2ejt1 cjPt1

    k0fkj x2tk2ejtk1.

    First, we show that n1Pn

    t1 x2t1w0t1 Op1.

    n1Xnt1

    xtw0t n1

    Xnt1

    xt1 Dxtw0t )Z 10

    C1U dU 0F01bn K1,

    where K1 EDx0w00 Ev0w01, and vt FLut.(3.1.a) Show n3=2

    Pnt1 x2t1w

    0t1=s

    2jt op1.

    Note that supt kwt1=s2jtkmo1 for some m42 because

    kwt1=s2jtkm kbn0 FLut1=s2jtkmp1=oj sup

    bnjbnj

    X1k1

    kjCkjkutkmo1.

    Because fwt1=s2jtg is strong mixing from Assumption 1(f), we can appeal to Theorem 3.1of Hansen (1992) to show that n1

    Pnt1 x2t1w

    0t1=s

    2jt Op1. Thus, n3=2

    Pnt1 x2t1w

    0t1=

    s2jt op1.(3.1.b) Show n3=2

    Pnt1 hjLx2t2ejt1hjLw0t2ejt1=s4jt op1, where

    hjLx2t2ejt1hjLw0t2ejt1 Xt1k1

    h2j;kx2tk1w0tk1e

    2jtk

    Xkal

    hj;khj;lx2tk1w0tl1ejtkejtl .

    First, we show that

    n3=2Xnt1

    Xt1k1

    h2j;kx2tk1w0tk1e

    2jtk=s

    4jt op1.

    Lemma 4 of Lee and Hansen (1994) has shown that, for all j and kX1,

    E fkjs2jtks2jt

    Ftk1 !

    p1 a.s.

    Thus, we can show that, for some m42,

    sup kfkj wtk1e2jtk=s4jtkmp1=ojkwtk1kmkjtkk22mo1.

    B. Seo / Journal of Econometrics 137 (2007) 68111 97t

  • ARTICLE IN PRESSBecause fwtk1e2jtk=s4jtg is strong mixing, we can show that

    n1Xnt1

    hj;kx2tk1w0tk1e2jtk=s

    4jt Op1

    for all j and kX1.Because hj;k cjfkj decays exponentially, the nite-dimensional convergence implies

    n1Xnt1

    Xt1k0

    h2j;kx2tk1w0tk1e

    2jtk=s

    4jt Op1.

    Second, we show that

    n3=2Xnt1

    Xkal

    hj;khj;lx2tk1w0tl1ejtlejtk=s4jt op1,

    for all j.Without loss of generality, we set kol.

    n3=2Xnt1

    Xkal

    hj;khj;lx2tk1w0tl1ejtlejtk=s4jt

    n3=2Xnt1

    Xkal

    hj;khj;lx2tl1 Dx2tl Dx2tk1w0tl1ejtlejtk=s4jt.

    Because kfk=2j fl=2j wtl1ejtlejtk=s4jtkmp1=ojkwtl1kmkjtlkmkjtkkmo1 for somem42,

    n1Xnt1

    Xkal

    hj;khj;lx2tl1w0tl1ejtlejtk=s4jt Op1.

    Also,

    n1Xnt1

    Xkal

    hj;khj;lDx2tl Dx2tk1w0tl1ejtlejtk=s4jt Op1.

    (3.1.c) The other parts entail fwtk2ejtk1ejtl1Zjt=s4jtg, fejtk1wt1ejt=s4jtg, fwtk2Zjt=s2jtg, and fwtk2ejtk1ejt=s4jtg. These processes are Martingale difference sequences, andthus we can apply Theorem 2.1 of Hansen (1992) to get the desired results.(3.2) Show n1

    Pnt1 q

    2lty0=qb qvecG00 Op1, where

    q2lty0qb qvecG00

    Xmj1

    A0jLj x2t1DX 0t1

    s2jt 2 hjLx2t2ejt1hjLDX

    0t2ejt1

    s4jt

    "

    1 2Zjt 2hjLx2t2ejt1DX 0t1ejt

    s4jt

    2 x2t1ejthjLDX0t2ejt1

    s4 hjLx2t2DX

    0t2Zjt

    s2

    #.

    B. Seo / Journal of Econometrics 137 (2007) 6811198jt jt

  • ARTICLE IN PRESSFirst, we show that n1Pn

    t1 xtDx0ti Op1 for i 0; 1; 2; . . . ; l 1.

    n1Xnt1

    xtDx0ti n1Xnt1

    xti1 Dxti DxtDx0ti

    )Z 10

    C1U dU 0C10 K2i,

    where K2i EDx0Dx00 DxiDx00 Ev0Dx01 for i 0; 1; 2; . . . ; l 1.(3.2.a)(3.2.b) Note that supt kDxti=s2jtkmo1 for some m42 and i 1; 2; . . . ; l because

    kDxti=s2jtkm kCLuti=s2jtkmp1=ojX1k0

    jCkjkutkmo1.

    Also,

    suptkfkj Dxtk1e2jtk=s4jtkmp1=ojkDxtk1kmkjtkk22mo1.

    Thus, we get the following in the same way as (3.1.a)(3.1.b):

    n3=2Xnt1

    x2t1DX 0t1s2jt

    op1,

    n3=2Xnt1

    hjLx2t2ejt1hjLDX 0t2ejt1=s4jt op1.

    (3.2.c) The other parts entail

    DXtk2ejtk1ejtl1Zjts4jt

    ;ejtk1DXt1ejt

    s4jt;DXtk2Zjt

    s2jt;DXtk2ejtk1ejt

    s4jt

    ( ).

    These processes are Martingale difference sequences, and thus we can get the desiredresults.(3.3) Let gj oj ;cj ;fj0 for j 1; 2; . . . ;m.Show n1

    Pnt1 q

    2lty0=qb qg0j Op1, where

    q2lty0qb qg0j

    Xmj1

    A0j x2t1ejtqs2jt=qg0j

    s4jt hjLx2t2ejt1qs

    2jt=qg

    0j

    s4jt1 2Zjt

    " #

    and

    qs2jtqgj

    1

    1 fjPt1k0 f

    kj e

    2jtk1

    oj1 fj2

    cjPt1

    k1 kfk1j e

    2jtk

    0BBBBBBB@

    1CCCCCCCA.

    Lemma 4 of Lee and Hansen (1994) and Lemma 3 of Lumsdaine (1996) have shown that

    kqs2jt=qgj1=s2jtkmo1 for some 1pmp6. Furthermore, it can be shown that qs2jt=qgj1=

    B. Seo / Journal of Econometrics 137 (2007) 68111 99s2jto1 a.s. for gj oj ;cj. Thus, we show the proof for gj fj .

  • ARTICLE IN PRESS(3.3.a) Show n1Pn

    t1 x2t1qs2jt=qfjejt=s4jt Op1, where qs2jt=qfj oj=1 fj2cjPt1

    k1 kfk1j e

    2jtk.

    Note that qs2jt=qfjejt=s4jt is an MDS. Because kqs2jt=qfjejt=s4jtk2p1=o1=2j kqs2jt=qfj1=s2jtk2o1, we can show that

    n1Xnt1

    x2t1qs2jt=qfjejts4jt

    Op1.

    (3.3.b) Show n3=2Pn

    t1 hjLx2t2ejt1Pt1

    k1 kfk1j e

    2jtk=s4jt op1, where

    hjLx2t2ejt1Xt1k1

    kfk1j e2jtk

    ! cj

    Xt1k1

    kf2kj x2tk1e3jtk

    cj=fjXkal

    lfkj fljx2tk1ejtke

    2jtl .

    First, we note that vjt;k f3k=2j e3jtk=s4jt is uniformly square integrable because

    supt

    Ev2jt;kfjvjt;kjXcgp1

    ojsupt

    E6jtkfj3jtkjXcojg ! 0

    as c!1 if ktk6o1.We apply Theorem 3.1 of Hansen (1992) and get the desired result because kfk=2j decays

    exponentially.Second, we show that

    n3=2Xnt1

    Xkal

    lfkj fljx2tk1ejtke

    2jtl=s

    4jt op1.

    In the same way as (3.1.b), kfk=2j fljejtke2jtl=s4jtkmo1 for some m42 and for all kal.Because lfk=2j decays exponentially, we can get the desired result.(3.3.c) The other part entails ejtkqs2jt=qfjZjt=s4jt. As the process is an MDS, we can

    show the proof in the same way as (3.3.a).(3.4) Show n3=2

    Pnt1 q

    2lty0=qb ql0 op1, whereq2lty0qb qlji

    A0j x2t1uits2jt

    2 hjLx2t2ejt1hjLuit1ejt1s4jt

    1 2Zjt"

    4 hjLx2t2ejt1uitejts4jt

    hjLx2t2uit1Zjts2jt

    #

    for ioj.We use the following:

    qejtqlji

    uit if ioj

    0 otherwise.(3.4.a) Because uit=s2jt is an MDS and kuit=s2jtk2p1=ojkuitk2o1, we can get

    n1Xn xt1uit

    2 Op1.

    B. Seo / Journal of Econometrics 137 (2007) 68111100t1 sjt

  • ARTICLE IN PRESS(3.4.b) Show n3=2Pn

    t1 hjLx2t2ejt1hjLuit1ejt1=s4jt op1, where

    hjLx2t2ejt1hjLuit1ejt1 Xt1k1

    h2j;kx2tk1uitke2jtk

    Xkal

    hj;khj;lx2tk1ejtkuitlejtl .

    First, because kfkj uitke2jtk=s4jtkmo1 for some m42 and fkj decays exponentially,

    n3=2Xnt1

    Xt1k1

    h2j;kx2tk1uitke2jtk=s

    4jt op1.

    Next, we can show that n3=2Pn

    t1P

    kal hj;khj;lx2tk1ejtkuitlejtl=s4jt op1 in the

    same as (3.3.b).Therefore, n3=2

    Pnt1 q

    2lty0=qb qy02 op1.Lemma 2implies that y^n 2 Yd, and hence y 2 Yd, where y 2 y^n; y0.Therefore, by appealing to Proposition 3.2 of Saikkonen (1993), Hny

    Hny0 op1, andDny^n y0 Hny1Gny0

    Hny01Gny0 op1,where y 2 y^n; y0. Furthermore, block-diagonality implies that

    n ^bn b0 hny01gny0 op1,which completes the proof. &

    Proof of Lemma 4. (4.1) Show n1Pn

    t1 qlty0=qb)R 10W 2sdW 01s, where

    qlty0qb

    Xmj1

    A0j x2t1ejts2jt

    hjLx2t2ejt1Zjts2jt

    " #.

    We denote zjt ejt=s2jt and qjt;k ejtkZjt=s2jt. Note that fzjt; qjt;kg is strictly stationaryand ergodic, and an MDS.Since kzjtk2p1=ojkejtk2o1, we can apply Kurtz and Protter (1991) and Hansen (1992)

    to get

    n1Xnt1

    x2t1zjt )Z 10

    W 2 dZj.

    In the same way, kqjt;kk2p1=ojkejtk2kjtk24 1o1 for all kX1, and hence

    n1Xnt1

    x2t1qjt;k )Z 10

    W 2 dQj;k.

    We denote Fjn;k n1Pn

    t1 x2t1qjt;k and Fj;k R 10 W 2 dQj;k. Now, we want to show

    Fjn ) Fj, where Fjn Fjn;1;Fjn;2; . . . and Fj Fj;1;Fj;2; . . .. We dene a metricd1f

    P1k1 k

    rjf kj, where r42 and f 2 R1. Then, the nite-dimensional convergenceand the tightness of the probability measure Fjn in R

    1, with respect to the metric d1f ,imply the weak convergence Fjn ) Fj. The detailed proof is given in Hansen (1995,

    B. Seo / Journal of Econometrics 137 (2007) 68111 101pp. 11271128).

  • ARTICLE IN PRESSBecauseP1

    k1 hj;kf j;k is d1-continuous, we have the following result by using the CMT:

    n1Xnt1

    X1k1

    hj;kx2tk1qjt;k X1k1

    hj;kn1Xn

    t1x2t1qjt;k op1

    )X1k1

    hj;k

    Z 10

    W 2 dQj;k

    Z 10

    W 2X1k1

    hj;k dQj;k Z 10

    W 2 dQj.

    Also, show that Qjns ) Qjs, where Qjns n1=2Pns

    t1Pt1

    k1 hj;kqjt;k and Qjs P1k1 hj;kQj;k.Because

    P1kt hj;kkqjt;kk Oftj and

    Pnst1P1

    kt hj;kkqjt;kko1 for all s 2 0; 1,

    P sups20;1

    n1=2Xnst1

    X1k1

    hj;kqjt;k Xnst1

    Xt1k1

    hj;kqjt;k

    4

    !

    P sups20;1

    n1=2Xnst1

    X1kt

    hj;kqjt;k

    4

    !

    pP n1=2Xnt1

    X1kt

    hj;kjqjt;kj4 !

    p n1=2Pn

    t1P1

    kt hj;kkqjt;kk

    ! 0.

    Thus, Qjns ) Qjs. Let x2tk1 x2t1 Pk

    i1 Dx2ti for kX1. Since kPt1

    k1 hj;kPki1 Dx2tiqjt;kkm=2pPt1k1 hj;kPki1 kDx2tikmkqjt;kkmo1 for some m42,

    n1Xnt1

    Xt1k1

    hj;kx2tk1qjt;k

    n1Xnt1

    Xt1k1

    hj;kx2t1qjt;k n1Xnt1

    Xt1k1

    hj;kXki1

    Dx2ti

    !qjt;k

    n1Xnt1

    Xt1k1

    hj;kx2t1qjt;k op1.

    Thus,

    n1Xnt1

    hjLx2t2ejt1Zjts2jt

    n1Xnt1

    Xt1k1

    hj;kx2tk1qjt;k

    n1Xnt1

    Xt1k1

    hj;kx2t1qjt;k op1

    )Z 1

    W 2 dQj.

    B. Seo / Journal of Econometrics 137 (2007) 681111020

  • ARTICLE IN PRESSTherefore, we have n1Pn

    t1 qlty0=qb)R 10 W 2 dW

    01, where W 1s A0Zs Qs.

    (4.2) Show n2Pnt1 q2lty0=qb qb0 ) n R 10 W 2sW 02sds, where q

    2lty0qb qb

    0 Xmj1

    A0jAj x2t1x02t1

    s2jt 2 hjLx2t2ejt1hjLx

    02t2ejt1

    s4jt1 2Zjt

    "

    2x2t1ejthjLx02t2ejt1=s4jt 2hjLx2t2ejt1x02t1ejt=s4jt

    hjLx2t2x02t2Zjt=s2jt#.

    (4.2.a) Show n2Pn

    t1 x2t1x02t1=s

    2jt ) 1=B2j

    R 10 W 2W

    02.

    First, we show that n3=2Pn

    t1 x2t1x02t11=s2jt E1=s2jt Op1.

    Assumption 1(a) and 1(b) imply that fs2jtg is b-mixing with exponential decay, and sof1=s2jtg. Because k1=s2jtkmp1=ojo1 for somem42, we can show that n3=2

    Pnt1 x2t1x

    02t1

    1=s2jt E1=s2jt Op1. Thus,

    n2Xnt1

    x2t1x02t1=s2jt ) E1=s2jt

    Z 10

    W 2W02 1=B2j

    Z 10

    W 2W02.

    (4.2.b) Show n2Pn

    t1 hjLx2t2ejt1hjLx02t2ejt1=s4jt ) XjR 10 W 2W

    02, where

    hjLx2t2ejt1hjLx02t2ejt1 Xt1k1

    h2j;kx2tk1x02tk1e

    2jtk

    Xkal

    hj;khj;lx2tk1x02tl1ejtkejtl .

    First, we show that

    n3=2Xnt1

    hj;kx2tk1x02tk1sjt;k Op1,

    where sjt;k hj;ke2jtk=s4jt Ee2jtk=s4jt for all k.Note that sjt;k is b-mixing with exponential decay and ksjt;kkmp2=ojkjtk22mo1 for some

    m42 and for all kX1. Thus, we get n3=2Pt1

    k1Pn

    t1 h2j;kx2tk1x

    02tk1sjt;k Op1 by

    appealing to Theorem 3.1 of Hansen (1992).Also, we get

    n2Xnt1

    Xt1k1

    h2j;kx2tk1x02tk1Ee2jtk=s4jt

    n2Xnt1

    Xt1k1

    h2j;kx2t1x02t1Ee2jtk=s4jt op1

    since n1Pn

    t1 hj;kx2t1Pk

    i1 Dx02ti Op1 and n1

    Pnt1 hj;k

    Pki1 Dx2ti

    Pki1 Dx

    02ti

    B. Seo / Journal of Econometrics 137 (2007) 68111 103Op1.

  • ARTICLE IN PRESSBecause n1x2t1x02t1 Op1 andP1

    kth2j;kEe2jtk=s4jt Oftj,

    n2Xnt1

    X1kt

    h2j;kx2t1x02t1Ee2jtk=s4jt op1.

    Therefore,

    n2Xnt1

    Xt1k1

    h2j;kx2tk1x02tk1e

    2jtk=s

    4jt

    n2Xnt1

    Xt1k1

    h2j;kx2tk1x02tk1Ee2jtk=s4jt op1

    n2Xnt1

    x2t1x02t1X1k1

    h2j;kEe2jtk=s4jt op1

    ) XjZ 10

    W 2W02.

    Second, we show n2Pn

    t1 hj;khj;lx2tk1x02tl1ejtkejtl=s

    4jt op1 for all kaj.

    Without loss of generality, we set l4k.We have n3=2

    Pnt1P

    kal hj;khj;lx2tk1x02tl1ejtkejtl=s

    4jt Op1 because ejtkejtl=s4jt

    is b-mixing and kfk=2j fl=2j ejtkejtl=s4jtkmp1=ojktk2mo1 for some m42.(4.2.c) Also, n2

    Pnt1 hjLx2t2ejt1hjLx02t2ejt1Zjt=s4jt op1 because

    n3=2Xnt1

    x2tk1x02tk1e2jtkZjt=s

    4jt Op1

    and

    n3=2Xnt1

    x2tk1x02tl1ejtkejtlZjt=s4jt Op1 for all kX1 and kal.

    (4.2.d) In the same way, n3=2Pn

    t1 x2tk1x02t1ejtkejt=s

    4jt Op1 for all kX1. Hence,

    we have n2Pn

    t1 x2t1ejthjLx02t2ejt1=s4jt op1.(4.2.e) n2

    Pnt1 hjLx2t2x02t2Zjt=s2jt op1 since n3=2

    Pnt1 x2tk1x

    02tk1Zjt=s

    2jt

    Op1 for all kX1.Therefore, we have n2Pnt1 q2lty0=qb qb0 ) n R 10 W 2sW 02sds.(4.3) Show n2

    Pnt1 qlty0=qb qlty0=qb

    0 ) m R 10 W 2W 02, whereqlty0qb

    qlty0qb

    0 Xmj1

    A0jAj x2t1x02t1e2jt=s4jthjLx2t2ejt1hjLx02t2ejt1Z2jt=s4jt

    2hjLx2t2ejt1x02t1ejtZjt=s4jt.

    (4.3.a) We apply Theorem 3.1 of Hansen (1992) to show that

    n3=2Xn

    x2t1x02t1e2jt=s4jt Ee2jt=s4jt Op1.

    B. Seo / Journal of Econometrics 137 (2007) 68111104t1

  • ARTICLE IN PRESSThus,

    n2Xnt1

    x2t1x02t1e2jt=s

    4jt ) Ee2jt=s4jt

    Z 10

    W 2W02 1=B2j

    Z 10

    W 2W02.

    (4.3.b) Since EZ2jt kj 1,

    n2Xnt1

    hjLx2t2ejt1hjLx02t2ejt1Z2jt=s4jt ) kj 1XjZ 10

    W 2W02

    in the same way as (4.2.b).

    (4.3.c) Since fejtkejtZjt=s4jtg is an MDS, n3=2Pn

    t1 x2tk1x02t1ejtkejtZjt=s

    4jt Op1 for

    all kX1. Thus, n2Pn

    t1 hjLx2t2ejt1x02t1ejtZjt=s4jt op1 in the same way as (4.2.c).Therefore, n2

    Pnt1 gty0g0ty0 ) m

    R 10 W 2sW 02sds. &

    Proof of Theorem 1. By using Lemmas 14, we have

    n ^bn b0 hny01gny0 op1

    ) nZ 10

    W 2W02

    1 Z 10

    dW 1 W 2

    vecZ 10

    W 2W02

    1 Z 10

    W 2 dW01n1

    !.

    The RRR estimator ~bn is based on the following likelihood function:

    Lnb;U;L n1Xnt1

    ltb;U;L,

    where ltb;U;L 0:5Pm

    j1 log s2j 0:5

    Pmj1 e

    2jtb;U;L=s2j , etb; U;L Lutb;U, s2j

    Es2jt, and utb; U satises Eq. (5).We have the following derivatives:

    qltb0; U0;L0qb

    Xmj1

    A0j x2t1ejts2j

    ,

    q2ltb0;U0;L0qb qb

    0 Xmj1

    A0jAj x2t1x02t1

    s2j.

    By using the previous results, we can show that the RRR estimator ~bn has an asymptotic

    B. Seo / Journal of Econometrics 137 (2007) 68111 105distribution as (20). &

  • ARTICLE IN PRESSProof of Theorem 2. If we use the information matrix,

    WIn )Z

    dW 01 W 02

    n1 Z 10

    W 2W02

    1 !

    R0 R n1 Z 10

    W 2W02

    1 !R0

    " #1

    R n1 Z 10

    W 2W02

    1 ! ZdW 1 W 2

    J 0Q1=20Q1n Q1=2J.We can show other parts in the same way. &

    Proof of Lemma 5. (5.1) Show n1=2Pn

    t1 qlty0=qt!dL0Z QN0;L0S1=2MS1=2L,

    where

    qlty0qt

    Xmj1

    L0jejt

    s2jt hjLejt1Zjt

    s2jt

    " #.

    Let zjt ejt=s2jt, and qjt;k ejtkZjt=s2jt. Since fzjtg and qjt;k are Martingale differencesequences,

    n1=2Xnt1

    zjt !d ZjN 0;1

    B2j

    !,

    n1=2Xnt1

    qjt;k !dQj;kN0; kj 1x2j;k,

    for each kX1.Let Qjn n1=2

    Pnt1P1

    k1 hj;kqjt;k. Because kqjt;kk2o1 andP1

    k1 hj;kkqjt;kk2o1,Qjn!

    d P1k1 hj;kQj;k Qj.

    SinceP1

    kt hj;kkqjt;kk Oftj, we have the following result in the same way as (4.1).

    Qjn n1=2Xnt1

    Xt1k1

    hj;kqjt;k!dQj .

    Therefore, n1=2Pn

    t1 qlty0=qt!dL0Z Q.

    (5.2) Show n1Pnt1 q2lty0=qtqt0 !p L0S1=2NS1=2L, where q

    2lty0qt qt0

    Xmj1

    L0jLj1

    s2jt 2 hjLejt1

    2

    s4jt1 2Zjt 4

    hjLejt1ejts4jt

    hj1Zjts2jt

    " #.

    (5.2.a) Because f1=s2jtg is strictly stationary and ergodic, and k1=s2jtkp1=ojo1,

    n1Xn 1

    s2!p E 1

    s2

    ! 1B2.

    B. Seo / Journal of Econometrics 137 (2007) 68111106t1 jt jt j

  • ARTICLE IN PRESS(5.2.b) Show n1Pn

    t1 hjLejt12=s4jt!pXj, where

    hjLejt12 Xt1k1

    h2j;ke2jtk

    Xkal

    hj;khj;lejtkejtl .

    Because e2jtk=s4jt is strictly stationary and ergodic, for all kX1, and k

    P1k1 h

    2j;ke2jtk=

    s4jtko1,

    n1Xnt1

    X1k1

    h2j;ke2jtks4jt

    !pX1k1

    h2j;kEe2jtks4jt

    !.

    Also, n1Pn

    t1P1

    kt h2j;kEe2jtk=s4jt o1 since

    P1kt h

    2j;kEe2jtk=s4jt Oftj.

    Thus,

    n1Xnt1

    Xt1k1

    h2j;ke2jtk

    s4jt n1

    Xnt1

    X1k1

    h2j;kh2j;ke

    2jtk

    s4jt op1

    !pX1k1

    h2j;kEe2jtk=s4jt Xj.

    Because n1Pn

    t1 hj;khj;lejtkejtl=s4jt op1 for all kal, n1

    Pnt1 hjLejt12=s4jt!

    pXj.

    The other parts entail fejtkejtlZjt=s4jt; ejtkejt=s4jt; Zjt=s2jtg for k; lX1. These processes areMartingale difference sequences, and their sample moments converge to zero.

    Therefore, n1Pnt1 q2lty0=qtqt0 !p L0S1=2NS1=2L.(5.3) Show n3=2Pnt1 q2lty0=qb qt0 ) A0S1=2NS1=2L R 10 W 2, where

    q2lty0qb qt0

    Xmj1

    A0jLj x2t1s2jt

    2 hjLx2t2ejt1hjLejt1s4jt

    1 2Zjt"

    2 x2t1ejthjLejt1s4jt

    2 hjLx2t2ejt1ejts4jt

    hjLx2t2Zjts2jt

    #.

    (5.3.a) Show n3=2Pn

    t1 x2t1=s2jt ) 1=B2j

    R 10 W 2.

    Assumption 1 implies that fs2jtg is b-mixing with exponential decay, and so f1=s2jtg.Because k1=s2jtkmp1=ojo1 for some m42, we can show that n1

    Pnt1 x2t11=s2jt

    E1=s2jt Op1. Thus,

    n3=2Xnt1

    x2t1=s2jt ) E1=s2jtZ 10

    W 2 1=B2jZ 10

    W 2.

    (5.3.b) Show n3=2Pn

    t1 hjLx2t2ejt1hjLejt1=s4jt ) XjR 10 W 2, where

    hjLx2t2ejt1hjLejt1 Xt1

    h2j;kx2tk1e2jtk

    Xhj;khj;lx2tk1ejtkejtl .

    B. Seo / Journal of Econometrics 137 (2007) 68111 107k1 kal

  • ARTICLE IN PRESSFirst, we show that

    n1Xnt1

    fkj x2tk1sjt;k Op1,

    where sjt;k fkj e2jtk=s4jt Ee2jtk=s4jt for all kX1.Note that sjt;k is b-mixing with exponential decay and ksjt;kkmp2=ojkjtk22mo1 for some

    m42 and for all kX1. Also, n3=2Pn

    t1P1

    kt h2j;kx2t1Ee2jtk=s4jt op1 since n1=2x2t1

    Op1 andP1

    kt h2j;kEe2jtk=s4jt Ofkj .

    Therefore,

    n3=2Xnt1

    Xt1k1

    h2j;kx2tk1e2jtk

    s4jt n3=2

    Xnt1

    Xt1k1

    h2j;kx2tk1Ee2jtk=s4jt op1

    n3=2Xnt1

    Xt1k1

    h2j;kx2t1Ee2jtk=s4jt op1

    n3=2Xnt1

    x2t1X1k1

    h2j;kEe2jtk=s4jt op1

    ) XjZ 10

    W 2.

    Second, we show n3=2Pn

    t1 hj;khj;lx2tk1ejtkejtl=s4jt op1 for all kaj.

    Without loss of generality, we set l4k. We have

    n1Xnt1

    hj;khj;lx2tk1ejtkejtl=s4jt Op1.

    Because hj;khj;l decays exponentially, n1P

    l4k

    Pnt1 hj;khj;lx2tk1ejtkejtl=s

    4jt Op1.

    The other parts entail Martingale difference sequences, and are asymptoticallynegligible. Therefore,

    n3=2Xnt1

    q2lty0qb qt0

    ) A0S1=2NS1=2LZ 10

    W 2: &

    Proof of Theorem 3. The parameter vector of the model with an intercept can be denedas y b0; t0; y020. First, show that Hny2ty0 op1, where Hny2ty0 n1

    Pnt1 q

    2lty0=qy2 qt0.(6.1) Show Hnaty0 op1, where Hnaty0 n1

    Pnt1 q

    2lty0=qa qt0 and

    q2lty0qa qt0

    Xpj1

    L0j0Lj0 wt1s2jt

    2 hjLwt2ejt1hjLejt1s4jt

    1 2Zjt"

    2 hjLwt2ejt1ejts4jt

    2wt1ejthjLejt1s4jt

    hjLwt2Zjts2jt

    #.

    (6.1.a) Because wt1=s2jt is strictly stationary and ergodic, and kwt1=s2jtko1,P p

    B. Seo / Journal of Econometrics 137 (2007) 68111108n1 nt1 wt1=s2jt! 0.

  • ARTICLE IN PRESS(6.1.b) Show n1Pn

    t1 hjLwt2ejt1hjLejt1=s4jt op1, where

    hjLwt2ejt1hjLejt1 Xt1k1

    h2j;kwtk1e2jtk

    Xkal

    hj;khj;lwtk1ejtkejtl .

    Since supt kfkj wtk1e2jtk=s4jtkmo1 for some m42,

    n1=2Xnt1

    fkj wtk1e2jtk=s

    4jt Op1.

    Because hj;k decays exponentially, n1=2Pt1

    k1Pn

    t1 h2j;kwtk1e

    2jtk=s

    4jt Op1. In the

    same way, n1=2Pn

    t1 hj;khj;lwtk1ejtkejtl=s4jt Op1 for kal.

    Thus, n1Pn

    t1 hjLwt2ejt1hjLejt1=s4jt op1.The other parts entail

    wtk1ejtkejtlZjts4jt

    ;wtk1ejtkejt

    s4jt;wtk1Zjt

    s2jt

    ( )for k; lX1.

    These processes are MDS, and therefore Hnaty0 op1.In the same way, we can show that Hny2ty0 op1.Using the block diagonality of the Hessian matrix,

    n ^bn b0n

    p t^n t0

    0@

    1A) n

    RW 2W

    02 A

    0S1=2NS1=2L R W 2L0S1=2NS1=2A R W 02 L0S1=2NS1=2L

    0@

    1A

    1

    RdW 1 W 2L0Z Q

    !.

    Thus,

    n ^bn b0 ) V1U ,

    where

    U Z

    dW 1 W 2

    A0S1=2NS1=2LZ

    W 2

    L0S1=2NS1=2L1L0Z Q,

    V nZ

    W 2W02

    A0S1=2NS1=2L

    ZW 2

    L0S1=2NS1=2L1

    L0S1=2NS1=2AZ

    W 02

    .

    B. Seo / Journal of Econometrics 137 (2007) 68111 109

  • ARTICLE IN PRESSWe use the following:

    U ZdW 1 W 11 W 2

    ZW 2

    Z

    dW 1 W 2,

    V nZ

    W 2W02

    n

    ZW 2

    ZW 02

    n

    ZW 2W

    02 ,

    where W 1 a0L0Z Q and W 2 W 2 RW 2.

    Therefore,

    n ^bn b0 ) nZ

    W 2W02

    1 ZdW 1 W 2

    vecZ

    W 2W02

    1 ZW 2 dW

    01n

    !.

    In the same way, n ~bn b0 ) vecRW 2W

    02 1

    RW 2 dW

    01mr. &

    References

    Ahn, S.K., Reinsel, G.C., 1988. Nested reduced-rank autoregressive models for multiple time series. Journal of the

    American Statistical Association 83, 849856.

    Amemiya, T., 1973. Regression analysis when the variance of the dependent variable is proportional to the square

    of its expectation. Journal of the American Statistical Association 68, 928934.

    Andrews, D., 1987. Consistency in nonlinear econometric models: a generic uniform law of large numbers.

    Econometrica 55, 14651471.

    Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31,

    307327.

    Bollerslev, T., 1990. Modeling coherence in short-run nominal exchange rates: a multivariate generalized ARCH

    approach. Review of Economics and Statistics 72, 498505.

    Bollerslev, T., Engle, R., Wooldridge, J., 1988. A capital asset pricing model with time varying covariances.

    Journal of Political Economy 96, 116131.

    Bollerslev, T., Chou, R.Y., Kroner, K.F., 1992. ARCH modeling in nance: a review of the theory and empirical

    evidence. Journal of Econometrics 52, 559.

    Box, G.E.P., Tiao, G.C., 1977. A canonical analysis of multiple time series. Biometrika 64, 355365.

    Carrasco, M., Chen, X., 2002. Mixing and moment properties of various GARCH and stochastic volatility

    models. Econometric Theory 18, 1739.

    Chanda, K., 1974. Strong mixing properties of linear stochastic processes. Journal of Applied Probability 11,

    401408.

    Engle, R., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom

    ination. Econometrica 50, 9871008.

    Engle, R., Granger, C., 1987. Cointegration and error correction representation, estimation, and testing.

    Econometrica 55, 251276.

    Gorodetskii, V., 1977. On the strong mixing property for linear sequences. Theory of Probability and Its

    Applications 22, 411413.

    Hansen, B.E., 1992. Convergence to stochastic integrals for dependent heterogeneous processes. Econometric

    Theory 8, 489500.

    Hansen, B.E., 1995. Regression with nonstationary volatility. Econometrica 63, 11131132.

    He, C., Terasvirta, T., 1999. Properties of moments of a family of GARCH processes. Journal of Econometrics

    92, 173192.

    Johansen, S., 1988. Statistical analysis of cointegrating vectors. Journal of Economic Dynamics and Control 12,

    B. Seo / Journal of Econometrics 137 (2007) 68111110231254.

  • Johansen, S., 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive

    models. Econometrica 59, 15511580.

    Kurtz, T., Protter, P., 1991. Weak limit theorems to stochastic integrals and stochastic differential equations.

    Annals of Probability 19, 10351070.

    Lee, S.W., Hansen, B.E., 1994. Asymptotic theory for the GARCH(1,1) quasi-maximum likelihood estimator.

    Econometric Theory 10, 2952.

    Li, W.K., Ling, S., Wong, H., 2001. Estimation for partially nonstationary multivariate autoregressive models

    with conditional heteroskedasticity. Biometrika 88, 11351152.

    Ling, S., Li, W.K., 1998. Limiting distributions of maximum likelihood estimators for unstable autoregressive

    moving-average time series with general autoregressive heteroskedastic errors. Annals of Statistics 26, 84125.

    Ling, S., Li, W.K., 2003. Asymptotic inference for unit root processes with GARCH(1,1) errors. Econometric

    Theory 19, 541564.

    Ling, S., McAleer, M., 2003. Asymptotic theory for a vector ARMA-GARCH model. Econometric Theory 19,

    280310.

    Lumsdaine, R.L., 1996. Consistency and asymptotic normality of the quasi-maximum likelihood estimator in

    IGARCH (1,1) and covariance stationary GARCH (1,1) models. Econometrica 64, 575596.

    Newey, W., 1991. Uniform convergence in probability and stochastic equicontinuity. Econometrica 59,

    11611167.

    Phillips, P.C.B., 1991. Optimal inference in cointegrated system. Econometrica 59, 283306.

    Phillips, P.C.B., Durlauf, S.N., 1986. Multiple time series with integrated variables. Review of Economic Studies

    ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 11153, 473495.

    Saikkonen, P., 1993. Continuous weak convergence and stochastic equicontinuity results for integrated processes

    with an application to the estimation of a regression model. Econometric Theory 9, 155188.

    Saikkonen, P., 1995. Problems with the asymptotic theory