Upload
raul-jair-sanchez-maldonado
View
213
Download
0
Embed Size (px)
DESCRIPTION
Econometrics
Citation preview
Journal of Econometrics 137 (2007) 68111
increases. The simulation results indicate that the relative power of the t-statistics based on the MLE
has been considered important in the recent development of time series econometrics.Many statistical methods have been developed for the analysis of the cointegrated systems,
ARTICLE IN PRESS
www.elsevier.com/locate/jeconom
0304-4076/$ - see front matter r 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2006.03.008
Tel.: +82 2 820 0552; fax: +82 2 824 4384.
E-mail address: [email protected] signicantly as the GARCH effect increases.
r 2006 Elsevier B.V. All rights reserved.
JEL classification: C13; C32
Keywords: Cointegrating vector; Efciency gain; Multivariate GARCH
1. Introduction
The notion of cointegration was developed by Engle and Granger (1987), and since thenwith conditional heteroskedasticity
Byeongseon Seo
Department of Economics, Soongsil University and Texas A&M University, Seoul 156-743, Korea
Available online 12 May 2006
Abstract
This paper explores the asymptotic distribution of the cointegrating vector estimator in error
correction models with conditionally heteroskedastic errors. Asymptotic properties of the maximum
likelihood estimator (MLE) of the cointegrating vector, which estimates the cointegrating vector and
the multivariate GARCH process jointly, are provided. The MLE of the cointegrating vector follows
mixture normal, and its asymptotic distribution depends on the conditional heteroskedasticity and
the kurtosis of standardized innovations. The reduced rank regression (RRR) estimator and the
regression-based cointegrating vector estimators do not consider conditional heteroskedasticity, and
thus the efciency gain of the MLE emerges as the magnitude of conditional heteroskedasticityAsymptotic distribution of the cointegratingvector estimator in error correction models
ARTICLE IN PRESSand several methods of estimating the cointegrating vector have been proposed. Anotherdevelopment, generalized autoregressive conditional heteroskedasticity (GARCH), wasmade by Engle (1982) and Bollerslev (1986) to explain the time-varying volatility in thedata. This paper explores the asymptotic properties of the maximum likelihood estimator(MLE) of the cointegrating vector in the vector error correction model with conditionalheteroskedasticity. Because the existing estimation methods do not consider conditionalheteroskedasticity in the data, this study is useful and required.The main objective is to develop the asymptotic properties of the MLE of the
cointegrating vector, which estimates the error correction model and the multivariateGARCH process jointly. The existing estimation methods, including the reduced rankregression (RRR) and the regression-based estimators, allow for, but do not treat explicitlyconditional heteroskedasticity. Their asymptotic distributions are invariant to conditionalheteroskedasticity. However, these estimators ignore the information coming fromconditional heteroskedasticity. Many authors, including Bollerslev et al. (1992), showthat economic variables such as stock prices and exchange rates have time-varyingvariances. The clustered volatility and thick-tailed distribution are typical characteristics ofthese variables. Although there is vast literature on the cointegrating vector and GARCH,the literature on the distribution theory for the cointegrating vector estimator withconditionally heteroskedastic errors is still sparse. This paper lls this gap in the literatureby developing an asymptotic theory for the cointegrating vector estimator in errorcorrection models with conditional heteroskedasticity.In this paper, we nd that the MLE of the cointegrating vector follows mixture normal,
and its asymptotic distribution depends on the conditional heteroskedasticity and thekurtosis of standardized errors. The RRR and the regression-based cointegrating vectorestimators do not consider conditional heteroskedasticity in the data, and thus the MLEimproves efciency signicantly. Statistical inference on the cointegrating vector alsodepends on heteroskedasticity. The simulation study reveals that the efciency gain of theMLE emerges signicantly as the GARCH effect increases.The limiting distribution of the cointegrating vector estimator with heteroskedastic
errors has been explored by Li et al. (2001) and Seo (2001). Li et al. (2001) investigated thelimiting distribution of the cointegrating vector estimator in the partially nonstationaryvector autoregressive model with ARCH(1) errors. We consider the multivariate GARCHerrors, which is a natural extension considering the stylized facts of the real data. Thedistribution theory of the cointegrating vector estimator, found by Li et al. (2001), dependson two correlated Brownian motions, which implies nonstandard asymptotic distribution.In this paper, we show that the MLE of the cointegrating vector follows the mixed normaldistribution, and provide an explicit analysis of efciency gain. This study also extends Seo(2001) by allowing for multiple cointegration rank.There are other related papers by Wong and Li (1997), Ling and McAleer (2003), Ling
and Li (1998, 2003), and Seo (1999). Wong and Li (1997) and Ling and McAleer (2003)consider the vector autoregressive model with the GARCH errors, but they do notconsider nonstationarity and cointegration. Ling and Li (1998, 2003) and Seo (1999)explore the asymptotic theory for unit root tests with conditional heteroskedasticity. Here,we consider the cointegrating vector, and thus extend the former results to thenonstationary cointegrated models.We denote !p as convergence in probability, !d as convergence in distribution,
B. Seo / Journal of Econometrics 137 (2007) 68111 69respectively, and ) as weak convergence with respect to the uniform metric. BMO
represents a Brownian motion with long-run variance O. Also, is the integer operator,
ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 6811170j j is the Euclidean norm, and vec is the column-stacking operator.The paper is organized as follows. Section 2 introduces the model and the co-
integrating vector estimators. Section 3 develops the asymptotic theory for thecointegrating vector estimators. The error correction model with an intercept is analyzedin Section 4. Section 5 deals with simulation results on the properties of the cointegratingvector estimators.
2. The model
Consider a p-dimensional time series xt generated by the error correction model (ECM)as follows:
Dxt aIr
b
!0xt1
Xli1
GiDxti ut, (1)
where a is the p r adjustment vector, and b is the p r r cointegrating vector.We assume that the cointegration rank is known and equals r. Thus, if we denote Eq. (1)
as PLxt ut, then the rank of P P1 is r. We use the normalization of thecointegrating vector with respect to the rst r elements of xt. According to ournormalization, the cointegrating relationship wt is dened as follows:
wtb x1t b0x2t, (2)where x1t is r-dimensional and x2t is p r-dimensional.As dened in Engle and Granger (1987), the cointegrating relationship is stationary. Our
model is based on the normalization (2). The cointegrating vector can be identied fromthis representation. The same normalization has been used in many studies such as Phillips(1991).The error process ut is assumed to be a vector-valued Martingale difference sequence
(MDS) satisfying EutjFt1 0 and Eutu0tjFt1 Ot, whereFt is the s-eld generatedby xti for i 0; 1; 2; . . . . Thus, our model allows for the time-varying conditionalvariance, which generalizes the error condition of Engle and Granger (1987) and Johansen(1988, 1991).Many models of multivariate conditional heteroskedasticity have been developed to
explain time-varying covariance, common persistence, and volatility causality. Bollerslevet al. (1988) proposed vector GARCH and diagonal GARCH models. Each element ofcovariance follows the GARCH process, and thus we need to estimate a huge number ofparameters.1 Bollerslev (1990) proposed a multivariate GARCH model with constantconditional correlation. This model reduces the number of parameters to a manageablesize and it satises positive deniteness, and so the model has been used in many empiricalstudies.Our model is based on the constant-correlation GARCH specication, which has been
proposed by Bollerslev (1990).
Ot L1StL10, (3)
1For example, if p 3, the vector GARCH model has 78 parameters and the diagonal GARCH model has 18
parameters even though we assume a minimal lag order.
ARTICLE IN PRESSwhere L is a lower triangular matrix and St is a diagonal matrix as follows:
St
s21t 0 0 00 s22t 0 00 0 s23t 0... ..
. ... ..
. ...
0 0 0 s2pt
0BBBBBBBB@
1CCCCCCCCA
; L
1 0 0 0l21 1 0 0l31 l32 1 0... ..
. ... ..
. ...
lp1 lp2 lp3 1
0BBBBBBB@
1CCCCCCCA.
Dene et Lut, where et is an orthogonalized innovation of ut, satisfying EetjFt1 0 and Eete0tjFt1 St. We assume that each element of et follows the GARCH processas follows:
s2jt oj cje2jt1 fjs2jt1, (4)where oj40, cjX0, and fjX0 for j 1; 2; . . . ; p.We note that our model is the vector error correction model with multivariate GARCH
errors. The RRR estimator is based on the mean equation, but the MLE estimates themean and volatility equations jointly. We use the multivariate GARCH model withconstant correlation coefcient, and our analysis can be extended to other specicationssuch as the factor GARCH and the asymmetric GARCH models.If we denote Xt1b as the vector of stationary regressors and U as its coefcient matrix,
then the mean equation (1) can be written as follows:
Dxt UXt1b ut, (5)where Xt1b w0t1b;Dx0t1; . . . ;Dx0tl0 and U a;G1;G2; . . . ;Gl.We dene the parameter vector y b0; y020, where b vecb, y2 vecU0; g0; l00,
g g01; g02; . . . ; g0p0, gj oj ;cj ;fj0 for j 1; 2; . . . ; p, and l l21; l31; l32; . . . ; lpp10.Let y0 be the true parameter value. We denote ut uty0, et ety0, and St Sty0.
Dene the parameter space Y as y 2 Y Rk, where k r2p r ppl p 1=2 3.Let S ESto1 be the unconditional variance of the orthogonalized errors et, which
requires fj cjo1 for all j 1; 2; . . . ; p. Thus, the volatility process is stationary, whichimplies a moving average representation.We dene the following:
s2jt oj
1 fj cj
X1k0
fkj e2jtk1,
for j 1; 2; . . . ; p.The process s2jt follows the law of motion (4) with innite past history. However, based
on a sample of fx1;x2; . . . ; xng, the volatility process s2jt cannot be observed by aneconometrician.The volatility process (4), given the startup condition s2j0 oj=1 fj, has a moving
average representation in the form
s2jt oj
1 fj cj
Xt1k0
fkj e2jtk1,
B. Seo / Journal of Econometrics 137 (2007) 68111 71for t 1; 2; 3; . . . ; n and j 1; 2; . . . ; p.
ARTICLE IN PRESSThe distribution theory of the GARCH process has been based on the unobservedvolatility representation because the nite horizon representation is not stationary. Asdiscussed in Lee and Hansen (1994) and Lumsdaine (1996), the initial conditions areasymptotically negligible, and the distribution theory using the nite horizon representa-tion is asymptotically equivalent to that of the innite horizon representation given someregularity conditions. This paper develops the distribution theory in accordance with theasymptotic equivalence of these two volatility representations.The log-likelihood function, with the auxiliary condition that utjFt1N0;Ot, is given
by
Lny n1Xnt1
lty, (6)
where
lty 0:5 log jOtyj 0:5u0tyO1t yuty 0:5 log jStyj 0:5e0tyS1t yety
0:5Xpj1
log s2jty e2jtys2jty
!,
where s2jty and ejty satisfy Eqs. (1)(4).The MLE y^n can be dened as follows:
y^n argmaxy2YLny. (7)
We use the following derivatives:
qltyqb
Xpj1
A0j x2t1ejtys2jty
hjLx2t2ejt1yZjtys2jty
" #,
q2ltyqb qb
0
Xpj1
A0jAj x2t1x02t1s2jty
2 hjLx2t2ejt1yhjLx02t2ejt1y
s4jty1 2Zjty
"
2 hjLx2t2ejt1yx02t1ejty
s4jty 2 x2t1ejtyhjLx
02t2ejt1y
s4jty
hjLx2t2x02t2Zjty
s2jty
#,
where Zjty e2jty=s2jty 1, hjLx2t2ejt1y cjPt1
k0 fkj x2tk2ejtk1y, and Aj is
the jth row of A La for j 1; 2; . . . ; p.Because the MLE y^n maximizes the likelihood function, we get
Pnt1 qlty^n=qy 0. In
our model, the conditional variance depends on the cointegrating vector, and thus the rst-order condition accompanies the volatility adjustment. The Hessian matrix and the outer
B. Seo / Journal of Econometrics 137 (2007) 6811172product of gradients also entail the volatility adjustment.
ARTICLE IN PRESSThe likelihood function depends on a number of parameters, and redundant parametersmay lead to the singularity error. In particular, the Hessian matrix tends to be near-singular when the volatility equation is specied with redundant parameters. Thus, it isnecessary to achieve the parsimonious specication by using the associated diagnostictests. In some cases, the factor GARCH model and the conditional error correction modelcan be used to reduce the number of redundant parameters. If the likelihood function isspecied, the computation of the MLE is feasible in any statistical software, which iscapable of operating the maximum likelihood procedure.The RRR estimator is based on the mean equation (5), which can be computed by using
RRR (Ahn and Reinsel, 1988) or canonical analysis (Box and Tiao, 1977). Other slopeparameters can be estimated by least squares once the cointegrating vector is estimated.We denote the RRR estimator as ~bn and other estimators as ~Un. The RRR estimator ~bn issuper-consistent, and thus its estimates can be used as the initial values for an algorithm tomaximize the likelihood function.
3. Main results
If the cointegration rank is known and equals r, then there exist p r full column rankmatrices a and bn satisfying P abn0 . Let a? and bn? be p p r full column rankmatrices such that a0?a 0 and bn
0?b
n 0. From the representation theorem by Engle andGranger (1987), the error correction model (1) has the following representation:
Dxt CLut, 8
xt C1Xti1
ui FLut, 9
wt bn0xt bn
0FLut, 10
where C1 bn?a0?Pn1bn?1a0?, PnL PL P1=1 L, and FL CLC1=1 L.The ECM representation holds if Ejutj2o1 and
Pk1 kjCkjo1. Thus, xt involves
stochastic trends and a stationary component. Because the null space of C1 is spanned bythe cointegration space, bn
0C1 0 and C1a 0. The cointegration vector eliminates
the stochastic trends; hence, the cointegrating relationship wt is stationary. We denoteC21 as a partitioned matrix of C1 which corresponds to x2t, hence its dimension isp r p.Dene the standardized innovations t S1=2t et. That is, jt ejt=sjt for j 1; 2; . . . ; p.
We assume the following conditions.
Assumption 1. (a) jti:i:d:0; 1, E6jto1, and jt has a continuous and symmetric densityfor j 1; 2; . . . ; p.(b) Ejutjm1o1 for some m142.(c)P1
k1 k2jCkjo1, where CL
P1k0CkL
k and Dxt CLut.(d) ojXoj40 for j 1; 2; . . . ; p.(e) Y is compact.(f) For some m14m242, fwt1=s2jt;Dxti=s2jt;wtk1e2jtk=s4jt;Dxtk1e2jtk=s4jt; i 1;
2; . . . ; l; j 1; 2; . . . ; p; kX1g is a zero mean, strictly stationary, and strong mixing
B. Seo / Journal of Econometrics 137 (2007) 68111 73process with mixing coefcient ak Okc such that c4m1m2=m1 m2.
n1=2 w ) bn0F1Us, 13
ARTICLE IN PRESSt1t
where O EOt.
3.1. Stochastic equicontinuity
The asymptotic theory of the cointegrating vector estimator involves the tightness of theHessian matrix and the parameter restriction, which can be veried if consistency holds. AsSaikkonen (1993, 1995) has shown, the asymptotic distribution and consistency of thecointegrating vector estimator in nonstationary cointegrated models cannot be achieved bythe standard tightness condition, which has been used in the model with stationaryvariables such as Andrews (1987) and Newey (1991). The cointegrated systems involvenonstationary variables with unbounded variance. Besides, the convergence rate of thecointegrating vector estimator is different from that of short-run parameters. Thus, anappropriate tightness condition is necessary to show the distribution theory.We dene a diagonal matrix Dn diagD1n;D2n, where D1n diagn; n; . . . ; n and
D2n diagn
p;n
p; . . . ;
n
p correspond to the parameter vectors b and y2, respectively.The gradient vector, the Hessian matrix, and the outer product of gradients can be denedas follows:
Gny D1nXnt1
qltyqy
,
Hny D1nXn q2lty
qyqy0D1n ,Assumptions 1(a) and 1(b) imply that fs2jt; ejtg is strictly stationary and b-mixing (orabsolutely regular) with exponential decay for j 1; 2; . . . ; p as shown by Carrasco andChen (2002) and He and Terasvirta (1999). In addition, the volatility process fs2jtg is weaklystationary from Assumption 1(b), which justies the moving average representation offs2jtg. Assumptions 1(b) and 1(c) imply that fDxt;wtg is squarely integrable. The volatilityprocesses are strictly positive from Assumption 1(d). Assumption 1(e) implies that theparameter space is bounded. The volatility parameters are bounded from Assumption 1(b).Assumption 1(f) can be veried by assuming the smooth density condition because theECM representations (8) and (10) imply that the process fwt;Dxtg is stationary and satisesthe sufcient conditions for strong mixing suggested in Chanda (1974) and Gorodetskii(1977).The multivariate invariance principle of Phillips and Durlauf (1986) implies the
following:
Lemma 1. Under Assumption 1,
n1=2Xnst1
ut ) Us BMO, 11
n1=2x2ns ) C21Us, 12Xns
B. Seo / Journal of Econometrics 137 (2007) 6811174t1
ARTICLE IN PRESSand
Pny D1nXnt1
qltyqy
qltyqy0
D1n .
Denition 1 (Stochastic equicontinuity). Xny is stochastically equicontinuous on Yd if,for every 40 and Z40, there exists N; Z such that nXN; Z implies, for all d40,
P supy2Yd
jXny Xny0j4 !
pZ,
where Yd fy 2 Yj jDny y0jpdg and Dny n
pb0; y020.
Our denition is the tightness condition of Saikkonen (1993, 1995), and it is based on thenormalized parameter space to allow for difference in the convergence rates. We denoteXny Xny0 op1 if Xny is stochastically equicontinuous.Lemma 2. (1) Under Assumption 1, Lny Lny0 op1.(2) Assumption 1 implies
limn
Pn;y0 supy2NdLny Lny0o0
( ) 1,
for every d40, where Nd fy 2 Yj jn
p b b0j4dg.Therefore, limn Pn;y0fy^n 2 Ndg 1 and
n
p ^bn b0!p0, where Nd fy 2 Yj
j np b b0jpdg.(3) If Assumption 1 holds and kutk6o1, then Hny Hny0 op1.Lemma 2(1) shows that the likelihood function is stochastically equicontinuous on the
local neighborhood of the true parameter value. As the parameter values deviate fartherfrom the local neighborhood, the integrated regressors amplify the squared errors, whichlowers the likelihood function sharply. However, the volatility increases at the same time,which moderates the decline in the likelihood function.The MLE y^n exists because the likelihood function is continuous and the parameter
space is compact. The consistency of the MLE b^n in Lemma 2(2) is based on the sufcientcondition for consistency, which has been used in Wu (1981) and Saikkonen (1995). Thestandard theory of consistency does not apply as the model involves the different rates ofconvergence. However, this condition holds under Assumption 1, and hence the MLE b^n isconsistent.The consistency of the short-run parameters can be based on the consistency of the long-
run parameters. Given the convergence rate of b^n, the analysis of the ECM reduces to thatof the stationary VAR. The standard theory of consistency such as Ling and McAleer(2003) can be applied to show the consistency of the short-run parameters.Lemma 2(3) shows that the tightness condition of the Hessian matrix can be satised
under the moment condition kutk6o1. Lee and Hansen (1994) and Lumsdaine (1996)derived the stochastic equicontinuity of the Hessian matrix in the GARCH model. Inparticular, Lee and Hansen (1994) have shown that ktk4o1 is sufcient for stochasticequicontinuity. However, this result cannot be applied to our analysis as our model allows
B. Seo / Journal of Econometrics 137 (2007) 68111 75nonstationary regressors in the mean equation.
For example, Ling and McAleer (2003) developed the distribution theory for the
ARTICLE IN PRESSvector ARMA-GARCH model under the moment condition kutk6o1. Li et al. (2001)developed the limiting distribution of the cointegrating vector estimator with ARCH(1)process under the condition kutk4o1. However, it is not easy to compare the momentconditions directly because their distribution theory is based on the nite-dimensionalconvergence.The condition of bounded moments may well be treated as the sufcient condition for
the main results, and it may not be a crucial burden in a practical sense. However, as notedby Lumsdaine (1996), the moment condition restricts the parameter space severely, andhence the estimated parameter values in empirical studies often fail to satisfy even thefourth moment condition.As shown by Carrasco and Chen (2002), the moment condition kutk2mo1 can be
implied by ktk2mo1 and Ejfj cj2jtjmo1 for an integer mX1 and for all j 1; 2; . . . ; p.As mentioned before, the moment condition restricts the parameter space seriously whenwe set m 2 or 3. Therefore, we assume stochastic equicontinuity of the Hessian matrixdirectly and explore the asymptotic distribution of the cointegrating vector estimatorunder the minimal restriction on the parameter space.
Assumption 2. supy2Yd jHny Hny0j!p0.
Lemma 3. Under Assumption 1,
H12ny0!p0,
where H12ny n3=2Pn
t1 q2lty=qbqy02.
Therefore, under Assumptions 12,
n ^b b0 hny01gny0 op1, (14)where
gny n1Xnt1
qltyqb
,
and
hny n2Xnt1
q2ltyqb qb
0 .
3.2. Asymptotic distribution
Let zjt ejt=s2jt and qjt;k ejtkZjt=s2jt for kX1 and j 1; 2; . . . ; p. Because ejt, zjt,and qjt;k are Martingale difference sequences for all j 1; 2; . . . ; p and kX1,Because the likelihood function and the Hessian matrix of our model containvolatility adjustments, the asymptotic theory of the MLE depends on the heavy momentcondition. In particular, the distribution theory requires the tightness of the Hessianmatrix, which can be justied when the errors ut satisfy strong moment conditions.
B. Seo / Journal of Econometrics 137 (2007) 6811176Assumptions 1(a) and 1(b) imply the following from the invariance principle of Phillips
ARTICLE IN PRESSand Durlauf (1986).
n1=2Pns
t1 ejt
n1=2Pns
t1 zjt
n1=2Pns
t1 qjt;k
0BB@
1CCA )
EjsZjsQj;ks
0B@
1CA BM
s2j 1 0
1 1=B2j 0
0 0 kj 1x2j;k
0BB@
1CCA, (15)
where s2j Es2jt, 1=B2j E1=s2jt, x2j;k Ee2jtk=s4jt, and kj E4jt.We note that Zjs and Qj;ks are independent for each j 1; 2; . . . ; p and kX1. Also,
Zjs is independent of Qj;ks for each j 1; 2; . . . ; p and kX1.LetQjs
P1k1 hj;kQj;ksBMkj 1Xj, where hj;k cjfk1j and Xj
P1k1 h
2j;kx
2j;k,
which is nite because hj;k decays exponentially and x2j;k is nite for all j and k. We denote Es,
Zs, and Qs as p-dimensional vectors of Ejs, Zjs, and Qjs, respectively. Note thatEs LUs.Dene W 1s and W 2s as follows:
W 1sW 2s
!
A0Zs QsC21Us
! BM
m 0
0 C21OC021
!,
where m A0S1=2MS1=2A, S diags21;s22; . . . ;s2p, M is a diagonal matrix with theelement s2j =B
2j kj 1Hj, and Hj
P1k1 h
2j;kEs2j e2jtk=s4jt for j 1; 2; . . . ; p.
We can show that W 1s and W 2s are mutually independent by using the ECMrepresentation theorem and Eq. (15). We also dene n A0S1=2NS1=2A, where N is adiagonal matrix with the element s2j =B
2j 2Hj for j 1; 2; . . . ; p. If r 1, then m Pp
j1 A2j =s2j s2j =B2j kj 1Hj and n Pp
j1 A2j =s2j s2j =B2j 2Hj.
Lemma 4. Under Assumption 1,
gny0 )Z 10
dW 1s W 2s, 16
hny0 ) nZ 10
W 2sW 02sds
, 17
and
pny0 ) mZ 10
W 2sW 02sds
, (18)
where pny n2Pn
t1 qlty=qbqlty=qb0.
If jt is Gaussian for each j, then kj 3 and m n, which implies that the negativeHessian and the outer product have the same distribution. However, if the distribution ofjt is not normal for some j, then the variance of score does not coincide with that based onthe Hessian matrix, as discussed in White (1982). Thus, statistical inference on thecointegrating vector depends on the covariance estimation.To derive the asymptotic distribution of the RRR estimator ~bn, we dene W 1rs
B. Seo / Journal of Econometrics 137 (2007) 68111 77A0S1Es BMmr, where mr A0S1A. We note that W 1rs is independent of W 2s.
ARTICLE IN PRESSTheorem 1. Under Assumptions 12,
nb^n b0 )Z 10
W 2W02
1 Z 10
W 2 dW01n1. (19)
If Assumptions 1(b), (c), and (e) hold,
n ~bn b0 )Z 10
W 2W02
1 Z 10
W 2 dW01rm
1r . (20)
First, we note that the asymptotic distribution of the MLE is a mixture normal with avariance of n1mn1 R 10 W 2W 021. Li et al. (2001) considered the cointegrating vectorestimator in the vector autoregressive model with ARCH(1) errors. The limitingdistribution of the cointegrating vector estimator, found by Li et al. (2001), is a functionalof two correlated Brownian motions, which implies that the cointegrating vector estimatorfollows nonstandard asymptotic distribution. Theorem 1 shows that the MLE of thecointegrating vector has the mixed normal distribution, and therefore the inference on thecointegrating vector can be based on the standard theory.Second, the RRR estimator is also asymptotically distributed as mixture normal with a
variance of m1r R 10 W 2W
021. Because the rst-order condition and the Hessian
matrix of the RRR estimator do not accompany the volatility adjustment, Assumptions1(b), (c), and (e) are sufcient for the limiting distribution of the RRR estimator. Thedistribution theory for the RRR estimator has been explored by Johansen (1988, 1991) andSeo (1998), where white noise errors are assumed. This paper considers conditionalheteroskedastic errors, and we nd that the asymptotic distribution of the RRR estimatoris invariant to conditional heteroskedasticity.Third, if there is no GARCH effect, then m n mr A0S1A because Hj 0 and
s2j =B2j 1 for all j 1; 2; . . . ; p. In this case, the asymptotic distribution of the MLE is the
same as that of the RRR estimator.Fourth, the variance of the MLE depends on the adjustment vector a, correlation matrix
L, kurtosis kj, and the magnitude of the GARCH effect, which can be represented by Hjand s2j =B
2j . As a approaches zero, the cointegrating relationship becomes weaker and the
variance of the MLE increases to innity. In the same way, a weak cointegratingrelationship increases the variance of the RRR estimator.The GARCH effect magnies the unconditional variance of the error term, which leads
to the increase in the variance of the MLE. In our model, the intercept of the volatilityequation is xed. The unconditional variance can be invariant to the GARCH effect whenthe intercept varies depending on the GARCH parameters. However, the MLE uses theinformation of conditional heteroskedasticity, which lowers the variance of the MLE. TheGARCH effect increases Hj and s2j =B
2j for j 1; 2; . . . ; p, which generates the gain in
relative efciency of the MLE. On the other hand, the RRR estimator does not considerthe information of conditional heteroskedasticity. Thus, the variance of the RRRestimator always increases in the GARCH effect.The relative efciency gain of the MLE compared to the RRR estimator can be
measured by g as follows:
g n1mn1mr, (21)
B. Seo / Journal of Econometrics 137 (2007) 6811178where G Varnb^nVarn ~bn1 g I .
The efciency gain g depends on the magnitude of the GARCH effect, the fourthmoment kj , and s2j =B
2j for j 1; 2; . . . ; p. To simplify analysis, we dene the partial
efciency gain gj as follows:
gj s2j =B2j kj 1Hj s2j =B2j 2Hj2
.
As the GARCH effect increases, Hj increases. Thus, gj decreases, and the relativeefciency of the MLE improves. If the fourth moment kj is larger than 3, gj increases andlowers the efciency gain. Thus, the efciency of the MLE can be affected by thespecication error. The efciency gain also depends on Jensens ratio s2j =B
2j because this
ratio increases in the GARCH effect.Fig. 1 shows the partial efciency gain of the MLE of the cointegrating vector in an
error correction model with GARCH errors. The theoretical efciency gain is calculated asthe function of the volatility parameters cj and fj. The standardized innovations jt areassumed to follow the standard normal distribution. If the volatility parameters are notlarge, the efciency improves slowly. However, as the volatility parameters become larger,a signicant amount of efciency gain emerges. The overall efciency gain g can be largerthan the partial efciency gain gj depending on the correlation coefcient and theadjustment coefcient. Fig. 1 is based on the asymptotic theory, but the small sampleperformance will be affected by the estimation error, which increases uncertainty. As the
ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 79Fig. 1. GARCH effect and efciency gain.
and
ARTICLE IN PRESSWPn ) J 0QPJ, (24)0 0sample size increases, the estimation error decreases and the efciency gain approaches thetheoretical values.
3.3. Statistical inference
Suppose we want to test a set of linear restrictions based on the null hypothesis asfollows:
H0 : Rb rq,where R is a q rp r matrix, and rq is the q-dimensional vector.The covariance matrix of the cointegrating vector estimator can be estimated by using
the Hessian matrix and the outer product of the gradient. When the model is correctlyspecied, the negative Hessian matrix is equivalent to the outer product matrix. However,our model cannot be correctly specied, and so in that case we use the robust covarianceestimator. Thus, we may dene three t-statistics or Wald statistics according to thecovariance estimator.The Wald statistics can be dened according to the covariance estimation method as
follows:
Wjn nR ^b rq0RVarjn ^bR01nR ^b rq,where j I using the information matrix, j P using the inverse of the outer productmatrix, and j W using Whites robust covariance estimator.To derive the distribution of the Wald statistics, we dene a q-dimensional random
variable J as follows:
J Q1=2R n1 Z 10
W 2sW 02sds 1" # Z 1
0
dW 1s W 2s
,
where Q Rn1mn1 R 10 W 2sW 02sds1R0.The random variable J follows the standard normal distribution. We dene Qn and Qm
as follows:
Qn R n1 Z 10
W 2sW 02sds 1" #
R0,
Qm R m1 Z 10
W 2sW 02sds 1" #
R0.
Theorem 2. Under the null hypothesis H0 : Rb rq and Assumptions 12,WWn ) J 0J, 22WIn ) J 0QIJ, 23
B. Seo / Journal of Econometrics 137 (2007) 6811180where QI Q1=2 Q1n Q1=2 and QP Q1=2 Q1m Q1=2.
n1=2qt
!L0Z QN0;L0S1=2MS1=2L,
ARTICLE IN PRESSt1
n1Xnt1
q2lty0qt qt0
!p L0S1=2NS1=2L,
n3=2Xnt1
q2lty0qb qt0
) A0S1=2NS1=2LZ 10
W 2,
where Z Z1 and Q Q1 are the vector-valued random variables defined in Section 3.
Dene the demeaned Brownian motion W 2s W 2s RW 2sds.
Theorem 3. Under Assumptions 12,
nb^n b0 )Z 1
W 2W02
1 Z 1W 2 dW
01n1. (25)The Wald statistic based on the robust covariance estimator follows the chi-squareddistribution with q degrees of freedom, where q is the number of restrictions in the nullhypothesis. If we use the information matrix, the Wald statistic follows a chi-squareddistribution up to the scale effect QI . If the cointegration rank r equals 1, m and n becomescalars, and thus QI m=nIq. If the covariance is estimated by the inverse of the outerproduct of gradients, the distribution of the Wald statistic is also chi-squared up to thescale effect QP. By the same token, QP m2=n2Iq if r 1. Therefore, if the covariancematrix is estimated by the information matrix or the outer product of gradients, excesskurtosis tends to amplify the Wald statistics, which may lead to the over-rejection of thenull hypothesis. If the distribution of jt is normal, or if kj 3 for all j, these nuisanceparameters disappear since m n. Thus, the scale effect disappears given normality, ormore generally, kj 3. However, statistical inference using the robust covarianceestimator works properly even without normality.
4. ECM with an intercept
Suppose the nonstationary variable xt contains nonzero mean. It is natural to include anintercept in the vector error correction model as follows:
Dxt t a1
b
!0xt1
Xli1
GiDxti ut.
The error ut follows the multivariate GARCH process as in the model without intercept.The parameter vector is dened as y b0; t0; y020. The likelihood function, score, andHessian matrix can be dened in the same way as before. We denote y^n as the MLE and ~ynas the RRR estimator.We use the following asymptotic results:
Lemma 5. Under Assumption 1,
Xn qlty0 d
B. Seo / Journal of Econometrics 137 (2007) 68111 810 0
ARTICLE IN PRESSIf Assumptions 1(b), (c), and (e) hold,
n ~bn b0 )Z 10
W 2W02
1 Z 10
W 2 dW01rm
1r . (26)
In the ECM with an intercept, the asymptotic distribution of the MLE is a mixture
normal with a variance of n1mn1 R 10 W 2W 02 1. Also, the RRR estimator isasymptotically distributed as mixture normal with a variance of m1r
R 10 W
2W
02 1.
Thus, the cointegrating vector estimators follow the mixed normal asymptotic distribution.Besides, the efciency gain of the MLE depends on the magnitude of the GARCH effect asin the model without deterministic trends.Our results can be extended to the ECM with the deterministic trends. When the data
generating process xt contains the deterministic trends, we consider the ECM with thecorresponding trend variables. In that case, the asymptotic distribution is based on thedetrended Brownian motions. In addition, we may use the detrended variables to reducethe number of parameters to estimate. The detrended variables remove the deterministictrends, and the asymptotic distribution of the cointegrating vector estimator is based onthe detrended Brownian motions. Therefore, the cointegrating vector estimator follows themixed normal distribution in the ECM with deterministic trends.
5. Simulation evidence
In this section, we examine the nite sample properties of the cointegrating vectorestimators using the Monte Carlo simulation. The experiments are based on a bivariateerror correction model as follows:
Dxt a1
a2
!1
b
!0xt1 ut.
We also assume et Lut and St Eete0tjFt, where
L 1 0
l 1
!; St
s21t 0
0 s22t
!,
s2jt 1 cje2jt1 fjs2jt1,ejt sjtjt and jti:i:d:0; 1 for j 1; 2.
We compare the nite sample performance of the MLE of the cointegrating vector tothat of the RRR, the fully modied (FM), and the OLS estimators. The standard errors ofthe MLE are calculated from the robust covariance estimator. The experiments are basedon a sample size of 250 and 1000 replications. The process jt is generated by the Gaussrandom number generator. The true value of b is set at 1.First, we study the efciency gain of the MLE of the cointegrating vector. Table 1 shows
the root mean squared error (RMSE) and the mean absolute error (MAE) of thecointegrating vector estimators. When there is no conditional heteroskedasticity, the MLEis almost equivalent to the RRR estimator. As the GARCH effect increases, the RMSE
B. Seo / Journal of Econometrics 137 (2007) 6811182and MAE of the MLE decrease while those of the RRR estimator slowly increase. For
ARTICLE IN PRESSexample, at a1; a2;c1;f1;c2;f2; l 1; 0; 0:95; 0; 0:95; 0; 0 the RMSE and MAE of theMLE are 75% and 60% lower than those of the RRR estimator, respectively. The RMSEof the MLE is 30% lower than that of the RRR estimator at a1; a2;c1;f1;c2;f2; l 1; 0; 0:25; 0:7; 0:25; 0:7; 0.Table 1 shows that the impact of the parameter cj on the efciency gain is larger than
that of the parameter fj. If the parameter l is different from 0, the RMSE and the MAEdecrease. Thus, the relative efciency of the MLE improves depending on the parametersin the volatility processes compared to other estimators, which do not considerheteroskedasticity.As Table 1 shows, the FM estimator is less efcient to the MLE and the RRR estimator
Table 1
Efciency gain
a1, a2, c1, f1, c2, f2, l RMSE MAE
MLE RRR FM OLS MLE RRR FM OLS
1; 0; 0; 0; 0; 0; 0 0.0077 0.0079 0.0147 0.0158 0.0050 0.0054 0.0108 0.01061; 0; 0:25; 0; 0:25; 0; 0 0.0069 0.0074 0.0133 0.0146 0.0047 0.0051 0.0098 0.00971; 0; 0:5; 0; 0:5; 0; 0 0.0059 0.0080 0.0138 0.0157 0.0040 0.0053 0.0101 0.01021; 0; 0:75; 0; 0:75; 0; 0 0.0050 0.0088 0.0166 0.0179 0.0033 0.0056 0.0109 0.01081; 0; 0:95; 0; 0:95; 0; 0 0.0044 0.0173 0.0250 0.0219 0.0028 0.0072 0.0133 0.01281; 0; 0:25; 0:2; 0:25; 0:2; 0 0.0074 0.0081 0.0141 0.0159 0.0049 0.0055 0.0105 0.01081; 0; 0:25; 0:45; 0:25; 0:45; 0 0.0074 0.0080 0.0141 0.0156 0.0052 0.0055 0.0104 0.01041; 0; 0:25; 0:7; 0:25; 0:7; 0 0.0066 0.0093 0.0160 0.0177 0.0044 0.0058 0.0111 0.01191; 0; 0:25; 0:7; 0:25; 0:7; 0:5 0.0057 0.0072 0.0123 0.0115 0.0036 0.0045 0.008 0.00721;0:5; 0:25; 0:7; 0:25; 0:7; 0 0.0025 0.0031 0.0057 0.0067 0.0017 0.0021 0.0042 0.00501; 0:5; 0:25; 0:7; 0:25; 0:7; 0 0.0076 0.0093 0.0256 0.0478 0.0052 0.0066 0.0146 0.0297
B. Seo / Journal of Econometrics 137 (2007) 68111 83because it considers neither the short-run dynamics nor conditional heteroskedasticity. TheRMSE and MAE of the FM estimator increase slowly as the volatility parameters increase.Compared to the FM estimator, the OLS estimator does not treat asymptotic bias, and itsRMSE and MAE increase in the volatility parameters.Next, we examine the size performance of the t-statistics for the null hypothesis:
H0 : b 1.
Table 2 shows the descriptive statistics, the percentiles, and the coverage rates of thet-statistics based on the MLE, RRR, FM, and OLS estimators. The coverage rate isdened as PTou0:05 for the lower 5% size and PT4u0:95 for the upper 5% size, where Tis the t-statistic and u is the critical value. The standard errors of the MLE are based on therobust covariance matrix estimator.The descriptive statistics of the t-statistics based on the MLE are close to the properties
of the standard normal distribution. The coverage rates are very close to the true size,and thus statistical inference on the cointegrating vector can be based on the standardtheory.Fig. 2 shows the estimated kernel density of the t-statistics of the cointegrating vector
estimators. The estimated density based on the MLE looks very close to the standard
ARTICLE IN PRESS
Table 2
Size performance of the t-statistics
Descriptive statistics Percentiles Coverage rate
Mean S.D. Skewness Kurtosis 5 50 95 0.05 0.95
a 1 00, c1 0, f1 0, c2 0, f2 0, l 0MLE 0.0484 0.9609 0.0493 3.3621 1.7115 0.0130 1.5387 0.0550 0.0390RRR 0.0616 1.0225 0.0154 2.8984 1.7760 0.0526 1.6079 0.0650 0.0460FM 0.0552 1.0791 0.1021 3.0048 1.7274 0.0436 1.7642 0.0610 0.0550OLS 0.9415 0.8468 0.0786 3.0227 2.3296 0.9376 0.4512 0.2050 0.0000a 1 00, c1 0:25, f1 0, c2 0:25, f2 0, l 0MLE 0.0046 0.9958 0.0746 3.2712 1.6091 0.0390 1.6593 0.0480 0.0530RRR 0.0106 0.9843 0.0114 3.1257 1.6140 0.0274 1.6284 0.0470 0.0500FM 0.0706 1.0138 0.0751 3.1097 1.7647 0.0875 1.6087 0.0600 0.0450OLS 0.8922 0.8530 0.1537 2.8878 2.3297 0.8723 0.4928 0.1890 0.0000a 1 00, c1 0:5, f1 0, c2 0:5, f2 0, l 0MLE 0.0705 0.9754 0.0058 3.2890 1.6957 0.0758 1.5011 0.0540 0.0410RRR 0.0307 0.9609 0.0357 3.2549 1.5389 0.0387 1.5201 0.0460 0.0350FM 0.1096 1.0009 0.0255 2.9200 1.7616 0.1311 1.5035 0.0620 0.0330OLS 0.9032 0.8502 0.1717 3.0220 2.4042 0.8693 0.4231 0.1910 0.0020a 1 00, c1 0:75, f1 0, c2 0:75, f2 0, l 0MLE 0.0417 1.0145 0.1359 3.2862 1.7106 0.0712 1.6367 0.0590 0.0500RRR 0.0514 1.0088 0.1007 3.0358 1.6146 0.0590 1.6125 0.0450 0.0490FM 0.0915 1.0160 0.2077 3.8452 1.7403 0.0772 1.5578 0.0620 0.0420OLS 0.8917 0.8879 0.2214 3.4199 2.3956 0.8632 0.5092 0.1950 0.0030a 1 00, c1 0:95, f1 0, c2 0:95, f2 0, l 0MLE 0.0012 1.0686 0.0244 3.1862 1.7797 0.0422 1.7125 0.0630 0.0570RRR 0.0351 1.0649 0.0283 2.8242 1.7388 0.0268 1.7888 0.0620 0.0640FM 0.1201 1.1283 1.7399 21.8168 1.9119 0.0813 1.6030 0.0660 0.0460OLS 0.9188 0.9947 0.1727 5.6338 2.4226 0.9408 0.7530 0.2170 0.0120a 1 00, c1 0:25, f1 0:2, c2 0:25, f2 0:2, l 0MLE 0.0906 1.0056 0.0208 3.3390 1.8587 0.0429 1.6206 0.0650 0.0490RRR 0.0808 1.0342 0.0058 2.9102 1.8149 0.0878 1.6038 0.0670 0.0460FM 0.0792 1.0507 0.0326 2.9888 1.7867 0.0875 1.6885 0.0630 0.0520OLS 0.9417 0.8527 0.0207 2.7293 2.3466 0.9436 0.4890 0.2160 0.0000a 1 00, c1 0:25, f1 0:45, c2 0:25, f2 0:45, l 0MLE 0.0259 1.0294 0.0217 2.8997 1.6937 0.0458 1.7371 0.0540 0.0590RRR 0.0220 1.0079 0.0116 2.9878 1.6670 0.0185 1.7099 0.0530 0.0590FM 0.0768 1.0339 0.0238 2.8758 1.7891 0.0739 1.6406 0.0650 0.0500OLS 0.8899 0.8357 0.0080 3.0056 2.2193 0.8928 0.4802 0.1910 0.0020a 1 00, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0340 1.0471 0.0032 3.1390 1.7393 0.0372 1.6738 0.0630 0.0540RRR 0.0521 1.0511 0.1534 3.3659 1.7458 0.0793 1.6762 0.0610 0.0540FM 0.1839 1.0140 0.0089 3.2349 1.8452 0.1695 1.5007 0.0730 0.0370OLS 0.9524 0.9199 0.1455 3.2786 2.4333 0.9934 0.5298 0.2150 0.0060a 1 0:50, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0366 1.0451 0.0570 3.0441 1.6961 0.0280 1.7383 0.0570 0.0590RRR 0.0048 1.0213 0.0492 3.0431 1.6389 0.0060 1.6497 0.0490 0.0520FM 0.0077 1.0662 0.0690 3.1926 1.6852 0.0442 1.7712 0.0550 0.0610OLS 0.8649 1.2365 0.5086 3.6378 3.0223 0.7568 0.8945 0.2330 0.0150
B. Seo / Journal of Econometrics 137 (2007) 6811184
ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 85Table 2 (continued )
Descriptive statistics Percentiles Coverage rate
Mean S.D. Skewness Kurtosis 5 50 95 0.05 0.95
a 1 0:50, c1 0:25, f1 0:7, c2 0:25, f2 0:7, l 0MLE 0.0435 1.0626 0.0083 3.0819 1.7816 0.0561 1.8055 0.0660 0.0660RRR 0.0110 1.0519 0.0424 2.7972 1.7365 0.0084 1.7046 0.0610 0.0570FM 0.2559 1.0325 0.2624 4.6975 2.0407 0.2643 1.2801 0.0810 0.0300normal distribution for most values of the GARCH parameters. Also, the t-statistics basedon the RRR and FM estimators can be closely approximated by the normal distribution.However, as Fig. 2 shows, the OLS estimator reveals a large amount of size distortion andasymmetry.Next, we investigate the small sample properties on the power of the t-statistics by using
the local alternative hypothesis:
Hn : bn 1d
n.
OLS 1.4933 1.0235 0.7144 3.9089 3.3739 1.3836 0.0030 0.3950 0.0000
Fig. 2. Kernel density estimation.
If d 0, then the null hypothesis holds. As the local alternative parameter d varies, thenull hypothesis is no longer valid, and the t-statistics tend to reject the null hypothesis.Table 3 shows the frequency of rejecting the null hypothesis at d 1, 2, 3, 4, 5. At the localalternative d 3, the MLE rejects 65%, and the RRR estimator rejects 66% of the nullhypothesis at the 5% size if there is no conditional heteroskedasticity. Ata1; a2;c1;f1;c2;f2; l 1; 0; 0:95; 0; 0:95; 0; 0 and d 3, the MLE rejects 92%, andthe RRR estimator rejects 65% of the null hypothesis at the 5% size. Thus, the relativepower of the t-statistic based on the MLE improves as the volatility parameters increase.On the other hand, the power of the t-statistic based on the RRR or the FM estimator isinvariant to conditional heteroskedasticity.As Table 3 shows, the impact of the parameter cj on the power is greater than that of the
parameter fj. The moving average representation of a GARCH(1,1) process has theexponentially decaying coefcient of fj multiplied by the parameter cj at each lag of ejt.Besides, the power of the tests based on the MLE improves as the correlation parameter lis different from 0.
ARTICLE IN PRESS
Table 3
Power of the t-statistics
a1, a2, c1, f1, c2, f2, l d
1 2 3 4 5
1; 0; 0; 0; 0; 0; 0 MLE 0.2050 0.4700 0.6450 0.7790 0.8520RRR 0.2140 0.4920 0.6610 0.7910 0.8540
FM 0.1190 0.2320 0.3340 0.4990 0.6130
1; 0; 0:25; 0; 0:25; 0; 0 MLE 0.2660 0.5010 0.6860 0.8040 0.8850RRR 0.2360 0.4730 0.6510 0.7830 0.8670
FM 0.1190 0.2140 0.3180 0.4540 0.6040
1; 0; 0:5; 0; 0:5; 0; 0 MLE 0.3000 0.5830 0.7510 0.8330 0.9080RRR 0.2340 0.4740 0.6590 0.7660 0.8520
B. Seo / Journal of Econometrics 137 (2007) 6811186FM 0.1190 0.2320 0.3690 0.4770 0.6080
1; 0; 0:75; 0; 0:75; 0; 0 MLE 0.3790 0.7060 0.8420 0.9240 0.9520RRR 0.2390 0.4870 0.6370 0.8000 0.8360
FM 0.1280 0.2390 0.3510 0.5070 0.5900
1; 0; 0:95; 0; 0:95; 0; 0 MLE 0.5280 0.7810 0.9160 0.9610 0.9800RRR 0.2720 0.4970 0.6500 0.7510 0.8020
FM 0.1400 0.2610 0.3790 0.4880 0.5450
1; 0; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.3180 0.5460 0.7430 0.8360 0.8950RRR 0.2360 0.4210 0.6600 0.7580 0.8140
FM 0.1210 0.2320 0.3680 0.4900 0.5800
1; 0; 0:25; 0:7; 0:25; 0:7; 0:5 MLE 0.3970 0.6620 0.8100 0.9050 0.9520RRR 0.3090 0.5550 0.7360 0.8370 0.9200
FM 0.1430 0.3090 0.4530 0.6170 0.7130
1;0:5; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.6530 0.8880 0.9650 0.9910 0.9930RRR 0.5490 0.8380 0.9390 0.9860 0.9810
FM 0.2760 0.5740 0.7760 0.9030 0.9420
1; 0:5; 0:25; 0:7; 0:25; 0:7; 0 MLE 0.2790 0.4940 0.6930 0.7980 0.8540RRR 0.1810 0.4010 0.5880 0.7010 0.7910
FM 0.1310 0.2020 0.3100 0.3980 0.5310
ARTICLE IN PRESSMany empirical studies have shown that the conditional variances of nancial variablesreveal common persistence and volatility causality. Therefore, the efciency gain andpowerful inference on the cointegrating vector can be obtained when we use theinformation of conditional heteroskedasticity. The simulation evidence indicates thepotential gain of the information contained in the volatility process.
6. Concluding remarks
In this paper, we nd that the asymptotic distribution of the MLE of the cointegratingvector depends on the conditional heteroskedasticity. This fact implies that the efciencyof the MLE can be improved as the data contains conditional heteroskedasticity. Althoughthe RRR estimator and the regression-based estimators allow for conditional hetero-skedasticity, they do not consider the information coming from conditional hetero-skedasticity. As a result, the power of statistical inference on the cointegrating vectorimproves if we use the information of conditional heteroskedasticity.The conventional methods of estimating the cointegrating vector are based on the mean
equation. Because the OLS and GLS estimators are asymptotically equivalent innonstationary cointegrated models, the volatility equation has been treated lessimportantly. However, Amemiya (1973) has shown that the MLE improves the efciencyof estimators if the heteroskedasticity depends on the parameter of the mean equation inthe linear model with stationary variables. Therefore, this paper extends Amemiyas resultto the nonstationary cointegrated model with conditionally heteroskedastic errors.As many studies have shown, the nancial variables have time-varying variances and the
GARCH model has been widely used to estimate volatility. There exist many otherspecications which are capable of explaining conditional heteroskedasticity. Although weconsider a multivariate GARCH model with constant coefcients of correlation, our mainresults can be extended to other heteroskedastic models.Statistical inference on the cointegration space can be also affected by conditional
heteroskedasticity. If we use information of heteroskedastic errors, the power of thecointegration test is expected to improve in the same way that the efciency gain of thecointegrating vector estimator emerges. As this topic requires more complicated analysis,we leave it to future research.
Acknowledgments
I would like to thank Badi Baltagi, Valentina Corradi, David Drukker, Bruce Hansen,Dennis Jansen, Qi Li, Joon Park, Peter Robinson, Pentti Saikkonen, and participants atthe 2004 North America Econometric Society Meeting and workshops at Rice Universityand Texas A&M University for useful comments and suggestions. Special thanks are owedto the co-editor and two anonymous referees, who provided detailed and extensivecomments and suggestions. The author gratefully acknowledges the research support fromSoongsil University.
Appendix A. Mathematical proofs
In the appendix, we denote jAj trA0A1=2, kAkm EjAjm1=m, and Yd fy 2
B. Seo / Journal of Econometrics 137 (2007) 68111 87Y j jDny y0jpdg. For simplicity, supt sup1ptpn, and k k k k1. We denote
ARTICLE IN PRESSXty Xty0 op1 if, for all d40,
supy2Yd
jXty Xty0j!p0.
Proof of Lemma 1. By the invariance principle of Phillips and Durlauf (1986),
n1=2Xnst1
ut ) Us BMO.
(1.1) Show n1=2xns ) C1Us.We need to show
P sups20;1
n1=2 xns C1Xnst1
ut
4
!pP sup
s20;1n1=2jFLunsj4
!! 0.
Note that suptkFLutk2o1 because
kFLutk2pX1j0
jFjjkutk2pX1j0
X1kj1
jCkjkutk2pX1k1
kjCk j kutk2o1.
Thus, fFLutg is uniformly square integrable, which implies
supt
n1=2jFLutj!p0.
(1.2) Show n1=2Pns
t1 wt ) bn0F1Us.
We need to show
P sups20;1
n1=2Xnst1
wt bn0F1
Xnst1
ut
4
!pP sup
s20;1n1=2jbn0F1Lunsj4
!! 0,
where F1L FL F1=1 L.P1k1 k
2jCkjo1, supy2Yjyjo1, and kutk2o1 imply
kbn0F1Lutk2p supbn
jbnjX1j0
X1kj2
k j 1Ckut
2
p supbn
jbnjX1k1
k2jCkjkutk2o1: &
Proof of Lemma 2. The error correction model (1) can be written as follows:
Dxt aIr
b
!0x1t1x2t1
! GDXt1 ut,
B. Seo / Journal of Econometrics 137 (2007) 6811188where G G1;G2; . . . ;Gl and DXt1 vecDxt1;Dxt2; . . . ;Dxtl.
ARTICLE IN PRESSLet Lj be the jth row of the correlation matrix L. The orthogonalized innovation ejt Ljut follows the GARCH(1,1) process
s2jt oj cje2jt1 fjs2jt1.
We use the fact that supt supy 1=s2jtyp1=ojo1 by Assumption 1(d). Also, we use
0pfjo1 for j 1; 2; . . . ; p.First, we show suptkn1=2xtkmo1 for 1pmp6. We prove it for m 6.Since xt C1
Pti1 ui FLut, we need to show
supt
n1=2C1Xti1
ui
6
o1 and suptkn1=2FLutk6o1,
where ut uty0, and FL CL C1=1 L.By using Burkholders inequality and Minkowskis inequality, kutk6o1 implies
supt
n1=2C1Xti1
ui
6
p supt
C1 E n1Xt
i1u2i
3
0@
1A
1=6
pC1 supt
t=nkutk6pC1kutk6o1,
suptkn1=2FLutk6pn1=2
X1k1
kjCkjkutk6 op1,
where C1 1086=5
pjC1j.
Thus, suptkn1=2xtkmo1 for 1pmp6 by monotonicity.Also, we can show that suptkDxtkmo1 and suptkwtkmo1 for 1pmp6 because
suptkDxtkmp sup
tkCLutkmp
X1k0
jCkjkutkmo1
and
suptkwtkmp sup
tkbn0FLutkmp sup
bnjbnj
X1k1
kjCkjkutkmo1.
(2.1) Ln is stochastically equicontinuous.(2.1.a) Show e2jty e2jt op1.Because
uty ut an
p b b00n1=2x2t1 a a0wt1 G G0DXt1,utyu0ty utu0t op1 if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.We use the following:
ejty Ljuty
B. Seo / Journal of Econometrics 137 (2007) 68111 89 ejt Lj Lj0ut Ljuty ut.
ARTICLE IN PRESSSince
e2jty e2jt Lj Lj0utu0tLj Lj00 Ljuty ututy ut0L0j 2ejtu0tLj Lj00 2ejtuty ut0L0j 2Lj Lj0ututy ut0L0j,
supy2Yd
je2jty e2jtj!p0,
for all j 1; 2; . . . ; p if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.b) Show s2jty s2jt op1.Note that
s2jty oj
1 fj cj
Xt1k0
fkj e2jtk1y
s2jt oj
1 fj oj01 fj0
" # cj
Xt1k0
fkj e2jtk1y e2jtk1
Xt1k0
cjfkj cj0fkj0e2jtk1.
We use the following:
cjfkj cj0fkj0 cj cj0fkj cj0fkj fkj0
fkj fkj0 fj fj0fk1j fk2j fj0 fk1j0 pfj fj0kfj
k1,
where fj maxffj ;fj0g.Thus, s2jty s2jt op1 for all j if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.c) Show e2jty=s2jty e2jt=s2jt op1.We use the following:
e2jtys2jty
e2jt
s2jt e
2jty e2jts2jty
2jt
s2jtys2jty s2jt.
Since e2jty e2jt op1, s2jty s2jt op1, and 1=s21typ1=o1, e2jty=s2jty e2jt=s2jt op1 for all j if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.(2.1.d) Show lty lty0 op1, where
lty lty0 0:5Xpj1
log s2jty log s2jt e2jtys2jty
e2jt
s2jt
!" #,
Since log s2jty=s2jtps2jty s2jt=s2jt, log s2jty log s2jt op1.
B. Seo / Journal of Econometrics 137 (2007) 6811190Therefore, lty lty0 op1 if kn1=2x2tk2o1, kwtk2o1, and kDxtk2o1.
ARTICLE IN PRESSBecause
P supy2Yd
jLny Lny0j4 !
p 1n
Xnt1
E supy2Yd
jlty lty0j !
p 1supt
E supy2Yd
jlty lty0j !
,
Assumption 1 implies that the likelihood function Lny is stochastically equicontinuous.(2.2) Show the consistency of b^n.We dene Nd fy 2 Yj j
n
p b b0jpdg and Nd fy 2 Yj jn
p b b0j4dg. We claimthat, for every d40,
limn
Pn;y0fLny04 supy2NdLnyg 1.
To prove the claim, we use the following:
Lny0 Lny
12n
Xpj1
Xnt1
log s2jty log s2jt e2jtys2jty
e2jt
s2jt
" #
12n
Xpj1
Xnt1
logs2jtys2jt
s2jty s2jts2jty
e2jty e2jts2jty
e2jt s2jts2jty s2jt
s2jts2jty
" #
12n
Xpj1
Xnt1
s2jts2jty
log s2jt
s2jty 1 e
2jty e2jts2jty
e2jt s2jts2jty s2jt
s2jts2jty
" #.
Note that
e2jty Ljutyu0tyL0j
s2jty oj
1 fj cj
Xt1k0
fkj e2jtk1y,
where uty ut ab b00x2t1 a a0wt1 G G0DXt1.We use the fact that x2t Op
n
p , supt n1=2jDxtj!p0, and supt n
1=2jwtj!p0 since Dxt
and wt are uniformly square integrable. Thus, uty Opn
p , e2jty Opn, and1=s2jty Opn1 if y 2 Nd.First, s2jt=s
2jty log s2jt=s2jty 1X0 for all s2jt=s2jty40. Thus, limn Pn;y0fK1jnyX0g
1 for all j and y 2 Y, where K1jny 1=nPn
t1 s2jt=s2jty log s2jt=s2jty 1. If y 2 Nd,
B. Seo / Journal of Econometrics 137 (2007) 68111 91s2jt=s2jty ! 0 as n!1, which implies limn Pn;y0fK1jny40g 1.
ARTICLE IN PRESSSecond, show limn Pn;y0fK2jnyX0g 1 for all j and y 2 Nd, where K2jny 1=nPn
t1e2jty e2jt=s2jty.We use the following:
n1Xnt1
e2jty n1Xnt1
e2jt Ljn1Xnt1
uty ututy ut0L0j
Lj Lj0n1Xnt1
utu0tLj Lj00
2n1Xnt1
ejtu0tLj Lj00 2Ljn1
Xnt1
ututy ut0L0j.
Since n1Pn
t1 x2t1u0t Op1, n1
Pnt1 wt1u
0t op1, and n1
Pnt1 DXt1u
0t op1,
n1Xnt1
e2jty e2jt Ljn1Xnt1
uty ututy ut0L0j opn.
As Lemma 3 shows, n1Pn
t1 x2t1X0t1 Op1 and n1
Pnt1Xt1X
0t1 Op1, where
Xt1 w0t1;DX 0t10.
n1Xnt1
uty ututy ut0 ab b00n1Xnt1
x2t1x02t1b b0a0 opn.
Because x2t1x02t1=s2jty Op1 for all y 2 Nd,
n1Xnt1
e2jty e2jts2jty
Ljab b00n1Xnt1
x2t1x02t1s2jty
b b0a0L0j op1
!p Ljab b00M22yb b0a0L0j,
where M22y plim n1Pn
t1 x2t1x02t1=s
2jty.
Thus, limn Pn;y0fK2jnyX0g 1 and for all j and y 2 Nd. If aa0 and M22y40 for ally 2 Nd, then limn Pn;y0fK2jny40g 1.Third, K3jny 1=n
Pnt1 e2jt s2jts2jty s2jt=s2jts2jty 1=n
Pnt1 2jt 11 s2jt=
s2jty!p0 for all j and y 2 Y because E2jt 11 s2jt=s2jtyjFt1 0 and k2jt 1
1 s2jt=s2jtykm=2pkjtk2m 11 kejtk2m=ojo1 for some m42.Thus, limn Pn;y0fLny04supy2NdLnyg 1. The claim implies that if Lny^nXLny0,
it must be that y^n 2 Nd.Therefore,
limn
Pn;y0fy^n 2 NdgX limn
Pn;y0fLny^nXLny0g 1.
B. Seo / Journal of Econometrics 137 (2007) 6811192Next, we show that Hny is stochastically equicontinuous.
ARTICLE IN PRESS(2.3) Show Hnbby Hnbby0 op1, where Hnbby n1Pn
t1 n1q2lty=qb qb0
and
q2ltyqb qb
0
Xpj1
A0jAj x2t1x02t1s2jty
2 hjLx2t2ejt1yhjLx02t2ejt1y
s4jty1 2Zjty
"
2 hjLx2t2ejt1yx02t1ejty
s4jty 2 x2t1ejtyhjLx
02t2ejt1y
s4jty
hjLx2t2x02t2Zjty
s2jty
#.
(2.3.a) Show A0jAj n1x2t1x02t1=s2jty A0j0Aj0 n1x2t1x02t1=s2jt op1.Because
A0jAj n1x2t1x02t1
s2jty
! A0j0Aj0
n1x2t1x02t1s2jt
!
A0jAj n1x2t1x02t1
s2jts2jty
s2jty s2jt !" #
A0jAj A0j0Aj0 n1x2t1x02t1
s2jt
" #
and A0jAj A0j0Aj0 Aj Aj00Aj A0j0Aj Aj0, we get A0jAj n1x2t1x02t1=s2jty A0j0Aj0 n1x2t1x02t1=s2jt op1 if kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1.(2.3.b) Show n1hjLx2t2ejt1yhjLx02t2ejt1y=s4jty n1hj0Lx2t2ejt1
hj0Lx02t2ejt1=s4jt op1, where
hjLx2t2e1t1yhjLx02t2ejt1y c2jXt1k1
f2kj x2tk1x02tk1e
2jtky
c2jXkal
fkj fljx2tk1x
02tl1ejtkyejtly.
If kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1, thenx2tk1x02tk1e
2jtky
s4jtyx2tk1x02tk1e
2jtk
s41j op1
because
x2tk1x02tk1e2jtky
s4jty
x2tk1x02tk1e2jtk
s4jtx2tk1x02tk1e2jtky e2jtk
s4jty
x2tk1x02tk1e
2jtk
s2s2 y1
s2 y 1
s2
!s2jty s2jt.
B. Seo / Journal of Econometrics 137 (2007) 68111 93jt jt jt jt
ARTICLE IN PRESSAs a result, if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1, then
n1hjLx2t2ejt1yhjLx02t2ejt1y
s4jty
n1 hj0Lx2t2ejt1hj0Lx02t2ejt1
s4jt op1.
(2.3.c) Show
n1hjLx2t2ejt1yhjLx02t2ejt1y
s4jtyZjty
n1 hj0Lx2t2ejt1hj0Lx02t2ejt1
s4jtZjt op1,
where Zjt e2jt=s2jt 1.Because
x2tk1x02tk1e2jtkye2jty
s6jty
x2tk1x02tk1e
2jtke
2jt
s6jt x2tk1x
02tk1e2jtkye2jty e2jtke2jt
s6jty
x2tk1x02tk1e
2jtk
2jt
s2jty1
s4jty 1s2jty
1
s2jt 1s4jt
!s2jty s2jt,
we can get the desired results if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1.(2.3.d) Show n1A0jAj hjLx2t2ejt1yx02t1ejty=s4jty n1A0j0Aj0 hj0 Lx2t2
ejt1x02t1ejt=s4jt op1.If kn1=2x2tk5o1, kwtk5o1, and kDxtk5o1, then
n1x2tk1ejtkyx02t1ejty
s4jty n1 x2tk1ejtkx
02t1ejt
s4jt op1
because
x2tk1ejtkyx02t1ejtys4jty
x2tk1ejtkx02t1ejt
s4jt
x2tk1x02t1ejtkyejty ejtkejt
s4t
x2tk1x02t1ejtkjt
sjts2jty1
s2jty 1s2jt
!s2jty s2jt
for all kX1.(2.3.e) In the same way, kn1=2x2tk4o1 and kwtk4o1 imply that
n1hjLx2t2x02t2Zjty
2 n1 hj0Lx2t2x
02t2Zjt
2 op1.
B. Seo / Journal of Econometrics 137 (2007) 6811194sjty sjt
ARTICLE IN PRESSTherefore, Assumption 1 and kutk6o1 imply that n1q2lty=qbqb0 n1q2lty0=qbqb
0 op1.(2.4) Show Hnaay Hnaay0 op1, where a veca, Hnaay n1
Pnt1 q
2lty=qa qa0 and
q2ltyqa qa0
Xpj1
L0jLj wt1bw0t1b
s2jty 2 hjLwt2bejt1yhjLw
0t2bejt1y
s4jty
"
1 2Zjty 2hjLwt2bejt1yw0t1bejty
s4jty
2 wt1bejtyhjLw0t2bejt1y
s4jty hjLwt2bw
0t2bZjty
s2jty
#,
where hjLwt2bejt1y cjPt1
k0 fkj wtk2bejtk1y.
Note that wtbw0tb wtw0t op1 if kn1=2x2tk2o1 and kwtk2o1 becausewtbw0tb wtw0t
n
p b b00n1x2tx02tn
p b b0 np b b00n1=2x2tw0t wtn1=2x02t np b b0.
(2.4.a) Show wt1bw0t1b=s21ty wt1w0t1=s2jt op1.If kn1=2x2tk4o1 and kwtk4o1, then wt1bw0tb=s21ty wt1w0t1=s2jt op1
because
wt1bw0tbs21ty
wt1w0t1
s2jt wt1bw
0tb wt1w0t1s2jty
wt1w0t1
s2jts2jty
s2jty s2jt.
In the same way as (2.3.b)(2.3.e), we can show that Assumption 1 and kutk6o1 implyq2ltyqa qa0
q2lty0qa qa0
op1.
(2.5) Show Hnggy Hnggy0 op1, where Hnggy n1Pn
t1 q2lty=qg qg0.
Because q2lty=qgi qgj 0 for iaj and i; j 1; 2; . . . ; p, we consider the following:q2ltyqo2j
121 fj2s4jty
1 2Zjty
q2ltyqc2j
Pt1k0 fkj e2jtk1y2
s4jty1 2Zjty
q2ltyqf2j
oj=1 fj2 cj
Pt1k1 kf
k1j e
2jtk1y2
2s4jty1 2Zjty
oj=1 fj3
s2jtyZjty.
The proof for stochastic equicontinuity in the GARCH model has been provided in Lee
B. Seo / Journal of Econometrics 137 (2007) 68111 95and Hansen (1994) and Lumsdaine (1996), where the mean equation does not contain
ARTICLE IN PRESSregressors. However, in the same way as (2.3), we can show that
q2ltyqg qg0
q2lty0qg qg0
op1
if kn1=2x2tk6o1, kwtk6o1, and kDxtk6o1.(2.6) Show Hnlly Hnlly0 op1, where Hnlly n1
Pnt1 q
2lty=ql ql0.Because qeity=qlji uity1 j4i,
q2ltyql2ji
u2itys2jty
2 c2j
Pt1k0 f
kj uitk1yejtk1ys4jty
1 2Zjty
4 cjuityejtyPt1
k0 fkj uitk1yejtk1y
s4jty
cjP1k0fkj u2itk1y
s2jtyZjty.
(2.6.a) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then u2ity=s2jty u2it=s2jt op1because
u2itys2jty
u2it
s2jt u
2ity u2its2jty
u2it
s2jts2jty
s2jty s2jt.
(2.6.b) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then
c2jPt1
k0 fkj uitk1yejtk1ys4jty
c2j0
Pt1k0 f
kj0uitk1ejtk1s4jt
op1,
because
uitk1yejtk1ys4jty
uitk1ejtk1s4jt
uitk1yejtk1y uitk1ejtk1s4jty
uitk1ejtk1s2jts
2jty
1
s2jty 1s2jt
!s2jty s2jt.
(2.6.c) If kn1=2x2tk4o1, kwtk4o1, and kDxtk4o1, then
c2jPt1
k0 fkj uitk1yejtk1ys4jty
Zjty c2j0Pt1
k0 fkj0uitk1ejtk1s4jt
Zjt op1.
The cross derivatives entail similar variables, and the same method can be applied to theproof. Therefore, if Assumption 1 holds and kutk6o1, then
Hny Hny0 op1: &P
B. Seo / Journal of Econometrics 137 (2007) 6811196Proof of Lemma 3. Show n3=2 nt1 q2lty0=qb qy02 op1.
ARTICLE IN PRESS(3.1) We show n3=2Pn
t1 q2lty0=qb qveca00 op1, where
q2lty0qb qveca00
Xmj1
A0jLj x2t1w0t1
s2jt 2 hjLx2t2ejt1hjLw
0t2ejt1
s4jt1 2Zjt
"
2 hjLx2t2ejt1w0t1ejt
s4jt 2 x2t1ejthjLw
0t2ejt1
s4jt
hjLx2t2w0t2Zjt
s2jt
#,
where hjLx2t2ejt1 cjPt1
k0fkj x2tk2ejtk1.
First, we show that n1Pn
t1 x2t1w0t1 Op1.
n1Xnt1
xtw0t n1
Xnt1
xt1 Dxtw0t )Z 10
C1U dU 0F01bn K1,
where K1 EDx0w00 Ev0w01, and vt FLut.(3.1.a) Show n3=2
Pnt1 x2t1w
0t1=s
2jt op1.
Note that supt kwt1=s2jtkmo1 for some m42 because
kwt1=s2jtkm kbn0 FLut1=s2jtkmp1=oj sup
bnjbnj
X1k1
kjCkjkutkmo1.
Because fwt1=s2jtg is strong mixing from Assumption 1(f), we can appeal to Theorem 3.1of Hansen (1992) to show that n1
Pnt1 x2t1w
0t1=s
2jt Op1. Thus, n3=2
Pnt1 x2t1w
0t1=
s2jt op1.(3.1.b) Show n3=2
Pnt1 hjLx2t2ejt1hjLw0t2ejt1=s4jt op1, where
hjLx2t2ejt1hjLw0t2ejt1 Xt1k1
h2j;kx2tk1w0tk1e
2jtk
Xkal
hj;khj;lx2tk1w0tl1ejtkejtl .
First, we show that
n3=2Xnt1
Xt1k1
h2j;kx2tk1w0tk1e
2jtk=s
4jt op1.
Lemma 4 of Lee and Hansen (1994) has shown that, for all j and kX1,
E fkjs2jtks2jt
Ftk1 !
p1 a.s.
Thus, we can show that, for some m42,
sup kfkj wtk1e2jtk=s4jtkmp1=ojkwtk1kmkjtkk22mo1.
B. Seo / Journal of Econometrics 137 (2007) 68111 97t
ARTICLE IN PRESSBecause fwtk1e2jtk=s4jtg is strong mixing, we can show that
n1Xnt1
hj;kx2tk1w0tk1e2jtk=s
4jt Op1
for all j and kX1.Because hj;k cjfkj decays exponentially, the nite-dimensional convergence implies
n1Xnt1
Xt1k0
h2j;kx2tk1w0tk1e
2jtk=s
4jt Op1.
Second, we show that
n3=2Xnt1
Xkal
hj;khj;lx2tk1w0tl1ejtlejtk=s4jt op1,
for all j.Without loss of generality, we set kol.
n3=2Xnt1
Xkal
hj;khj;lx2tk1w0tl1ejtlejtk=s4jt
n3=2Xnt1
Xkal
hj;khj;lx2tl1 Dx2tl Dx2tk1w0tl1ejtlejtk=s4jt.
Because kfk=2j fl=2j wtl1ejtlejtk=s4jtkmp1=ojkwtl1kmkjtlkmkjtkkmo1 for somem42,
n1Xnt1
Xkal
hj;khj;lx2tl1w0tl1ejtlejtk=s4jt Op1.
Also,
n1Xnt1
Xkal
hj;khj;lDx2tl Dx2tk1w0tl1ejtlejtk=s4jt Op1.
(3.1.c) The other parts entail fwtk2ejtk1ejtl1Zjt=s4jtg, fejtk1wt1ejt=s4jtg, fwtk2Zjt=s2jtg, and fwtk2ejtk1ejt=s4jtg. These processes are Martingale difference sequences, andthus we can apply Theorem 2.1 of Hansen (1992) to get the desired results.(3.2) Show n1
Pnt1 q
2lty0=qb qvecG00 Op1, where
q2lty0qb qvecG00
Xmj1
A0jLj x2t1DX 0t1
s2jt 2 hjLx2t2ejt1hjLDX
0t2ejt1
s4jt
"
1 2Zjt 2hjLx2t2ejt1DX 0t1ejt
s4jt
2 x2t1ejthjLDX0t2ejt1
s4 hjLx2t2DX
0t2Zjt
s2
#.
B. Seo / Journal of Econometrics 137 (2007) 6811198jt jt
ARTICLE IN PRESSFirst, we show that n1Pn
t1 xtDx0ti Op1 for i 0; 1; 2; . . . ; l 1.
n1Xnt1
xtDx0ti n1Xnt1
xti1 Dxti DxtDx0ti
)Z 10
C1U dU 0C10 K2i,
where K2i EDx0Dx00 DxiDx00 Ev0Dx01 for i 0; 1; 2; . . . ; l 1.(3.2.a)(3.2.b) Note that supt kDxti=s2jtkmo1 for some m42 and i 1; 2; . . . ; l because
kDxti=s2jtkm kCLuti=s2jtkmp1=ojX1k0
jCkjkutkmo1.
Also,
suptkfkj Dxtk1e2jtk=s4jtkmp1=ojkDxtk1kmkjtkk22mo1.
Thus, we get the following in the same way as (3.1.a)(3.1.b):
n3=2Xnt1
x2t1DX 0t1s2jt
op1,
n3=2Xnt1
hjLx2t2ejt1hjLDX 0t2ejt1=s4jt op1.
(3.2.c) The other parts entail
DXtk2ejtk1ejtl1Zjts4jt
;ejtk1DXt1ejt
s4jt;DXtk2Zjt
s2jt;DXtk2ejtk1ejt
s4jt
( ).
These processes are Martingale difference sequences, and thus we can get the desiredresults.(3.3) Let gj oj ;cj ;fj0 for j 1; 2; . . . ;m.Show n1
Pnt1 q
2lty0=qb qg0j Op1, where
q2lty0qb qg0j
Xmj1
A0j x2t1ejtqs2jt=qg0j
s4jt hjLx2t2ejt1qs
2jt=qg
0j
s4jt1 2Zjt
" #
and
qs2jtqgj
1
1 fjPt1k0 f
kj e
2jtk1
oj1 fj2
cjPt1
k1 kfk1j e
2jtk
0BBBBBBB@
1CCCCCCCA.
Lemma 4 of Lee and Hansen (1994) and Lemma 3 of Lumsdaine (1996) have shown that
kqs2jt=qgj1=s2jtkmo1 for some 1pmp6. Furthermore, it can be shown that qs2jt=qgj1=
B. Seo / Journal of Econometrics 137 (2007) 68111 99s2jto1 a.s. for gj oj ;cj. Thus, we show the proof for gj fj .
ARTICLE IN PRESS(3.3.a) Show n1Pn
t1 x2t1qs2jt=qfjejt=s4jt Op1, where qs2jt=qfj oj=1 fj2cjPt1
k1 kfk1j e
2jtk.
Note that qs2jt=qfjejt=s4jt is an MDS. Because kqs2jt=qfjejt=s4jtk2p1=o1=2j kqs2jt=qfj1=s2jtk2o1, we can show that
n1Xnt1
x2t1qs2jt=qfjejts4jt
Op1.
(3.3.b) Show n3=2Pn
t1 hjLx2t2ejt1Pt1
k1 kfk1j e
2jtk=s4jt op1, where
hjLx2t2ejt1Xt1k1
kfk1j e2jtk
! cj
Xt1k1
kf2kj x2tk1e3jtk
cj=fjXkal
lfkj fljx2tk1ejtke
2jtl .
First, we note that vjt;k f3k=2j e3jtk=s4jt is uniformly square integrable because
supt
Ev2jt;kfjvjt;kjXcgp1
ojsupt
E6jtkfj3jtkjXcojg ! 0
as c!1 if ktk6o1.We apply Theorem 3.1 of Hansen (1992) and get the desired result because kfk=2j decays
exponentially.Second, we show that
n3=2Xnt1
Xkal
lfkj fljx2tk1ejtke
2jtl=s
4jt op1.
In the same way as (3.1.b), kfk=2j fljejtke2jtl=s4jtkmo1 for some m42 and for all kal.Because lfk=2j decays exponentially, we can get the desired result.(3.3.c) The other part entails ejtkqs2jt=qfjZjt=s4jt. As the process is an MDS, we can
show the proof in the same way as (3.3.a).(3.4) Show n3=2
Pnt1 q
2lty0=qb ql0 op1, whereq2lty0qb qlji
A0j x2t1uits2jt
2 hjLx2t2ejt1hjLuit1ejt1s4jt
1 2Zjt"
4 hjLx2t2ejt1uitejts4jt
hjLx2t2uit1Zjts2jt
#
for ioj.We use the following:
qejtqlji
uit if ioj
0 otherwise.(3.4.a) Because uit=s2jt is an MDS and kuit=s2jtk2p1=ojkuitk2o1, we can get
n1Xn xt1uit
2 Op1.
B. Seo / Journal of Econometrics 137 (2007) 68111100t1 sjt
ARTICLE IN PRESS(3.4.b) Show n3=2Pn
t1 hjLx2t2ejt1hjLuit1ejt1=s4jt op1, where
hjLx2t2ejt1hjLuit1ejt1 Xt1k1
h2j;kx2tk1uitke2jtk
Xkal
hj;khj;lx2tk1ejtkuitlejtl .
First, because kfkj uitke2jtk=s4jtkmo1 for some m42 and fkj decays exponentially,
n3=2Xnt1
Xt1k1
h2j;kx2tk1uitke2jtk=s
4jt op1.
Next, we can show that n3=2Pn
t1P
kal hj;khj;lx2tk1ejtkuitlejtl=s4jt op1 in the
same as (3.3.b).Therefore, n3=2
Pnt1 q
2lty0=qb qy02 op1.Lemma 2implies that y^n 2 Yd, and hence y 2 Yd, where y 2 y^n; y0.Therefore, by appealing to Proposition 3.2 of Saikkonen (1993), Hny
Hny0 op1, andDny^n y0 Hny1Gny0
Hny01Gny0 op1,where y 2 y^n; y0. Furthermore, block-diagonality implies that
n ^bn b0 hny01gny0 op1,which completes the proof. &
Proof of Lemma 4. (4.1) Show n1Pn
t1 qlty0=qb)R 10W 2sdW 01s, where
qlty0qb
Xmj1
A0j x2t1ejts2jt
hjLx2t2ejt1Zjts2jt
" #.
We denote zjt ejt=s2jt and qjt;k ejtkZjt=s2jt. Note that fzjt; qjt;kg is strictly stationaryand ergodic, and an MDS.Since kzjtk2p1=ojkejtk2o1, we can apply Kurtz and Protter (1991) and Hansen (1992)
to get
n1Xnt1
x2t1zjt )Z 10
W 2 dZj.
In the same way, kqjt;kk2p1=ojkejtk2kjtk24 1o1 for all kX1, and hence
n1Xnt1
x2t1qjt;k )Z 10
W 2 dQj;k.
We denote Fjn;k n1Pn
t1 x2t1qjt;k and Fj;k R 10 W 2 dQj;k. Now, we want to show
Fjn ) Fj, where Fjn Fjn;1;Fjn;2; . . . and Fj Fj;1;Fj;2; . . .. We dene a metricd1f
P1k1 k
rjf kj, where r42 and f 2 R1. Then, the nite-dimensional convergenceand the tightness of the probability measure Fjn in R
1, with respect to the metric d1f ,imply the weak convergence Fjn ) Fj. The detailed proof is given in Hansen (1995,
B. Seo / Journal of Econometrics 137 (2007) 68111 101pp. 11271128).
ARTICLE IN PRESSBecauseP1
k1 hj;kf j;k is d1-continuous, we have the following result by using the CMT:
n1Xnt1
X1k1
hj;kx2tk1qjt;k X1k1
hj;kn1Xn
t1x2t1qjt;k op1
)X1k1
hj;k
Z 10
W 2 dQj;k
Z 10
W 2X1k1
hj;k dQj;k Z 10
W 2 dQj.
Also, show that Qjns ) Qjs, where Qjns n1=2Pns
t1Pt1
k1 hj;kqjt;k and Qjs P1k1 hj;kQj;k.Because
P1kt hj;kkqjt;kk Oftj and
Pnst1P1
kt hj;kkqjt;kko1 for all s 2 0; 1,
P sups20;1
n1=2Xnst1
X1k1
hj;kqjt;k Xnst1
Xt1k1
hj;kqjt;k
4
!
P sups20;1
n1=2Xnst1
X1kt
hj;kqjt;k
4
!
pP n1=2Xnt1
X1kt
hj;kjqjt;kj4 !
p n1=2Pn
t1P1
kt hj;kkqjt;kk
! 0.
Thus, Qjns ) Qjs. Let x2tk1 x2t1 Pk
i1 Dx2ti for kX1. Since kPt1
k1 hj;kPki1 Dx2tiqjt;kkm=2pPt1k1 hj;kPki1 kDx2tikmkqjt;kkmo1 for some m42,
n1Xnt1
Xt1k1
hj;kx2tk1qjt;k
n1Xnt1
Xt1k1
hj;kx2t1qjt;k n1Xnt1
Xt1k1
hj;kXki1
Dx2ti
!qjt;k
n1Xnt1
Xt1k1
hj;kx2t1qjt;k op1.
Thus,
n1Xnt1
hjLx2t2ejt1Zjts2jt
n1Xnt1
Xt1k1
hj;kx2tk1qjt;k
n1Xnt1
Xt1k1
hj;kx2t1qjt;k op1
)Z 1
W 2 dQj.
B. Seo / Journal of Econometrics 137 (2007) 681111020
ARTICLE IN PRESSTherefore, we have n1Pn
t1 qlty0=qb)R 10 W 2 dW
01, where W 1s A0Zs Qs.
(4.2) Show n2Pnt1 q2lty0=qb qb0 ) n R 10 W 2sW 02sds, where q
2lty0qb qb
0 Xmj1
A0jAj x2t1x02t1
s2jt 2 hjLx2t2ejt1hjLx
02t2ejt1
s4jt1 2Zjt
"
2x2t1ejthjLx02t2ejt1=s4jt 2hjLx2t2ejt1x02t1ejt=s4jt
hjLx2t2x02t2Zjt=s2jt#.
(4.2.a) Show n2Pn
t1 x2t1x02t1=s
2jt ) 1=B2j
R 10 W 2W
02.
First, we show that n3=2Pn
t1 x2t1x02t11=s2jt E1=s2jt Op1.
Assumption 1(a) and 1(b) imply that fs2jtg is b-mixing with exponential decay, and sof1=s2jtg. Because k1=s2jtkmp1=ojo1 for somem42, we can show that n3=2
Pnt1 x2t1x
02t1
1=s2jt E1=s2jt Op1. Thus,
n2Xnt1
x2t1x02t1=s2jt ) E1=s2jt
Z 10
W 2W02 1=B2j
Z 10
W 2W02.
(4.2.b) Show n2Pn
t1 hjLx2t2ejt1hjLx02t2ejt1=s4jt ) XjR 10 W 2W
02, where
hjLx2t2ejt1hjLx02t2ejt1 Xt1k1
h2j;kx2tk1x02tk1e
2jtk
Xkal
hj;khj;lx2tk1x02tl1ejtkejtl .
First, we show that
n3=2Xnt1
hj;kx2tk1x02tk1sjt;k Op1,
where sjt;k hj;ke2jtk=s4jt Ee2jtk=s4jt for all k.Note that sjt;k is b-mixing with exponential decay and ksjt;kkmp2=ojkjtk22mo1 for some
m42 and for all kX1. Thus, we get n3=2Pt1
k1Pn
t1 h2j;kx2tk1x
02tk1sjt;k Op1 by
appealing to Theorem 3.1 of Hansen (1992).Also, we get
n2Xnt1
Xt1k1
h2j;kx2tk1x02tk1Ee2jtk=s4jt
n2Xnt1
Xt1k1
h2j;kx2t1x02t1Ee2jtk=s4jt op1
since n1Pn
t1 hj;kx2t1Pk
i1 Dx02ti Op1 and n1
Pnt1 hj;k
Pki1 Dx2ti
Pki1 Dx
02ti
B. Seo / Journal of Econometrics 137 (2007) 68111 103Op1.
ARTICLE IN PRESSBecause n1x2t1x02t1 Op1 andP1
kth2j;kEe2jtk=s4jt Oftj,
n2Xnt1
X1kt
h2j;kx2t1x02t1Ee2jtk=s4jt op1.
Therefore,
n2Xnt1
Xt1k1
h2j;kx2tk1x02tk1e
2jtk=s
4jt
n2Xnt1
Xt1k1
h2j;kx2tk1x02tk1Ee2jtk=s4jt op1
n2Xnt1
x2t1x02t1X1k1
h2j;kEe2jtk=s4jt op1
) XjZ 10
W 2W02.
Second, we show n2Pn
t1 hj;khj;lx2tk1x02tl1ejtkejtl=s
4jt op1 for all kaj.
Without loss of generality, we set l4k.We have n3=2
Pnt1P
kal hj;khj;lx2tk1x02tl1ejtkejtl=s
4jt Op1 because ejtkejtl=s4jt
is b-mixing and kfk=2j fl=2j ejtkejtl=s4jtkmp1=ojktk2mo1 for some m42.(4.2.c) Also, n2
Pnt1 hjLx2t2ejt1hjLx02t2ejt1Zjt=s4jt op1 because
n3=2Xnt1
x2tk1x02tk1e2jtkZjt=s
4jt Op1
and
n3=2Xnt1
x2tk1x02tl1ejtkejtlZjt=s4jt Op1 for all kX1 and kal.
(4.2.d) In the same way, n3=2Pn
t1 x2tk1x02t1ejtkejt=s
4jt Op1 for all kX1. Hence,
we have n2Pn
t1 x2t1ejthjLx02t2ejt1=s4jt op1.(4.2.e) n2
Pnt1 hjLx2t2x02t2Zjt=s2jt op1 since n3=2
Pnt1 x2tk1x
02tk1Zjt=s
2jt
Op1 for all kX1.Therefore, we have n2Pnt1 q2lty0=qb qb0 ) n R 10 W 2sW 02sds.(4.3) Show n2
Pnt1 qlty0=qb qlty0=qb
0 ) m R 10 W 2W 02, whereqlty0qb
qlty0qb
0 Xmj1
A0jAj x2t1x02t1e2jt=s4jthjLx2t2ejt1hjLx02t2ejt1Z2jt=s4jt
2hjLx2t2ejt1x02t1ejtZjt=s4jt.
(4.3.a) We apply Theorem 3.1 of Hansen (1992) to show that
n3=2Xn
x2t1x02t1e2jt=s4jt Ee2jt=s4jt Op1.
B. Seo / Journal of Econometrics 137 (2007) 68111104t1
ARTICLE IN PRESSThus,
n2Xnt1
x2t1x02t1e2jt=s
4jt ) Ee2jt=s4jt
Z 10
W 2W02 1=B2j
Z 10
W 2W02.
(4.3.b) Since EZ2jt kj 1,
n2Xnt1
hjLx2t2ejt1hjLx02t2ejt1Z2jt=s4jt ) kj 1XjZ 10
W 2W02
in the same way as (4.2.b).
(4.3.c) Since fejtkejtZjt=s4jtg is an MDS, n3=2Pn
t1 x2tk1x02t1ejtkejtZjt=s
4jt Op1 for
all kX1. Thus, n2Pn
t1 hjLx2t2ejt1x02t1ejtZjt=s4jt op1 in the same way as (4.2.c).Therefore, n2
Pnt1 gty0g0ty0 ) m
R 10 W 2sW 02sds. &
Proof of Theorem 1. By using Lemmas 14, we have
n ^bn b0 hny01gny0 op1
) nZ 10
W 2W02
1 Z 10
dW 1 W 2
vecZ 10
W 2W02
1 Z 10
W 2 dW01n1
!.
The RRR estimator ~bn is based on the following likelihood function:
Lnb;U;L n1Xnt1
ltb;U;L,
where ltb;U;L 0:5Pm
j1 log s2j 0:5
Pmj1 e
2jtb;U;L=s2j , etb; U;L Lutb;U, s2j
Es2jt, and utb; U satises Eq. (5).We have the following derivatives:
qltb0; U0;L0qb
Xmj1
A0j x2t1ejts2j
,
q2ltb0;U0;L0qb qb
0 Xmj1
A0jAj x2t1x02t1
s2j.
By using the previous results, we can show that the RRR estimator ~bn has an asymptotic
B. Seo / Journal of Econometrics 137 (2007) 68111 105distribution as (20). &
ARTICLE IN PRESSProof of Theorem 2. If we use the information matrix,
WIn )Z
dW 01 W 02
n1 Z 10
W 2W02
1 !
R0 R n1 Z 10
W 2W02
1 !R0
" #1
R n1 Z 10
W 2W02
1 ! ZdW 1 W 2
J 0Q1=20Q1n Q1=2J.We can show other parts in the same way. &
Proof of Lemma 5. (5.1) Show n1=2Pn
t1 qlty0=qt!dL0Z QN0;L0S1=2MS1=2L,
where
qlty0qt
Xmj1
L0jejt
s2jt hjLejt1Zjt
s2jt
" #.
Let zjt ejt=s2jt, and qjt;k ejtkZjt=s2jt. Since fzjtg and qjt;k are Martingale differencesequences,
n1=2Xnt1
zjt !d ZjN 0;1
B2j
!,
n1=2Xnt1
qjt;k !dQj;kN0; kj 1x2j;k,
for each kX1.Let Qjn n1=2
Pnt1P1
k1 hj;kqjt;k. Because kqjt;kk2o1 andP1
k1 hj;kkqjt;kk2o1,Qjn!
d P1k1 hj;kQj;k Qj.
SinceP1
kt hj;kkqjt;kk Oftj, we have the following result in the same way as (4.1).
Qjn n1=2Xnt1
Xt1k1
hj;kqjt;k!dQj .
Therefore, n1=2Pn
t1 qlty0=qt!dL0Z Q.
(5.2) Show n1Pnt1 q2lty0=qtqt0 !p L0S1=2NS1=2L, where q
2lty0qt qt0
Xmj1
L0jLj1
s2jt 2 hjLejt1
2
s4jt1 2Zjt 4
hjLejt1ejts4jt
hj1Zjts2jt
" #.
(5.2.a) Because f1=s2jtg is strictly stationary and ergodic, and k1=s2jtkp1=ojo1,
n1Xn 1
s2!p E 1
s2
! 1B2.
B. Seo / Journal of Econometrics 137 (2007) 68111106t1 jt jt j
ARTICLE IN PRESS(5.2.b) Show n1Pn
t1 hjLejt12=s4jt!pXj, where
hjLejt12 Xt1k1
h2j;ke2jtk
Xkal
hj;khj;lejtkejtl .
Because e2jtk=s4jt is strictly stationary and ergodic, for all kX1, and k
P1k1 h
2j;ke2jtk=
s4jtko1,
n1Xnt1
X1k1
h2j;ke2jtks4jt
!pX1k1
h2j;kEe2jtks4jt
!.
Also, n1Pn
t1P1
kt h2j;kEe2jtk=s4jt o1 since
P1kt h
2j;kEe2jtk=s4jt Oftj.
Thus,
n1Xnt1
Xt1k1
h2j;ke2jtk
s4jt n1
Xnt1
X1k1
h2j;kh2j;ke
2jtk
s4jt op1
!pX1k1
h2j;kEe2jtk=s4jt Xj.
Because n1Pn
t1 hj;khj;lejtkejtl=s4jt op1 for all kal, n1
Pnt1 hjLejt12=s4jt!
pXj.
The other parts entail fejtkejtlZjt=s4jt; ejtkejt=s4jt; Zjt=s2jtg for k; lX1. These processes areMartingale difference sequences, and their sample moments converge to zero.
Therefore, n1Pnt1 q2lty0=qtqt0 !p L0S1=2NS1=2L.(5.3) Show n3=2Pnt1 q2lty0=qb qt0 ) A0S1=2NS1=2L R 10 W 2, where
q2lty0qb qt0
Xmj1
A0jLj x2t1s2jt
2 hjLx2t2ejt1hjLejt1s4jt
1 2Zjt"
2 x2t1ejthjLejt1s4jt
2 hjLx2t2ejt1ejts4jt
hjLx2t2Zjts2jt
#.
(5.3.a) Show n3=2Pn
t1 x2t1=s2jt ) 1=B2j
R 10 W 2.
Assumption 1 implies that fs2jtg is b-mixing with exponential decay, and so f1=s2jtg.Because k1=s2jtkmp1=ojo1 for some m42, we can show that n1
Pnt1 x2t11=s2jt
E1=s2jt Op1. Thus,
n3=2Xnt1
x2t1=s2jt ) E1=s2jtZ 10
W 2 1=B2jZ 10
W 2.
(5.3.b) Show n3=2Pn
t1 hjLx2t2ejt1hjLejt1=s4jt ) XjR 10 W 2, where
hjLx2t2ejt1hjLejt1 Xt1
h2j;kx2tk1e2jtk
Xhj;khj;lx2tk1ejtkejtl .
B. Seo / Journal of Econometrics 137 (2007) 68111 107k1 kal
ARTICLE IN PRESSFirst, we show that
n1Xnt1
fkj x2tk1sjt;k Op1,
where sjt;k fkj e2jtk=s4jt Ee2jtk=s4jt for all kX1.Note that sjt;k is b-mixing with exponential decay and ksjt;kkmp2=ojkjtk22mo1 for some
m42 and for all kX1. Also, n3=2Pn
t1P1
kt h2j;kx2t1Ee2jtk=s4jt op1 since n1=2x2t1
Op1 andP1
kt h2j;kEe2jtk=s4jt Ofkj .
Therefore,
n3=2Xnt1
Xt1k1
h2j;kx2tk1e2jtk
s4jt n3=2
Xnt1
Xt1k1
h2j;kx2tk1Ee2jtk=s4jt op1
n3=2Xnt1
Xt1k1
h2j;kx2t1Ee2jtk=s4jt op1
n3=2Xnt1
x2t1X1k1
h2j;kEe2jtk=s4jt op1
) XjZ 10
W 2.
Second, we show n3=2Pn
t1 hj;khj;lx2tk1ejtkejtl=s4jt op1 for all kaj.
Without loss of generality, we set l4k. We have
n1Xnt1
hj;khj;lx2tk1ejtkejtl=s4jt Op1.
Because hj;khj;l decays exponentially, n1P
l4k
Pnt1 hj;khj;lx2tk1ejtkejtl=s
4jt Op1.
The other parts entail Martingale difference sequences, and are asymptoticallynegligible. Therefore,
n3=2Xnt1
q2lty0qb qt0
) A0S1=2NS1=2LZ 10
W 2: &
Proof of Theorem 3. The parameter vector of the model with an intercept can be denedas y b0; t0; y020. First, show that Hny2ty0 op1, where Hny2ty0 n1
Pnt1 q
2lty0=qy2 qt0.(6.1) Show Hnaty0 op1, where Hnaty0 n1
Pnt1 q
2lty0=qa qt0 and
q2lty0qa qt0
Xpj1
L0j0Lj0 wt1s2jt
2 hjLwt2ejt1hjLejt1s4jt
1 2Zjt"
2 hjLwt2ejt1ejts4jt
2wt1ejthjLejt1s4jt
hjLwt2Zjts2jt
#.
(6.1.a) Because wt1=s2jt is strictly stationary and ergodic, and kwt1=s2jtko1,P p
B. Seo / Journal of Econometrics 137 (2007) 68111108n1 nt1 wt1=s2jt! 0.
ARTICLE IN PRESS(6.1.b) Show n1Pn
t1 hjLwt2ejt1hjLejt1=s4jt op1, where
hjLwt2ejt1hjLejt1 Xt1k1
h2j;kwtk1e2jtk
Xkal
hj;khj;lwtk1ejtkejtl .
Since supt kfkj wtk1e2jtk=s4jtkmo1 for some m42,
n1=2Xnt1
fkj wtk1e2jtk=s
4jt Op1.
Because hj;k decays exponentially, n1=2Pt1
k1Pn
t1 h2j;kwtk1e
2jtk=s
4jt Op1. In the
same way, n1=2Pn
t1 hj;khj;lwtk1ejtkejtl=s4jt Op1 for kal.
Thus, n1Pn
t1 hjLwt2ejt1hjLejt1=s4jt op1.The other parts entail
wtk1ejtkejtlZjts4jt
;wtk1ejtkejt
s4jt;wtk1Zjt
s2jt
( )for k; lX1.
These processes are MDS, and therefore Hnaty0 op1.In the same way, we can show that Hny2ty0 op1.Using the block diagonality of the Hessian matrix,
n ^bn b0n
p t^n t0
0@
1A) n
RW 2W
02 A
0S1=2NS1=2L R W 2L0S1=2NS1=2A R W 02 L0S1=2NS1=2L
0@
1A
1
RdW 1 W 2L0Z Q
!.
Thus,
n ^bn b0 ) V1U ,
where
U Z
dW 1 W 2
A0S1=2NS1=2LZ
W 2
L0S1=2NS1=2L1L0Z Q,
V nZ
W 2W02
A0S1=2NS1=2L
ZW 2
L0S1=2NS1=2L1
L0S1=2NS1=2AZ
W 02
.
B. Seo / Journal of Econometrics 137 (2007) 68111 109
ARTICLE IN PRESSWe use the following:
U ZdW 1 W 11 W 2
ZW 2
Z
dW 1 W 2,
V nZ
W 2W02
n
ZW 2
ZW 02
n
ZW 2W
02 ,
where W 1 a0L0Z Q and W 2 W 2 RW 2.
Therefore,
n ^bn b0 ) nZ
W 2W02
1 ZdW 1 W 2
vecZ
W 2W02
1 ZW 2 dW
01n
!.
In the same way, n ~bn b0 ) vecRW 2W
02 1
RW 2 dW
01mr. &
References
Ahn, S.K., Reinsel, G.C., 1988. Nested reduced-rank autoregressive models for multiple time series. Journal of the
American Statistical Association 83, 849856.
Amemiya, T., 1973. Regression analysis when the variance of the dependent variable is proportional to the square
of its expectation. Journal of the American Statistical Association 68, 928934.
Andrews, D., 1987. Consistency in nonlinear econometric models: a generic uniform law of large numbers.
Econometrica 55, 14651471.
Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31,
307327.
Bollerslev, T., 1990. Modeling coherence in short-run nominal exchange rates: a multivariate generalized ARCH
approach. Review of Economics and Statistics 72, 498505.
Bollerslev, T., Engle, R., Wooldridge, J., 1988. A capital asset pricing model with time varying covariances.
Journal of Political Economy 96, 116131.
Bollerslev, T., Chou, R.Y., Kroner, K.F., 1992. ARCH modeling in nance: a review of the theory and empirical
evidence. Journal of Econometrics 52, 559.
Box, G.E.P., Tiao, G.C., 1977. A canonical analysis of multiple time series. Biometrika 64, 355365.
Carrasco, M., Chen, X., 2002. Mixing and moment properties of various GARCH and stochastic volatility
models. Econometric Theory 18, 1739.
Chanda, K., 1974. Strong mixing properties of linear stochastic processes. Journal of Applied Probability 11,
401408.
Engle, R., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom
ination. Econometrica 50, 9871008.
Engle, R., Granger, C., 1987. Cointegration and error correction representation, estimation, and testing.
Econometrica 55, 251276.
Gorodetskii, V., 1977. On the strong mixing property for linear sequences. Theory of Probability and Its
Applications 22, 411413.
Hansen, B.E., 1992. Convergence to stochastic integrals for dependent heterogeneous processes. Econometric
Theory 8, 489500.
Hansen, B.E., 1995. Regression with nonstationary volatility. Econometrica 63, 11131132.
He, C., Terasvirta, T., 1999. Properties of moments of a family of GARCH processes. Journal of Econometrics
92, 173192.
Johansen, S., 1988. Statistical analysis of cointegrating vectors. Journal of Economic Dynamics and Control 12,
B. Seo / Journal of Econometrics 137 (2007) 68111110231254.
Johansen, S., 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive
models. Econometrica 59, 15511580.
Kurtz, T., Protter, P., 1991. Weak limit theorems to stochastic integrals and stochastic differential equations.
Annals of Probability 19, 10351070.
Lee, S.W., Hansen, B.E., 1994. Asymptotic theory for the GARCH(1,1) quasi-maximum likelihood estimator.
Econometric Theory 10, 2952.
Li, W.K., Ling, S., Wong, H., 2001. Estimation for partially nonstationary multivariate autoregressive models
with conditional heteroskedasticity. Biometrika 88, 11351152.
Ling, S., Li, W.K., 1998. Limiting distributions of maximum likelihood estimators for unstable autoregressive
moving-average time series with general autoregressive heteroskedastic errors. Annals of Statistics 26, 84125.
Ling, S., Li, W.K., 2003. Asymptotic inference for unit root processes with GARCH(1,1) errors. Econometric
Theory 19, 541564.
Ling, S., McAleer, M., 2003. Asymptotic theory for a vector ARMA-GARCH model. Econometric Theory 19,
280310.
Lumsdaine, R.L., 1996. Consistency and asymptotic normality of the quasi-maximum likelihood estimator in
IGARCH (1,1) and covariance stationary GARCH (1,1) models. Econometrica 64, 575596.
Newey, W., 1991. Uniform convergence in probability and stochastic equicontinuity. Econometrica 59,
11611167.
Phillips, P.C.B., 1991. Optimal inference in cointegrated system. Econometrica 59, 283306.
Phillips, P.C.B., Durlauf, S.N., 1986. Multiple time series with integrated variables. Review of Economic Studies
ARTICLE IN PRESSB. Seo / Journal of Econometrics 137 (2007) 68111 11153, 473495.
Saikkonen, P., 1993. Continuous weak convergence and stochastic equicontinuity results for integrated processes
with an application to the estimation of a regression model. Econometric Theory 9, 155188.
Saikkonen, P., 1995. Problems with the asymptotic theory