60
Fixed-smoothing Asymptotics in a Two-step GMM Framework Yixiao Sun Department of Economics, University of California, San Diego January 8, 2014 Abstract The paper develops the xed-smoothing asymptotics in a two-step GMM framework. Un- der this type of asymptotics, the weighting matrix in the second-step GMM criterion function converges weakly to a random matrix and the two-step GMM estimator is asymptotically mixed normal. Nevertheless, the Wald statistic, the GMM criterion function statistic and the LM statistic remain asymptotically pivotal. It is shown that critical values from the xed- smoothing asymptotic distribution are high order correct under the conventional increasing- smoothing asymptotics. When an orthonormal series covariance estimator is used, the critical values can be approximated very well by the quantiles of a noncentral F distribution. A sim- ulation study shows that the new statistical tests based on the xed-smoothing critical values are much more accurate in size than the conventional chi-square test. JEL Classication: C12, C32 Keywords: F distribution, Fixed-smoothing Asymptotics, Heteroskedasticity and Autocorre- lation Robust, Increasing-smoothing Asymptotics, Noncentral F Test, Two-step GMM Esti- mation 1 Introduction Recent research on heteroskedasticity and autocorrelation robust (HAR) inference has been fo- cusing on developing distributional approximations that are more accurate than the conventional chi-square approximation or the normal approximation. To a great extent and from a broad perspective, this development is in line with many other areas of research in econometrics where more accurate distributional approximations are the focus of interest. A common theme for com- ing up with a new approximation is to embed the nite sample situation in a di/erent limiting thought experiment. In the case of HAR inference, the conventional limiting thought experiment assumes that the amount of smoothing increases with the sample size but at a slower rate. The Email: [email protected]. For helpful comments, the author thanks Jungbin Hwang, David Kaplan, and Min Seong Kim, Elie Tamer, the coeditor, and anonymous referees. The author gratefully acknowledges partial research support from NSF under Grant No. SES-0752443. Correspondence to: Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508. 1

Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Fixed-smoothing Asymptotics in a Two-step GMM Framework

Yixiao Sun�

Department of Economics,University of California, San Diego

January 8, 2014

Abstract

The paper develops the �xed-smoothing asymptotics in a two-step GMM framework. Un-der this type of asymptotics, the weighting matrix in the second-step GMM criterion functionconverges weakly to a random matrix and the two-step GMM estimator is asymptoticallymixed normal. Nevertheless, the Wald statistic, the GMM criterion function statistic and theLM statistic remain asymptotically pivotal. It is shown that critical values from the �xed-smoothing asymptotic distribution are high order correct under the conventional increasing-smoothing asymptotics. When an orthonormal series covariance estimator is used, the criticalvalues can be approximated very well by the quantiles of a noncentral F distribution. A sim-ulation study shows that the new statistical tests based on the �xed-smoothing critical valuesare much more accurate in size than the conventional chi-square test.

JEL Classi�cation: C12, C32

Keywords: F distribution, Fixed-smoothing Asymptotics, Heteroskedasticity and Autocorre-lation Robust, Increasing-smoothing Asymptotics, Noncentral F Test, Two-step GMM Esti-mation

1 Introduction

Recent research on heteroskedasticity and autocorrelation robust (HAR) inference has been fo-cusing on developing distributional approximations that are more accurate than the conventionalchi-square approximation or the normal approximation. To a great extent and from a broadperspective, this development is in line with many other areas of research in econometrics wheremore accurate distributional approximations are the focus of interest. A common theme for com-ing up with a new approximation is to embed the �nite sample situation in a di¤erent limitingthought experiment. In the case of HAR inference, the conventional limiting thought experimentassumes that the amount of smoothing increases with the sample size but at a slower rate. The

�Email: [email protected]. For helpful comments, the author thanks Jungbin Hwang, David Kaplan, and MinSeong Kim, Elie Tamer, the coeditor, and anonymous referees. The author gratefully acknowledges partial researchsupport from NSF under Grant No. SES-0752443. Correspondence to: Department of Economics, University ofCalifornia, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508.

1

Page 2: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

new limiting thought experiment assumes that the amount of smoothing is held �xed as the sam-ple size increases. This leads to two types of asymptotics: the conventional increasing-smoothingasymptotics and the more recent �xed-smoothing asymptotics. Sun (2014) coins these two in-clusive terms so that they are applicable to di¤erent HAR variance estimators, including bothkernel HAR variance estimators and orthonormal series (OS) HAR variance estimators.

There is a large and growing literature on the �xed-smoothing asymptotics. For kernel HARvariance estimators, the �xed-smoothing asymptotics is the so-called the �xed-b asymptotics�rst studied by Kiefer and Vogelsang (2002a, 2002b, 2005, KV hereafter) in the econometricsliterature. For other studies, see, for example, Jansson (2004), Sun, Phillips, Jin (2008), Sunand Phillips (2009), Gonçlaves and Vogelsang (2011) in the time series setting, Bester, Conley,Hansen and Vogelsang (2011, BCHV hereafter) and Sun and Kim (2014) in the spatial setting, andGonçalves (2011), Kim and Sun (2013), and Vogelsang (2012) in the panel data setting. For OSHAR variance estimators, the �xed-smoothing asymptotics is the so-called �xed-K asymptotics.For its theoretical development and related simulation evidence, see, for example, Phillips (2005),Müller (2007), and Sun (2011, 2013). The approximation approaches in some other papers canalso be regarded as special cases of the �xed-smoothing asymptotics. This includes, among others,Ibragimov and Müller (2010), Shao (2010) and Bester, Hansen and Conley (2011).

All of the recent developments on the �xed-smoothing asymptotics have been devoted to�rst-step GMM estimation and inference. In this paper, we establish the �xed-smoothing asymp-totics in a general two-step GMM framework. For two-step estimation and inference, the HARvariance estimator not only appears in the covariance estimator but also plays the role of theoptimal weighting matrix in the second-step GMM criterion function. Under the �xed-smoothingasymptotics, the weighting matrix converges to a random matrix. As a result, the second-stepGMM estimator is not asymptotically normal but rather asymptotically mixed normal. On theone hand, the asymptotic mixed normality captures the estimation uncertainty of the GMMweighting matrix and is expected to be closer to the �nite sample distribution of the second-stepGMM estimator. On the other hand, the lack of asymptotic normality posts a challenge for piv-otal inference. It is far from obvious that the Wald statistic is still asymptotically pivotal underthe new asymptotics. To confront this challenge, we have to judicially rotate and transform theasymptotic distribution and show that it is equivalent to a distribution that is nuisance parameterfree.

The �xed-smoothing asymptotic distribution not only depends on the kernel function or basisfunctions and the smoothing parameter, which is the same as in the one-step GMM framework,but also depends on the degree of over-identi�cation, which is di¤erent from existing results. Ingeneral, the degree of over-identi�cation or the number of moment conditions remains in the as-ymptotic distribution only under the so-called many-instruments or many-moments asymptotics;see for example Han and Phillips (2006) and references therein. Here the number of momentconditions and hence the degree of over-identi�cation are �nite and �xed. Intuitively, the de-gree of over-identi�cation remains in the asymptotic distribution because it is indicative of thedimension of the limiting random weighting matrix.

In the case of OS HAR variance estimation, the �xed-smoothing asymptotic distribution isa mixed noncentral F distribution � a noncentral F distribution with a random noncentralityparameter. This is an intriguing result, as a noncentral F distribution is not expected under thenull hypothesis. It is reassuring that the random noncentrality parameter becomes degenerateas the amount of smoothing increases. Replacing the random noncentrality parameter by itsmean, we show that the mixed noncentral F distribution can be approximated extremely well by

2

Page 3: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

a noncentral F distribution. The noncentral F distribution is implemented in standard program-ming languages and packages such as the R language, MATLAB, Mathematica and STATA, socritical values are readily available. No computation-intensive simulation is needed. This can beregarded as an advantage of using the OS HAR variance estimator.

In the case of kernel HAR variance estimation, the �xed-smoothing asymptotic distributionis a mixed chi-square distribution � a chi-square distribution scaled by an independent andpositive random variable. This resembles an F distribution except that the random denominatorhas a nonstandard distribution. Nevertheless, we are able to show that critical values from the�xed-smoothing asymptotic distribution are high order correct under the conventional increasing-smoothing asymptotics. This result is established on the basis of two distributional expansions.The �rst expansion is the expansion of the �xed-smoothing asymptotic distribution as the amountof smoothing increases, and the second one is the high order Edgeworth expansion established inSun and Phillips (2009). We arrive at the two expansions via completely di¤erent routes, yet atthe end some of the terms in the two expansions are exactly the same.

Our framework for establishing the �xed-smoothing asymptotics is general enough to accom-modate both the kernel HAR variance estimators and the OS HAR variance estimators. The�xed-smoothing asymptotics is established under weaker conditions than what are typically as-sumed in the literature. More speci�cally, instead of maintaining a functional CLT assumption,we make use of a regular CLT, which is weaker than an FCLT. Our method of proof is also novel.It applies directly to both smooth and nonsmooth kernel functions. There is no need to give aseparate treatment to non-smooth kernels such as the Bartlett kernel. The uni�ed method ofproof leads to a uni�ed representation of the �xed-smoothing asymptotic distribution.

The �xed-smoothing asymptotics is established for three commonly used test statistics in theGMM framework: the Wald statistic, the GMM criterion function statistic, and the score typestatistic or the LM statistic. As in the conventional increasing-smoothing asymptotic framework,we show that these three test statistics are asymptotically equivalent and converge to the samelimiting distribution.

In the Monte Carlo simulations, we examine the accuracy of the �xed-smoothing approx-imation to the distribution of the Wald statistic. We �nd that the tests based on the new�xed-smoothing approximation have much more accurate size than the conventional chi-squaretests. This is especially true when the degree of over-identi�cation is large. When the model isover-identi�ed, the �xed-smoothing approximation that accounts for the randomness of the GMMweighting matrix is also more accurate than the �xed-smoothing approximation that ignores therandomness. When the OS HAR variance estimator is used, the convenient noncentral F test hasalmost identical size properties as the nonstandard test whose critical values have to be simulated.

The rest of the paper is organized as follows. Section 2 describes the estimation and testingproblems at hand. Section 3 establishes the �xed-smoothing asymptotics for the variance esti-mators and the associated test statistics. Section 4 gives di¤erent representations of the �xed-smoothing asymptotic distribution. In the case of OS HAR variance estimation, a noncentral Fdistribution is shown to be a very accurate approximation to the nonstandard �xed-smoothingasymptotic distribution. In the case of kernel HAR variance estimation, the �xed-smoothingapproximation is shown to provide a high order re�nement over the chi-square approximation.Section 5 studies the �xed-smoothing asymptotic approximation under local alternatives. Thenext section reports simulation evidence and applies our method to GMM estimation and in-ference in a stochastic volatility model. The last section provides some concluding discussion.Technical lemmas and proofs of the main results are given in the Appendix.

3

Page 4: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

A word on notation: we use Fp;K�p�q+1��2�to denote a random variable that follows the

noncentral F distribution with degrees of freedom (p;K � p� q + 1) and noncentrality parameter�2:We use Fp;K�p�q+1

�z; �2

�to denote the CDF of the noncentral F distribution. For a positive

semide�nite symmetric matrix A; A1=2 is the symmetric matrix square root of A when it is notde�ned explicitly.

2 Two-step GMM Estimation and Testing

We are interested in a d � 1 vector of parameters � 2 � � Rd: Let vt 2 Rdv denote a vector ofobservations at time t: Let �0 denote the true parameter value and assume that �0 is an interiorpoint of �: The moment conditions

Ef (vt; �) = 0; t = 1; 2; :::; T (1)

hold if and only if � = �0 where f (vt; �) is an m�1 vector of continuously di¤erentiable functions.The process f (vt; �0) may exhibit autocorrelation of unknown forms. We assume that m � d andthat the rank of E

�@f (vt; �0) =@�

0� is equal to d: That is, we consider a model that is possiblyover-identi�ed with the degree of over-identi�cation q = m� d:

De�ne

gt (�) =1

T

tXj=1

f(vj ; �);

then the GMM estimator of �0 is given by

�̂GMM = argmin�2�

gT (�)0W�1

T gT (�)

where WT is a positive de�nite weighting matrix.To obtain an initial �rst step estimator, we often choose a simple weighting matrix Wo that

does not depend on model parameters, leading to

~�T = argmin�2�

gT (�)0W�1

o gT (�) :

As an example, we may set Wo = Im in the general GMM setting. In the IV regression, we mayset Wo = Z

0Z=T where Z is the data matrix for the instruments. We assume that

Wop!Wo;1;

a matrix that is positive de�nite almost surely.According to Hansen (1982), the optimal weighting matrix WT is the asymptotic variance

matrix ofpTgT (�0) : On the basis of the �rst step estimate ~�T ; we can use ~ut := f(vt; ~�T ) to

estimate the asymptotic variance matrix. Many nonparametric estimators of the variance matrixare available in the literature. In this paper, we consider a class of quadratic variance estimators,which includes the conventional kernel variance estimators of Andrews (1991), Newey and West(1987), Politis (2011), sharp and steep kernel variance estimators of Phillips, Sun and Jin (2006,2007), and the orthonormal series variance estimators of Phillips (2005), Müller (2007), and Sun(2011, 2013) as special cases. Following Phillips, Sun and Jin (2006, 2007), we refer to theconventional kernel estimators as contracted kernel estimators and the sharp and steep kernelestimators as exponentiated kernel estimators.

4

Page 5: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

For a given value of �; the quadratic HAR variance estimator takes the form:

WT (�) =1

T

TXt=1

TXs=1

Qh

�t

T;s

T

� f(vt; �)�

1

T

TX�=1

f(v� ; �)

! f(vs; �)�

1

T

TX�=1

f(v� ; �)

!0:

The feasible HAR variance estimator based on ~�T is then given by

WT

�~�T

�=1

T

TXt=1

TXs=1

Qh

�t

T;s

T

� ~ut �

1

T

TX�=1

~u�

! ~us �

1

T

TX�=1

~u�

!0: (2)

In the above de�nitions, Qh (r; s) is a symmetric weighting function that depends on the smooth-ing parameter h: For conventional kernel estimators, Qh (r; s) = k ((r � s) =b) and we takeh = 1=b: For the sharp kernel estimator, Qh (r; s) = (1� jr � sj)� 1 fjr � sj < 1g and we takeh = �: For steep quadratic kernel estimators, Qh (r; s) = k� (r � s) and we take h = p

�: Forthe OS estimators Qh (r; s) = K�1PK

j=1 �j (r)�j (s) and we take h = K; where��j (r)

are

orthonormal basis functions on L2[0; 1] satisfyingR 10 �j (r) dr = 0: A prewhitened version of the

above estimators can also be used. See, for example, Andrews and Monahan (1992) and Xiaoand Linton (2002).

It is important to point out that the HAR variance estimator in (2) is based on the de-meaned moment process ~ut�T�1

PT�=1 ~u� : For over-identi�ed GMM models, the sample average

T�1PT�=1 ~u� is in general di¤erent from zero. Under the conventional asymptotic theory, de-

meaning is necessary to ensure the consistency of the HAR variance estimator for misspeci�edmodels. Under the new asymptotic theory, demeaning is necessary to ensure that the associatedtest statistics are asymptotically pivotal. Hall (2000, 2005: Sec 4.3) provides more discussionson the bene�ts of demeaning including power improvement for over-identi�cation tests. See alsoSun and Kim (2012).

With the variance estimator WT (~�T ); the two-step GMM estimator is:

�̂T = argmin�2�

gT (�)0W�1

T (~�T )gT (�) :

Suppose we want to test the linear null hypothesis H0 : R�0 = r against H0 : R�0 6= r whereR is a p� d matrix with full row rank. Nonlinear restrictions can be converted to linear ones bythe delta method. We consider three types of test statistics. The �rst type is the conventionalWald statistic. The normalized Wald statistic is

WT :=WT (�̂T ) = T (R�̂T � r)0�RhGT (�̂T )

0W�1T (�̂T )GT (�̂T )

i�1R0��1

(R�̂T � r)=p;

where GT (�) =@gT (�)@�0

:When p = 1 and for one-sided alternative hypotheses, we can construct the t statistic tT =

tT (�̂T ) where tT (�T ) is de�ned to be

tT (�T ) =

pT (R�T � r)n

R�GT (�T )0W

�1T (�T )GT (�T )

��1R0o1=2 :

The second type of test statistic is based on the likelihood ratio principle. Let �̂T;R be therestricted second-step GMM estimator:

�̂T;R = argmin�2�

gT (�)0W�1

T (~�T )gT (�) s:t: R� = r:

5

Page 6: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

The likelihood ratio principle suggests the GMM distance statistic (or GMM criterion functionstatistic) given by

DT :=hTgT (�̂T )

0W�1T (��T )gT (�̂T )� TgT (�̂T;R)0W�1

T (��T )gT (�̂T;R)i=p

where ��T is apT -consistent estimator of �0; i.e., ��T � �0 = Op(1=

pT ): Typically, we take ��T to

be ~�T ; the unrestricted �rst-step estimator. To ensure that DT � 0; we use the same W�1T (��T )

in computing the restricted and unrestricted GMM criterion functions.The third type of test statistic is the GMM counterpart of the score statistic or Lagrange

Multiplier (LM) statistic. It is based on the score or gradient of the GMM criterion function,i.e., �T (�) = G0T (�)W

�1T (��T )gT (�). The test statistic is given by

ST = Th�T (�̂T;R)

i0 hG0T (�̂T;R)W

�1T (��T )GT (�̂T;R)

i�1�T (�̂T;R)=p

where as before ��T is apT -consistent estimator of �0:

It is implicit in our notation that we use the same smoothing parameter h inWT (��T ); WT (~�T )andWT (�̂T ). In principle, we can use di¤erent smoothing parameters but the asymptotic distribu-tions will become very complicated. For future research, it is interesting to explore the advantagesof using di¤erent smoothing parameters. In this paper we employ the same smoothing parameterin all HAR variance estimators.

Under the usual asymptotics where h ! 1 but at a slower rate than the sample size T; allthree statistics WT ; DT and ST are asymptotically distributed as �2p=p, and the t-statistic tT isasymptotically normal under the null. The question is, what are the limiting distributions ofWT ;DT ; ST and tT when h is held �xed as T !1? The rationale of considering this type of thoughtexperiment is that it may deliver asymptotic approximations that are more accurate than thechi-square or normal approximation in �nite samples.

When h is �xed, that is, b; � or K is �xed, the variance matrix estimator involves a �xedamount of smoothing in that it is approximately equal to an average of a �xed number of quan-tities from a frequency domain perspective. The �xed-b, �xed-� or �xed-K asymptotics may becollectively referred to as the �xed-smoothing asymptotics. Correspondingly, the conventionalasymptotics under which h ! 1, T ! 1 jointly is referred to as the increasing-smoothingasymptotics. In view of the de�nition of h for each HAR variance estimator, the magnitude of hindicates the amount or level of smoothing in each case.

3 The Fixed-smoothing Asymptotics

3.1 Fixed-smoothing asymptotics for the variance estimator

To establish the �xed-smoothing asymptotics, we �rst introduce another HAR variance estimator.Let

Q�T;h (r; s) = Qh (r; s)�1

T

TX�1=1

Qh

��1T; s�� 1

T

TX�2=1

Qh

�r;�2T

�+1

T

TX�1=1

TX�2=1

Qh

��1T;�2T

�be the demeaned version of Qh (r; s) ; then we have

WT (�) =1

T

TXt=1

TXs=1

Q�T;h

�t

T;s

T

�f(vt; �)f(vs; �)

0: (3)

6

Page 7: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

As T !1; we obtain the �continuous�version of Q�T;h (r; s) given by

Q�h (r; s) = Qh (r; s)�Z 1

0Qh (�1; s) d�1 �

Z 1

0Qh (r; �2) d�2 +

Z 1

0

Z 1

0Qh (�1; �2) d�1d�2:

Q�h (r; s) can be regarded as a centered version of Qh (r; s), as it satis�esR 10 Q

�h (r; �) dr =R 1

0 Q�h (� ; s) ds = 0 for any � : For OS HAR variance estimators, we have Q

�h (r; s) = Qh (r; s) ; and

so centering is not necessary.For our theoretical development, it is helpful to de�ne the HAR variance estimator:

W �T (�) =

1

T

TXt=1

TXs=1

Q�h

�t

T;s

T

�f(vt; �)f(vs; �)

0:

It is shown in Lemma 1(a) below that WT (��T ) and W �T (��T ) are asymptotically equivalent for anyp

T -consistent estimator ��T : So the limiting distribution of WT (��T ) follows from that of W �T (��T ):

To establish the �xed-smoothing asymptotics of W �T (��T ) and hence that of WT (��T ), we main-

tain Assumption 1 on the kernel function and basis functions.

Assumption 1 (a) For kernel HAR variance estimators, the kernel function k (�) satis�es thefollowing condition: for any b 2 (0; 1] and � � 1, kb (x) and k� (x) are symmetric, continuous,piecewise monotonic, and piecewise continuously di¤erentiable on [�1; 1]. (b) For the OS HARvariance estimator, the basis functions �j (�) are piecewise monotonic, continuously di¤erentiableand orthonormal in L2[0; 1] and

R 10 �j (x) dx = 0:

Assumption 1 on the kernel function is very mild. It includes many commonly used kernelfunctions such as the Bartlett kernel, Parzen kernel, QS kernel, Daniel kernel and Tukey-Hanningkernel. Assumption 1 ensures the asymptotic equivalence of WT (��T ) and W �

T (��T ). In addition,

under Assumption 1, we can use the Fourier series expansion to show that Q�h (r; s) has thefollowing uni�ed representation for all the HAR variance estimators we consider:

Q�h (r; s) =1Xj=1

�j�j (r) �j (s) ; (4)

where f�j (�)g is a sequence of continuously di¤erentiable functions satisfyingR 10 �j (r) dr = 0.

The right hand side of (4) converges absolutely and uniformly over (r; s) 2 [0; 1] � [0; 1]: Fornotational simplicity, the dependence of �j and �j (�) on h has been suppressed. For positivesemi-de�nite kernels k (�), the above representation also follows from Mercer�s theorem, in whichcase �j is the eigenvalue and �j (�) is the corresponding orthonormal eigen function for theFredholm operator with kernel Q�h (�; �) : Alternatively, the representation can be regarded asa spectral decomposition of this Fredholm operator, which can be shown to be compact. Thisrepresentation enables us to give a uni�ed proof for the contracted kernel HAR variance estimator,the exponentiated kernel HAR variance estimator, and the OS HAR variance estimator. Forkernel HAR variance estimation, there is no need to single out non-smooth kernels such as theBartlett kernel and treat them di¤erently as in KV (2005).

De�ne

Gt(�) =@gt (�)

@�0=1

T

tXj=1

@f(vj ; �)

@�0for t � 1; and G0 (�) = 0 for all � 2 �:

7

Page 8: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Let ut = f(vt; �0) and�0 (t) � 1; et s iidN(0; Im):

We make the following four assumptions on the GMM estimators and the data generating process.

Assumption 2 As T !1; �̂T = �0+op (1) ; �̂T;R = �0+op (1) ; ~�T = �0+op (1) for an interiorpoint �0 2 �:

Assumption 3P1j=�1 k�jk <1 where �j = Eutu0t�j.

Assumption 4 For any �T = �0 + op (1) ; plimT!1G[rT ] (�T ) = rG uniformly in r where G =G(�0) and G(�) = E@f(vt; �)=@�0.

Assumption 5 (a) T�1=2PTt=1�j (t=T )ut converges weakly to a continuous distribution, jointly

over j = 0; 1; :::; J for every �nite J:(b) The following holds:

P

1pT

TXt=1

�j

�t

T

�ut � x for j = 0; 1; :::; J

!

= P

�1pT

TXt=1

�j

�t

T

�et � x for j = 0; 1; :::; J

!+ o (1) as T !1

for every �nite J where x 2 Rp and � is the matrix square root of ; i.e., ��0 = :=P1j=�1 �j :

Assumptions 2�4 are standard assumptions in the literature on the �xed-smoothing asymp-totics. They are the same as those in Kiefer and Vogelsang (2005), Sun and Kim (2012), amongothers. Assumption 5 is a variant of the standard multivariate CLT. If Assumption 1 holds and

T 1=2g[rT ](�0) =1pT

[rT ]Xt=1

utd! �Bm(r)

where Bm(r) is a standard Brownian motion, then Assumption 5 holds. So Assumption 5 isweaker than the above FCLT, which is typically assumed in the literature on the �xed-smoothingasymptotics.

A great advantage of maintaining only the CLT assumption is that our results can be eas-ily generalized to the case of higher dimensional dependence, such as spatial dependence andspatial-temporal dependence. In fact, Sun and Kim (2014) have established the �xed-smoothingasymptotics in the spatial setting using only the CLT assumption. They avoid the more restric-tive FCLT assumption maintained in BCHV (2011). With some minor notational change and asmall change in Assumption 4 as in Sun and Kim (2014), our results remain valid in the spatialsetting.

Assumption 5(b) only assumes that the approximation error is o(1); which is enough forour �rst order �xed-smoothing asymptotics. However, it is useful to discuss the composition ofthe approximation error. Let J (t) = (�0 (t) ;�1 (t) ; :::;�J (t))

0 and aJ 2 RJ+1 be a vector ofthe same dimension. It follows from Lemma 1 in Taniguchi and Puri (1996) that under someadditional conditions:

P

a0J

1pT

TXt=1

J

�t

T

�ut � x

!= P

a0J�

1pT

TXt=1

J

�t

T

�et � x

!+c(x)pT+O

�1

T

8

Page 9: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

for some function c(x). In the above Edgeworth expansion, the approximation error of orderc(x)=

pT captures the skewness of ut. When ut is Gaussian or has a symmetric distribution, this

term disappears. Part of the approximation error of order O(1=T ) comes from the stochasticdependence of futg : If we replace the iid Gaussian process f�etg by a dependent Gaussianprocess

�uNtthat has the same covariance structure as futg, then we can remove this part of

the approximation error.To present Assumption 5 and our theoretical results more compactly, we introduce the no-

tion of asymptotic equivalence in distribution. Consider two stochastically bounded sequences ofrandom vectors �T 2 Rp and �T 2 Rp, we say that they are asymptotically equivalent in distri-bution and write �T

as �T if and only if Ef (�T ) = Ef (�T ) + o (1) as T ! 1 for all boundedand continuous functions f (�) on Rp: According to Lemma 2 in the appendix, Assumption 5 isequivalent to

1pT

TXt=1

�j

�t

T

�ut

as �1pT

TXt=1

�j

�t

T

�et

jointly over j = 0; 1; :::; J .Let

j (�) =

"1pT

TXt=1

�j

�t

T

�f (vt; �)

#"1pT

TXt=1

�j

�t

T

�f (vt; �)

#0:

Using the uniform series representation in (4), we can write

W �T (�) =

1Xj=1

�jj (�) ;

which is an in�nite weighted sum of outer-products. To establish the �xed-smoothing asymptoticdistribution of W �

T (�) under Assumption 5, we split W�T (�) into a �nite sum part and the

remainder part: W �T (�) =

PJj=1 �jj (�) +

P1j=J+1 �jj (�). For each �nite J; we can use

Assumptions 2�5 to obtain the asymptotically equivalent distribution for the �rst term. However,for the second term to vanish, we require J !1: To close the gap in our argument, we use Lemma4 in the appendix, which puts our proof on a rigorous footing and may be of independent interest.

Lemma 1 Let Assumptions 1-5 hold, then for anypT -consistent estimator ��T and for a �xed

h,(a) WT (��T ) =W

�T (��T ) +Op (1=T ) ;

(b) W �T (��T ) =W

�T (�0) + op (1) ;

(c) W �T (��T )

asWeT and WT (��T )asWeT where

WeT = �

"T�1

TXt=1

TX�=1

Qh (t=T; �=T ) (et � �e) (e� � �e)0#�0

and �e = 1T

PTt=1 et:

(d) W �T (��T )

d!W1 and WT (��T )d!W1 where

W1 = � ~W1�0 and ~W1 =

Z 1

0

Z 1

0Q�h(r; s)dBm(r)dBm(s)

0:

9

Page 10: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Lemma 1(c)(d) gives a uni�ed representation of the asymptotically equivalent distributionand the limiting distribution of W �

T (��T ) and WT (��T ). The representation applies to smooth

kernels as well as nonsmooth kernels such as the Bartlett kernel.Although we do not make the FCLT assumption, the limiting distribution of W �

T (��T ) and

WT (��T ) can still be represented by a functional of Brownian motion as in Lemma 1(d). Thisrepresentation serves two purposes. First, it gives an explicit representation of the limitingdistribution. Second, the representation in Lemma 1(d) enables us to compare the results weobtain here with existing results. It is reassuring that the limiting distribution W1 is the sameas that obtained by Kiefer and Vogelsang (2005), Phillips, Sun and Jin (2006, 2007), and Sunand Kim (2012), among others.

It is standard practice to obtain the limiting distribution in order to conduct asymptoticallyvalid inference. However, this is not necessary, as we can simply use the asymptotically equivalentdistribution in Lemma 1(c). There are some advantages in doing so. First, it is easier to simulatethe asymptotically equivalent distribution, as it involves only T iid normal vectors. In contrast, tosimulate the Brownian motion in the limiting distribution, we need to employ a large number of iidnormal vectors. Second, the asymptotically equivalent distribution can provide a more accurateapproximation in some cases. Note that WeT takes the same form as W �

T (�0) : If f(vt; �0) isiid N(0;); then W �

T (�0) and WeT have the same distribution in �nite samples. There is noapproximation error. In contrast, the limiting distribution W1 is always an approximation tothe �nite sample distribution of W �

T (�0) :

3.2 Fixed-smoothing asymptotics for the test statistics

We now establish the asymptotic distributions of WT ;DT ;ST and tT when h is �xed. For kernelvariance estimators, we focus on the case that k (�) is positive de�nite, as otherwise the two-step estimator may not even be consistent. In this case, we can show that, for all the varianceestimators we consider, W1 is nonsingular with probability one. Hence, using Lemma 1 and

Lemma 3 in the appendix, we have WT (~�T )�1 as W�1

eT and WT (~�T )�1 d! W�1

1 : Using theseresults and Assumptions 1-5, we have

pT��̂T � �0

�= �

hG0WT (~�T )

�1Gi�1

G0WT (~�T )�1pTgT (�0) + op (1)

as ��G0W�1

eT G��1

G0W�1eT

1pT

TXt=1

�et (5)

d! ��G0W�1

1 G��1

G0W�11 �Bm (1) :

The limiting distribution is not normal but rather mixed normal. More speci�cally, W�11 is inde-

pendent of Bm(1); so conditional onW�11 ; the limiting distribution is normal with the conditional

variance depending on W�11 :

Using (5) and Lemma 3 in the appendix, we have WTas FeT and WT

d! F1 where

FeT =

"R�G0W�1

eT G��1

G0W�1eT �

1pT

TXt=1

et

#0 hR�G0W�1

eT G��1

R0i�1

�"R�G0W�1

eT G��1

G0W�1eT �

1pT

TXt=1

et

#=p;

10

Page 11: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

and

F1 =hR�G0W�1

1 G��1

G0W�11 �Bm(1)

i0 hR�G0W�1

1 G��1

R0i�1

�hR�G0W�1

1 G��1

G0W�11 �Bm(1)

i=p:

To show that FeT and F1 are pivotal, we let et := (e0t;p; e0t;d�p; e

0t;q)

0. The subscripts p, d� p;and q on e indicate not only the dimensions of the random vectors and but also distinguish themso that, for example, et;p is di¤erent and independent from et;q for all values of p and q: Denote

Cp;T =1pT

TXt=1

et;p, Cq;T =1pT

TXt=1

et;q

and

Cpp;T =1

T

TXt=1

TX�=1

Qh(t

T;�

T)~et;p~e

0�;p; Cpq;T =

1

T

TXt=1

TX�=1

Qh(t

T;�

T)~et;p~e

0�;q (6)

Cqq;T =1

T

TXt=1

TX�=1

Qh(t

T;�

T)~et;q~e

0�;q; Dpp;T = Cpp;T � Cpq;TC�1qq;TC

0pq;T ;

where ~ei;j = ei;j � T�1PTs=1 es;j for i = t; � and j = p; q: Similarly let

Bm(r) :=�B0p(r); B

0d�p(r); B

0q(r)

�0where Bp(r); Bd�p(r) and Bq(r) are independent standard Brownian motion processes of dimen-sions p, d� p; and q; respectively. Denote

Cpp =

Z 1

0

Z 1

0Q�h(r; s)dBp(r)dBp(s)

0; Cpq =

Z 1

0

Z 1

0Q�h(r; s)dBp(r)dBq(s)

0 (7)

Cqq =

Z 1

0

Z 1

0Q�h(r; s)dBq(r)dBq(s)

0; Dpp = Cpp � CpqC�1qq C 0pq:

Then using the theory of multivariate statistics, we obtain the following theorem.

Theorem 1 Let Assumptions 1-5 hold. Assume that k (�) used in the kernel HAR varianceestimation is positive de�nite. Then, for a �xed h;

(a) WT (�̂T )ashCp;T � Cpq;TC�1qq;TCq;T

i0D�1pp;T

hCp;T � Cpq;TC�1qq;TCq;T

i=p

d= FeT ;

(b) WT (�̂T )d!�Bp (1)� CpqC�1qq Bq (1)

�0D�1pp

�Bp (1)� CpqC�1qq Bq (1)

�=p

d= F1;

(c) tT (�̂T )as teT :=

hCp;T � Cpq;TC�1qq;TCq;T

i=pDpp;T ;

(d) tT (�̂T )d! t1 :=

�Bp (1)� CpqC�1qq Bq (1)

�=pDpp;

where (C 0p;T ; C0q;T )

0 is independent of Cpq;TC�1qq;T and Dpp;T ; and

�B0p (1) ; B

0q (1)

�0 is independentof CpqC�1qq and Dpp:

Remark 1 Theorems 1 (a) and (b) show that both the asymptotically equivalent distribution FeTand the limiting distribution F1 are pivotal. In particular, they do not depend on the long runvariance : Both FeT and F1 are nonstandard but can be simulated. It is easier to simulate FeT ;as it involves only T iid standard normal vectors.

11

Page 12: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Remark 2 LetWT (~�T ) be the Wald statistic based on the �rst-step GMM estimator ~�T . WT (~�T )

is constructed in the same way as WT (�̂T ): It is easy to see that WT (~�T )d! Bp (1)

0C�1pp Bp (1) =p:See KV (2005) for contracted kernel HAR variance estimators, Phillips, Sun and Jin (2006,2007)for exponentiated kernel HAR variance estimators, and Sun (2013) for OS HAR variance esti-mators. Comparing the limit Bp (1)

0C�1pp Bp (1) =p with F1 given in Theorem 1, we can see thatF1 contains additional adjustment terms.

Remark 3 When the model is exactly identi�ed, we have q = 0. In this case,

F1d= Bp (1)

0C�1pp Bp (1) =p:

This limit is the same as that in the one-step GMM framework. This is not surprising, as whenthe model is exactly identi�ed, the GMM weighting matrix becomes irrelevant and the two-stepestimator is numerically identical to the one-step estimator.

Remark 4 It is interesting to see that the limiting distribution F1 (and the asymptotically equiv-alent distribution) depends on the degree of over-identi�cation. Typically such dependence showsup only when the number of moment conditions is assumed to grow with the sample size. Here thenumber of moment conditions is �xed. It remains in the limiting distribution because it containsinformation on the dimension of the random limiting matrix ~W1:

Theorem 2 Let Assumptions 1-5 hold. Then, for a �xed h;

DT =WT + op (1) and ST =WT + op (1) :

It follows from Theorem 2 that all three statistics are asymptotically equivalent in distributionto the same distribution FeT , as given in Theorem 1(a). They also have the same limiting distri-bution F1: So the asymptotic equivalence of the three statistics under the increasing-smoothingasymptotics remains valid under the �xed-smoothing asymptotics.

Careful inspection of the proofs of Theorems 1 and 2 shows that the two theorems holdregardless of which

pT -consistent estimator of �0 we use in estimating WT (�0) and GT (�0) in

DT and ST : However, it may make a di¤erence to high orders.

4 Representation and Approximation of the Fixed-smoothingAsymptotic Distribution

In this section, we establish di¤erent representations of the �xed-smoothing limiting distribu-tion. These representations highlight the connection and di¤erence between the nonstandardapproximation and the standard �2 or normal approximation.

4.1 The case of OS HAR variance estimation

To obtain a new representation of F1; we consider the matrix

K

�Cpp CpqC 0pq Cqq

�d=

Z 1

0

Z 1

0Q�h (r; s) dBp+q (r) dB

0p+q(s)

=

KXj=1

�Z 1

0�j (r) dBp+q (r)

� �Z 1

0�j (r) dBp+q (r)

�0

12

Page 13: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

where Bp+q (r) is the standard Brownian motion of dimension p + q: Since f�j (r)g are ortho-normal,

R 10 �j (r) dBp+q (r)

ds iidN(0; Ip+q) and the above matrix follows a standard Wishartdistribution Wp+q (K; Ip+q) : A well known property of a Wishart random matrix is that Dpp =Cpp�CpqC�1qq C 0pq sWp (K � q; Ip) =K and is independent of both Cpq and Cqq: This implies thatDpp is independent of Bp (1)�CpqC�1qq Bq (1), and that F1 is equal in distribution to a quadraticform with an independent scaled Wishart matrix as the (inverse) weighting matrix.

This brings F1 close to Hotelling�s T 2 distribution, which is the same as a standard F distri-bution after some multiplicative adjustment. The only di¤erence is that Bp (1)�CpqC�1qq Bq (1) isnot normal and hence

Bp (1)� CpqC�1qq Bq (1) 2 does not follow a chi-square distribution. How-ever, conditional on CpqC�1qq Bq (1) ; jjBp (1)�CpqC�1qq Bq (1) jj2 follows a noncentral �2p distributionwith noncentrality parameter

�2 = jjCpqC�1qq Bq (1) jj2:It then follows that F1 is conditionally distributed as a noncentral F distribution. Let

� =K

(K � p� q + 1) ; �2 = E�2 =

pq

K � q � 1 :

The following theorem presents this result formally.

Theorem 3 For OS HAR variance estimation, we have(a) ��1F1

d= Fp;K�p�q+1

��2�; a mixed noncentral F random variable with random noncen-

trality parameter �2;(b) P (��1F1 < z) = Fp;K�p�q+1

�z; �2

�+ o

�K�1� as K !1;

(c) P (pF1 < z) = Gp (z)� G0p (z) z�p+2q�1K

�+ G00p (z) z2 1K + o

�1K

�as K ! 1; where Gp (z)

is the CDF of the chi-square distribution �2p:

Remark 5 According to Theorem 3(a),

F1d=

�2p��2�=p

�2K�p�q+1=K

where �2p��2�is independent of �2K�p�q+1. It is easy to see that

�2p��2� d= jjCp�1 � Cp�KC 0q�K

�Cq�KC

0q�K

��1Cq�1jj2

where each Cm�n is anm�n matrix (vector) with iid standard normal elements and Cp�1; Cp�K ; Cq�Kand Cq�1 are mutually independent. This provides a simple way to simulate F1:

Remark 6 Parts (a) and (b) of Theorem 3 are intriguing in that the limiting distribution underthe null is approximated by a noncentral F distribution, although the noncentrality parametergoes to zero (in probability) as K !1.

Remark 7 When q = 0; that is, when the model is just identi�ed, we have �2 = �2 = 0 and so

K � p+ 1K

F1d= Fp;K�p+1:

This result is the same as that obtained by Sun (2013). So an advantage of using the OS HARvariance estimator is that the �xed-smoothing asymptotics is exactly a standard F distributionfor just identi�ed models.

13

Page 14: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Remark 8 The asymptotics obtained under the speci�cation that K is �xed and then lettingK !1 is a sequential asymptotics. As K !1; we may show that

��1F1d=

�2p=p

�2K�p�q+1= (K � p� q + 1) + op (1) = �2p=p+ op (1) :

To the �rst order, the �xed-smoothing asymptotic distribution reduces to the distribution of �2p=p.As a result, if �rst order asymptotics are used in both steps in the sequential asymptotic theory,then the sequential asymptotic distribution is the same as the conventional joint asymptotic dis-tribution. However, Theorem 3 is not based on the �rst order asymptotics but rather a high orderasymptotics. The high order sequential asymptotics can be regarded as a convenient way to obtainan asymptotic approximation that better re�ects the �nite sample distribution of the test statisticDT ; ST or WT :

Remark 9 Instead of approximations via asymptotic expansions, all three types of approxima-tions are distributional approximations. More speci�cally, the �nite sample distributions of DT ;ST ; and WT are approximated by the distributions of

�2pp;�2p��2�=p

�2K�p�q+1=K; and

�2p��2�=p

�2K�p�q+1=K(8)

respectively under the conventional joint asymptotics, the �xed-smoothing asymptotics, and thehigher order sequential asymptotics. An advantage of using distributional approximations is thatthey hold uniformly over their supports (by Pólya�s Lemma).

Remark 10 Theorem 3(c) gives a �rst order distributional expansion of F1: It is clear that thedi¤erence between pF1 and �2p depends on K; p and q: Let �

1��p = G�1p (1� �) be the (1� �)

quantile of Gp(z); then

P�pF1 � �1��p

�= �+ G0p

��1��p

��1��p

�p+ 2q � 1

K

�� G00p

��1��p

� ��1��p

�2 1K+ o

�1

K

�:

For a typical critical value �1��p ; G0p(�1��p ) > 0 and G00p (�1��p ) < 0; so we expect P (F1 ��1��p =p) > �; at least when K is large. So the critical value from F1 is expected to be larger than�1��p =p: For given p and large K; the di¤erence between P (F1 � �1��p =p) and � increases with q;the degree of over-identi�cation. A practical implication of Theorem 3(c) is that when the degreeof over-identi�cation is large, using the chi-square critical value rather than the critical valuefrom F1 may lead to the �nding of a statistically signi�cant relationship that does not actuallyexist.

Figure 1 reports 95% critical values from the two approximations: the noncentral (scaled)Fp;K�p�q+1

��; �2

�approximation and the nonstandard F1 approximation, for di¤erent combina-

tions of p and q. The nonstandard F1 distribution is simulated according to the representation inRemark 5 and the nonstandard critical values are based on 10000 simulation replications. Sincethe noncentral F distribution appears naturally in power analysis, it has been implemented instandard statistical and programming packages. So critical values from the noncentral F dis-tribution can be obtained easily. Let NCF 1�� := F1��p;K�p�q+1

��2�be the (1� �) quantile of

the noncentral Fp;K�p�q+1��; �2

�distribution, then �NCF 1�� corresponds to F1��1 ; the (1� �)

14

Page 15: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

10 15 20 25 30 35 40 45 503

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

K

 p = 1

10 15 20 25 30 35 40 45 503

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

K

 p = 2

10 15 20 25 30 35 40 45 503

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

K

 p = 3

F∞

, q = 0F

∞, q = 1

F∞, q = 2F

∞, q = 3

NCF, q = 0NCF, q = 1NCF, q = 2NCF, q = 3

10 15 20 25 30 35 40 45 503

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

K

 p = 4

F∞

, q = 0F

∞, q = 1

F∞, q = 2F

∞, q = 3

NCF, q = 0NCF, q = 1NCF, q = 2NCF, q = 3

Figure 1: 95% quantitle of the nonstandard distribution and its noncentral F approximation

quantile of the nonstandard F1 distribution. Figure 1 graphs �NCF 1�� and F1��1 against dif-ferent K values. The �gure exhibits several patterns, all of which are consistent with Theorem3(c). First, the noncentral F critical values are remarkably close to the nonstandard criticalvalues. So approximating �2 by its mean �2 does not incur much approximation error. Second,the noncentral F and nonstandard critical values increase with the degree of over-identi�cationq. The increase is more signi�cant when K is smaller. This is expected as both the noncen-trality parameter �2 and the multiplicative factor � increase with q: In addition, as q increases,the denominators in the noncentral F and nonstandard distributions become more random. Allthese e¤ects shift the probability mass to the right as q increases. Third, for any given p andq combination, the noncentral F and nonstandard critical values decrease monotonically as Kincreases and approach the corresponding critical values from the chi-square distribution.

We have also considered approximating the 90% critical values. In this case, the noncentralF critical values are also very close to the nonstandard critical values.

In parallel with Theorem 3, we can represent and approximate t1 as follows:

Theorem 4 Let � = K=(K � q): For OS HAR variance estimation, we have(a) t1=

p�

d= tK�q (�) ; a mixed noncentral t random variable with random noncentrality

parameter � = C1qC�1qq Bq (1) ;(b) P (t1=

p� < z) = 1

2 + sgn(z)12F1;K�q

�z2; �2

�+ o

�K�1� as K ! 1; where �2 = q

K�q�1and sgn(�) is the sign function;

15

Page 16: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

(c) P (t1 < z) = � (z)�jzj�(z)�z2 + (4q + 1)

�=(4K)+o

�K�1� as K !1; where � (z) and

� (z) are the CDF and pdf of the standard normal distribution.

Theorem 4(a) is similar to Theorem 3(a). With a scale adjustment, the �xed-smoothingasymptotic distribution of the t statistic is a mixed noncentral t distribution.

According to Theorem 4(b), we can approximate the quantile of t1 by that of a noncentralF distribution. More speci�cally, let t1��1 be the (1� �) quantile of t1; then

t1��1 _=

8<:q�F1�2�1;K�q(�

2); � < 0:5

�q�F2��11;K�q(�

2); � � 0:5

where F1�2�1;K�q(�2) is the (1� 2�) quantile of the noncentral F distribution. For a two-sided t test,

the (1� �) quantile jt1j1�� of jt1j can be approximated byq�F1��1;K�q(�

2): As in the case ofquantile approximation for F1; the above approximation is remarkably accurate. Figure 2, whichis similar to Figure 1, illustrates this. The �gure graphs t1��1 and its approximation against thevalues of K for di¤erent degrees of over-identi�cation and for � = 0:05 and 0:95: As K increases,the quantiles approach the normal quantiles �1:645. However when K is small or q is large, thereis a signi�cant di¤erence between the normal quantiles and the corresponding quantiles from t1:This is consistent with Theorem 4(c). For a given small K, the absolute di¤erence increases withthe degree of over-identi�cation q. For a given q; the absolute di¤erence decreases with K:

Theorem 4(c) and Figure 2 suggest that the quantile of t1 is larger than the correspondingnormal quantile in absolute value. So the test based on the normal approximation rejects thenull more often than the test based on the �xed-smoothing approximation. This provides anexplanation of the large size distortion of the asymptotic normal test.

4.2 The case of kernel HAR variance estimation

In the case of kernel HAR variance estimation, Dpp = Cpp � CpqC�1qq C 0pq is not independent ofCpq or Cqq: It is not as easy to simplify the nonstandard distribution as in the case of OS HARvariance estimation. Di¤erent proofs are needed.

Let

�1 =

Z 1

0Q�h (r; r) dr and �2 =

Z 1

0

Z 1

0[Q�h (r; s)]

2 drds: (9)

In Lemma 5 in the appendix, we show that �1 � 1 � 1=h and �2 � 1=h where \a � b�indicatesthat a and b are of the same order of magnitude.

Theorem 5 For kernel HAR variance estimation, we have(a) pF1

d= �2p=�

2 where �2p is independent of �2,

�2d=

e0p�Ip + CpqC

�1qq C

�1qq C

0pq

�ep

e0p�Ip + CpqC

�1qq C

�1qq C 0pq

� �Cpp � CpqC�1qq C 0pq

��1 �Ip + CpqC

�1qq C

�1qq C 0pq

�ep

and ep = (1; 0; :::; 0)0 2 Rp.

(b) As h!1; we have �2 p! 1 and

P (pF1 < z) = Gp (z) + G0p (z) z�(�1 � 1)�

�2�1(p+ 2q � 1)

�+ G00p (z) z2�2 + o (�2)

where as before Gp (z) is the CDF of the chi-square distribution �2p:

16

Page 17: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

0 10 20 30 40 50­3

­2

­1

0

1

2

3

K

 q = 0

0 10 20 30 40 50­3

­2

­1

0

1

2

3

K

 q = 1

0 10 20 30 40 50­4

­3

­2

­1

0

1

2

3

4

K

 q = 2

0 10 20 30 40 50­10

­5

0

5

10

K

 q = 3

95% quantile of t∞

 5% quantile of t∞

Approximate 95% quantileApproximate 5% quantile

95% quantile of t∞

 5% quantile of t ∞

Approximate 95% quantileApproximate 5% quantile

Figure 2: 5% and 95% quantitles of the nonstandard distribution t1 and their noncentral Fapproximations

Remark 11 Theorem 5(a) shows that pF1 follows a scale-mixed chi-square distribution. Since�2 !p 1 as h ! 1; the sequential limit of pWT is the usual chi-square distribution. The resultis the same as in the OS HAR variance estimation. The virtue of Theorem 5(a) is that it givesan explicit characterization of the random scaling factor �2:

Remark 12 Using Theorem 5(a), we have P (pF1 < z) = EGp�z�2�: Theorem 5(b) follows by

taking a Taylor expansion of Gp�z�2�around Gp (z) and approximating the moments of �2: In

the case of the contracted kernel HAR variance estimation, Sun (2014) establishes an expansionof the asymptotic distribution for WT (~�T ), which is based on the one-step GMM estimator ~�T :Using the notation in this paper, it is shown that

P�Bp (1)

0C�1pp Bp (1) < z�= Gp (z) + G0p (z) z

�(�1 � 1)�

�2�1(p� 1)

�+ G00p (z) z2�2 + o (�2) :

So up to the order O(�2); the two expansions agree except that there is an additional term inTheorem 5(b) that re�ects the degree of over-identi�cation. If we use the chi-square distributionor the �xed smoothing asymptotic distribution Bp (1)

0C�1pp Bp (1) as the reference distribution, thenthe probability of over-rejection increases with q, at least when �2 is small.

We now focus on the case of contracted kernel HAR variance estimation. As h ! 1; i.e.,b! 0; we can show that

�1 = 1� bc1 + o(b) and �2 = bc2 + o (b) (10)

17

Page 18: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

where

c1 =

Z 1

�1k (x) dx; c2 =

Z 1

�1k2 (x) dx:

Using Theorem 5(b) and the identity that �G0�z2� �z2 + 1

�= 2G00

�z2�z2; we have, for p = 1 :

P�F1 < z2

�= G

�z2�� c1G0

�z2�z2b� c2

2

�z4 + (4q + 1) z2

�G0�z2�b+ o(b) (11)

where G�z2�= G1

�z2�:

The above expansion is related to the high order Edgeworth expansion established in Sun andPhillips (2009) for linear IV regressions. Sun and Phillips (2009) consider the conventional jointlimit under which b! 0 and T !1 jointly and show that

P (jtT (�̂T )j < z) = �(z)�� (�z)

��c1z +

1

2c2�z3 + z (4q + 1)

��b�(z) + (bT )�q0 �1;1z�(z) + s:o: (12)

where (bT )�q0 �1;1z captures the nonparametric bias of the kernel HAR variance estimator, and�s.o.� stands for smaller order terms. Here q0 is the order of the kernel used and the explicitexpression for �1;1 is not important here but is given in Sun and Phillips (2009). Observing that�(z)�� (�z) = G

�z2�; we have �(z) = G0

�z2�z and �0(z) = G0

�z2�+2z2G00

�z2�: Using these

equations, we can represent the high order expansion in (12) as

P (jtT (�̂T )j2 < z2) = P (WT < z2) (13)

= G�z2�� c1G0

�z2�z2b� c2

2

�z4 + (4q + 1) z2

�G0�z2�b+ (bT )�q0 �1;1G0

�z2�z2 + s:o:

Comparing this with (11), the terms of order O(b) are seen to be exactly the same across the twoexpansions.

Let z2 = F1��1 be the (1� �) quantile from the distribution of F1; i.e., P�F1 < F1��1

�=

1� �: Then

G�F1��1

�� c1G0

�F1��1

�F1��1 b� c2

2

h�F1��1

�2+ (4q + 1)F1��1

iG0�F1��1

�b = 1� �+ o(b)

and soP (WT < F1��1 ) = 1� �+ (bT )�q0 �1;1G0

�F1��1

�F1��1 + s:o:

That is, using the critical value F1��1 eliminates a term in the higher order distributional ex-pansion of WT under the conventional asymptotics. The nonstandard critical value F1��1 isthus high order correct under the conventional asymptotics. In other words, the nonstandardapproximation provides a high order re�nement over the conventional chi-square approximation.

Although the high order re�nement is established for the case of p = 1 and linear IV regres-sions, we expect it to be true more generally and for all three types of HAR variance estimators.All we need is a higher order Edgeworth expansion for the Wald statistic in a general GMM set-ting. In view of Sun and Phillips (2009), this is not di¢ cult conceptually but can be technicallyvery tedious.

18

Page 19: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

5 The Fixed-smoothing Asymptotics under the Local Alterna-tives

In this section, we consider the �xed-smoothing asymptotics under a sequence of local alternativesH1 : R�0 = r � �0=

pT where �0 is a p � 1 vector. This con�guration, known as the Pitman

drift, is widely used in local power analysis. Under the local alternatives, both �0 and the datagenerating process depend on the sample size. For notational convenience, we have suppressedan extra T subscript on �0.

If Assumptions 2-5 hold under the local alternatives, then we can use the same arguments as

before and show that WT (�̂T )d! F1;�0 and tT (�̂T )

d! t1;�0 where

F1;�0 =hR�G0W�1

1 G��1

G0W�11 �Bm(1) + �0

i0 hR�G0W�1

1 G��1

R0i�1

�hR�G0W�1

1 G��1

G0W�11 �Bm(1) + �0

i=p

and

t1;�0 =hR�G0W�1

1 G��1

G0W�11 �Bm(1) + �0

i0 hR�G0W�1

1 G��1

R0i�1=2

:

LetV = R

�G0�1G

��1R0; �� = V�1=2�0;

and

F1(jj��jj2) =�Bp (1)� CpqC�1qq Bq (1) + ��

�0D�1pp

�Bp (1)� CpqC�1qq Bq (1) + ��

�=p; (14)

t1(��) =�Bp (1)� CpqC�1qq Bq (1) + ��

�=pDpp when p = 1: (15)

The following theorem shows that F1;�0 and F1(jj��jj2) have the same distribution. Similarly,t1;�0 and t1(��) have the same distribution.

Theorem 6 Let Assumption 1 hold with k (�) being positive de�nite. In addition, let Assumptions2-5 hold under H1. Then, for a �xed h;

(a) WT (�̂T )d! F1(jj��jj2);

(b) tT (�̂T )d! t1(��) when p = 1;

where�B0p (1) ; B

0q (1)

�0 is independent of CpqC�1qq and Dpp:

In the spirit of Theorem 1, we can also approximate the distributions of WT and tT by theirrespective asymptotically equivalent distributions. To conserve space, we omit the details here. Inaddition, we can show that, under the assumptions of Theorem 6, DT and ST are asymptoticallyequivalent in distribution to WT : Hence they also converge in distribution to F1(jj��jj2): When�0 = 0; Theorem 6 reduces to Theorems 1(b) and (d).

The proof of Theorem 6 shows that the right hand side of (14) depends on �� only throughjj��jj2: For this reason, we can write F1(�) as a function of jj��jj2 only. But jj��jj2 = �00V�1�0 isexactly the same as the noncentrality parameter in the noncentral chi-square distribution underthe conventional increasing-smoothing asymptotics. See, for example, Hall (2005, p. 163). Whenh ! 1; F1(jj��jj2) converges in distribution to �2p(jj��jj2)=p: That is, the conventional limitingdistribution can be recovered from F1(jj��jj2): However, for a �nite h; these two distributions canbe substantially di¤erent.

19

Page 20: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

For the Wald statistic based on the �rst-step estimator, it is not hard to show that for a �xedh;

WT (~�T )d! ~F1(jj~�jj2) :=

hBp (1) + ~�

i0C�1pp

hBp (1) + ~�

i=p

where~V = R

�G0W�1

o;1G��1

G0W�1o;1W

�1o;1G

�G0W�1

o;1G��1

R0 and ~� = ~V�1=2�0. (16)

Similar to F1(jj��jj2), the limiting distribution ~F1(jj~�jj2) depends on ~� only through jj~�jj2: Notethat ~V and V are the asymptotic variances of the �rst-step and two-step estimators in the con-ventional asymptotic framework. It follows from the standard GMM theory that jj��jj2 � jj~�jj2.So in general the two-step test has a larger noncentrality parameter than the �rst-step test. Thisis a source of the power advantage of the two-step test.

To characterize the di¤erence between jj��jj2 and jj~�jj2; we let UG�GV 0G be a singular valuedecomposition of G; where

�G;m�d =�A0G; O

0G

�0;

AG is a nonsingular d � d matrix, OG is a q � d matrix of zeros, and UG and VG are orthonor-mal matrices. We rotate the moment conditions and partition the rotated moment conditionsU 0Gf (vt; �) according to

U 0Gf (vt; �) :=

�fU1(vt; �)fU2(vt; �)

�(17)

where fU1(vt; �) 2 Rd and fU2(vt; �) 2 Rq. Then the moment conditions in (1) are equivalent tothe following rotated moment conditions:

EU 0Gf (vt; �) = 0; t = 1; 2; :::; T (18)

hold if and only if � = �0. The corresponding rotated weighting matrix and long run variancematrix are

WU;1 : = U 0GWo;1UG =

�W 11U;1 W 12

U;1W 21U;1 W 22

U;1

�;

U : = U 0GUG =

�11U 12U21U 22U

�;

where the matrices are partitioned conforming to the two blocks in (17).

Let BU;1 =�W 22U;1

��1W 21U;1. In the appendix we show that

~V = RVGA�1G

�Id

�BU;1

�0U

�Id

�BU;1

��RVGA

�1G

�0; (19)

V = RVGA�1G

�Id

��22U��1

21U

�0U

�Id

��22U��1

21U

��RVGA

�1G

�0;

and

~V � V = RVGA�1Gh�22U��1

21U �BU;1i022U

h�22U��1

21U �BU;1i �RVGA

�1G

�0: (20)

So the di¤erence between ~V and V depends on the di¤erence between�22U��1

21U and BU;1:

While�22U��1

21U is the long run (population) regression matrix when fU1(vt; �0) is regressed

20

Page 21: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

on fU2(vt; �0); BU;1 can be regarded as the long run regression matrix implied by the weightingmatrix WU;1: When the two long run regression matrices are close to each other, ~V and V willalso be close to each other. Otherwise, ~V and V can be very di¤erent.

The di¤erence between ~V and V translates into the di¤erence between the noncentralityparameters. To gain deeper insights, we let

f�U1 (vt; �) := VGA�1G

�fU1(vt; �)�B0U;1fU2 (vt; �)

�:

Note that the long run variance matrix of Rf�U1 (vt; �0) is ~V. It is easy to show that the GMMestimator based on the moment conditions Ef�U1 (vt; �0) = 0 has the same asymptotic distributionas the �rst-step estimator ~�T : Hence Ef�U1 (vt; �0) = 0 can be regarded as the e¤ective momentconditions behind ~�T : Intuitively, ~�T employs the moment conditions Ef�U1 (vt; �0) = 0 only whileignoring the second block of moment conditions EfU2 (vt; �0) = 0: In contrast, the two-stepestimator �̂T employs both blocks of moment conditions. The relative merits of the �rst-stepestimator and the two-step estimator depend on whether the second block of moment conditionscontains additional information about �0: To measure the information content, we introduce thelong run correlation matrix between Rf�U1 (vt; �0) and fU2 (vt; �0) :

�12 = ~V�1=2�RVGA

�1G

�12U �B0U;122U

�� �22U��1=2

where RVGA�1G (

12U �B0U;122U ) is the long run covariance between Rf�U1 (vt; �0) and fU2 (vt; �0) :

Proposition 7 gives a characterization of jj��jj2 � jj~�jj2 in terms of �12:

Proposition 7 If ~V is nonsingular, then

jj��jj2 � jj~�jj2 = �00 ~V�1=2�Ip � �12�012

��1=2�12�

012

�Ip � �12�012

��1=2 ~V�1=2�0:Let �12 =

Ppi=1 �iaib

0i be the singular value decomposition of �12 where faig and fbig are

orthonormal vectors in Rp and Rq respectively. Sorted in the descending order,��2iare the

(squared) long run canonical correlation coe¢ cients between Rf�U1 (vt; �0) and fU2 (vt; �0) : Then

jj��jj2 � jj~�jj2 =pXi=1

�2i1� �2i

ha0i ~V�1=2�0

i2:

Consider a special case that maxpi=1��2i= 0; i.e., �12 is a matrix of zeros. In this case, the

second block of moment conditions contains no additional information, and we have jj��jj2 = jj~�jj2:This case happens when BU;1 =

�22U��1

21U , i.e., the implied long run regression coe¢ cient isthe same as the population long run regression coe¢ cient. Consider another special case that�21 = maxpi=1

��2iapproaches 1. If a01 ~V�1=2�0 6= 0; then jj��jj2 � jj~�jj2 approaches 1 as �21

approaches 1 from below. This case happens when the second block of moment conditions hasvery high long run prediction power for the �rst block.

There are many cases in-between. Depending on the magnitude of the long run canonicalcorrelation coe¢ cients and the direction of the local alternative hypothesis, jj��jj2�jj~�jj2 can rangefrom 0 to 1: We expect that, under the �xed-smoothing asymptotics, there is a threshold valuec� > 0 such that when jj��jj2 � jj~�jj2 > c�; the two-step test is asymptotically more powerful thanthe �rst-step test. Otherwise, the two-step test is asymptotically less powerful. The thresholdvalue c� should depend on h; p; q, jj��jj2 and the signi�cance level �. It should be strictly greaterthan zero, as the two-step test entails the cost of estimating the optimal weighting matrix. Weleave the threshold determination to future research.

21

Page 22: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

6 Simulation Evidence and Empirical Application

6.1 Simulation Evidence

In this subsection, we study the �nite sample performance of the �xed-smoothing approximationsto the Wald statistic. We consider the following data generating process:

yt = x0;t�+ x1;t�1 + x2;t�2 + x3;t�3 + "y;t

where x0;t � 1 and x1;t, x2;t and x3;t are scalar regressors that are correlated with "y;t: Thedimension of the unknown parameter � = (�; �1; �2; �3)

0 is d = 4: We have m instrumentsz0;t; z1;t; :::; zm�1;t with z0;t � 1. The reduced-form equations for x1;t, x2;t and x3;t are given by

xj;t = zj;t +m�1Xi=d�1

zi;t + "xj ;t for j = 1; 2; 3:

We assume that zi;t for i � 1 follows either an AR(1) process

zi;t = �zi;t�1 +p1� �2ezi;t;

or an MA(1) processzi;t = �ezi;t�1 +

p1� �2ezi;t;

where

ezi;t =eizt + e

0ztp

2

and et = [e0zt; e1zt; :::; e

m�1zt ]0 s iidN(0; Im): By construction, the variance of zit for any i =

1; 2; :::;m � 1 is 1: Due to the presence of the common shocks e0zt; the correlation coe¢ cientbetween the non-constant zi;t and zj;t for i 6= j is 0:5: The DGP for "t = ("yt; "x1t; "x2t; "x3t)

0

is the same as that for (z1;t; :::; zm�1;t) except the dimensionality di¤erence. The two vectorprocesses "t and (z1;t; :::; zm�1;t) are independent from each other.

In the notation of this paper, we have f(vt; �) = zt (yt � x0t�) where vt = [y0t; x0t; z

0t]0, xt =

(1; x1t; x2t; x3t)0 , zt = (1; z1t; :::; zm�1;t)0. We take � = �0:8;�0:5; 0:0; 0:5; 0:8 and 0:9. We

consider m = 4; 5; 6 and the corresponding degrees of over-identi�cation are q = 0; 1; 2: The nullhypotheses of interest are

H01 : �1 = 0;

H02 : �1 = �2 = 0;

H03 : �1 = �2 = �3 = 0

where p = 1; 2 and 3 respectively. The corresponding matrix R is the 2 : p+1 rows of the identitymatrix Id: We consider two signi�cance levels � = 5% and � = 10% and three di¤erent samplesizes T = 100; 200; 500: The number of simulation replications is 10000.

We �rst examine the �nite sample size accuracy of di¤erent tests using the OS HAR varianceestimator. The tests are based on the same Wald test statistic, so they have the same size-adjusted power. The di¤erence lies in the reference distributions or critical values used. Weemploy the following critical values: �1��p =p, K

K�p+1F1��p;K�p+1;

KK�p�q+1F

1��p;K�p�q+1

��2�with

�2 = pq=(K�q+1); and F1��1 ; leading to the �2 test, the CF (central F) test, the NCF (noncentral

22

Page 23: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

F) test and the nonstandard F1 test. The �2 test uses the conventional chi-square approximation.The CF test uses the �xed-smoothing approximation for the Wald statistic based on a �rst-stepGMM estimator. Alternatively, the CF test uses the �xed-smoothing approximation with q = 0:The NCF test uses the noncentral F approximation given in Theorem 3. The F1 test uses thenonstandard limiting distribution F1 with simulated critical values. For each test, the initial�rst step estimator is the IV estimator with weight matrix Wo = (Z

0Z=T ) where Z is the matrixof the observed instruments.

To speed up the computation, we assume thatK is even and use the basis functions �2j�1(x) =p2 cos 2j�x, �2j(x) =

p2 sin 2j�x, j = 1; :::;K=2: In this case, the OS HAR variance estimator

can be computed using discrete Fourier transforms. The OS HAR estimator is a simple average ofperiodogram. We select K based on the AMSE criterion implemented using the VAR(1) plug-inprocedure in Phillips (2005). For completeness, we reproduce the MSE optimal formula for Khere:

KMSE = 2�&0:5

�tr [(Im2 +Kmm) ( )]

4vec(B)0vec(B)

�1=5T 4=5

';

where d�e is the ceiling function, Kmm is the m2 �m2 commutation matrix and

B = ��2

6

1Xj=�1

j2Ef(vt; �0)f(vt�j ; �0)0: (21)

We compute KMSE on the basis of the initial �rst step estimator ~�T and use it in computingboth WT (~�T ) and WT (�̂T ):

Table 1 gives the empirical size of the di¤erent tests for the AR(1) case with sample sizeT = 100. The nominal size of the tests is � = 5%: The results for � = 10% are qualitativelysimilar. First, as it is clear from the table, the chi-square test can have a large size distortion.The size distortion can be very severe. For example, when � = 0:9, p = 3 and q = 2, theempirical size of the chi-square test can be as high as 71.2%, which is far from 5%, the nominalsize of the test. Second, the size distortion of the CF test is substantially smaller than the chi-square test when the degree of over-identi�cation is small. This is because the CF test employsthe asymptotic approximation that partially captures the estimation uncertainty of the HARvariance estimator. Third, the empirical size of the NCF test is nearly the same as that of thenonstandard F1 test. This is consistent with Figure 1. This result provides further evidence thatthe noncentral F distribution approximates the nonstandard F1 distribution very well for at leastthe 95% quantile. Finally, among the four tests, the NCF and F1 tests have the most accuratesize. For the two-step GMM estimator, the HAR variance estimator appears in two di¤erentplaces and plays two di¤erent roles � �rst as the inverse of the optimal weighting matrix andthen as part of the asymptotic variance estimator for the GMM estimator. The nonstandard F1approximation and the noncentral F approximation attempt to capture the estimation uncertaintyof the HAR variance estimator in both places. In contrast, a crucial step underlying the centralF approximation is that the HAR variance estimator is treated as consistent when it acts as theoptimal weighting matrix. As a result, the central F approximation does not adequately capturethe estimation uncertainty of the HAR variance estimator. This explains why the NCF and F1tests have more accurate size than the CF test.

Table 2 presents the simulated empirical size for the MA(1) case. The qualitative observationsfor the AR(1) case remain valid. The chi-square test is most size distorted. The NCF and F1tests are least size distorted. The CF test is in between. As before, the size distortion increases

23

Page 24: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

with the serial dependence, the number of joint hypotheses, and the degree of over-identi�cation.This table provides further evidence that the noncentral F distribution and the nonstandardF1 distribution provide more accurate approximations to the sampling distribution of the Waldstatistic.

Tables 3 and 4 report results for the case when the sample size is 200. Compared to the caseswith sample size 100, all tests become more accurate in size. The size accuracy improves furtherwhen the sample size increases to 500. This is well expected. In terms of the size accuracy, theNCF test and the F1 test are close to each other. They dominate the CF test, which in turndominates the chi-square test.

Next, we consider the empirical size of di¤erent tests based on kernel HAR variance estimators.Both the contracted kernel approach and the exponentiated kernel approach are considered, butwe report only the commonly-used contracted kernel approach as the results are qualitativelysimilar. We employ three kernels: the Bartlett, Parzen and QS kernels. These three kernelsare positive (semi) de�nite. For each kernel, we use the data-driven AMSE optimal bandwidthand its VAR(1) plug-in implementation from Andrews (1991). For each Wald statistic, threecritical values are used: �1��p =p, F1��1 [0] ; and F1��1 [q] where F1��1 [q] is the (1� �) quantilefrom the nonstandard F1 distribution with degree of over-identi�cation q: F1��1 [0] coincides withthe critical value from the �xed-smoothing asymptotics distribution derived for the �rst-step IVestimator.

To save space, we only report the results for the Parzen kernel. Tables 5�8 correspondto Tables 1�4. It is clear that the qualitative results exhibited in Tables 1�4 continue to apply.According to the size accuracy, the F1 [q] test dominates the F1 [0] test, which in turn dominatesthe conventional �2 test.

6.2 Empirical Application

To illustrate the �xed-smoothing approximations, we consider the log-normal stochastic volatilitymodel of the form:

rt = �tZt

log �2t = ! + ��log �2t�1 � !

�+ �uut

where rt is the rate of return and (Zt; ut) is iidN(0; I2): The �rst equation speci�es the distributionof the return as heteroscedastic normal. The second equation speci�es the dynamics of the log-volatility as an AR(1). The parameter vector is � = (!; �; �u) : We impose the restriction that� 2 (0; 1); which is an empirically relevant range. The model and the parameter restriction are thesame as those considered by Andersen and Sorensen (1996), which gives a detailed discussion onthe motivation of the stochastic volatility models and the GMM approach. For more discussions,see Ghysels, Harvey, Renault (1996) and references therein.

We employ the GMM to estimate the log-normal stochastic volatility model. The data areweekly returns of S&P 500 stocks, which are constructed by compounding daily returns withdividends from a CRSP index �le. We consider both value-weighted returns (vwretd) and equal-weighted returns (ewretd). The weekly returns range from the �rst week of 2001 to the lastweek of 2012 with sample size T = 627: We use weekly data in order to minimize problemsassociated with daily data such as asynchronous trading and bid-ask bounce. This is consistentwith Jacquier, Polson and Rossi (1994).

The GMM approach relies on functions of the time series frtg to identify the parameters ofthe model. For the log-normal stochastic volatility model, we can obtain the following moment

24

Page 25: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

conditions:

E jrtj` = c`E��`t

�for ` = 1; 2; 3; 4 with (c1; c2; c3; c4) =

�p2=�; 1; 2

p2=�; 3

�E jrtrt�j j = 2��1E (�t�t�j) and Er2t r

2t�j = E

��2t�

2t�j�for j = 1; 2; :::

where

E��`t

�= exp

�!`

2+

`2�2u8(1� �2)

�;

E��`1t �

`2t�j

�= E

��`1t

�E��`2t

�exp

��2u`1`2�

j

4(1� �2)

�:

Higher order moments can be computed but we choose to focus on a subset of lower ordermoments. Andersen and Sorensen (1996) points out that it is generally not optimal to includetoo many moment conditions when the sample is limited. On the other hand, it is not advisableto include just as many moment conditions as the number of parameters. When T = 500 and� = (�7:36; 0:90; :363), which is an empirically relevant parameter vector, Table 1 in Andersenand Sorensen (1996) shows that it is MSE-optimal to employ nine moment conditions. For thisreason, we employ two sets of nine moment conditions given in the appendix of Andersen andSorensen (1996). The baseline set of the nine moment conditions are Efi(rt; �) = 0 for i = 1; :::; 9with

fi(rt; �) = jrtji � c`E(�`t); i = 1; :::; 4;f5(rt; �) = jrtrt�1j � 2��1E (�t�t�1) ;f6(rt; �) = jrtrt�3j � 2��1E (�t�t�3) ;f7(rt; �) = jrtrt�5j � 2��1E (�t�t�5) ;f8(rt; �) = r2t r

2t�2 � E

��2t�

2t�2�;

f9(rt; �) = r2t r2t�4 � E

��2t�

2t�4�: (22)

The alternative set of the nine moment conditions are Efi(rt; �) = 0 for i = 1; :::; 9 with

fi(rt; �) = jrtji � c`E��`t

�; i = 1; :::; 4;

f5(rt; �) = jrtrt�2j � 2��1E (�t�t�2) ;f6(rt; �) = jrtrt�4j � 2��1E (�t�t�4) ;f7(rt; �) = jrtrt�6j � 2��1E (�t�t�6) ;f8(rt; �) = r2t r

2t�1 � E

��2t�

2t�1�;

f9(rt; �) = r2t r2t�3 � E

��2t�

2t�3�: (23)

In each case m = 9 and d = 3, and so the degree of over-identi�cation is q = m� d = 6:We focus on constructing 90% and 95% con�dence intervals (CI�s) for �: Given the high

nonlinearity of the moment conditions, we invert the GMM distance statistics to obtain the CI�s.The CI�s so obtained are invariant to the reparametrization of model parameters. For example, a95% con�dence interval is the set of � values in (0,1) that the DT test does not reject at the 5%level. We search over the grid from 0.01 to 0.99 with increment 0.01 to invert the DT test. As inthe simulation study, we employ three di¤erent critical values: �1��p =p, F1��1 [0] ; and F1��1 [q],

25

Page 26: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

which correspond to three di¤erent asymptotic approximations. For the series LRV estimator,F1��1 [0] is a critical value from the corresponding F approximation.

Before presenting the empirical results, we notice that there is no �hole�in the CI�s we obtain� the values of � that are not rejected form an arithmetic sequence. Given this, we report onlythe minimum and maximum values of � over the grid that are not rejected. It turns out themaximum value is always equal to the upper bound 0.99. It thus su¢ ces to report the minimumvalue, which we take as the lower limit of the CI.

Table 9 presents the lower limits for di¤erent 95% CI�s together with the smoothing parametersused. The smoothing parameters are selected in the same ways as in the simulation study.Since the selected K values are relatively large and the selected b values are relatively small,the CI�s based on the �2 approximation and the F1[0] approximation are close to each other.However, there is still a noticeable di¤erence between the CI�s based on the �2 approximationand the nonstandard F1[q] approximation. This is especially true for the case with baselinemoment conditions, the Bartlett kernel, and equally-weighted returns. Taking into account therandomness of the estimated weighting matrix leads to wider CI�s. The log-volatility may not beas persistent as previously thought.

7 Conclusion

The paper has developed the �xed-smoothing asymptotics for heteroskedasticity and autocorre-lation robust inference in a two-step GMM framework. We have shown that the conventionalchi-square test that ignores the estimation uncertainty of the GMM weighting matrix and thecovariance estimator can have very large size distortion. This is especially true when the numberof joint hypotheses being tested and the degree of over-identi�cation are high or the underlyingprocesses are persistent. The test based on our new �xed-smoothing approximation reduces thesize distortion substantially and is thus recommended for practical use.

There are a number of interesting extensions. First, given that our proof uses only a CLT,the results of the paper can be extended easily to the spatial setting, spatial-temporal setting orpanel data setting. See Kim and Sun (2013) for an extension along this line. Second, in the MonteCarlo experiments, we use the conventional MSE criterion to select the smoothing parameters.We could have used the methods proposed in Sun and Phillips (2009), Sun, Phillips and Jin(2008) or Sun (2014) that are tailored towards con�dence interval construction or hypothesistesting. But in their present forms these methods are either available only for the t-test or workonly in the �rst-step GMM framework. It will be interesting to extend them to more generaltests in the two-step GMM framework. Third, we may employ the asymptotically equivalentdistribution to conduct more reliable inference. Simulation results not reported here show thatthe size di¤erence between using the asymptotically equivalent distribution and the limitingdistribution is negligible. However, it is easy to replace the iid process in the asymptoticallyequivalent distribution FeT by a dependent process that mimics the degree of autocorrelation inthe moment process. Such a replacement may lead to even more accurate tests and is similarin spirit to Zhang and Shao (2013) who have shown the high order re�nement of a Gaussianbootstrap procedure under the �xed-smoothing asymptotics. Finally, the general results of thepaper can be extended to a nonparametric sieve GMM framework. See Chen, Liao and Sun(2014) for a recent development on autocorrelation robust sieve inference for time series models.

26

Page 27: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Table 1: Empirical size of the nominal 5% �2 test, F test, noncentral F test and nonstandardF1 test based on the OS HAR variance estimator for the AR(1) case with T = 100, number ofjoint hypotheses p and number of overidentifying restrictions q� �2 CF NCF F1 �2 CF NCF F1 �2 CF NCF F1

p = 1; q = 0 p = 2; q = 0 p = 3; q = 0-0.8 0.114 0.072 0.072 0.073 0.197 0.087 0.087 0.084 0.310 0.109 0.109 0.111-0.5 0.081 0.060 0.060 0.059 0.117 0.066 0.066 0.066 0.174 0.077 0.077 0.0780.0 0.063 0.051 0.051 0.050 0.083 0.052 0.052 0.053 0.112 0.060 0.060 0.0620.5 0.094 0.063 0.063 0.063 0.142 0.065 0.065 0.065 0.222 0.077 0.077 0.0780.8 0.134 0.086 0.086 0.088 0.229 0.100 0.100 0.097 0.355 0.119 0.119 0.1220.9 0.166 0.117 0.117 0.120 0.290 0.150 0.150 0.146 0.437 0.181 0.181 0.184

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.186 0.129 0.081 0.077 0.307 0.153 0.088 0.087 0.457 0.193 0.107 0.113-0.5 0.113 0.086 0.065 0.065 0.175 0.099 0.069 0.068 0.247 0.116 0.079 0.0800.0 0.081 0.065 0.053 0.052 0.113 0.075 0.057 0.056 0.155 0.085 0.060 0.0600.5 0.128 0.091 0.064 0.063 0.204 0.110 0.073 0.072 0.308 0.130 0.079 0.0800.8 0.196 0.140 0.089 0.086 0.331 0.172 0.101 0.099 0.489 0.209 0.112 0.1180.9 0.252 0.184 0.126 0.123 0.420 0.242 0.155 0.153 0.589 0.301 0.183 0.192

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.260 0.190 0.080 0.079 0.425 0.256 0.090 0.083 0.602 0.318 0.100 0.097-0.5 0.154 0.117 0.061 0.062 0.244 0.146 0.065 0.061 0.351 0.175 0.074 0.0730.0 0.104 0.085 0.055 0.055 0.148 0.101 0.062 0.058 0.211 0.119 0.065 0.0630.5 0.171 0.129 0.065 0.066 0.279 0.165 0.065 0.061 0.415 0.200 0.073 0.0720.8 0.268 0.199 0.085 0.082 0.449 0.269 0.090 0.082 0.623 0.330 0.100 0.0970.9 0.332 0.257 0.124 0.121 0.529 0.358 0.148 0.137 0.712 0.433 0.160 0.157

Table 2: Empirical size of the nominal 5% �2 test, F test, noncentral F test and nonstandardF1 test based on the series LRV estimator for the MA(1) case with T = 100, number of jointhypotheses p and number of overidentifying restrictions q� �2 CF NCF F1 �2 CF NCF F1 �2 CF NCF F1

p = 1; q = 0 p = 2; q = 0 p = 3; q = 0-0.8 0.072 0.055 0.055 0.055 0.106 0.058 0.058 0.058 0.153 0.069 0.069 0.070-0.5 0.074 0.055 0.055 0.055 0.101 0.058 0.058 0.059 0.144 0.069 0.069 0.0700.0 0.063 0.051 0.051 0.050 0.083 0.052 0.052 0.053 0.112 0.060 0.060 0.0620.5 0.078 0.054 0.054 0.054 0.119 0.060 0.060 0.061 0.182 0.071 0.071 0.0720.8 0.081 0.056 0.056 0.056 0.125 0.060 0.060 0.062 0.195 0.071 0.071 0.0710.9 0.077 0.055 0.055 0.055 0.114 0.059 0.059 0.060 0.170 0.070 0.070 0.070

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.099 0.074 0.057 0.056 0.160 0.090 0.061 0.060 0.223 0.099 0.067 0.068-0.5 0.097 0.075 0.057 0.057 0.150 0.088 0.061 0.060 0.207 0.096 0.068 0.0680.0 0.081 0.065 0.053 0.052 0.113 0.075 0.057 0.056 0.155 0.085 0.060 0.0600.5 0.108 0.078 0.057 0.056 0.171 0.093 0.062 0.062 0.253 0.107 0.068 0.0680.8 0.116 0.082 0.057 0.057 0.185 0.097 0.062 0.062 0.274 0.115 0.069 0.0700.9 0.107 0.077 0.058 0.057 0.163 0.093 0.063 0.062 0.239 0.107 0.070 0.071

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.137 0.105 0.054 0.054 0.216 0.133 0.058 0.056 0.305 0.153 0.066 0.066-0.5 0.131 0.100 0.055 0.055 0.204 0.128 0.063 0.060 0.284 0.146 0.066 0.0650.0 0.104 0.085 0.055 0.055 0.148 0.101 0.062 0.058 0.211 0.119 0.065 0.0630.5 0.143 0.109 0.060 0.061 0.231 0.138 0.063 0.060 0.339 0.168 0.067 0.0670.8 0.153 0.115 0.060 0.061 0.247 0.146 0.062 0.059 0.366 0.177 0.067 0.0660.9 0.137 0.105 0.058 0.059 0.218 0.134 0.061 0.059 0.316 0.159 0.070 0.070

27

Page 28: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Table 3: Empirical size of the nominal 5% �2 test, F test, noncentral F test and nonstandardF1 test based on the series LRV estimator for the AR(1) case with T = 200, number of jointhypotheses p and number of overidentifying restrictions q� �2 CF NCF F1 �2 CF NCF F1 �2 CF NCF F1

p = 1; q = 0 p = 2; q = 0 p = 3; q = 0-0.8 0.102 0.071 0.071 0.071 0.150 0.073 0.073 0.073 0.234 0.091 0.091 0.091-0.5 0.070 0.058 0.058 0.058 0.088 0.062 0.062 0.063 0.117 0.071 0.071 0.0720.0 0.060 0.054 0.054 0.054 0.065 0.052 0.052 0.052 0.081 0.059 0.059 0.0570.50 0.079 0.063 0.063 0.063 0.104 0.063 0.063 0.064 0.144 0.066 0.066 0.0700.8 0.114 0.072 0.072 0.073 0.187 0.079 0.079 0.078 0.283 0.092 0.092 0.0930.9 0.140 0.091 0.091 0.094 0.243 0.107 0.107 0.104 0.366 0.124 0.124 0.128

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.147 0.107 0.072 0.072 0.234 0.126 0.080 0.081 0.348 0.147 0.092 0.094-0.5 0.083 0.070 0.058 0.057 0.117 0.081 0.066 0.065 0.154 0.094 0.073 0.0720.0 0.069 0.060 0.053 0.053 0.081 0.064 0.058 0.058 0.100 0.071 0.062 0.0610.5 0.099 0.080 0.062 0.061 0.139 0.088 0.065 0.062 0.182 0.092 0.070 0.0720.8 0.164 0.112 0.076 0.073 0.270 0.138 0.083 0.083 0.394 0.159 0.087 0.0900.9 0.205 0.145 0.095 0.090 0.347 0.178 0.110 0.108 0.498 0.219 0.121 0.127

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.195 0.148 0.071 0.073 0.319 0.192 0.075 0.070 0.460 0.233 0.083 0.082-0.5 0.101 0.082 0.060 0.060 0.144 0.098 0.066 0.063 0.190 0.116 0.069 0.0700.0 0.074 0.066 0.053 0.054 0.096 0.074 0.056 0.057 0.117 0.083 0.059 0.0600.5 0.113 0.091 0.056 0.056 0.166 0.107 0.064 0.059 0.234 0.129 0.070 0.0690.8 0.225 0.170 0.072 0.073 0.371 0.216 0.075 0.070 0.539 0.264 0.081 0.0780.9 0.276 0.210 0.090 0.088 0.460 0.280 0.099 0.090 0.639 0.349 0.106 0.103

Table 4: Empirical size of the nominal 5% �2 test, F test, noncentral F test and nonstandardF1 test based on the series LRV estimator for the MA(1) case with T = 200, number of jointhypotheses p and number of overidentifying restrictions q� �2 CF NCF F1 �2 CF NCF F1 �2 CF NCF F1

p = 1; q = 0 p = 2; q = 0 p = 3; q = 0-0.8 0.066 0.054 0.054 0.053 0.082 0.056 0.056 0.056 0.106 0.062 0.062 0.064-0.5 0.066 0.056 0.056 0.055 0.079 0.054 0.054 0.056 0.103 0.061 0.061 0.0640.0 0.060 0.054 0.054 0.054 0.065 0.052 0.052 0.052 0.081 0.059 0.059 0.0570.5 0.070 0.058 0.058 0.057 0.089 0.057 0.057 0.057 0.120 0.060 0.060 0.0630.8 0.072 0.057 0.057 0.057 0.091 0.058 0.058 0.058 0.130 0.060 0.060 0.0630.9 0.069 0.056 0.056 0.056 0.084 0.057 0.057 0.057 0.114 0.061 0.061 0.063

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.080 0.066 0.055 0.055 0.105 0.075 0.063 0.061 0.141 0.084 0.069 0.068-0.5 0.077 0.065 0.056 0.056 0.102 0.076 0.064 0.062 0.134 0.082 0.066 0.0660.0 0.069 0.060 0.053 0.053 0.081 0.064 0.058 0.058 0.100 0.071 0.062 0.0610.5 0.085 0.071 0.058 0.057 0.115 0.077 0.060 0.058 0.152 0.083 0.064 0.0630.8 0.089 0.072 0.057 0.056 0.123 0.077 0.058 0.057 0.164 0.085 0.064 0.0650.9 0.080 0.067 0.057 0.056 0.110 0.074 0.059 0.058 0.142 0.084 0.066 0.065

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.096 0.080 0.056 0.056 0.130 0.090 0.059 0.058 0.177 0.109 0.067 0.066-0.5 0.095 0.079 0.055 0.057 0.123 0.089 0.060 0.060 0.165 0.106 0.065 0.0640.0 0.074 0.066 0.053 0.054 0.096 0.074 0.056 0.057 0.117 0.083 0.059 0.0600.5 0.100 0.080 0.054 0.055 0.136 0.094 0.059 0.058 0.184 0.110 0.065 0.0660.8 0.103 0.083 0.054 0.055 0.146 0.096 0.059 0.056 0.202 0.113 0.063 0.0630.9 0.093 0.079 0.055 0.056 0.131 0.093 0.059 0.058 0.172 0.105 0.065 0.065

28

Page 29: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Table 5: Empirical size of the nominal 5% �2 test, the F1[0] test and the F1[q] test based onthe Parzen kernel LRV estimator for the AR(1) case with T = 100, number of joint hypothesesp and number of overidentifying restrictions q

� �2 F1 [0] F1 [q] �2 F1 [0] F1 [q] �2 F1 [0] F1 [q]p = 1; q = 0 p = 2; q = 0 p = 3; q = 0

-0.8 0.132 0.091 0.091 0.215 0.112 0.112 0.326 0.153 0.153-0.5 0.085 0.073 0.073 0.116 0.082 0.082 0.160 0.100 0.1000.0 0.063 0.059 0.059 0.071 0.058 0.058 0.093 0.070 0.0700.5 0.093 0.074 0.074 0.131 0.076 0.076 0.182 0.096 0.0960.8 0.164 0.102 0.102 0.273 0.126 0.126 0.408 0.154 0.1540.9 0.216 0.124 0.124 0.382 0.164 0.164 0.559 0.212 0.212

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.197 0.145 0.110 0.313 0.174 0.127 0.443 0.224 0.164-0.5 0.105 0.091 0.076 0.150 0.104 0.088 0.201 0.126 0.1040.0 0.073 0.067 0.059 0.096 0.077 0.067 0.121 0.084 0.0740.5 0.117 0.095 0.079 0.169 0.111 0.090 0.236 0.132 0.1040.8 0.223 0.151 0.110 0.362 0.189 0.132 0.521 0.230 0.1600.9 0.315 0.204 0.145 0.521 0.271 0.182 0.704 0.329 0.226

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.257 0.195 0.115 0.406 0.242 0.139 0.555 0.305 0.166-0.5 0.128 0.110 0.078 0.188 0.131 0.091 0.248 0.155 0.1030.0 0.086 0.079 0.060 0.114 0.091 0.070 0.150 0.108 0.0800.5 0.139 0.115 0.081 0.210 0.140 0.088 0.291 0.167 0.1050.8 0.282 0.201 0.113 0.456 0.260 0.135 0.613 0.319 0.1570.9 0.384 0.269 0.144 0.605 0.365 0.190 0.774 0.438 0.221

Table 6: Empirical size of the nominal 5% �2 test, the F1(0) test and the F1(q) test based onthe Parzen kernel LRV estimator for the MA(1) case with T = 100, number of joint hypothesesp and number of overidentifying restrictions q

� �2 F1 [0] F1 [q] �2 F1 [0] F1 [q] �2 F1 [0] F1 [q]p = 1; q = 0 p = 2; q = 0 p = 3; q = 0

-0.8 0.075 0.065 0.065 0.101 0.070 0.070 0.133 0.086 0.086-0.5 0.073 0.065 0.065 0.096 0.070 0.070 0.126 0.085 0.0850.0 0.063 0.059 0.059 0.071 0.058 0.058 0.093 0.070 0.0700.5 0.079 0.064 0.064 0.107 0.069 0.069 0.150 0.084 0.0840.8 0.081 0.066 0.066 0.113 0.070 0.070 0.160 0.086 0.0860.9 0.076 0.063 0.063 0.103 0.070 0.070 0.140 0.082 0.082

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.091 0.079 0.067 0.133 0.094 0.078 0.173 0.108 0.088-0.5 0.086 0.076 0.067 0.126 0.094 0.078 0.162 0.106 0.0870.0 0.073 0.067 0.059 0.096 0.077 0.067 0.121 0.084 0.0740.5 0.097 0.081 0.070 0.140 0.094 0.077 0.191 0.111 0.0880.8 0.104 0.083 0.069 0.149 0.097 0.077 0.203 0.116 0.0900.9 0.095 0.081 0.069 0.134 0.092 0.075 0.181 0.108 0.091

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.114 0.099 0.072 0.169 0.118 0.083 0.215 0.139 0.093-0.5 0.110 0.096 0.069 0.157 0.114 0.081 0.201 0.133 0.0910.0 0.086 0.079 0.060 0.114 0.091 0.070 0.150 0.108 0.0800.5 0.120 0.100 0.071 0.168 0.117 0.083 0.229 0.138 0.0920.8 0.125 0.105 0.072 0.181 0.122 0.081 0.249 0.144 0.0910.9 0.113 0.098 0.071 0.160 0.114 0.081 0.216 0.133 0.091

29

Page 30: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Table 7: Empirical size of the nominal 5% �2 test, the F1[0] test and the F1[q] test based onthe Parzen kernel LRV estimator for the AR(1) case with T = 200, number of joint hypothesesp and number of overidentifying restrictions q

� �2 F1 [0] F1 [q] �2 F1 [0] F1 [q] �2 F1 [0] F1 [q]p = 1; q = 0 p = 2; q = 0 p = 3; q = 0

-0.8 0.110 0.090 0.090 0.150 0.098 0.098 0.218 0.123 0.123-0.5 0.075 0.073 0.073 0.090 0.078 0.078 0.114 0.093 0.0930.0 0.059 0.061 0.061 0.061 0.058 0.058 0.074 0.070 0.0700.5 0.080 0.073 0.073 0.096 0.075 0.075 0.126 0.086 0.0860.8 0.124 0.091 0.091 0.186 0.102 0.102 0.261 0.122 0.1220.9 0.170 0.105 0.105 0.286 0.129 0.129 0.421 0.161 0.161

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.143 0.117 0.096 0.208 0.136 0.110 0.293 0.166 0.134-0.5 0.083 0.079 0.071 0.108 0.091 0.083 0.137 0.109 0.1000.0 0.063 0.064 0.061 0.074 0.070 0.066 0.086 0.079 0.0750.5 0.091 0.085 0.074 0.117 0.091 0.079 0.147 0.102 0.0880.8 0.155 0.118 0.094 0.239 0.142 0.110 0.328 0.167 0.1270.9 0.235 0.160 0.116 0.391 0.199 0.140 0.542 0.243 0.170

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.168 0.138 0.094 0.257 0.174 0.114 0.349 0.213 0.137-0.5 0.092 0.087 0.073 0.124 0.103 0.083 0.149 0.116 0.0950.0 0.066 0.067 0.061 0.078 0.074 0.065 0.092 0.084 0.0740.5 0.098 0.090 0.071 0.131 0.103 0.079 0.172 0.121 0.0920.8 0.193 0.154 0.096 0.295 0.185 0.113 0.415 0.230 0.1290.9 0.296 0.215 0.121 0.469 0.275 0.147 0.637 0.338 0.173

Table 8: Empirical size of the nominal 5% �2 test, the F1[0] test and the F1[q] test based onthe Parzen kernel LRV estimator for the MA(1) case with T = 200, number of joint hypothesesp and number of overidentifying restrictions q

� �2 F1 [0] F1 [q] �2 F1 [0] F1 [q] �2 F1 [0] F1 [q]p = 1; q = 0 p = 2; q = 0 p = 3; q = 0

-0.8 0.065 0.064 0.064 0.077 0.068 0.068 0.098 0.077 0.077-0.5 0.065 0.065 0.065 0.075 0.067 0.067 0.093 0.077 0.0770.0 0.059 0.061 0.061 0.061 0.058 0.058 0.074 0.070 0.0700.5 0.070 0.066 0.066 0.082 0.066 0.066 0.104 0.074 0.0740.8 0.071 0.066 0.066 0.085 0.066 0.066 0.111 0.075 0.0750.9 0.069 0.066 0.066 0.079 0.065 0.065 0.101 0.076 0.076

p = 1; q = 1 p = 2; q = 1 p = 3; q = 1-0.8 0.076 0.074 0.065 0.095 0.082 0.074 0.118 0.093 0.085-0.5 0.074 0.073 0.066 0.091 0.082 0.074 0.113 0.093 0.0860.0 0.063 0.064 0.061 0.074 0.070 0.066 0.086 0.079 0.0750.5 0.077 0.074 0.066 0.097 0.080 0.070 0.118 0.089 0.0800.8 0.081 0.076 0.066 0.102 0.081 0.071 0.126 0.090 0.0780.9 0.075 0.072 0.064 0.095 0.078 0.069 0.114 0.089 0.080

p = 1; q = 2 p = 2; q = 2 p = 3; q = 2-0.8 0.087 0.083 0.069 0.107 0.092 0.073 0.135 0.108 0.089-0.5 0.085 0.083 0.069 0.104 0.088 0.072 0.131 0.106 0.0860.0 0.066 0.067 0.061 0.078 0.074 0.065 0.092 0.084 0.0740.5 0.085 0.082 0.066 0.111 0.092 0.072 0.137 0.104 0.0830.8 0.089 0.083 0.065 0.118 0.094 0.072 0.145 0.107 0.0820.9 0.083 0.081 0.066 0.108 0.090 0.071 0.132 0.102 0.085

30

Page 31: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Table 9: Lower limits of di¤erent 95% CI�s for � and data-driven smoothing parameters in thelog-normal stochastic volatility model

Equally-weighted Return Value-weighted ReturnBaseline Alternative Baseline Alternative

Series �2 0.74 0.58 0.78 0.62Series F1 [0] 0.73 0.57 0.78 0.62Series F1 [q] 0.70 0.53 0.76 0.57K (140) (138) (140) (140)

Bartlett �2 0.56 0.52 0.78 0.60Bartlett F1 [0] 0.54 0.52 0.77 0.60Bartlett F1 [q] 0.35 0.42 0.75 0.53b (.0155) (.0157) (0.155) (.0155)

Parzen �2 0.72 0.52 0.78 0.50Parzen F1 [0] 0.73 0.52 0.78 0.50Parzen F1 [q] 0.69 0.47 0.77 0.43b (.0136) (.0139) (.0136) (.0136)

QS �2 0.74 0.52 0.78 0.52QS F1 [0] 0.74 0.52 0.78 0.52QS F1 [q] 0.71 0.48 0.77 0.46b (.0068) (.0069) (.0067) (.0067)

Baseline: CI using baseline 9 moment conditions given in (22);Alternative: CI using alternative 9 moment conditions given in (23);

Series �2 : the CI constructed based on the OS HAR variance estimator and using �21 as thereference distribution. Other row titles are similarly de�ned.

31

Page 32: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

8 Appendix of Proofs

8.1 Technical Lemmas

Consider two stochastically bounded sequences of random variables f�T 2 Rpg and f�T 2 Rpg.Given the stochastic boundedness, there exists a nondecreasing mapping s(T ) : N! N such that�s(T )

d! �1 and �s(T )d! �1 for some random vectors �1 and �1: Let C (�1; �1) be the set

of points at which the CDFs of �1 and �1 are continuous. The following lemma provides acharacterization of asymptotic equivalence in distribution. The proof is similar to that for theLévy-Cramér continuity theorem and is omitted here.

Lemma 2 The following two statements are equivalent:(a) P (�s(T ) � x) = P (�s(T ) � x) + o (1) for all x 2 C (�1; �1) ;(b) �s(T )

as �s(T ):

Lemma 3 If (�1T ; �1T ) converges weakly to (�; �) and

P (�1T � x; �1T � y) = P (�2T � x; �2T � y) + o (1) as T !1 (24)

for all continuity point (x; y) of the CDF of (�; �); then

g(�1T ; �1T )as g(�2T ; �2T );

where g (�; �) is continuous on a set C such that P ((�; �) 2 C) = 1:

Proof of Lemma 3. Since (�1T ; �1T ) converges weakly, we can apply Lemma 2 with s(T ) = T:It follows from the condition in (24) that Ef(�1T ; �1T )�Ef(�2T ; �2T )! 0 for any f 2 BC. ButEf(�1T ; �1T )! Ef(�; �) and so Ef(�2T ; �2T )! Ef(�; �) for any f 2 BC. That is, (�2T ; �2T ) alsoconverges weakly to (�; �) : Using the same proof for proving the continuous mapping theorem, wehave Ef (g(�1T ; �1T ))�Ef (g(�2T ; �2T ))! 0 for any f 2 BC: Therefore g(�1T ; �1T )

as g(�2T ; �2T ):

Lemma 4 Suppose !T = �T;J + �T;J and !T does not depend on J: Assume that there exist ��T;J

and �T such that(i) P

��T;J < �

�� P

���T;J < �

�= o(1) for each �nite J and each � 2 R as T !1;

(ii) P���T;J < �

�� P (�T < �) = o(1) uniformly over su¢ ciently large T for each � 2 R as

J !1;(iii) the CDF of �T is equicontinuous on R when T is su¢ ciently large, and(iv) for every � > 0; supT P

����T;J �� > �� p! 0 as J !1: Then

P (!T < �) = P (�T < �) + o(1) for each � 2 R as T !1:

Proof of Lemma 4. Let " > 0 and � 2 R: Under condition (iii), we can �nd a � := �(") > 0such that for some integer Tmin > 0

P (� � � � �T < � + �) � "

for all T � Tmin: Here Tmin does not depend on � or ": Under condition (iv), we can �nd aJmin := Jmin (") that does not depend on T such that

P����T;J �� > �� � "

32

Page 33: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

for all J � Jmin and all T: From condition (ii), we can �nd a J 0min � Jmin and a T 0min � Tmin suchthat ��P ���T;J < ��� P (�T < �)�� � "for all J � J 0min and all T � T 0min: It follows from condition (i) that for any �nite J0 � J 0min; thereexists a T 00min(J0) � T 0min � Tmin such that��P ��T;J0 < � + ��� P ���T;J0 < � + ���� � "��P ��T;J0 < � � ��� P ���T;J0 < � � ���� � "for T � T 00min(J0):

When T � T 00min(J0); we have

P (!T � �) = P��T;J0 + �T;J0 � �

�� P

��T;J0 � � + �

�+ P

����T;J0�� > ��� P

���T;J0 � � + �

�+ 2" � P (�T < � + �) + 3"

� P (�T < �) + 4":

Similarly,

P (!T � �) = P��T;J0 + �T;J0 � �

�� P (�T;J0 � � � �)� P

����T;J0�� � ��� P (��T;J0 � � � �)� 2" � P (�T � � � �)� 3"� P (�T � �)� 4":

Since the above two inequalities hold for all " > 0; we must have P (!T < �) = P (�T < �) + o(1)as T !1:

8.2 Proof of the Main Results

Proof of Lemma 1. Part (a). Let

eT (s) =1

T

TXr=1

Qh

� rT; s��Z 1

0Qh(r; s)dr;

Then

WT (��T )�W �T (��T )

= �"1

T

TXt=1

f(vt; ��T )

#"TXs=1

eT (s)f(vs; ��T )0

#�"1

T

TXt=1

eT (t)f(vt; ��T )

#"TXs=1

f(vs; ��T )0

#

+

"1

T

TX�=1

TXr=1

Qh

� �T;r

T

��Z 1

0

Z 1

0Qh(� ; r)d�dr

#"1

T

TXt=1

TXs=1

f(vt; ��T )f(vs; ��T )0

#:

Under Assumption 1, we have, for any �xed h :

1

T

TX�=1

TXr=1

Qh

� �T;r

T

��Z 1

0

Z 1

0Qh(� ; r)d�dr = O

�1

T

�:

33

Page 34: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Hence WT (��T )�W �T (��T )

� 1pT

TXt=1

f(vt; ��T )

� 1pT

TXs=1

eT (s)f(vs; ��T )

+

1pT

TXt=1

eT (t)f(vt; ��T )

� 1pT

TXs=1

f(vs; ��T )

+O( 1T ) 1pT

TXt=1

f(vt; ��T )

2

: (25)

We proceed to evaluate the orders of the terms in above upper bound. First, for some ���Tbetween ��T and �0; we have

1pT

TXt=1

f(vt; ��T ) =1pT

TXt=1

f(vt; �0) +GT����T

�pT���T � �0

�= Op(1) (26)

by Assumptions 4 and 5. Here ���T in the matrix GT����T

�can be di¤erent for di¤erent rows. For

notational economy, we do not make this explicit. We follow this practice in the rest of the proof.Next, using summation by parts, we have

1pT

TXt=1

eT (t)f(vt; ��T ) =1pT

TXt=1

eT (t)f(vt; �0) +1

T

TXt=1

eT (t)@f(vt; ��

�T )

@�0pT���T � �0

�=

1pT

TXt=1

eT (t)f(vt; �0) +T�1Xt=1

[eT (t)� eT (t+ 1)]Gt����T

�pT���T � �0

�+eT (T )GT (��

�T )pT���T � �0

�:

Assumption 1 implies that eT (t) = O(1=T ) uniformly over t: So eT (T )GT (���T )pT���T � �0

�=

Op (1=T ) : For any conformable vector a;

var

"1pT

TXt=1

eT (t)a0f(vt; �0)

#� 1

T 2

1Xj=�1

��a0�ja�� � 1

T 2

0@ 1Xj=�1

k�jk

1A kak2 = O� 1

T 2

by Assumption 3. So T�1=2PTt=1 eT (t)f(vt; �0) = Op(1=T ): It then follows that

1pT

TXt=1

eT (t)f(vt; ��T ) =

T�1Xt=1

[eT (t)� eT (t+ 1)]Gt����T

�pT���T � �0

�+Op

�1

T

�: (27)

34

Page 35: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Letting �t = Gt(���T )� t

TG; we can write

T�1Xt=1

[eT (t)� eT (t+ 1)]Gt����T

�pT���T � �0

�=

T�1Xt=1

[eT (t)� eT (t+ 1)] �tpT���T � �0

�+1

T

T�1Xt=1

[eT (t)� eT (t+ 1)] tGpT���T � �0

�=

T�1Xt=1

[eT (t)� eT (t+ 1)] �tpT���T � �0

�+1

T

TXt=1

eT (t)GpT���T � �0

�� eT (T )G

pT���T � �0

�=

T�1Xt=1

[eT (t)� eT (t+ 1)] �tpT���T � �0

�+Op

�1

T

�: (28)

Given that supt k�tk = op (1) under Assumption 4 and eT (�) is piecewise monotonic under As-sumption 1, we can easily show that

PT�1t=1 [eT (t)� eT (t+ 1)] �t

pT���T � �0

�= op (1=T ) : Com-

bining this with (27) and (28) yields

T�1=2TXt=1

eT (t)f(vt; ��T ) = Op (1=T ) : (29)

Part (a) now follows from (25), (26) and (29).Part (b). For some ���T between ��T and �0; we have

W �T (��T ) =

1

T

TXt=1

TX�=1

Q�h

�t

T;�

T

��ut +

@f(vt; ���T )

@�0

���T � �0

���u� +

@f(v� ; ���T )

@�0

���T � �0

��0=W �

T (�0) + I1 + I01 + I2; (30)

where

I1 =1

T

TX�=1

TXt=1

Q�h

�t

T;�

T

�@f(vt; ��

�T )

@�0

���T � �0

�u0� ;

I2 =1

T 2

TXt=1

TX�=1

Q�h

�t

T;�

T

��@f(vt; ��

�T )

@�0

���T � �0

�� �@f(v� ; ���T )@�0

���T � �0

��0:

For a sequence of vectors or matrices fCtg ; de�ne S0 (C) = 0 and St (C) = T�1Pt�=1C� :

Using summation by parts, we obtain, for two sequence of matrices fAtg and fBtg :

1

T 2

TX�=1

TXt=1

Q�h

�t

T;�

T

�AtB

0�

=T�1X�=1

T�1Xt=1

rQ�h�t

T;�

T

�St (A)S 0� (B)

+T�1Xt=1

�Q�h

�t

T; 1

��Q�h

�t+ 1

T; 1

��St (A)S 0T (B)

+

T�1X�=1

�Q�h

�1;�

T

��Q�h

�1;� + 1

T

��ST (A)S 0� (B) +Q�h (1; 1)ST (A)S 0T (B) :

35

Page 36: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

where

rQ�h�t

T;�

T

�:= Q�h

�t

T;�

T

��Q�h

�t+ 1

T;�

T

��Q�h

�t

T;� + 1

T

�+Q�h

�t+ 1

T;� + 1

T

�:

Plugging At =@f(vt;��

�T )

@�0

���T � �0

�and Bt = ut into the above expression and letting �t =

Gt(���T )� t

TG as before, we obtain a new representation of I1 :

I1 = TT�1X�=1

T�1Xt=1

rQ�h�t

T;�

T

��t

TG+ �t

����T � �0

�S 0� (u)

+ TT�1Xt=1

�Q�h

�t

T; 1

��Q�h

�t+ 1

T; 1

���t

TG+ �t

����T � �0

�S 0T (u)

+ TT�1X�=1

�Q�h

�1;�

T

��Q�h

�1;� + 1

T

��(G+ �T )

���T � �0

�S 0� (u)

+ TQ�h (1; 1) (G+ �T )���T � �0

�S 0T (u)

= T

T�1X�=1

T�1Xt=1

rQ�h�t

T;�

T

��t

���T � �0

�S 0� (u)

+ TT�1Xt=1

�Q�h

�t

T; 1

��Q�h

�t+ 1

T; 1

���t

���T � �0

�S 0T (u)

+ TT�1X�=1

�Q�h

�1;�

T

��Q�h

�1;� + 1

T

���T

���T � �0

�S 0� (u)

+ TQ�h (1; 1) �T���T � �0

�S 0T (u) +G

���T � �0

� 1T

TX�=1

TXt=1

Q�h

�t

T;�

T

�u0�

:= I11 + I12 + I13 + I14 + I15:

We now show that each of the �ve terms in the above expression is op (1) : It is easy to seethat

I15 = G���T � �0

� 1T

TX�=1

TXt=1

Q�h

�t

T;�

T

�u0�

= GpT���T � �0

� 1pT

TX�=1

"1

T

TXt=1

Q�h

�t

T;�

T

�#u0�

= GpT���T � �0

� 1pT

TX�=1

�Z 1

0Q�h

�t;�

T

�dt+O

�1

T

��u0�

= GpT���T � �0

� 1pT

TX�=1

�O

�1

T

��u0� = op (1) ;

I14 = TQ�h (1; 1) �T���T � �0

�S 0T (u)

= Q�h (1; 1) �TpT���T � �0

�pTS 0T (u) = op (1) ;

36

Page 37: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

and

I13 = T

T�1X�=1

�Q�h

�1;�

T

��Q�h

�1;� + 1

T

���T

���T � �0

�S 0� (u)

= �T

hpT���T � �0

�i" 1pT

TX�=1

Q�h

�1;�

T

�u�

#+�T

hpT���T � �0

�iQ�h (1; 1)

hpTST (u)

i= op (1) :

To show that I12 = op (1), we note that under the piecewise monotonicity condition in As-sumption 1, Q�h (t=T; 1) is piecewise monotonic in t=T: So for some �nite J we can partition theset f1; 2; :::; Tg into J maximal non-overlapping subsets [Jj=1Ij such that Q�h (t=T; 1) is monotonicon each Ij := fIjL; :::; IjUg : Now

T�1Xt=1

�Q�h

�t

T; 1

��Q�h

�t+ 1

T; 1

���t

JXj=1

Xt2Ij

�Q�h

�t

T; 1

��Q�h

�t+ 1

T; 1

���t

+ op (1)�

JXj=1

Xt2Ij

����Q�h� tT ; 1��Q�h

�t+ 1

T; 1

����� supsk�sk+ op (1)

=

JXj=1

(�)jXt2Ij

�Q�h

�t+ 1

T; 1

��Q�h

�t

T; 1

��supsk�sk+ op (1)

=JXj=1

�����Q�h�IjUT ; 1

��Q�h

�IjLT; 1

������ supsk�sk+ op (1)

= O (1) supsk�sk+ op (1) = op (1)

where the op (1) term in the �rst inequality re�ects the case when t and t + 1 belong to di¤er-ent partitions and �(�)j� takes �+�or ���depending on whether Q�h (t=T; 1) is increasing ordecreasing on the interval [IjL; IjU ]: As a result,

kI12k = T

T�1Xt=1

�Q�h

�t

T; 1

��Q�h

�t+ 1

T; 1

���t

���T � �0

�S 0T (u)

T�1Xt=1

�Q�h

�t

T; 1

��Q�h

�t+ 1

T; 1

���t

pT ���T � �0� pTS 0T (u) = op (1) : (31)

To show that I11 = op (1) ; we use a similar argument. Under the piecewise monotonicitycondition, we can partition the set of lattice points f(i; j) : i; j = 1; 2; :::; Tg into a �nite numberof maximal non-overlapping subsets

I` = f(i; j) : I�`1L � i � I�`1U ; I�`2L � j � I�`2Ug

37

Page 38: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

for ` = 1; :::; `max such that rQ�h (i=T; j=T ) have the same sign for all (i; j) 2 I`: Now

kI11k = T�1X�=1

T�1Xt=1

rQ�h�t

T;�

T

��t

hpT���T � �0

�i hpTS 0� (u)

i =

`maxX`=1

X(i;j)2I`

rQ�h�i

T;j

T

��i

hpT���T � �0

�i hpTS 0j (u)

i =

`maxX`=1

(�)`X

(i;j)2I`

rQ�h�i

T;j

T

� supt k�tk � pT ���T � �0� � �sup

pTS 0� (u) �

=

`maxX`=1

����Q�h�I�`1LT ;I�`2LT

��Q�h

�I�`1UT;I�`2LT

��Q�h

�I�`1LT;I�`2UT

�+Q�h

�I�`1UT;I�`2UT

������ op (1)= op (1) :

We have therefore proved that I1 = op (1).We can use similar arguments to show that I2 = op (1). Details are omitted. This completes

the proof of Part (b).Parts (c) and (d). We start by establishing the series representation of the weighting

function as given in (4). The representation trivially holds with �j (�) = �j (�) for the OS-HARvariance estimator. It remains to show that it also holds for the contracted kernel kb (�) and theexponentiated kernel k� (�) : We focus on the former as the proof for the latter is similar. UnderAssumption 1(a), kb (x) has a Fourier cosine series expansion

kb (x) = c+1Xj=1

~�j cos j�x (32)

where c is the constant term in the expansion,P1j=1

���~�j��� <1, and the right hand side convergesuniformly over x 2 [�1; 1]: This implies that

kb (r � s) = c+1Xj=1

~�j cos j�r cos j�s+1Xj=1

~�j sin j�r sin j�s

and the right hand side converges uniformly over (r; s) 2 [0; 1]� [0; 1]: Now

k�h (r; s) = kb (r; s)�Z 1

0kb (� � s) d� �

Z 1

0kb (r � �) d� +

Z 1

0

Z 1

0kb (�1 � �2) d�1d�2 (33)

=

1Xj=1

~�j cos j�r cos j�s+

1Xj=1

~�j

�sin j�r �

Z 1

0sin j��d�

��sin j�s�

Z 1

0sin j��d�

:=

1Xj=1

�j�j (r) �j (s)

by taking

�j (r) =

�cos�12�jr

�; j is even

sin�12� (j + 1) r

��R 10

�sin 12� (j + 1) �

�d� ; j is odd

38

Page 39: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

and

�j =

�~�j=2; j is even,~�(j+1)=2; j is odd.

Obviously �j (r) is continuously di¤erentiable and satis�esR 10 �j (r) dr = 0 for all j = 1; : : : : The

uniform convergence ofP1j=1 �j�j (r) �j(s) to k

�h (r; s) inherits from the uniform convergence of

the Fourier series in (32).In view of part (b), it su¢ ces to show that W �

T (�0)as WeT : Using the Cramér-Wold device,

it su¢ ces to show thattr [W �

T (�0)A]as tr [WeTA]

for any conformable symmetry matrix A. Note that

tr [W �T (�0)A] =

1

T

TXt=1

TX�=1

Q�h

�t

T;�

T

�u0tAu� ;

tr (WeTA) =1

T

TXt=1

TX�=1

Q�h

�t

T;�

T

�(�et)

0A (�et) :

We write tr [W �T (�0)A] as

tr [W �T (�0)A] =

1Xj=1

�j

1pT

TXt=1

�j

�t

T

�ut

!0A

1pT

TXt=1

�j

�t

T

�ut

!:= �T;J + �T;J

where

�T;J =

JXj=1

�j

"1pT

TXt=1

�j

�t

T

�ut

#0A

"1pT

TXt=1

�j

�t

T

�ut

#;

�T;J =1X

j=J+1

�j

"1pT

TXt=1

�j

�t

T

�ut

#0A

"1pT

TXt=1

�j

�t

T

�ut

#:

We proceed to apply Lemma 4 by verifying the four conditions. First, it follows from Assumption5 that for every J �xed,

�T;Jas ��T;J :=

JXj=1

�j

"1pT

TXt=1

�j

�t

T

��et

#0A

"1pT

TXt=1

�j

�t

T

��et

#(34)

as T !1: Second, let

�T =

1Xj=1

�j

"1pT

TXt=1

�j

�t

T

��et

#0A

"1pT

TXt=1

�j

�t

T

��et

#= tr (WeTA)

and A =Pm`=1 �`a`a

0` be the spectral decomposition of A: Then

�T � ��T;J =1X

j=J+1

�j

mX`=1

�`

"1pT

TXt=1

�j

�t

T

�a0`�et

#2:

39

Page 40: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

So

E���T � ��T;J �� � C 1X

j=J+1

j�j j1

T

TXt=1

�2j

�t

T

�� C

1Xj=J+1

j�j j

for some constant C that does not depend on T: That is, E���T � ��T;J �� ! 0 uniformly in T as

J !1: So for any � > 0 and " > 0, there exists a J0 that does not depend on T such that

P����T � ��T;J �� � �� � E ���T � ��T;J �� =� � "

for all J � J0 and all T: It then follows that there exists a T0 which does not depend on J0 suchthat

P���T;J < �

�= P

��T +

���T;J � �T

�< ��

� P (�T < � + �) + P����T � ��T;J �� � ��

� P (�T < � + �) + " � P (�T < �) + 2"

for J � J0 and T � T0: Here we have used the equicontinuity of the CDF of �T when T issu¢ ciently large. This is veri�ed below. Similarly, we can show that for J � J0 and T � T0 wehave

P���T;J < �

�� P (�T < �)� 2":

Since the above two inequalities hold for all " > 0; it must be true that P���T;J < �

��P (�T < �) =

o(1) uniformly over su¢ ciently large T as J !1:Third, we can use Lemma 4 to show that �T converges in distribution to �1 where

�1 =

1Xj=1

�j

�Z 1

0�j (r) �dBm (r)

�0A

�Z 1

0�j (s) �dBm (s)

=

Z 1

0

Z 1

0Q�h (r; s) [�dBm (r)]

0A�dBm (s) :

Details are omitted here. So for any � and �; we have

P (� � � � �T < � + �) = P (� � � � �1 < � + �) + o (1)

as T !1: Given the continuity of the CDF of �1; for any " > 0; we can �nd a � > 0 such thatP (� � � � �T < � + �) � " for all T when T is su¢ ciently large. We have thus veri�ed Condition(iii) in Lemma 4. Finally, for every � > 0, we have

P����T;J �� > �� � P

0@ 1Xj=J+1

j�j jmX`=1

j�`j"1pT

TXt=1

�j

�t

T

�a0`ut

#2> �

1A� ��1

mX`=1

j�`jE1X

j=J+1

j�j j"1pT

TXt=1

�j

�t

T

�a0`ut

#2:

40

Page 41: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

But by Assumption 3,

E1X

j=J+1

j�j j"1pT

TXt=1

�j

�t

T

�a0`ut

#2

=

1Xj=J+1

j�j j1

T

TXt=1

�2j

�t

T

�E�a0`ut

�2+

1Xj=J+1

j�j j1

T

Xt6=s

�j

�t

T

��j

� sT

�Ea0`utu

0sa`

� C1X

j=J+1

j�j j+1X

j=J+1

j�j j1

T

Xt6=s

��a0`�t�sa`�� = O0@ 1Xj=J+1

j�j j

1A = o(1)

uniformly over T; as J !1: So we have proved that supT P����T;J �� > �� = o (1) as J !1:

Combining the above steps, we have

tr [W �T (�0)A]

as �T = tr (WeTA) ;

and

tr (WeTA)d!Z 1

0

Z 1

0Q�h (r; s) [�dBm (r)]

0A�dBm (s) = tr(W1A):

As a consequence,

W �T (��T )

asWeTd!W1:

In view of Part (a), we also have

WT (��T )asWeT

d!W1:

Proof of Theorem 1. We prove part (b) only as the proofs for other parts are similar. It su¢ ces

to show that F1d=�Bp (1)� CpqC�1qq Bq (1)

�0D�1pp

�Bp (1)� CpqC�1qq Bq (1)

�=p: Let G� = ��1G,

which is an m� d matrix, then

F1 =

�RhG0� ~W

�11 G�

i�1G0� ~W

�11 Bm(1)

�0�RhG0� ~W

�11 G�

i�1R0��1

��RhG0� ~W

�11 G�

i�1G0� ~W

�11 Bm(1)

�=p:

Let Um�m�m�dV 0d�d be a singular value decomposition (SVD) of G�: By de�nition, U0U = UU 0 =

Im, V V 0 = V 0V = Id and

� =

�Ad�dOq�d

�;

where A is a diagonal matrix with singular values on the main diagonal and O is a matrix ofzeros. Then

RhG0� ~W

�11 G�

i�1G0� ~W

�11 Bm(1) = R

hV �0U 0 ~W�1

1 U�V 0i�1

V �0U 0 ~W�11 Bm(1)

= RVh�0U 0 ~W�1

1 U�i�1

�0U 0 ~W�11 Bm(1) = RV

h�0U 0 ~W�1

1 U�i�1

�0hU 0 ~W�1

1 Ui �U 0Bm(1)

�d= RV

h�0 ~W�1

1 �i�1

�0 ~W�11 Bm(1);

41

Page 42: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

andRhG0� ~W

�11 G�

i�1R0 = R

hV �0U 0 ~W�1

1 U�V 0i�1

R0d= RV

h�0 ~W�1

1 �i�1

V 0R0;

where we have used the distributional equivalence of [ ~W�11 ; Bm(1)] and [U 0 ~W�1

1 U; U 0Bm(1)]: Let

~W1 =

�C11 C12C21 C22

�and ~W�1

1 =

�C11 C12

C21 C22

�where C11 and C11 are d�dmatrices, C22 and C22 are q�q matrices, and C12 = C 021; C12 = (C21)0.By de�nition

C11 =

Z 1

0

Z 1

0Q�h(r; s)dBd(r)dBd(s)

0 =

�Cpp Cp;d�pC 0p;d�p Cd�p;d�p

�(35)

C12 =

Z 1

0

Z 1

0Q�h(r; s)dBd(r)dBq(s)

0 =

�CpqCd�p;q

�(36)

C22 =

Z 1

0

Z 1

0Q�h(r; s)dBq(r)dBq(s)

0 = Cqq (37)

where Cpp; Cpq; and Cqq are de�ned in (7); and Cd�p;d�p; Cp;d�p and Cd�p;q are similarly de�ned.It follows from the partitioned inverse formula that

C11 =�C11 � C12C�122 C21

��1, C12 = �C11C12C�122 :

With the above partition of ~W�11 ; we haveh

�0 ~W�11 �

i�1=

��A0 O0

�� C11 C12

C21 C22

��AO

���1=�A0C11A

��1= A�1

�C11

��1 �A0��1

and so

RVh�0 ~W�1

1 �i�1

�0 ~W�11

= RV A�1�C11

��1 �A0��1 �

A0 O0�� C11 C12

C21 C22

�= RV A�1

�C11

��1 �A0��1

A0�C11 C12

�= RV A�1

�Id;

�C11

��1C12

�; (38)

andRV

h�0 ~W�1

1 �i�1

V 0R0 = RV A�1�C11

��1 �A0��1

V 0R0:

As a result,

F1d=hRV A�1

�Id;

�C11

��1C12

�Bm(1)

i0 �RV A�1

�C11

��1 �A0��1

V 0R0��1

�hRV A�1

�Id;

�C11

��1C12

�Bm(1)

i=p

=�RV A�1

�Id; �C12C�122

�Bm(1)

�0 �RV A�1

�C11

��1 �A0��1

V 0R0��1

��RV A�1

�Id; �C12C�122

�Bm(1)

�=p:

42

Page 43: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Let Bm(1) = [B0d (1) ; B0q (1)]

0 and ~Up�p ~�p�d ~V 0d�d be a SVD of RV A�1; where

~�p�d =�~Ap�p; ~Op�(d�p)

�:

Then

F1d=n~U ~�~V 0

�Bd (1)� C12C�122 Bq (1)

�o0 h~U ~�~V 0

�C11

��1 ~V ~�0 ~U 0i�1� ~U ~�~V 0

�Bd (1)� C12C�122 Bq (1)

�=p

=n~�~V 0

�Bd (1)� C12C�122 Bq (1)

�o0 h~�~V 0

�C11

��1 ~V ~�0i�1� ~�~V 0

�Bd (1)� C12C�122 Bq (1)

�=p

=n~�h~V 0Bd (1)� ~V 0C12C

�122 Bq (1)

io0 h~�~V 0

�C11

��1 ~V ~�0i�1� ~�

h~V 0Bd (1)� ~V 0C12C

�122 Bq (1)

i:

Noting that the joint distribution ofh~V 0Bd (1) ; ~V

0C12; C22; ~V 0�C11

��1 ~V i is invariant to theorthonormal matrix ~V ; we have

F1d=n~��Bd (1)� C12C�122 Bq (1)

�o0 h~��C11

��1 ~�0i�1� ~�

�Bd (1)� C12C�122 Bq (1)

�=p

=��

~A; ~O� �Bd (1)� C12C�122 Bq (1)

�0�h�

~A; ~O� �C11

��1 � ~A; ~O�0i�1

���

~A; ~O� �Bd (1)� C12C�122 Bq (1)

�=p:

Writing �C11

��1= C11 � C12C�122 C21 =

�Dpp D12

D21 D22

�whereDpp = Cpp�CpqC�1qq C 0pq andD22 is a (d� p)�(d� p)matrix and using equations (35)�(37),we have

F1d=�Bp (1)� CpqC�1qq Bq (1)

�0D�1pp

�Bp (1)� CpqC�1qq Bq (1)

�=p:

Proof of Theorem 2. Let �T be the p� 1 vector of Lagrange multipliers for the constrainedGMM estimation. The �rst order conditions for �̂T;R are

@g0T

��̂T;R

�@�

W�1T (~�T )gT

��̂T;R

�+R0�T = 0 and R�̂T;R = r: (39)

Linearizing the �rst set of conditions and using Assumption 4, we have the following system ofequations:�

~ R0

R 0p�p

� pT��̂T;R � �0

�pT�T

!=

��G0W�1

T (~�T )pTgT (�0)

0p�1

�+ op (1)

43

Page 44: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

where ~ := ~(~�T ) = G0W�1

T (~�T )G. From this, we get

pT��̂T;R � �0

�= �~�1G0W�1

T (~�T )pTgT (�0)

� ~�1R0nR~�1R0

o�1R~�1G0W�1

T (~�T )pTgT (�0) + op (1) (40)

and pT�T = �

nR~�1R0

o�1R~�1G0W�1

T (~�T )pTgT (�0) + op (1) : (41)

Combining (5) with (40), we have

pT��̂T;R � �̂T

�= �~�1R0

nR~�1R0

o�1R~�1G0W�1

T (~�T )pTgT (�0) + op (1) (42)

which implies thatpT��̂T;R � �̂T

�= Op (1) : So

gT

��̂T;R

�= gT

��̂T

�+GT (�̂T )

��̂T;R � �̂T

�+ op

�1=pT�:

Plugging this into the de�nition of DT ; we obtain

DT = T��̂T;R � �̂T

�0G0T (�̂T )W

�1T (��T )GT (�̂T )

��̂T;R � �̂T

�=p

� 2Tg0T��̂T

�W�1T (��T )G(�̂T )

��̂T;R � �̂T

�=p+ op (1) (43)

Using the �rst order conditions for �̂T : g0T

��̂T

�W�1T (~�T )GT (�̂T ) = 0; and Lemma 1(a), we

obtainDT = T

��̂T;R � �̂T

�0~��̂T;R � �̂T

�=p+ op (1) : (44)

Plugging (42) into (44) and simplifying the resulting expression, we have

DT =�R0nR~�1R0

o�1R~�1G0W�1

T (~�T )pTgT (�0)

�0� ~�1

�R0nR~�1R0

o�1R~�1G0W�1

T (~�T )pTgT (�0)

�=p+ op (1)

=hR~�1G0W�1

T (~�T )pTgT (�0)

i0 hR~�1R0

i�1 hR~�1G0W�1

T (~�T )pTgT (�0)

i=p+ op (1)

=hpTR

��̂T � �0

�i0 hR~�1R0

i�1 hpTR

��̂T � �0

�i=p+ op (1)

=WT + op (1) : (45)

Next, we prove the second result in the theorem. In view of the �rst order conditions in (39)and equation (41), we have

pT�T

��̂T;R

�=@g0T

��̂T;R

�@�

W�1T (~�T )

pTgT

��̂T;R

�+ op (1) = �

pTR0�T + op (1)

= R0nR~�1R0

o�1R~�1G0W�1

T (~�T )pTgT (�0) + op (1)

= ~pT��̂T � �̂T;R

�+ op (1) ;

44

Page 45: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

and so

ST = T��̂T � �̂T;R

�0~��̂T � �̂T;R

�=p+ op (1)

= DT + op (1) =WT + op (1) :

Proof of Theorem 3. Part (a). Let

H =

Bp (1)� CpqC�1qq Bq (1) Bp (1)� CpqC�1qq Bq (1) ; �H

!

be an orthonormal matrix, then

F1d= Bp (1)� CpqC�1qq Bq (1) 2

"Bp (1)� CpqC�1qq Bq (1) Bp (1)� CpqC�1qq Bq (1)

#0H

�H 0D�1pp HH0

"Bp (1)� CpqC�1qq Bq (1) Bp (1)� CpqC�1qq Bq (1)

#=p

= Bp (1)� CpqC�1qq Bq (1) 2 e0p �H 0D�1pp H

�ep=p;

where ep = (1; 0; 0; :::; 0)0 2 Rp: But H 0D�1pp H has the same distribution as D�1pp and Dpp isindependent of Bp (1)� CpqC�1qq Bq (1) : So

F1d=

Bp (1)� CpqC�1qq Bq (1) 2 =p�e0pD

�1pp ep

��1 : (46)

That is, F1 is equal in distribution to a ratio of two independent random variables.It is easy to see that

�e0pD

�1pp ep

��1 d=

"e0p+q

�Z 1

0

Z 1

0Q�h (r; s) dBp+q (r) dB

0p+q(s)

��1ep+q

#�1d=1

K�2K�p�q+1

where ep+q = (1; 0; :::; 0)0 2 Rp+q: With this, we can now represent F1 as

F1d=

�2p��2�=p

�2K�p�q+1=K(47)

and so

��1F1d=

�2p��2�=p

�2K�p�q+1= (K � p� q + 1)d= Fp;K�p�q+1

��2�: (48)

Part (b). Since the numerator and the denominator in (48) are independent, ��1F1 isdistributed as a noncentral F distribution, conditional on �2: More speci�cally, we have

P���1F1 < z

�= P

�Fp;K�p�q+1

��2�< z�= EFp;K�p�q+1

�z;�2

45

Page 46: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

where Fp;K�p�q+1 (z; �) is the CDF of the noncentral F distribution with degrees of freedom(p;K � p� q + 1) and noncentrality parameter �; and Fp;K�p�q+1 (�) is a random variable withCDF Fp;K�p�q+1 (z; �) :

We proceed to compute the mean of �2. Let

�j =

Z 1

0�j (r) dBp(r) s iidN(0; Ip) and �j =

Z 1

0�j (r) dBq(r) s iidN(0; Iq):

Note that��jare independent of

��j: We can represent Cpq and Cqq as

Cpq = K�1

KXj=1

�j�0j and Cqq = K

�1KXj=1

�j�0j : (49)

So

E�2 = EBq (1)0C�1qq C

0pqCpqC

�1qq Bq (1) = Etr

�C�1qq C

0pqCpqC

�1qq

�= Etr

240@ 1

K

KXj=1

�j�0j

1A�10@ 1

K

KXj=1

�j�0j

1A0@ 1

K

KXj=1

�j�0j

1A0@ 1

K

KXj=1

�j�0j

1A�135= ptrE (�) ;

where � :=�PK

j=1 �j�0j

��1follows an inverse-Wishart distribution and

E (�) =Iq

K � q � 1

for K large enough. Therefore E�2 = pqK�q�1 = �

2:

Next we compute the variance of �2: It follows from the law of total variance that

var��2�= E

�var

��2jC�1qq C 0pqCpqC�1qq

��+ var

��tr�C�1qq C

0pqCpqC

�1qq

���:

Note that Bq (1) is independent of C�1qq C0pqCpqC

�1qq : So conditional on C

�1qq C

0pqCpqC

�1qq ; �

2 is aquadratic form in standard normals. Hence the conditional variance of �2 is

var��2jC�1qq C 0pqCpqC�1qq

�= 2tr

�C�1qq C

0pqCpqC

�1qq C

�1qq C

0pqCpqC

�1qq

�:

Using the representation in (49), we have

Etr�C�1qq C

0pqCpqC

�1qq C

�1qq C

0pqCpqC

�1qq

�= Etr

0@ 1

K

KXj=1

�j�0j

1A0@ 1

K

KXj=1

�j�0j

1AC�2qq0@ 1

K

KXj=1

�j�0j

1A0@ 1

K

KXj=1

�j�0j

1AC�2qq= Etr

24 1

K4

KXj1=1

KXi1=1

�j1��0j1�i1

��0i1C

�2qq

KXj2=1

KXi2=1

�j2��0j2�i2

��0i2C

�2qq

35= Etr

24 1

K4

KXj1=1

KXi1=1

KXj2=1

KXi2=1

�j1�0i1C

�2qq �j2�

0i2C

�2qq E

��0j1�i1

� ��0j2�i2

�35 :46

Page 47: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Since

E��0j1�i1

� ��0j2�i2

�=

8>>>><>>>>:p2; j1 = i1; j2 = i2; and j1 6= j2;p; j1 = j2; i1 = i2; and j1 6= i1;p j1 = i2; i1 = j2; and j1 6= i1;p2 + 2p; j1 = j2 = i1 = i2;0; otherwise,

we have

Etr�C�1qq C

0pqCpqC

�1qq C

�1qq C

0pqCpqC

�1qq

�= Etr

24 1

K4

KXj1=1

KXj2=1

�j1�0j1C

�2qq �j2�

0j2C

�2qq

35 p2 + Etr24 1

K4

KXj1=1

KXi1=1

�j1��0i1C

�2qq �j1

��0i1C

�2qq

35 p+ Etr

24 1

K4

KXj1=1

KXi1=1

�j1��0i1C

�2qq �i1

��0j1C

�2qq

35 p= Etr

0@ KX`1=1

�`1�0`1

1A�2 �p2 + p�+ Etr24 1

K2

KXj1=1

�j1�0j1C

�2qq

35 tr " 1K2

KXi1=1

�0i1C�2qq �i1

#p

= E�tr��2�� �p2 + p

�+ E [tr (�)]2 p

=

0@ qXi=1

qXj=1

E�2ij

1A�p2 + p�+ E qXi=1

�ii

!2p:

It follows from Theorem 5.2.2 of Press (2005, pp. 119, using the notation here) that

E�2ij =(K � q + 1) �ij + (K � q � 1)

(K � q) (K � q � 1)2 (K � q � 3)+

�ij

[K � q � 1]2= O(

1

K2)

and for i 6= j;

E�ii�jj =2

(K � q) (K � q � 1)2 (K � q � 3)+

1

[K � q � 1]2= O(

1

K2):

HenceEtr

�C�1qq C

0pqCpqC

�1qq C

�1qq C

0pqCpqC

�1qq

�= O(

1

K2): (50)

47

Page 48: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Next

var��tr�C�1qq C

0pqCpqC

�1qq

���� E

�tr�C�1qq C

0pqCpqC

�1qq

�tr�C�1qq C

0pqCpqC

�1qq

��= Etr

240@ 1

K

KXj=1

�j�0j

1A0@ 1

K

KXj=1

�j�0j

1AC�2qq35 tr

240@ 1

K

KXj=1

�j�0j

1A0@ 1

K

KXj=1

�j�0j

1AC�2qq35

= Etr

24 1K

KXi1=1

KXj1=1

�i1��0i1�j1

��0j1C

�2qq

35 tr24 1K

KXi2=1

KXj2=1

�i2��0i2�j2

��0j2C

�2qq

35= E

1

K2

KXi1=1

KXj1=1

KXi2=1

KXj2=1

tr��i1�

0j1C

�2qq

�tr��i2�

0j2C

�2qq

�E���0i1�j1

� ��0i2�j2

��= E [tr (�)]2 p2 + E

1

K2

KXi1=1

KXj1=1

tr�C�2qq �j1�

0j1C

�2qq �i1�

0i1

�2p

= E [tr (�)]2 p2 + Etr��2�2p:

Using the same formulae from Press (2005), we can show that the last term is of order O�K�2� :

This, combined with (50), leads to var��2�= O(K�2):

Taking a Taylor expansion and using the mean and variance of �2; we have

P���1F1 < z

�= EFp;K�p�q+1

�z;�2

�= EFp;K�p�q+1

�z; �2

�+ E

@Fp;K�p�q+1�z; �2

�@�

��2 � �2

�+ E

@2Fp;K�p�q+1�z; ~�2

�@�2

��2 � �2

�2(51)

= EFp;K�p�q+1�z; �2

�+ E

@2Fp;K�p�q+1�z; ~�2

�@�2

��2 � �2

�2for some ~�2 between �2 and �2: By de�nition,

Fp;K�p�q+1 (z; �) = P

0@�2p (�)p

"�2K�p�q+1

K � p� q + 1

#�1< z

1A= EGp

pz

"�2K�p�q+1

K � p� q + 1

#; �

!

where Gp (z; �) is the CDF of the noncentral chi-square distribution �2p (�) with noncentrality

48

Page 49: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

parameter �: In view of the relationship Gp (z; �) = exp���2

�P1j=0

(�=2)j

j! Gp+2j (z) ; we have

@Fp;K�p�q+1 (z; �)

@�

=1Xj=0

@

@�

"exp

���2

�(�=2)j

j!

#EGp+2j

pz

"�2K�p�q+1

K � p� q + 1

#!

= �12exp

���2

� 1Xj=0

(�=2)j

j!EGp+2j

pz

"�2K�p�q+1

K � p� q + 1

#!

+1

2exp

���2

� 1Xj=0

(�=2)j

j!EGp+2+2j

pz

"�2K�p�q+1

K � p� q + 1

#!

=1

2

1Xj=0

exp

���2

�(�=2)j

j!E

"Gp+2+2j

pz

"�2K�p�q+1

K � p� q + 1

#!� Gp+2j

pz

"�2K�p�q+1

K � p� q + 1

#!#

and so ����@2Fp;K�p�q+1 (z; �)@�2

���� � 1

2

1Xj=0

����� @@�"exp

���2

�(�=2)j

j!

#������ 1

4

1Xj=0

exp

���2

�(�=2)j

j!+1

4

1Xj=1

exp

���2

�(�=2)(j�1)

(j � 1)!

� 1

4+1

4=1

2

for all z and �: Combining the boundedness of @2Fp;K�p�q+1 (z; �) =@�2 with (51) yields

P���1F1 < z

�= Fp;K�p�q+1

�z; �2

�+O

�var(�2)

�= Fp;K�p�q+1

�z; �2

�+O

�1

K2

�= Fp;K�p�q+1

�z; �2

�+ o

�1

K

�:

Part (c). It follows from Part (b) that

P (pF1 < z) = EGp

z�2K�p�q+1

K; �2

!+ o

�1

K

= EGp

z�2K�p�q+1

K; 0

!+ E

@

@�Gp

z�2K�p�q+1

K; 0

!�2 + o

�1

K

+ E

"@2

@�2Gp

z�2K�p�q+1

K; ��2

!#�4; (52)

where ��2 is between 0 and �2: As in the proof of Part (b), we can show that��� @2@�2Gp (z; �)

��� � 1.As a result,

E

"@2

@�2Gp

z�2K�p�q+1

K; ��2

!#�4 = O(

1

K2) = o(

1

K): (53)

49

Page 50: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Consequently,

P (pF1 < z) = EGp

z�2K�p�q+1

K; 0

!+ E

@

@�Gp

z�2K�p�q+1

K; 0

!�2 + o(

1

K):

By direct calculations, it is easy to show that

@

@�Gp (z; 0) = �

1

2[Gp (z)� Gp+2 (z)] = �

1

p

zp=2e�z=2

2p=2��p2

� = �1pG0p (z) z: (54)

Therefore,

P (pF1 < z)

= EGp

z�2K�p�q+1

K

!� 1pEG0p

z�2K�p�q+1

K

!z�2K�p�q+1

K�2 + o

�1

K

= Gp (z) + G0p (z) zE �2K�p�q+1

K� 1!+1

2G00p (z) z2var

�2K�p�q+1

K

!� �

2

pG0p (z) z + o

�1

K

�= Gp (z) + G0p (z) z

�p� q + 1K

+ G00p (z) z2K � p� q + 1

K2� q

K � q � 1G0p (z) z

= Gp (z)� G0p (z) z�p+ 2q � 1

K

�+ G00p (z) z2

1

K+ o

�1

K

�:

Proof of Theorem 4. Part (a). Using the same argument for proving Theorem 3(a), we have

t1d=B1 (1)� C1qC�1qq Bq (1)q

�2K�q=K(55)

and sot1p�

d=B1 (1)� C1qC�1qq Bq (1)q

�2K�q= (K � q)d= tK�q (�) :

Part (b). Since the distribution of t1 is symmetric about 0; we have for any z 2 R+:

P

�t1p�< z

�=1

2+1

2P (jt1=

p�j < jzj) = 1

2+1

2P (t21=� < z

2)

=1

2+1

2F1;K�q

�z2; �2

�+ o(

1

K)

where the second last equality follows from Theorem 3(b). When z 2 R�; we have

P

�t1p�< z

�=1

2� 12P (jt1=

p�j < jzj) = 1

2� 12P (t21=� < z

2)

=1

2� 12F1;K�q

�z2; �2

�+ o(

1

K):

50

Page 51: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Therefore

P

�t1p�< z

�=1

2+1

2sgn(z)F1;K�q

�z2; �2

�+ o(

1

K):

Part (c). Using Theorem 3(c) and the symmetry of the distribution of t1 about 0; we havefor any z 2 R+:

P (t1 < z) =1

2+1

2P (jt1j < jzj) =

1

2+1

2P (t21 < z2)

=1

2+1

2G�z2�� G0

�z2�z2� qK

�+1

2G00�z2�z41

K+ o

�1

K

�:

Using the relationships that1

2+1

2G�z2�= �(z);

and�G0

�z2�z2� qK

�+1

2G00�z2�z41

K= � 1

4K�(z)

�z3 + z(4q + 1)

�;

we haveP (t1 < z) = �(z)� 1

4Kz�(z)

�z2 + (4q + 1)

�+ o(

1

K):

Similarly when z 2 R�; we have

P (t1 < z) = �(z) +1

4Kz�(z)

�z2 + (4q + 1)

�+ o(

1

K):

ThereforeP (t1 < z) = �(z)� 1

4Kjzj�(z)

�z2 + (4q + 1)

�+ o(

1

K):

Before proving Theorem 5, we present a technical lemma. Part (i) of the lemma is proved inSun (2014). Parts (ii) and (iii) of the lemma are proved in Sun, Phillips and Jin (2011)

De�ne g0 = limx!0 [1� k (x)] =xq0 , q0 is the Parzen exponent of the kernel function, c1 =R1�1 k (x) dx; c2 =

R1�1 k

2 (x) dx: Recall the de�nitions of �1 and �2 in (9).

Lemma 5 (i) For conventional kernel HAR variance estimators, we have, as h!1;(a) �1 = 1� bc1 +O(b2);(b) �2 = bc2 +O(b

2).(ii) For sharp kernel HAR variance estimators, we have, as h!1;(a) �1 = 1� 2

�+2 ;

(b) �2 =1�+1 +O

�1�2

�:

(iii) For steep kernel HAR variance estimator, we have, as h!1;(a) �1 = 1�

���g0

�1=2+O

�1�

�;

(b) �2 =�

�2�g0

�1=2+O

�1�

�:

51

Page 52: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Proof of Theorem 5. Part (a). Recall

pF1 =�Bp (1)� CpqC�1qq Bq (1)

�0D�1pp

�Bp (1)� CpqC�1qq Bq (1)

�:

Conditional on (Cpq; Cqq; Cpp) ; Bp (1)�CpqC�1qq Bq (1) is normal with mean zero and variance Ip+CpqC

�1qq C

�1qq C

0pq: Let L be the lower triangular matrix such that LL

0 is the Choleski decompositionof Ip+CpqC�1qq C

�1qq C

0pq: Then the conditional distribution of � := L

�1 �Bp (1)� CpqC�1qq Bq (1)� isN(0; Ip): Since the conditional distribution does not depend on (Cpq; Cqq; Cpp) ; we can concludethat � is independent of (Cpq; Cqq; Cpp) : So we can write

pF1d= � 0A�

where A = L0D�1pp L. Given that A is a function of (Cpq; Cqq; Cpp) ; we know that � and A are

independent. As a result, � 0A� d= � 0 (OAO0) � for any orthonormal matrix O that is independent

of �:

Let H =��= k�k ; ~H

�be an orthonormal matrix with �rst column �= k�k : We choose H to

be independent of A: This is possible as � and A are independent. Then

pF1d= k�k2 � 0

k�k�OAO0

� �

k�k = k�k2

�� 0

k�kH�H 0 �OAO0�H �H 0 �

k�k

�= k�k2 e0p

�H 0OAO0H

�ep

where ep = (1; 0; :::; 0)0 is the �rst basis vector in Rp: Since k�k2 ;H and A are mutually indepen-

dent from each other, we can write

pF1d= k�k2 e0p

�H0O0AOH

�ep

for any orthonormal matrix H that is independent of both � and A: Letting O = H0; we obtain:

pF1d= k�k2

�e0pAep

�=

k�k2�e0pAep

��1 d=

k�k2�e0pL

0D�1pp Lep��1 :

Since Lep is the �rst column of L; we have, using the de�nition of the Choleski decomposition:

Lep =

�Ip + CpqC

�1qq C

�1qq C

0pq

�epq

e0p�Ip + CpqC

�1qq C

�1qq C 0pq

�ep

:

As a result,

pF1d=k�k2

�2

where

�2 =�e0pL

0D�1pp Lep��1

=e0p�Ip + CpqC

�1qq C

�1qq C

0pq

�ep

e0p�Ip + CpqC

�1qq C

�1qq C 0pq

�D�1pp

�Ip + CpqC

�1qq C

�1qq C 0pq

�ep:

Part (b). It is easy to show that

ECqq = �1Iq and var [vec (Cqq)] = �2 (Iqq +Kqq)

52

Page 53: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

where �1 and �2 are de�ned in (9), Iqq is the q2 � q2 identity matrix, and Kqq is the q2 � q2

commutation matrix. So Cqq = �1Iq + op (1) and C�1qq = ��11 Iq + op (1) as h ! 1: Similarly,

Cpp = �1Ip+ op (1) and C�1pp = �

�11 Ip+ op (1) ; as h!1: In addition, using the same argument,

we can show that Cpq = op (1) : Therefore

�2 =e0p�Ip + CpqC

0pq=�

21

�ep

e0p�Ip + CpqC 0pq=�

21

� �Ip � CpqC 0pq=�21

��1 �Ip + CpqC 0pq=�

21

�ep(1 + op (1))

= 1 + op (1)

That is, �2 !p 1 as h!1:We proceed to prove the distributional expansion. The (i; j)-th elements Cpp(i; j), Cpq(i; j)

and Cqq(i; j) of Cpp; Cpq and Cqq are equal to eitherR 10

R 10 Q

�h (r; s) dB(r)dB(s) or

R 10

R 10 Q

�h (r; s) dB(r)d

~B(s)

where B(�) and ~B (�) are independent standard Brownian motion processes. By direct calcula-tions, we have, for any & 2 (0; 3=8);

P (jCef (i; j)� ECef (i; j)j > �&2) �E jCef (i; j)� ECef (i; j)j8

�8&2= O

��42�8&2

�= o(�2)

where e; f = p or q: De�ne the event E as

E = f! : jCef (i; j)� ECef (i; j)j � �&2 for all i; j and all e and fg ;

then the complement Ec of E satis�es P (Ec) = o(�2) as h!1: Let ~E be another event, then

P�~E�

= P�~E \ E

�+ P

�~E \ Ec

�= P

�~E \ E

�+ o(�2)

= P ( ~EjE)P (E) + o(�2) = P ( ~EjE) (1� o(�2)) + o(�2)= P ( ~EjE) + o (�2) :

That is, up to an error of order o(�2); P�~E�; P�~E \ E

�and P ( ~EjE) are equivalent. So for the

purpose of proving the theorem, it is innocuous to condition on E or remove the conditioning, ifneeded.

Now conditioning E , the numerator of �2 satis�es:

e0p�Ip + CpqC

�1qq C

�1qq C

0pq

�ep

= 1 +1

�21e0pCpq

�Iq +

Cqq � ECqq�1

��1 �Iq +

Cqq � ECqq�1

��1C 0pqep

= 1 +1

�21e0pCpq (Iq �Mqq) (Iq �Mqq)C

0pqep

= 1 +1

�21e0pCpq ~MqqC

0pqep

where Mqq is a matrix with elements Mqq (i; j) satisfying jMqq (i; j)j = O (�&2) conditional on E ,and ~Mqq is a matrix satisfying

��� ~Mqq (i; j)� 1 fi = jg��� = O (�&2) conditional on E .

Let

Cp+q;p+q =

�Cpp CpqC 0pq Cqq

�53

Page 54: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

and eq+q;p be the matrix consisting of the �rst p columns of the identity matrix Ip+q: To evaluatethe denominator of �2; we note that

D�1pp =1

�1e0q+q;p

�Ip+q +

Cp+q;p+q � ECp+q;p+q�1

��1eq+q;p

=1

�1e0q+q;p

�Ip+q �

Cp+q;p+q � ECp+q;p+q�1

�eq+q;p

+e0q+q;p

�[Cp+q;p+q � ECp+q;p+q] [Cp+q;p+q � ECp+q;p+q]

�21

�eq+q;p +Mpp

=1

�1

�Ip �

Cpp � ECpp�1

+[Cpp � ECpp] [Cpp � ECpp] + CpqC 0pq

�21

�+Mpp

where Mpp is a matrix with elements Mpp (i; j) satisfying jMpp (i; j)j = O��3&2�= o(�2) for

& > 1=3 conditional on E .For the purpose of proving our result, Mpp can be ignored as its presence generates an ap-

proximation error of order o (�2) ; which is the same as the order of the approximation error givenin the theorem. More speci�cally, let

~Cpp =Cpq ~MqqC

0pq

�21; ~D�pp =

�Ip �

Cpp � ECpp�1

+[Cpp � ECpp] [Cpp � ECpp] + CpqC 0pq

�21

�and

~�2 := ~�2( ~Cpp; ~D�pp) =

�1

�1 + e0p ~Cppep

�e0p

�Ip + ~Cpp

�~D�pp

�Ip + ~Cpp

�ep;

thenP (pF1 < z) = EGp

��2z�= EGp

�~�2z�+ o (�2) :

Note that for any q � q matrix Lqq; we have

ECpqLqqC0pq = E

Z 1

0

Z 1

0Q�h(r1; s1)Q

�h(r2; s2)dBp(r1)dBq(s1)

0LqqdBq(s2)dBp(r1)0

= E

Z 1

0

Z 1

0Q�h(r1; s1)Q

�h(r2; s2)dBp(r1)dBp(r1)

0tr�dBq(s2)dBq(s1)

0Lqq�

= tr (Lqq)

Z 1

0

Z 1

0[Q�h(r; s)]

2 drdrIp = tr (Lqq)�2Ip: (56)

Taking an expansion of ~�2( ~Cpp; ~D�pp) around ~Cpp = q�2=�21Ip and ~D

�pp = Ip; we obtain

~�2 = �20 + err

where err is the approximation error and

�20 = �1 �2q�2�1

+ e0p [Cpp � ECpp] ep �e0p [Cpp � ECpp] [Cpp � ECpp] ep

�1+

�e0p [Cpp � ECpp] ep

2�1

:

We keep enough terms in �20 so that EGp�~�2z�= EGp

��20z�+ o (�2) :

54

Page 55: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Now we write

P (pF1 < z) = EGp(�20z) + o (�2)

= Gp (z) + G0p (z) z�E�20 � 1

�+1

2G00p (z) z2E(�20 � 1)2 + o (�2) :

In view of

E�20 � 1 = (�1 � 1)�2q�2�1

� 1

�1Ee0p [Cpp � ECpp] [Cpp � ECpp] ep + E

�e0p [Cpp � ECpp] ep

2�1

= (�1 � 1)�2q�2�1

� �2�1(p+ 1) +

2�2�1

= (�1 � 1)�2q�2�1

� �2�1(p� 1)

and

E(�20 � 1)2

= E

��1 � 1�

2q�2�1

+ e0p [Cpp � ECpp] ep�2+ o (�2)

= E�e0p [Cpp � ECpp] ep

2+ (�1 � 1)2 + o (�2)

= 2�2 + (�1 � 1)2 + o (�2) ;

we have

P (pF1 < z)

= Gp (z) + G0p (z) z�(�1 � 1)�

2q�2�1

� �2�1(p� 1)

�+ G00p (z) z2�2 + o (�2)

= Gp (z) + G0p (z) z�(�1 � 1)�

�2�1(p+ 2q � 1)

�+ G00p (z) z2�2 + o (�2) : (57)

Proof of Theorem 6. We prove part (a) only as the proof for part (b) is similar. Using thesame arguments and the notation as in the proof of Theorem 1, we have

R�G0W�1

1 G��1

G0W�11 �Bm (1) + �0

d= RV A�1

�Id; �C12C�122

�Bm(1) + �0

andR�G0W�1

1 G��1

R0d= RV A�1

�C11

��1 �A0��1

V 0R0:

So

F1;�0d=�RV A�1

�Id; �C12C�122

�Bm(1) + �0

�0 hRV A�1

�C11

��1 �A0��1

V 0R0i�1

��RV A�1

�Id; �C12C�122

�Bm(1) + �0

�=p:

55

Page 56: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Let Bm(1) = [B0d (1) ; B0q (1)]

0 and RV A�1 = ~U ~�~V 0 be a SVD of RV A�1; where ~� =�~A; ~O

�: Then

F1;�0d=n~U ~�~V 0

�Bd (1)� C12C�122 Bq (1)

�+ �0

o0 �~U ~�~V 0

�C11

��1 ~V ~�0 ~U 0��1�n~U ~�~V 0

�Bd (1)� C12C�122 Bq (1)

�+ �0

o=p

=n~�~V 0

�Bd (1)� C12C�122 Bq (1)

�+ ~U 0�0

o0 �~�~V 0C�111

~V ~�0��1

�n~�~V 0

�Bd (1)� C12C�122 Bq (1)

�+ ~U 0�0

o=p:

The above distribution does not depend on the orthonormal matrix ~V : So

F1;�0d=n~��Bd (1)� C12C�122 Bq (1)

�+ ~U 0�0

o0 �~�C�111

~�0��1

�n~��Bd (1)� C12C�122 Bq (1)

�+ ~U 0�0

o=p

=n~A�Bp (1)� CpqC�1qq Bq (1)

�+ ~U 0�0

o0 �~ADpp ~A

��1�n~A�Bp (1)� CpqC�1qq Bq (1)

�+ ~U 0�0

o=hBp (1)� CpqC�1qq Bq (1) + ~A�1 ~U 0�0

i0�D�1pp

hBp (1)� CpqC�1qq Bq (1) + ~A�1 ~U 0�0

i:

Let H = (~A�1U 0�0k ~A�1U 0�0k ;

~H) be a p� p orthonormal matrix, then

F1;�0d=hH 0Bp (1)�H 0CpqC

�1qq Bq (1) +H

0 ~A�1 ~U 0�0i0

�H 0D�1pp HhH 0Bp (1)�H 0CpqC

�1qq Bq (1) +H

0 ~A�1 ~U 0�0i

=hH 0Bp (1)�H 0CpqC

�1qq Bq (1) + epjj ~A�1 ~U 0�0jj

i0�H 0D�1pp H

hH 0Bp (1)�H 0CpqC

�1qq Bq (1) + epjj ~A�1 ~U 0�0jj

i:

But the joint distribution of�H 0Bp (1) ;H 0Cpq;H 0D�1pp H

�is invariant to H: Hence we can write

F1;�0d=hBp (1)� CpqC�1qq Bq (1) + epjj ~A�1 ~U 0�0jj

i0�D�1pp

hBp (1)� CpqC�1qq Bq (1) + epjj ~A�1 ~U 0�0jj

i=p:

That is, the distribution of F1;�0 depends on �0 only through jj ~A�1 ~U 0�0jj. Note that ~U ~A ~A0 ~U 0 =

56

Page 57: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

~U ~�~V 0�~U ~�~V 0

�0= RV A�1

�RV A�1

�0 andjj ~A�1 ~U 0�0jj2 = �00 ~U

�~A�1�0~A�1 ~U 0�0 = �

00

h~U ~A ~A0 ~U 0

i�1�0

= �00

hRV A�1

�A�1

�0V 0R0

i�1�0 = �

00

hR�V A0AV 0

��1R0i�1

�0

= �00

�Rh���1G

�0��1G

i�1R0��1

�0 = �00

nR�G0�1G

��1R0o�1

�0

= jjV�1=2�0jj2 = jj��jj2;

we have

F1;�0d=�Bp (1)� CpqC�1qq Bq (1) + ��

�0�D�1pp

�Bp (1)� CpqC�1qq Bq (1) + ��

�=p = F1

�jj��jj2

�:

Proof of Proposition 7. We �rst prove (20). Using the singular value decomposition G =UG�GV

0G; we have

~V = RVG��0GW

�1U;1�G

��1 ��0GW

�1U;1UW

�1U;1�G

���0GW

�1U;1�G

��1V 0GR

0:

Note that�0GW

�1U;1�G = A

0G

�W 11�2U;1��1

AG;

and

W�1U;1�G =

24 �W 11�2U;1

��1��W 21U;1

��W 22U;1

��1 �W 11�2U;1

��135AG;

we have

~V = RVGA�1G

�W 11�2U;1�A�1G AG

24 �W 11�2U;1

��1�W 21U;1

��W 22U;1

��1 �W 11�2U;1

��1350U

24 �W 11�2U;1

��1�W 21U;1

��W 22U;1

��1 �W 11�2U;1

��135

��RVGA

�1G

�W 11�2U;1�A�1G AG

�0= RVGA

�1G

�Id

�BU;1

�0�11U 12U21U 22U

��Id

�BU;1

��RVGA

�1G

�0= RVGA

�1G

�11U � 12U BU;1 �B0U;121U +B0U;122U BU;1

� �RVGA

�1G

�0:

In view ofV = RVGA�1G

h11U � 12U

�22U��1

21U

i �RVGA

�1G

�0;

we have

~V � V = RVGA�1G

h12U

�22U��1

21U � 12U BU;1 �B0U;121U +B0U;122U BU;1i �RVGA

�1G

�0= RVGA

�1G

h�22U��1

21U �BU;1i022U

h�22U��1

21U �BU;1i �RVGA

�1G

�0:

57

Page 58: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

Next, we prove the proposition, starting with

jj��jj2 � jj~�jj2

= �00 ~V�1=2h~V�1=2V ~V�1=2

i�1=2 hIp � ~V�1=2V ~V�1=2

i h~V�1=2V ~V�1=2

i�1=2~V�1=2�0:

The above equality holds when ~V is nonsingular. But

~V�1=2�~V � V

�(~V�1=2)

= ~V�1=2RVGA�1Gh�22U��1

21U �BU;1i022U

h�22U��1

21U �BU;1i �RVGA

�1G

�0(~V�1=2)

=h~V�1=2RVGA�1G

�21U � 22U BU;1

�0 �22U��1=2i

�h�22U��1=2 �

21U � 22U BU;1� �RVGA

�1G

�0(~V�1=2)

i= �12�

012

and~V�1=2V ~V�1=2 = Ip � ~V�1=2

�~V � V

�(~V�1=2) = Ip � �12�012;

hencejj��jj2 � jj~�jj2 = �00 ~V�1=2

�Ip � �12�012

��1=2�12�

012

�Ip � �12�012

��1=2 ~V�1=2�0as desired.

References

[1] Andersen, T. G. and Sorensen, B. E. (1996): �GMM Estimation of a Stochastic VolatilityModel: a Monte Carlo Study.�Journal of Business & Economic Statistics 14(3), 328�52.

[2] Andrews, D. W. K. (1991): �Heteroskedasticity and Autocorrelation Consistent CovarianceMatrix Estimation.�Econometrica 59, 817�858.

[3] Andrews, D. W. K. and J. C. Monahan (1992): �An Improved Heteroskedasticity and Au-tocorrelation Consistent Covariance Matrix Estimator.�Econometrica 60(4), 953�966.

[4] Bester, A., Conley, T., Hansen, C., and Vogelsang, T. (2011): �Fixed-b Asymptotics forSpatially Dependent Robust Nonparametric Covariance Matrix Estimators.�Working paper,Michigan State University.

[5] Bester, C. A., T. G. Conley, and C.B. Hansen (2011): �Inference with Dependent DataUsing Cluster Covariance Estimators.�Journal of Econometrics 165(2), 137�51.

[6] Chen, X., Z. Liao and Y. Sun (2014): �Sieve Inference on Possibly Misspeci�ed Semi-nonparametric Time Series Models.�Journal of Econometrics 178(3), 639�658.

[7] Gonçlaves, S. (2011): �The Moving Blocks Bootstrap for Panel Linear Regression Modelswith Individual Fixed E¤ects.�Econometric Theory 27(5), 1048�1082.

58

Page 59: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

[8] Gonçlaves, S. and T. Vogelsang (2011): �Block Bootstrap HAC Robust Tests: The Sophis-tication of the Naive Bootstrap.�Econometric Theory 27(4), 745�791.

[9] Ghysels, E., A. C. Harvey and E. Renault (1996): �Stochastic Volatility,�In G.S. Maddalaand C.R.Rao (Eds), Statistical Methods in Finance, 119�191.

[10] Han, C. and P. C. B. Phillips (2006): �GMM with Many Moment Conditions.�Econometrica74(1), 147�192.

[11] Hansen, L. P. (1982): �Large Sample Properties of Generalized Method of Moments Esti-mators.�Econometrica 50, 1029�1054.

[12] Hall, A. R. (2000): �Covariance Matrix Estimation and the Power of the Over-identifyingRestrictions Test.�Econometrica 50, 1517�1527.

[13] Hall, A. R. (2005): Generalized Method of Moments. Oxford University Press.

[14] Ibragimov, R. and U. K. Müller (2010): �t-Statistic Based Correlation and HeterogeneityRobust Inference.�Journal of Business and Economic Statistics 28, 453�468.

[15] Jansson, M. (2004): �On the Error of Rejection Probability in Simple Autocorrelation Ro-bust Tests.�Econometrica 72, 937�946.

[16] Jacquier, E., N. G. Polson and P. E. Rossi (1994): �Bayesian Analysis of Stochastic VolatilityModels�Journal of Business & Economic Statistics 12(4), 371�389.

[17] Kiefer, N. M. and T. J. Vogelsang (2002a): �Heteroskedasticity-autocorrelation Robust Test-ing Using Bandwidth Equal to Sample Size.�Econometric Theory 18, 1350�1366.

[18] � � � (2002b): �Heteroskedasticity-autocorrelation Robust Standard Errors Using theBartlett Kernel without Truncation.�Econometrica 70, 2093�2095.

[19] � � � (2005): �A New Asymptotic Theory for Heteroskedasticity-Autocorrelation RobustTests.�Econometric Theory 21, 1130�1164.

[20] Kim, M. S. and Sun, Y. (2013): �Heteroskedasticity and Spatiotemporal Dependence RobustInference for Linear Panel Models with Fixed E¤ects.�Journal of Econometrics, 177(1), 85�108.

[21] Müller, U. K. (2007): �A Theory of Robust Long-Run Variance Estimation.� Journal ofEconometrics 141, 1331�1352.

[22] Newey, W. K. and K. D. West (1987): �A Simple, Positive Semide�nite, Heteroskedasticityand Autocorrelation Consistent Covariance Matrix.�Econometrica 55, 703�708.

[23] Phillips, P. C. B. (2005): �HAC Estimation by Automated Regression.�Econometric Theory21, 116�142.

[24] Phillips, P. C. B., Y. Sun and S. Jin (2006): �Spectral Density Estimation and RobustHypothesis Testing using Steep Origin Kernels without Truncation.�International EconomicReview 47, 837�894

59

Page 60: Fixed-smoothing Asymptotics in a Two-step GMM Framework€¦ · GMM = argmin 2 g T ( ) 0 W 1 T g T ( ) where W T is a positive de–nite weighting matrix. To obtain an initial –rst

[25] Phillips, P. C. B., Y. Sun and S. Jin (2007): �Consistent HAC Estimation and RobustRegression Testing Using Sharp Origin Kernels with No Truncation.�Journal of StatisticalPlanning and Inference 137, 985�1023.

[26] Politis, D. (2011): �Higher-order Accurate, Positive Semi-de�nite Estimation of Large-sample Covariance and Spectral Density Matrices.�Econometric Theory, 27, 703�744.

[27] Press, J. (2005): Applied Multivariate Analysis, Second Edition, Dover Books on Mathemat-ics.

[28] Shao, X. (2010): �A Self-normalized Approach to Con�dence Interval Construction in TimeSeries.�Journal of the Royal Statistical Society B, 72, 343�366.

[29] Sun, Y. (2011): �Robust Trend Inference with Series Variance Estimator and Testing-optimalSmoothing Parameter.�Journal of Econometrics, 164(2), 345�366.

[30] Sun, Y. (2013): �A Heteroskedasticity and Autocorrelation Robust F Test Using Orthonor-mal Series Variance Estimator.�Econometrics Journal, 16, 1�26.

[31] Sun, Y. (2014): �Let�s Fix It: Fixed-b Asymptotics versus Small-b Asymptotics in Het-eroscedasticity and Autocorrelation Robust Inference.� Journal of Econometrics 178(3),659�677.

[32] Sun, Y. and D. Kaplan (2012): �Fixed-smoothing Asymptotics and Accurate F approxima-tion Using Vector Autoregressive Covariance Matrix Estimators.�Working paper, UC SanDiego.

[33] Sun, Y., P. C. B. Phillips and S. Jin (2008). �Optimal Bandwidth Selection inHeteroskedasticity-Autocorrelation Robust Testing.�Econometrica 76(1), 175�194.

[34] Sun, Y., P. C. B. Phillips and S. Jin (2011): �Power Maximization and Size Control inHeteroscedasticity and Autocorrelation Robust Tests with Exponentiated Kernels.�Econo-metric Theory 27(6), 1320�1368.

[35] Sun, Y. and P. C. B. Phillips (2009): �Bandwidth Choice for Interval Estimation in GMMRegression.�Working paper, Department of Economics, UC San Diego.

[36] Sun, Y. and M. S. Kim (2012): �Simple and Powerful GMM Over-identi�cation Tests withAccurate Size.�Journal of Econometrics 166(2), 267�281

[37] Sun, Y. and M. S. Kim (2014): �Asymptotic F Test in a GMM Framework with CrossSectional Dependence.�Review of Economics and Statistics, forthcoming.

[38] Taniguchi, M. and Puri, M. L. (1996): �Valid Edgeworth Expansions of M-estimators inRegression Models with Weakly Dependent Residuals.�Econometric Theory 12, 331�346.

[39] Vogelsang, T. (2012): �Heteroskedasticity, Autocorrelation, and Spatial Correlation RobustInference in Linear Panel Models with Fixed-e¤ects. Journal of Econometrics 166, 303�319.

[40] Xiao, Z. and O. Linton (2002): �A Nonparametric Prewhitened Covariance Estimator.�Journal of Time Series Analysis 23, 215�250.

[41] Zhang, X. and X. Shao (2013): �Fixed-Smoothing Asymptotics For Time Series.�Annals ofStatistics 41(3), 1329�1349.

60